This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.86-rc1
Todd Kjos tkjos@android.com binder: fix race that allows malicious free of live buffer
YueHaibing yuehaibing@huawei.com misc: mic/scif: fix copy-paste error in scif_create_remote_lookup
Dexuan Cui decui@microsoft.com Drivers: hv: vmbus: check the creation_status in vmbus_establish_gpadl()
Yu Zhao yuzhao@google.com mm: use swp_offset as key in shmem_replace_page()
Luis Chamberlain mcgrof@kernel.org lib/test_kmod.c: fix rmmod double free
Martin Kelly martin@martingkelly.com iio:st_magn: Fix enable device after trigger
Felipe Balbi felipe.balbi@linux.intel.com Revert "usb: dwc3: gadget: skip Set/Clear Halt when invalid"
Michael Niewöhner linux@mniewoehner.de usb: core: quirks: add RESET_RESUME quirk for Cherry G230 Stream series
Kai-Heng Feng kai.heng.feng@canonical.com USB: usb-storage: Add new IDs to ums-realtek
Larry Finger Larry.Finger@lwfinger.net staging: rtl8723bs: Add missing return for cfg80211_rtw_get_station
Ben Wolsieffer benwolsieffer@gmail.com staging: vchiq_arm: fix compat VCHIQ_IOC_AWAIT_COMPLETION
Josef Bacik josef@toxicpanda.com btrfs: release metadata before running delayed refs
Richard Genoud richard.genoud@gmail.com dmaengine: at_hdmac: fix module unloading
Richard Genoud richard.genoud@gmail.com dmaengine: at_hdmac: fix memory leak in at_dma_xlate()
Heiko Stuebner heiko@sntech.de ARM: dts: rockchip: Remove @0 from the veyron memory node
Pan Bian bianpan2016@163.com ext2: fix potential use after free
Anisse Astier anisse@astier.eu ALSA: hda/realtek - fix headset mic detection for MSI MS-B171
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Support ALC300
Takashi Iwai tiwai@suse.de ALSA: sparc: Fix invalid snd_free_pages() at error path
Takashi Iwai tiwai@suse.de ALSA: control: Fix race between adding and removing a user element
Takashi Iwai tiwai@suse.de ALSA: ac97: Fix incorrect bit shift at AC97-SPSA control write
Takashi Iwai tiwai@suse.de ALSA: wss: Fix invalid snd_free_pages() at error path
Maximilian Heyne mheyne@amazon.de fs: fix lost error code in dio_complete
Jiri Olsa jolsa@kernel.org perf/x86/intel: Add generic branch tracing check to intel_pmu_has_bts()
Jiri Olsa jolsa@kernel.org perf/x86/intel: Move branch tracing setup to the Intel-specific source file
Sebastian Andrzej Siewior bigeasy@linutronix.de x86/fpu: Disable bottom halves while loading FPU registers
Borislav Petkov bp@suse.de x86/MCE/AMD: Fix the thresholding machinery initialization order
Christoph Muellner christoph.muellner@theobroma-systems.com arm64: dts: rockchip: Fix PCIe reset polarity for rk3399-puma-haikou.
Hou Zhiqiang Zhiqiang.Hou@nxp.com PCI: layerscape: Fix wrong invocation of outbound window disable accessor
Pan Bian bianpan2016@163.com btrfs: relocation: set trans to be NULL after ending transaction
Filipe Manana fdmanana@suse.com Btrfs: ensure path name is null terminated at btrfs_control_ioctl
Max Filippov jcmvbkbc@gmail.com xtensa: fix coprocessor part of ptrace_{get,set}xregs
Max Filippov jcmvbkbc@gmail.com xtensa: fix coprocessor context offset definitions
Max Filippov jcmvbkbc@gmail.com xtensa: enable coprocessors that are being flushed
Wanpeng Li wanpengli@tencent.com KVM: X86: Fix scan ioapic use-before-initialization
Liran Alon liran.alon@oracle.com KVM: x86: Fix kernel info-leak in KVM_HC_CLOCK_PAIRING hypercall
Jim Mattson jmattson@google.com kvm: svm: Ensure an IBPB on all affected CPUs when freeing a vmcb
Junaid Shahid junaids@google.com kvm: mmu: Fix race in emulated page table writes
Thomas Gleixner tglx@linutronix.de x86/speculation: Provide IBPB always command line options
Thomas Gleixner tglx@linutronix.de x86/speculation: Add seccomp Spectre v2 user space protection mode
Thomas Gleixner tglx@linutronix.de x86/speculation: Enable prctl mode for spectre_v2_user
Thomas Gleixner tglx@linutronix.de x86/speculation: Add prctl() control for indirect branch speculation
Thomas Gleixner tglx@linutronix.de x86/speculation: Prepare arch_smt_update() for PRCTL mode
Thomas Gleixner tglx@linutronix.de x86/speculation: Prevent stale SPEC_CTRL msr content
Thomas Gleixner tglx@linutronix.de x86/speculation: Split out TIF update
Thomas Gleixner tglx@linutronix.de ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS
Thomas Gleixner tglx@linutronix.de x86/speculation: Prepare for conditional IBPB in switch_mm()
Thomas Gleixner tglx@linutronix.de x86/speculation: Avoid __switch_to_xtra() calls
Thomas Gleixner tglx@linutronix.de x86/process: Consolidate and simplify switch_to_xtra() code
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Prepare for per task indirect branch speculation control
Thomas Gleixner tglx@linutronix.de x86/speculation: Add command line control for indirect branch speculation
Thomas Gleixner tglx@linutronix.de x86/speculation: Unify conditional spectre v2 print functions
Thomas Gleixner tglx@linutronix.de x86/speculataion: Mark command line parser data __initdata
Thomas Gleixner tglx@linutronix.de x86/speculation: Mark string arrays const correctly
Thomas Gleixner tglx@linutronix.de x86/speculation: Reorder the spec_v2 code
Thomas Gleixner tglx@linutronix.de x86/l1tf: Show actual SMT state
Thomas Gleixner tglx@linutronix.de x86/speculation: Rework SMT state change
Thomas Gleixner tglx@linutronix.de sched/smt: Expose sched_smt_present static key
Thomas Gleixner tglx@linutronix.de x86/Kconfig: Select SCHED_SMT if SMP enabled
Peter Zijlstra (Intel) peterz@infradead.org sched/smt: Make sched_smt_present track topology
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Reorganize speculation control MSRs update
Thomas Gleixner tglx@linutronix.de x86/speculation: Rename SSBD update functions
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Disable STIBP when enhanced IBRS is in use
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Remove unnecessary ret variable in cpu_show_common()
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Clean up spectre_v2_parse_cmdline()
Tim Chen tim.c.chen@linux.intel.com x86/speculation: Update the TIF_SSBD comment
Zhenzhong Duan zhenzhong.duan@oracle.com x86/retpoline: Remove minimal retpoline support
Zhenzhong Duan zhenzhong.duan@oracle.com x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
Zhenzhong Duan zhenzhong.duan@oracle.com x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
Jiri Kosina jkosina@suse.cz x86/speculation: Propagate information about RSB filling mitigation to sysfs
Jiri Kosina jkosina@suse.cz x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
Jiri Kosina jkosina@suse.cz x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
Tom Lendacky thomas.lendacky@amd.com x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
Tom Lendacky thomas.lendacky@amd.com x86/bugs: Update when to check for the LS_CFG SSBD mitigation
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/bugs: Switch the selection of mitigation from CPU vendor to CPU features
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/bugs: Add AMD's SPEC_CTRL MSR usage
Konrad Rzeszutek Wilk konrad.wilk@oracle.com x86/bugs: Add AMD's variant of SSB_NO
Peter Zijlstra peterz@infradead.org sched/core: Fix cpu.max vs. cpuhotplug deadlock
Bernd Eckstein 3erndeckstein@gmail.com usbnet: ipheth: fix potential recvmsg bug and recvmsg bug 2
Julian Wiedmann jwi@linux.ibm.com s390/qeth: fix length check in SNMP processing
Pan Bian bianpan2016@163.com rapidio/rionet: do not free skb before reading its length
Willem de Bruijn willemb@google.com packet: copy user buffers before orphan or clone
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
Jason Wang jasowang@redhat.com virtio-net: fail XDP set if guest csum is negotiated
Jason Wang jasowang@redhat.com virtio-net: disable guest csum during XDP set
Lorenzo Bianconi lorenzo.bianconi@redhat.com net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
Petr Machata petrm@mellanox.com net: skb_scrub_packet(): Scrub offload_fwd_mark
Sasha Levin sashal@kernel.org Revert "wlcore: Add missing PM call for wlcore_cmd_wait_for_event_or_timeout()"
Darrick J. Wong darrick.wong@oracle.com xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with cp_pack_start_sum
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with i_extra_isize
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with block address in main area
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with node footer and iblocks
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with user_block_count
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with extra_attr feature
Ben Hutchings ben.hutchings@codethink.co.uk f2fs: Add sanity_check_inode() function
Chao Yu yuchao0@huawei.com f2fs: fix to do sanity check with secs_per_zone
Chao Yu yuchao0@huawei.com f2fs: introduce and spread verify_blkaddr
Chao Yu yuchao0@huawei.com f2fs: clean up with is_valid_blkaddr()
Jaegeuk Kim jaegeuk@kernel.org f2fs: enhance sanity_check_raw_super() to avoid potential overflow
Jaegeuk Kim jaegeuk@kernel.org f2fs: sanity check on sit entry
Yunlei He heyunlei@huawei.com f2fs: check blkaddr more accuratly before issue a bio
Shaokun Zhang zhangshaokun@hisilicon.com btrfs: tree-checker: Fix misleading group system information
Qu Wenruo wqu@suse.com btrfs: tree-checker: Check level for leaves and nodes
Qu Wenruo wqu@suse.com btrfs: Check that each block group has corresponding chunk at mount time
Qu Wenruo wqu@suse.com btrfs: tree-checker: Detect invalid and empty essential trees
Qu Wenruo wqu@suse.com btrfs: tree-checker: Verify block_group_item
David Sterba dsterba@suse.com btrfs: tree-check: reduce stack consumption in check_dir_item
Arnd Bergmann arnd@arndb.de btrfs: tree-checker: use %zu format string for size_t
Qu Wenruo wqu@suse.com btrfs: tree-checker: Add checker for dir item
Qu Wenruo wqu@suse.com btrfs: tree-checker: Fix false panic for sanity test
Qu Wenruo quwenruo.btrfs@gmx.com btrfs: tree-checker: Enhance btrfs_check_node output
Qu Wenruo quwenruo.btrfs@gmx.com btrfs: Move leaf and node validation checker to tree-checker.c
Qu Wenruo quwenruo.btrfs@gmx.com btrfs: Add checker for EXTENT_CSUM
Qu Wenruo quwenruo.btrfs@gmx.com btrfs: Add sanity check for EXTENT_DATA when reading out leaf
Qu Wenruo quwenruo.btrfs@gmx.com btrfs: Check if item pointer overlaps with the item itself
Qu Wenruo quwenruo.btrfs@gmx.com btrfs: Refactor check_leaf function for later expansion
Qu Wenruo wqu@suse.com btrfs: Verify that every chunk has corresponding block group at mount time
Gu Jinxiang gujx@cn.fujitsu.com btrfs: validate type when reading a chunk
Lior David qca_liord@qca.qualcomm.com wil6210: missing length check in wmi_set_ie
Vakul Garg vakul.garg@nxp.com net/tls: Fixed return value when tls_complete_pending_work() fails
Boris Pismenny borisp@mellanox.com tls: Use correct sk->sk_prot for IPV6
Ilya Lesokhin ilyal@mellanox.com tls: don't override sk_write_space if tls_set_sw_offload fails.
Ilya Lesokhin ilyal@mellanox.com tls: Avoid copying crypto_info again after cipher_type check.
Ilya Lesokhin ilyal@mellanox.com tls: Fix TLS ulp context leak, when TLS_TX setsockopt is not used.
Ilya Lesokhin ilyal@mellanox.com tls: Add function to update the TLS socket configuration
Alexei Starovoitov ast@kernel.org bpf: Prevent memory disambiguation attack
Ilya Dryomov idryomov@gmail.com libceph: implement CEPHX_V2 calculation mode
Ilya Dryomov idryomov@gmail.com libceph: add authorizer challenge
Ilya Dryomov idryomov@gmail.com libceph: factor out encrypt_authorizer()
Ilya Dryomov idryomov@gmail.com libceph: factor out __ceph_x_decrypt()
Ilya Dryomov idryomov@gmail.com libceph: factor out __prepare_write_connect()
Ilya Dryomov idryomov@gmail.com libceph: store ceph_auth_handshake pointer in ceph_connection
Richard Weinberger richard@nod.at ubi: Initialize Fastmap checkmapping correctly
Matthias Schwarzott zzam@gentoo.org media: em28xx: Fix use-after-free when disconnecting
Hugh Dickins hughd@google.com mm/khugepaged: collapse_shmem() do not crash on Compound
Hugh Dickins hughd@google.com mm/khugepaged: collapse_shmem() without freezing new_page
Hugh Dickins hughd@google.com mm/khugepaged: minor reorderings in collapse_shmem()
Hugh Dickins hughd@google.com mm/khugepaged: collapse_shmem() remember to clear holes
Hugh Dickins hughd@google.com mm/khugepaged: fix crashes due to misaccounted holes
Hugh Dickins hughd@google.com mm/khugepaged: collapse_shmem() stop if punched or truncated
Hugh Dickins hughd@google.com mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
Hugh Dickins hughd@google.com mm/huge_memory: splitting set mapping+index before unfreeze
Konstantin Khlebnikov khlebnikov@yandex-team.ru mm/huge_memory.c: reorder operations in __split_huge_page_tail()
Hugh Dickins hughd@google.com mm/huge_memory: rename freeze_page() to unmap_page()
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 56 +- Documentation/userspace-api/spec_ctrl.rst | 9 + Makefile | 4 +- arch/arm/boot/dts/rk3288-veyron.dtsi | 6 +- .../arm64/boot/dts/rockchip/rk3399-puma-haikou.dts | 2 +- arch/x86/Kconfig | 12 +- arch/x86/Makefile | 5 +- arch/x86/events/core.c | 20 - arch/x86/events/intel/core.c | 52 +- arch/x86/events/perf_event.h | 13 +- arch/x86/include/asm/cpufeatures.h | 2 + arch/x86/include/asm/msr-index.h | 5 +- arch/x86/include/asm/nospec-branch.h | 44 +- arch/x86/include/asm/spec-ctrl.h | 20 +- arch/x86/include/asm/switch_to.h | 3 - arch/x86/include/asm/thread_info.h | 20 +- arch/x86/include/asm/tlbflush.h | 8 +- arch/x86/kernel/cpu/amd.c | 4 +- arch/x86/kernel/cpu/bugs.c | 510 ++++++++++++---- arch/x86/kernel/cpu/common.c | 9 +- arch/x86/kernel/cpu/mcheck/mce_amd.c | 19 +- arch/x86/kernel/fpu/signal.c | 4 +- arch/x86/kernel/process.c | 101 +++- arch/x86/kernel/process.h | 39 ++ arch/x86/kernel/process_32.c | 8 +- arch/x86/kernel/process_64.c | 10 +- arch/x86/kvm/cpuid.c | 10 +- arch/x86/kvm/mmu.c | 27 +- arch/x86/kvm/svm.c | 28 +- arch/x86/kvm/x86.c | 4 +- arch/x86/mm/tlb.c | 115 +++- arch/xtensa/kernel/asm-offsets.c | 16 +- arch/xtensa/kernel/process.c | 5 +- arch/xtensa/kernel/ptrace.c | 42 +- drivers/android/binder.c | 21 +- drivers/android/binder_alloc.c | 14 +- drivers/android/binder_alloc.h | 3 +- drivers/dma/at_hdmac.c | 10 +- drivers/hv/channel.c | 8 + drivers/iio/magnetometer/st_magn_buffer.c | 12 +- drivers/media/usb/em28xx/em28xx-dvb.c | 3 +- drivers/misc/mic/scif/scif_rma.c | 2 +- drivers/mtd/ubi/vtbl.c | 20 +- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 9 +- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 4 +- drivers/net/rionet.c | 2 +- drivers/net/usb/ipheth.c | 10 +- drivers/net/virtio_net.c | 13 +- drivers/net/wireless/ath/wil6210/wmi.c | 8 +- drivers/net/wireless/ti/wlcore/cmd.c | 6 - drivers/pci/dwc/pci-layerscape.c | 2 +- drivers/s390/net/qeth_core_main.c | 27 +- drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c | 2 +- .../vc04_services/interface/vchiq_arm/vchiq_arm.c | 7 +- drivers/usb/core/quirks.c | 3 + drivers/usb/dwc3/gadget.c | 5 - drivers/usb/storage/unusual_realtek.h | 10 + fs/btrfs/Makefile | 2 +- fs/btrfs/disk-io.c | 153 +---- fs/btrfs/extent-tree.c | 86 ++- fs/btrfs/relocation.c | 1 + fs/btrfs/super.c | 1 + fs/btrfs/transaction.c | 6 +- fs/btrfs/tree-checker.c | 649 +++++++++++++++++++++ fs/btrfs/tree-checker.h | 38 ++ fs/btrfs/volumes.c | 30 +- fs/btrfs/volumes.h | 2 + fs/ceph/mds_client.c | 11 + fs/direct-io.c | 4 +- fs/ext2/xattr.c | 2 +- fs/f2fs/checkpoint.c | 43 +- fs/f2fs/data.c | 52 +- fs/f2fs/f2fs.h | 41 +- fs/f2fs/file.c | 21 +- fs/f2fs/inode.c | 78 ++- fs/f2fs/node.c | 9 +- fs/f2fs/recovery.c | 6 +- fs/f2fs/segment.c | 13 +- fs/f2fs/segment.h | 24 +- fs/f2fs/super.c | 96 ++- fs/xfs/libxfs/xfs_attr.c | 9 +- include/linux/bpf_verifier.h | 1 + include/linux/ceph/auth.h | 8 + include/linux/ceph/ceph_features.h | 7 +- include/linux/ceph/messenger.h | 6 +- include/linux/ceph/msgr.h | 2 +- include/linux/jump_label.h | 7 + include/linux/ptrace.h | 4 +- include/linux/sched.h | 9 + include/linux/sched/smt.h | 20 + include/linux/skbuff.h | 18 +- include/net/tls.h | 4 +- include/uapi/linux/btrfs_tree.h | 1 + include/uapi/linux/prctl.h | 1 + kernel/bpf/verifier.c | 62 +- kernel/cpu.c | 14 +- kernel/jump_label.c | 12 +- kernel/sched/core.c | 19 +- kernel/sched/fair.c | 4 +- kernel/sched/sched.h | 4 +- lib/test_kmod.c | 1 - mm/huge_memory.c | 79 +-- mm/khugepaged.c | 129 ++-- mm/shmem.c | 12 +- net/ceph/auth.c | 16 + net/ceph/auth_x.c | 217 +++++-- net/ceph/auth_x_protocol.h | 7 + net/ceph/messenger.c | 86 +-- net/ceph/osd_client.c | 11 + net/core/skbuff.c | 4 + net/packet/af_packet.c | 4 +- net/tls/tls_main.c | 124 ++-- net/tls/tls_sw.c | 13 +- scripts/Makefile.build | 2 - sound/core/control.c | 80 +-- sound/isa/wss/wss_lib.c | 2 - sound/pci/ac97/ac97_codec.c | 2 +- sound/pci/hda/patch_realtek.c | 9 + sound/sparc/cs4231.c | 8 +- 119 files changed, 2912 insertions(+), 907 deletions(-)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 906f9cdfc2a0800f13683f9e4ebdfd08c12ee81b upstream.
The term "freeze" is used in several ways in the kernel, and in mm it has the particular meaning of forcing page refcount temporarily to 0. freeze_page() is just too confusing a name for a function that unmaps a page: rename it unmap_page(), and rename unfreeze_page() remap_page().
Went to change the mention of freeze_page() added later in mm/rmap.c, but found it to be incorrect: ordinary page reclaim reaches there too; but the substance of the comment still seems correct, so edit it down.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261514080.2275@eggly.anvils Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index adacfe66cf3d..09d7e9c4de89 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2280,7 +2280,7 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma, } }
-static void freeze_page(struct page *page) +static void unmap_page(struct page *page) { enum ttu_flags ttu_flags = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS | TTU_RMAP_LOCKED | TTU_SPLIT_HUGE_PMD; @@ -2295,7 +2295,7 @@ static void freeze_page(struct page *page) VM_BUG_ON_PAGE(!unmap_success, page); }
-static void unfreeze_page(struct page *page) +static void remap_page(struct page *page) { int i; if (PageTransHuge(page)) { @@ -2412,7 +2412,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags);
- unfreeze_page(head); + remap_page(head);
for (i = 0; i < HPAGE_PMD_NR; i++) { struct page *subpage = head + i; @@ -2593,7 +2593,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) }
/* - * Racy check if we can split the page, before freeze_page() will + * Racy check if we can split the page, before unmap_page() will * split PMDs */ if (!can_split_huge_page(head, &extra_pins)) { @@ -2602,7 +2602,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) }
mlocked = PageMlocked(page); - freeze_page(head); + unmap_page(head); VM_BUG_ON_PAGE(compound_mapcount(head), head);
/* Make sure the page is not on per-CPU pagevec as it takes pin */ @@ -2659,7 +2659,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) fail: if (mapping) spin_unlock(&mapping->tree_lock); spin_unlock_irqrestore(zone_lru_lock(page_zone(head)), flags); - unfreeze_page(head); + remap_page(head); ret = -EBUSY; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 605ca5ede7643a01f4c4a15913f9714ac297f8a6 upstream.
THP split makes non-atomic change of tail page flags. This is almost ok because tail pages are locked and isolated but this breaks recent changes in page locking: non-atomic operation could clear bit PG_waiters.
As a result concurrent sequence get_page_unless_zero() -> lock_page() might block forever. Especially if this page was truncated later.
Fix is trivial: clone flags before unfreezing page reference counter.
This race exists since commit 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit") while unsave unfreeze itself was added in commit 8df651c7059e ("thp: cleanup split_huge_page()").
clear_compound_head() also must be called before unfreezing page reference because after successful get_page_unless_zero() might follow put_page() which needs correct compound_head().
And replace page_ref_inc()/page_ref_add() with page_ref_unfreeze() which is made especially for that and has semantic of smp_store_release().
Link: http://lkml.kernel.org/r/151844393341.210639.13162088407980624477.stgit@buzz Signed-off-by: Konstantin Khlebnikov khlebnikov@yandex-team.ru Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: Nicholas Piggin npiggin@gmail.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 36 +++++++++++++++--------------------- 1 file changed, 15 insertions(+), 21 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 09d7e9c4de89..8969f5cf3174 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2312,26 +2312,13 @@ static void __split_huge_page_tail(struct page *head, int tail, struct page *page_tail = head + tail;
VM_BUG_ON_PAGE(atomic_read(&page_tail->_mapcount) != -1, page_tail); - VM_BUG_ON_PAGE(page_ref_count(page_tail) != 0, page_tail);
/* - * tail_page->_refcount is zero and not changing from under us. But - * get_page_unless_zero() may be running from under us on the - * tail_page. If we used atomic_set() below instead of atomic_inc() or - * atomic_add(), we would then run atomic_set() concurrently with - * get_page_unless_zero(), and atomic_set() is implemented in C not - * using locked ops. spin_unlock on x86 sometime uses locked ops - * because of PPro errata 66, 92, so unless somebody can guarantee - * atomic_set() here would be safe on all archs (and not only on x86), - * it's safer to use atomic_inc()/atomic_add(). + * Clone page flags before unfreezing refcount. + * + * After successful get_page_unless_zero() might follow flags change, + * for exmaple lock_page() which set PG_waiters. */ - if (PageAnon(head) && !PageSwapCache(head)) { - page_ref_inc(page_tail); - } else { - /* Additional pin to radix tree */ - page_ref_add(page_tail, 2); - } - page_tail->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; page_tail->flags |= (head->flags & ((1L << PG_referenced) | @@ -2344,14 +2331,21 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_unevictable) | (1L << PG_dirty)));
- /* - * After clearing PageTail the gup refcount can be released. - * Page flags also must be visible before we make the page non-compound. - */ + /* Page flags must be visible before we make the page non-compound. */ smp_wmb();
+ /* + * Clear PageTail before unfreezing page refcount. + * + * After successful get_page_unless_zero() might follow put_page() + * which needs correct compound_head(). + */ clear_compound_head(page_tail);
+ /* Finally unfreeze refcount. Additional reference from page cache. */ + page_ref_unfreeze(page_tail, 1 + (!PageAnon(head) || + PageSwapCache(head))); + if (page_is_young(head)) set_page_young(page_tail); if (page_is_idle(head))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 173d9d9fd3ddae84c110fea8aedf1f26af6be9ec upstream.
Huge tmpfs stress testing has occasionally hit shmem_undo_range()'s VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page).
Move the setting of mapping and index up before the page_ref_unfreeze() in __split_huge_page_tail() to fix this: so that a page cache lookup cannot get a reference while the tail's mapping and index are unstable.
In fact, might as well move them up before the smp_wmb(): I don't see an actual need for that, but if I'm missing something, this way round is safer than the other, and no less efficient.
You might argue that VM_BUG_ON_PAGE(page_to_pgoff(page) != index, page) is misplaced, and should be left until after the trylock_page(); but left as is has not crashed since, and gives more stringent assurance.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261516380.2275@eggly.anvils Fixes: e9b61f19858a5 ("thp: reintroduce split_huge_page()") Requires: 605ca5ede764 ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Jerome Glisse jglisse@redhat.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 8969f5cf3174..69ffeb439b0f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2331,6 +2331,12 @@ static void __split_huge_page_tail(struct page *head, int tail, (1L << PG_unevictable) | (1L << PG_dirty)));
+ /* ->mapping in first tail page is compound_mapcount */ + VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, + page_tail); + page_tail->mapping = head->mapping; + page_tail->index = head->index + tail; + /* Page flags must be visible before we make the page non-compound. */ smp_wmb();
@@ -2351,12 +2357,6 @@ static void __split_huge_page_tail(struct page *head, int tail, if (page_is_idle(head)) set_page_idle(page_tail);
- /* ->mapping in first tail page is compound_mapcount */ - VM_BUG_ON_PAGE(tail > 2 && page_tail->mapping != TAIL_MAPPING, - page_tail); - page_tail->mapping = head->mapping; - - page_tail->index = head->index + tail; page_cpupid_xchg_last(page_tail, page_cpupid_last(head)); lru_add_page_tail(head, page_tail, lruvec, list); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 006d3ff27e884f80bd7d306b041afc415f63598f upstream.
Huge tmpfs testing, on 32-bit kernel with lockdep enabled, showed that __split_huge_page() was using i_size_read() while holding the irq-safe lru_lock and page tree lock, but the 32-bit i_size_read() uses an irq-unsafe seqlock which should not be nested inside them.
Instead, read the i_size earlier in split_huge_page_to_list(), and pass the end offset down to __split_huge_page(): all while holding head page lock, which is enough to prevent truncation of that extent before the page tree lock has been taken.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261520070.2275@eggly.anvils Fixes: baa355fd33142 ("thp: file pages support for split_huge_page()") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/huge_memory.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 69ffeb439b0f..930f2aa3bb4d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2362,12 +2362,11 @@ static void __split_huge_page_tail(struct page *head, int tail, }
static void __split_huge_page(struct page *page, struct list_head *list, - unsigned long flags) + pgoff_t end, unsigned long flags) { struct page *head = compound_head(page); struct zone *zone = page_zone(head); struct lruvec *lruvec; - pgoff_t end = -1; int i;
lruvec = mem_cgroup_page_lruvec(head, zone->zone_pgdat); @@ -2375,9 +2374,6 @@ static void __split_huge_page(struct page *page, struct list_head *list, /* complete memcg works before add pages to LRU */ mem_cgroup_split_huge_fixup(head);
- if (!PageAnon(page)) - end = DIV_ROUND_UP(i_size_read(head->mapping->host), PAGE_SIZE); - for (i = HPAGE_PMD_NR - 1; i >= 1; i--) { __split_huge_page_tail(head, i, lruvec, list); /* Some pages can be beyond i_size: drop them from page cache */ @@ -2549,6 +2545,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) int count, mapcount, extra_pins, ret; bool mlocked; unsigned long flags; + pgoff_t end;
VM_BUG_ON_PAGE(is_huge_zero_page(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); @@ -2571,6 +2568,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) ret = -EBUSY; goto out; } + end = -1; mapping = NULL; anon_vma_lock_write(anon_vma); } else { @@ -2584,6 +2582,15 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
anon_vma = NULL; i_mmap_lock_read(mapping); + + /* + *__split_huge_page() may need to trim off pages beyond EOF: + * but on 32-bit, i_size_read() takes an irq-unsafe seqlock, + * which cannot be nested inside the page tree lock. So note + * end now: i_size itself may be changed at any moment, but + * head page lock is good enough to serialize the trimming. + */ + end = DIV_ROUND_UP(i_size_read(mapping->host), PAGE_SIZE); }
/* @@ -2633,7 +2640,7 @@ int split_huge_page_to_list(struct page *page, struct list_head *list) if (mapping) __dec_node_page_state(page, NR_SHMEM_THPS); spin_unlock(&pgdata->split_queue_lock); - __split_huge_page(page, list, flags); + __split_huge_page(page, list, end, flags); if (PageSwapCache(head)) { swp_entry_t entry = { .val = page_private(head) };
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 701270fa193aadf00bdcf607738f64997275d4c7 upstream.
Huge tmpfs testing showed that although collapse_shmem() recognizes a concurrently truncated or hole-punched page correctly, its handling of holes was liable to refill an emptied extent. Add check to stop that.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261522040.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Matthew Wilcox willy@infradead.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0a5bb3e8a8a3..d4a06afbeda4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1352,6 +1352,16 @@ static void collapse_shmem(struct mm_struct *mm, radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { int n = min(iter.index, end) - index;
+ /* + * Stop if extent has been hole-punched, and is now completely + * empty (the more obvious i_size_read() check would take an + * irq-unsafe seqlock on 32-bit). + */ + if (n >= HPAGE_PMD_NR) { + result = SCAN_TRUNCATED; + goto tree_locked; + } + /* * Handle holes in the radix tree: charge it from shmem and * insert relevant subpage of new_page into the radix-tree. @@ -1463,6 +1473,11 @@ static void collapse_shmem(struct mm_struct *mm, if (result == SCAN_SUCCEED && index < end) { int n = end - index;
+ /* Stop if extent has been truncated, and is now empty */ + if (n >= HPAGE_PMD_NR) { + result = SCAN_TRUNCATED; + goto tree_locked; + } if (!shmem_charge(mapping->host, n)) { result = SCAN_FAIL; goto tree_locked;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit aaa52e340073b7f4593b3c4ddafcafa70cf838b5 upstream.
Huge tmpfs testing on a shortish file mapped into a pmd-rounded extent hit shmem_evict_inode()'s WARN_ON(inode->i_blocks) followed by clear_inode()'s BUG_ON(inode->i_data.nrpages) when the file was later closed and unlinked.
khugepaged's collapse_shmem() was forgetting to update mapping->nrpages on the rollback path, after it had added but then needs to undo some holes.
There is indeed an irritating asymmetry between shmem_charge(), whose callers want it to increment nrpages after successfully accounting blocks, and shmem_uncharge(), when __delete_from_page_cache() already decremented nrpages itself: oh well, just add a comment on that to them both.
And shmem_recalc_inode() is supposed to be called when the accounting is expected to be in balance (so it can deduce from imbalance that reclaim discarded some pages): so change shmem_charge() to update nrpages earlier (though it's rare for the difference to matter at all).
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261523450.2275@eggly.anvils Fixes: 800d8c63b2e98 ("shmem: add huge pages support") Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 4 +++- mm/shmem.c | 6 +++++- 2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d4a06afbeda4..be7b0863e33c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1539,8 +1539,10 @@ static void collapse_shmem(struct mm_struct *mm, *hpage = NULL; } else { /* Something went wrong: rollback changes to the radix-tree */ - shmem_uncharge(mapping->host, nr_none); spin_lock_irq(&mapping->tree_lock); + mapping->nrpages -= nr_none; + shmem_uncharge(mapping->host, nr_none); + radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) { if (iter.index >= end) diff --git a/mm/shmem.c b/mm/shmem.c index fa08f56fd5e5..177fef62cbbd 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -296,12 +296,14 @@ bool shmem_charge(struct inode *inode, long pages) if (!shmem_inode_acct_block(inode, pages)) return false;
+ /* nrpages adjustment first, then shmem_recalc_inode() when balanced */ + inode->i_mapping->nrpages += pages; + spin_lock_irqsave(&info->lock, flags); info->alloced += pages; inode->i_blocks += pages * BLOCKS_PER_PAGE; shmem_recalc_inode(inode); spin_unlock_irqrestore(&info->lock, flags); - inode->i_mapping->nrpages += pages;
return true; } @@ -311,6 +313,8 @@ void shmem_uncharge(struct inode *inode, long pages) struct shmem_inode_info *info = SHMEM_I(inode); unsigned long flags;
+ /* nrpages adjustment done by __delete_from_page_cache() or caller */ + spin_lock_irqsave(&info->lock, flags); info->alloced -= pages; inode->i_blocks -= pages * BLOCKS_PER_PAGE;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 2af8ff291848cc4b1cce24b6c943394eb2c761e8 upstream.
Huge tmpfs testing reminds us that there is no __GFP_ZERO in the gfp flags khugepaged uses to allocate a huge page - in all common cases it would just be a waste of effort - so collapse_shmem() must remember to clear out any holes that it instantiates.
The obvious place to do so, where they are put into the page cache tree, is not a good choice: because interrupts are disabled there. Leave it until further down, once success is assured, where the other pages are copied (before setting PageUptodate).
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261525080.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index be7b0863e33c..f535dbe3f9f5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1502,7 +1502,12 @@ static void collapse_shmem(struct mm_struct *mm, * Replacing old pages with new one has succeed, now we need to * copy the content and free old pages. */ + index = start; list_for_each_entry_safe(page, tmp, &pagelist, lru) { + while (index < page->index) { + clear_highpage(new_page + (index % HPAGE_PMD_NR)); + index++; + } copy_highpage(new_page + (page->index % HPAGE_PMD_NR), page); list_del(&page->lru); @@ -1512,6 +1517,11 @@ static void collapse_shmem(struct mm_struct *mm, ClearPageActive(page); ClearPageUnevictable(page); put_page(page); + index++; + } + while (index < end) { + clear_highpage(new_page + (index % HPAGE_PMD_NR)); + index++; }
local_irq_save(flags);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 042a30824871fa3149b0127009074b75cc25863c upstream.
Several cleanups in collapse_shmem(): most of which probably do not really matter, beyond doing things in a more familiar and reassuring order. Simplify the failure gotos in the main loop, and on success update stats while interrupts still disabled from the last iteration.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261526400.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 73 ++++++++++++++++++++----------------------------- 1 file changed, 30 insertions(+), 43 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f535dbe3f9f5..0eac4344477a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1333,13 +1333,12 @@ static void collapse_shmem(struct mm_struct *mm, goto out; }
+ __SetPageLocked(new_page); + __SetPageSwapBacked(new_page); new_page->index = start; new_page->mapping = mapping; - __SetPageSwapBacked(new_page); - __SetPageLocked(new_page); BUG_ON(!page_ref_freeze(new_page, 1));
- /* * At this point the new_page is 'frozen' (page_count() is zero), locked * and not up-to-date. It's safe to insert it into radix tree, because @@ -1368,13 +1367,13 @@ static void collapse_shmem(struct mm_struct *mm, */ if (n && !shmem_charge(mapping->host, n)) { result = SCAN_FAIL; - break; + goto tree_locked; } - nr_none += n; for (; index < min(iter.index, end); index++) { radix_tree_insert(&mapping->page_tree, index, new_page + (index % HPAGE_PMD_NR)); } + nr_none += n;
/* We are done. */ if (index >= end) @@ -1390,12 +1389,12 @@ static void collapse_shmem(struct mm_struct *mm, result = SCAN_FAIL; goto tree_unlocked; } - spin_lock_irq(&mapping->tree_lock); } else if (trylock_page(page)) { get_page(page); + spin_unlock_irq(&mapping->tree_lock); } else { result = SCAN_PAGE_LOCK; - break; + goto tree_locked; }
/* @@ -1410,11 +1409,10 @@ static void collapse_shmem(struct mm_struct *mm, result = SCAN_TRUNCATED; goto out_unlock; } - spin_unlock_irq(&mapping->tree_lock);
if (isolate_lru_page(page)) { result = SCAN_DEL_PAGE_LRU; - goto out_isolate_failed; + goto out_unlock; }
if (page_mapped(page)) @@ -1436,7 +1434,9 @@ static void collapse_shmem(struct mm_struct *mm, */ if (!page_ref_freeze(page, 3)) { result = SCAN_PAGE_COUNT; - goto out_lru; + spin_unlock_irq(&mapping->tree_lock); + putback_lru_page(page); + goto out_unlock; }
/* @@ -1452,17 +1452,10 @@ static void collapse_shmem(struct mm_struct *mm, slot = radix_tree_iter_resume(slot, &iter); index++; continue; -out_lru: - spin_unlock_irq(&mapping->tree_lock); - putback_lru_page(page); -out_isolate_failed: - unlock_page(page); - put_page(page); - goto tree_unlocked; out_unlock: unlock_page(page); put_page(page); - break; + goto tree_unlocked; }
/* @@ -1470,7 +1463,7 @@ static void collapse_shmem(struct mm_struct *mm, * This code only triggers if there's nothing in radix tree * beyond 'end'. */ - if (result == SCAN_SUCCEED && index < end) { + if (index < end) { int n = end - index;
/* Stop if extent has been truncated, and is now empty */ @@ -1482,7 +1475,6 @@ static void collapse_shmem(struct mm_struct *mm, result = SCAN_FAIL; goto tree_locked; } - for (; index < end; index++) { radix_tree_insert(&mapping->page_tree, index, new_page + (index % HPAGE_PMD_NR)); @@ -1490,14 +1482,19 @@ static void collapse_shmem(struct mm_struct *mm, nr_none += n; }
+ __inc_node_page_state(new_page, NR_SHMEM_THPS); + if (nr_none) { + struct zone *zone = page_zone(new_page); + + __mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none); + __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none); + } + tree_locked: spin_unlock_irq(&mapping->tree_lock); tree_unlocked:
if (result == SCAN_SUCCEED) { - unsigned long flags; - struct zone *zone = page_zone(new_page); - /* * Replacing old pages with new one has succeed, now we need to * copy the content and free old pages. @@ -1511,11 +1508,11 @@ static void collapse_shmem(struct mm_struct *mm, copy_highpage(new_page + (page->index % HPAGE_PMD_NR), page); list_del(&page->lru); - unlock_page(page); - page_ref_unfreeze(page, 1); page->mapping = NULL; + page_ref_unfreeze(page, 1); ClearPageActive(page); ClearPageUnevictable(page); + unlock_page(page); put_page(page); index++; } @@ -1524,28 +1521,17 @@ static void collapse_shmem(struct mm_struct *mm, index++; }
- local_irq_save(flags); - __inc_node_page_state(new_page, NR_SHMEM_THPS); - if (nr_none) { - __mod_node_page_state(zone->zone_pgdat, NR_FILE_PAGES, nr_none); - __mod_node_page_state(zone->zone_pgdat, NR_SHMEM, nr_none); - } - local_irq_restore(flags); - - /* - * Remove pte page tables, so we can re-faulti - * the page as huge. - */ - retract_page_tables(mapping, start); - /* Everything is ready, let's unfreeze the new_page */ - set_page_dirty(new_page); SetPageUptodate(new_page); page_ref_unfreeze(new_page, HPAGE_PMD_NR); + set_page_dirty(new_page); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_anon(new_page); - unlock_page(new_page);
+ /* + * Remove pte page tables, so we can re-fault the page as huge. + */ + retract_page_tables(mapping, start); *hpage = NULL; } else { /* Something went wrong: rollback changes to the radix-tree */ @@ -1578,8 +1564,8 @@ static void collapse_shmem(struct mm_struct *mm, slot, page); slot = radix_tree_iter_resume(slot, &iter); spin_unlock_irq(&mapping->tree_lock); - putback_lru_page(page); unlock_page(page); + putback_lru_page(page); spin_lock_irq(&mapping->tree_lock); } VM_BUG_ON(nr_none); @@ -1588,9 +1574,10 @@ static void collapse_shmem(struct mm_struct *mm, /* Unfreeze new_page, caller would take care about freeing it */ page_ref_unfreeze(new_page, 1); mem_cgroup_cancel_charge(new_page, memcg, true); - unlock_page(new_page); new_page->mapping = NULL; } + + unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); /* TODO: tracepoints */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 87c460a0bded56195b5eb497d44709777ef7b415 upstream.
khugepaged's collapse_shmem() does almost all of its work, to assemble the huge new_page from 512 scattered old pages, with the new_page's refcount frozen to 0 (and refcounts of all old pages so far also frozen to 0). Including shmem_getpage() to read in any which were out on swap, memory reclaim if necessary to allocate their intermediate pages, and copying over all the data from old to new.
Imagine the frozen refcount as a spinlock held, but without any lock debugging to highlight the abuse: it's not good, and under serious load heads into lockups - speculative getters of the page are not expecting to spin while khugepaged is rescheduled.
One can get a little further under load by hacking around elsewhere; but fortunately, freezing the new_page turns out to have been entirely unnecessary, with no hacks needed elsewhere.
The huge new_page lock is already held throughout, and guards all its subpages as they are brought one by one into the page cache tree; and anything reading the data in that page, without the lock, before it has been marked PageUptodate, would already be in the wrong. So simply eliminate the freezing of the new_page.
Each of the old pages remains frozen with refcount 0 after it has been replaced by a new_page subpage in the page cache tree, until they are all unfrozen on success or failure: just as before. They could be unfrozen sooner, but cause no problem once no longer visible to find_get_entry(), filemap_map_pages() and other speculative lookups.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261527570.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins hughd@google.com Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0eac4344477a..45bf0e5775f7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1288,7 +1288,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * collapse_shmem - collapse small tmpfs/shmem pages into huge one. * * Basic scheme is simple, details are more complex: - * - allocate and freeze a new huge page; + * - allocate and lock a new huge page; * - scan over radix tree replacing old pages the new one * + swap in pages if necessary; * + fill in gaps; @@ -1296,11 +1296,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * - if replacing succeed: * + copy data over; * + free old pages; - * + unfreeze huge page; + * + unlock huge page; * - if replacing failed; * + put all pages back and unfreeze them; * + restore gaps in the radix-tree; - * + free huge page; + * + unlock and free huge page; */ static void collapse_shmem(struct mm_struct *mm, struct address_space *mapping, pgoff_t start, @@ -1337,13 +1337,11 @@ static void collapse_shmem(struct mm_struct *mm, __SetPageSwapBacked(new_page); new_page->index = start; new_page->mapping = mapping; - BUG_ON(!page_ref_freeze(new_page, 1));
/* - * At this point the new_page is 'frozen' (page_count() is zero), locked - * and not up-to-date. It's safe to insert it into radix tree, because - * nobody would be able to map it or use it in other way until we - * unfreeze it. + * At this point the new_page is locked and not up-to-date. + * It's safe to insert it into the page cache, because nobody would + * be able to map it or use it in another way until we unlock it. */
index = start; @@ -1521,9 +1519,8 @@ static void collapse_shmem(struct mm_struct *mm, index++; }
- /* Everything is ready, let's unfreeze the new_page */ SetPageUptodate(new_page); - page_ref_unfreeze(new_page, HPAGE_PMD_NR); + page_ref_add(new_page, HPAGE_PMD_NR - 1); set_page_dirty(new_page); mem_cgroup_commit_charge(new_page, memcg, false, true); lru_cache_add_anon(new_page); @@ -1571,8 +1568,6 @@ static void collapse_shmem(struct mm_struct *mm, VM_BUG_ON(nr_none); spin_unlock_irq(&mapping->tree_lock);
- /* Unfreeze new_page, caller would take care about freeing it */ - page_ref_unfreeze(new_page, 1); mem_cgroup_cancel_charge(new_page, memcg, true); new_page->mapping = NULL; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 06a5e1268a5fb9c2b346a3da6b97e85f2eba0f07 upstream.
collapse_shmem()'s VM_BUG_ON_PAGE(PageTransCompound) was unsafe: before it holds page lock of the first page, racing truncation then extension might conceivably have inserted a hugepage there already. Fail with the SCAN_PAGE_COMPOUND result, instead of crashing (CONFIG_DEBUG_VM=y) or otherwise mishandling the unexpected hugepage - though later we might code up a more constructive way of handling it, with SCAN_SUCCESS.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1811261529310.2275@eggly.anvils Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages") Signed-off-by: Hugh Dickins hughd@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Jerome Glisse jglisse@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org [4.8+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/khugepaged.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 45bf0e5775f7..d27a73737f1a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1401,7 +1401,15 @@ static void collapse_shmem(struct mm_struct *mm, */ VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(!PageUptodate(page), page); - VM_BUG_ON_PAGE(PageTransCompound(page), page); + + /* + * If file was truncated then extended, or hole-punched, before + * we locked the first page, then a THP might be there already. + */ + if (PageTransCompound(page)) { + result = SCAN_PAGE_COMPOUND; + goto out_unlock; + }
if (page_mapping(page) != mapping) { result = SCAN_TRUNCATED;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
[ Upstream commit 910b0797fa9e8af09c44a3fa36cb310ba7a7218d ]
Fix bug by moving the i2c_unregister_device calls after deregistration of dvb frontend.
The new style i2c drivers already destroys the frontend object at i2c_unregister_device time. When the dvb frontend is unregistered afterwards it leads to this oops:
[ 6058.866459] BUG: unable to handle kernel NULL pointer dereference at 00000000000001f8 [ 6058.866578] IP: dvb_frontend_stop+0x30/0xd0 [dvb_core] [ 6058.866644] PGD 0 [ 6058.866646] P4D 0
[ 6058.866726] Oops: 0000 [#1] SMP [ 6058.866768] Modules linked in: rc_pinnacle_pctv_hd(O) em28xx_rc(O) si2157(O) si2168(O) em28xx_dvb(O) em28xx(O) si2165(O) a8293(O) tda10071(O) tea5767(O) tuner(O) cx23885(O) tda18271(O) videobuf2_dvb(O) videobuf2_dma_sg(O) m88ds3103(O) tveeprom(O) cx2341x(O) v4l2_common(O) dvb_core(O) rc_core(O) videobuf2_memops(O) videobuf2_v4l2(O) videobuf2_core(O) videodev(O) media(O) bluetooth ecdh_generic ums_realtek uas rtl8192cu rtl_usb rtl8192c_common rtlwifi usb_storage snd_hda_codec_realtek snd_hda_codec_hdmi snd_hda_codec_generic i2c_mux snd_hda_intel snd_hda_codec snd_hwdep x86_pkg_temp_thermal snd_hda_core kvm_intel kvm irqbypass [last unloaded: videobuf2_memops] [ 6058.867497] CPU: 2 PID: 7349 Comm: kworker/2:0 Tainted: G W O 4.13.9-gentoo #1 [ 6058.867595] Hardware name: MEDION E2050 2391/H81H3-EM2, BIOS H81EM2W08.308 08/25/2014 [ 6058.867692] Workqueue: usb_hub_wq hub_event [ 6058.867746] task: ffff88011a15e040 task.stack: ffffc90003074000 [ 6058.867825] RIP: 0010:dvb_frontend_stop+0x30/0xd0 [dvb_core] [ 6058.867896] RSP: 0018:ffffc90003077b58 EFLAGS: 00010293 [ 6058.867964] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 000000010040001f [ 6058.868056] RDX: ffff88011a15e040 RSI: ffffea000464e400 RDI: ffff88001cbe3028 [ 6058.868150] RBP: ffffc90003077b68 R08: ffff880119390380 R09: 000000010040001f [ 6058.868241] R10: ffffc90003077b18 R11: 000000000001e200 R12: ffff88001cbe3028 [ 6058.868330] R13: ffff88001cbe68d0 R14: ffff8800cf734000 R15: ffff8800cf734098 [ 6058.868419] FS: 0000000000000000(0000) GS:ffff88011fb00000(0000) knlGS:0000000000000000 [ 6058.868511] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6058.868578] CR2: 00000000000001f8 CR3: 00000001113c5000 CR4: 00000000001406e0 [ 6058.868662] Call Trace: [ 6058.868705] dvb_unregister_frontend+0x2a/0x80 [dvb_core] [ 6058.868774] em28xx_dvb_fini+0x132/0x220 [em28xx_dvb] [ 6058.868840] em28xx_close_extension+0x34/0x90 [em28xx] [ 6058.868902] em28xx_usb_disconnect+0x4e/0x70 [em28xx] [ 6058.868968] usb_unbind_interface+0x6d/0x260 [ 6058.869025] device_release_driver_internal+0x150/0x210 [ 6058.869094] device_release_driver+0xd/0x10 [ 6058.869150] bus_remove_device+0xe4/0x160 [ 6058.869204] device_del+0x1ce/0x2f0 [ 6058.869253] usb_disable_device+0x99/0x270 [ 6058.869306] usb_disconnect+0x8d/0x260 [ 6058.869359] hub_event+0x93d/0x1520 [ 6058.869408] ? dequeue_task_fair+0xae5/0xd20 [ 6058.869467] process_one_work+0x1d9/0x3e0 [ 6058.869522] worker_thread+0x43/0x3e0 [ 6058.869576] kthread+0x104/0x140 [ 6058.869602] ? trace_event_raw_event_workqueue_work+0x80/0x80 [ 6058.869640] ? kthread_create_on_node+0x40/0x40 [ 6058.869673] ret_from_fork+0x22/0x30 [ 6058.869698] Code: 54 49 89 fc 53 48 8b 9f 18 03 00 00 0f 1f 44 00 00 41 83 bc 24 04 05 00 00 02 74 0c 41 c7 84 24 04 05 00 00 01 00 00 00 0f ae f0 <48> 8b bb f8 01 00 00 48 85 ff 74 5c e8 df 40 f0 e0 48 8b 93 f8 [ 6058.869850] RIP: dvb_frontend_stop+0x30/0xd0 [dvb_core] RSP: ffffc90003077b58 [ 6058.869894] CR2: 00000000000001f8 [ 6058.875880] ---[ end trace 717eecf7193b3fc6 ]---
Signed-off-by: Matthias Schwarzott zzam@gentoo.org Signed-off-by: Mauro Carvalho Chehab mchehab@s-opensource.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/usb/em28xx/em28xx-dvb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/usb/em28xx/em28xx-dvb.c b/drivers/media/usb/em28xx/em28xx-dvb.c index 4a7db623fe29..29cdaaf1ed90 100644 --- a/drivers/media/usb/em28xx/em28xx-dvb.c +++ b/drivers/media/usb/em28xx/em28xx-dvb.c @@ -2105,6 +2105,8 @@ static int em28xx_dvb_fini(struct em28xx *dev) } }
+ em28xx_unregister_dvb(dvb); + /* remove I2C SEC */ client = dvb->i2c_client_sec; if (client) { @@ -2126,7 +2128,6 @@ static int em28xx_dvb_fini(struct em28xx *dev) i2c_unregister_device(client); }
- em28xx_unregister_dvb(dvb); kfree(dvb); dev->dvb = NULL; kref_put(&dev->ref, em28xx_free_device);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 25677478474a91fa1b46f19a4a591a9848bca6fb upstream
We cannot do it last, otherwithse it will be skipped for dynamic volumes.
Reported-by: Lachmann, Juergen juergen.lachmann@harman.com Fixes: 34653fd8c46e ("ubi: fastmap: Check each mapping only once") Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/ubi/vtbl.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/drivers/mtd/ubi/vtbl.c b/drivers/mtd/ubi/vtbl.c index 94d7a865b135..7504f430c011 100644 --- a/drivers/mtd/ubi/vtbl.c +++ b/drivers/mtd/ubi/vtbl.c @@ -578,6 +578,16 @@ static int init_volumes(struct ubi_device *ubi, vol->ubi = ubi; reserved_pebs += vol->reserved_pebs;
+ /* + * We use ubi->peb_count and not vol->reserved_pebs because + * we want to keep the code simple. Otherwise we'd have to + * resize/check the bitmap upon volume resize too. + * Allocating a few bytes more does not hurt. + */ + err = ubi_fastmap_init_checkmap(vol, ubi->peb_count); + if (err) + return err; + /* * In case of dynamic volume UBI knows nothing about how many * data is stored there. So assume the whole volume is used. @@ -620,16 +630,6 @@ static int init_volumes(struct ubi_device *ubi, (long long)(vol->used_ebs - 1) * vol->usable_leb_size; vol->used_bytes += av->last_data_size; vol->last_eb_bytes = av->last_data_size; - - /* - * We use ubi->peb_count and not vol->reserved_pebs because - * we want to keep the code simple. Otherwise we'd have to - * resize/check the bitmap upon volume resize too. - * Allocating a few bytes more does not hurt. - */ - err = ubi_fastmap_init_checkmap(vol, ubi->peb_count); - if (err) - return err; }
/* And add the layout volume */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 262614c4294d33b1f19e0d18c0091d9c329b544a upstream.
We already copy authorizer_reply_buf and authorizer_reply_buf_len into ceph_connection. Factoring out __prepare_write_connect() requires two more: authorizer_buf and authorizer_buf_len. Store the pointer to the handshake in con->auth rather than piling on.
Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ceph/messenger.h | 3 +- net/ceph/messenger.c | 54 ++++++++++++++++------------------ 2 files changed, 27 insertions(+), 30 deletions(-)
diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index ead9d85f1c11..9056077c023f 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -203,9 +203,8 @@ struct ceph_connection { attempt for this connection, client */ u32 peer_global_seq; /* peer's global seq for this connection */
+ struct ceph_auth_handshake *auth; int auth_retry; /* true if we need a newer authorizer */ - void *auth_reply_buf; /* where to put the authorizer reply */ - int auth_reply_buf_len;
struct mutex mutex;
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 5281da82371a..3a82e6d2864b 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1411,24 +1411,26 @@ static void prepare_write_keepalive(struct ceph_connection *con) * Connection negotiation. */
-static struct ceph_auth_handshake *get_connect_authorizer(struct ceph_connection *con, - int *auth_proto) +static int get_connect_authorizer(struct ceph_connection *con) { struct ceph_auth_handshake *auth; + int auth_proto;
if (!con->ops->get_authorizer) { + con->auth = NULL; con->out_connect.authorizer_protocol = CEPH_AUTH_UNKNOWN; con->out_connect.authorizer_len = 0; - return NULL; + return 0; }
- auth = con->ops->get_authorizer(con, auth_proto, con->auth_retry); + auth = con->ops->get_authorizer(con, &auth_proto, con->auth_retry); if (IS_ERR(auth)) - return auth; + return PTR_ERR(auth);
- con->auth_reply_buf = auth->authorizer_reply_buf; - con->auth_reply_buf_len = auth->authorizer_reply_buf_len; - return auth; + con->auth = auth; + con->out_connect.authorizer_protocol = cpu_to_le32(auth_proto); + con->out_connect.authorizer_len = cpu_to_le32(auth->authorizer_buf_len); + return 0; }
/* @@ -1448,8 +1450,7 @@ static int prepare_write_connect(struct ceph_connection *con) { unsigned int global_seq = get_global_seq(con->msgr, 0); int proto; - int auth_proto; - struct ceph_auth_handshake *auth; + int ret;
switch (con->peer_name.type) { case CEPH_ENTITY_TYPE_MON: @@ -1476,20 +1477,15 @@ static int prepare_write_connect(struct ceph_connection *con) con->out_connect.protocol_version = cpu_to_le32(proto); con->out_connect.flags = 0;
- auth_proto = CEPH_AUTH_UNKNOWN; - auth = get_connect_authorizer(con, &auth_proto); - if (IS_ERR(auth)) - return PTR_ERR(auth); - - con->out_connect.authorizer_protocol = cpu_to_le32(auth_proto); - con->out_connect.authorizer_len = auth ? - cpu_to_le32(auth->authorizer_buf_len) : 0; + ret = get_connect_authorizer(con); + if (ret) + return ret;
con_out_kvec_add(con, sizeof (con->out_connect), &con->out_connect); - if (auth && auth->authorizer_buf_len) - con_out_kvec_add(con, auth->authorizer_buf_len, - auth->authorizer_buf); + if (con->auth) + con_out_kvec_add(con, con->auth->authorizer_buf_len, + con->auth->authorizer_buf);
con->out_more = 0; con_flag_set(con, CON_FLAG_WRITE_PENDING); @@ -1753,11 +1749,14 @@ static int read_partial_connect(struct ceph_connection *con) if (ret <= 0) goto out;
- size = le32_to_cpu(con->in_reply.authorizer_len); - end += size; - ret = read_partial(con, end, size, con->auth_reply_buf); - if (ret <= 0) - goto out; + if (con->auth) { + size = le32_to_cpu(con->in_reply.authorizer_len); + end += size; + ret = read_partial(con, end, size, + con->auth->authorizer_reply_buf); + if (ret <= 0) + goto out; + }
dout("read_partial_connect %p tag %d, con_seq = %u, g_seq = %u\n", con, (int)con->in_reply.tag, @@ -1765,7 +1764,6 @@ static int read_partial_connect(struct ceph_connection *con) le32_to_cpu(con->in_reply.global_seq)); out: return ret; - }
/* @@ -2048,7 +2046,7 @@ static int process_connect(struct ceph_connection *con)
dout("process_connect on %p tag %d\n", con, (int)con->in_tag);
- if (con->auth_reply_buf) { + if (con->auth) { /* * Any connection that defines ->get_authorizer() * should also define ->verify_authorizer_reply().
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit c0f56b483aa09c99bfe97409a43ad786f33b8a5a upstream.
Will be used for sending ceph_msg_connect with an updated authorizer, after the server challenges the initial authorizer.
Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/ceph/messenger.c | 21 ++++++++++++--------- 1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 3a82e6d2864b..0b121327d32f 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1446,6 +1446,17 @@ static void prepare_write_banner(struct ceph_connection *con) con_flag_set(con, CON_FLAG_WRITE_PENDING); }
+static void __prepare_write_connect(struct ceph_connection *con) +{ + con_out_kvec_add(con, sizeof(con->out_connect), &con->out_connect); + if (con->auth) + con_out_kvec_add(con, con->auth->authorizer_buf_len, + con->auth->authorizer_buf); + + con->out_more = 0; + con_flag_set(con, CON_FLAG_WRITE_PENDING); +} + static int prepare_write_connect(struct ceph_connection *con) { unsigned int global_seq = get_global_seq(con->msgr, 0); @@ -1481,15 +1492,7 @@ static int prepare_write_connect(struct ceph_connection *con) if (ret) return ret;
- con_out_kvec_add(con, sizeof (con->out_connect), - &con->out_connect); - if (con->auth) - con_out_kvec_add(con, con->auth->authorizer_buf_len, - con->auth->authorizer_buf); - - con->out_more = 0; - con_flag_set(con, CON_FLAG_WRITE_PENDING); - + __prepare_write_connect(con); return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit c571fe24d243bfe7017f0e67fe800b3cc2a1d1f7 upstream.
Will be used for decrypting the server challenge which is only preceded by ceph_x_encrypt_header.
Drop struct_v check to allow for extending ceph_x_encrypt_header in the future.
Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/ceph/auth_x.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c index 2f4a1baf5f52..9cac05239346 100644 --- a/net/ceph/auth_x.c +++ b/net/ceph/auth_x.c @@ -70,25 +70,40 @@ static int ceph_x_encrypt(struct ceph_crypto_key *secret, void *buf, return sizeof(u32) + ciphertext_len; }
+static int __ceph_x_decrypt(struct ceph_crypto_key *secret, void *p, + int ciphertext_len) +{ + struct ceph_x_encrypt_header *hdr = p; + int plaintext_len; + int ret; + + ret = ceph_crypt(secret, false, p, ciphertext_len, ciphertext_len, + &plaintext_len); + if (ret) + return ret; + + if (le64_to_cpu(hdr->magic) != CEPHX_ENC_MAGIC) { + pr_err("%s bad magic\n", __func__); + return -EINVAL; + } + + return plaintext_len - sizeof(*hdr); +} + static int ceph_x_decrypt(struct ceph_crypto_key *secret, void **p, void *end) { - struct ceph_x_encrypt_header *hdr = *p + sizeof(u32); - int ciphertext_len, plaintext_len; + int ciphertext_len; int ret;
ceph_decode_32_safe(p, end, ciphertext_len, e_inval); ceph_decode_need(p, end, ciphertext_len, e_inval);
- ret = ceph_crypt(secret, false, *p, end - *p, ciphertext_len, - &plaintext_len); - if (ret) + ret = __ceph_x_decrypt(secret, *p, ciphertext_len); + if (ret < 0) return ret;
- if (hdr->struct_v != 1 || le64_to_cpu(hdr->magic) != CEPHX_ENC_MAGIC) - return -EPERM; - *p += ciphertext_len; - return plaintext_len - sizeof(struct ceph_x_encrypt_header); + return ret;
e_inval: return -EINVAL;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 149cac4a50b0b4081b38b2f38de6ef71c27eaa85 upstream.
Will be used for encrypting both the initial and updated authorizers.
Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/ceph/auth_x.c | 49 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 36 insertions(+), 13 deletions(-)
diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c index 9cac05239346..722791f45b2a 100644 --- a/net/ceph/auth_x.c +++ b/net/ceph/auth_x.c @@ -290,6 +290,38 @@ bad: return -EINVAL; }
+/* + * Encode and encrypt the second part (ceph_x_authorize_b) of the + * authorizer. The first part (ceph_x_authorize_a) should already be + * encoded. + */ +static int encrypt_authorizer(struct ceph_x_authorizer *au) +{ + struct ceph_x_authorize_a *msg_a; + struct ceph_x_authorize_b *msg_b; + void *p, *end; + int ret; + + msg_a = au->buf->vec.iov_base; + WARN_ON(msg_a->ticket_blob.secret_id != cpu_to_le64(au->secret_id)); + p = (void *)(msg_a + 1) + le32_to_cpu(msg_a->ticket_blob.blob_len); + end = au->buf->vec.iov_base + au->buf->vec.iov_len; + + msg_b = p + ceph_x_encrypt_offset(); + msg_b->struct_v = 1; + msg_b->nonce = cpu_to_le64(au->nonce); + + ret = ceph_x_encrypt(&au->session_key, p, end - p, sizeof(*msg_b)); + if (ret < 0) + return ret; + + p += ret; + WARN_ON(p > end); + au->buf->vec.iov_len = p - au->buf->vec.iov_base; + + return 0; +} + static void ceph_x_authorizer_cleanup(struct ceph_x_authorizer *au) { ceph_crypto_key_destroy(&au->session_key); @@ -306,7 +338,6 @@ static int ceph_x_build_authorizer(struct ceph_auth_client *ac, int maxlen; struct ceph_x_authorize_a *msg_a; struct ceph_x_authorize_b *msg_b; - void *p, *end; int ret; int ticket_blob_len = (th->ticket_blob ? th->ticket_blob->vec.iov_len : 0); @@ -350,21 +381,13 @@ static int ceph_x_build_authorizer(struct ceph_auth_client *ac, dout(" th %p secret_id %lld %lld\n", th, th->secret_id, le64_to_cpu(msg_a->ticket_blob.secret_id));
- p = msg_a + 1; - p += ticket_blob_len; - end = au->buf->vec.iov_base + au->buf->vec.iov_len; - - msg_b = p + ceph_x_encrypt_offset(); - msg_b->struct_v = 1; get_random_bytes(&au->nonce, sizeof(au->nonce)); - msg_b->nonce = cpu_to_le64(au->nonce); - ret = ceph_x_encrypt(&au->session_key, p, end - p, sizeof(*msg_b)); - if (ret < 0) + ret = encrypt_authorizer(au); + if (ret) { + pr_err("failed to encrypt authorizer: %d", ret); goto out_au; + }
- p += ret; - WARN_ON(p > end); - au->buf->vec.iov_len = p - au->buf->vec.iov_base; dout(" built authorizer nonce %llx len %d\n", au->nonce, (int)au->buf->vec.iov_len); return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 6daca13d2e72bedaaacfc08f873114c9307d5aea upstream.
When a client authenticates with a service, an authorizer is sent with a nonce to the service (ceph_x_authorize_[ab]) and the service responds with a mutation of that nonce (ceph_x_authorize_reply). This lets the client verify the service is who it says it is but it doesn't protect against a replay: someone can trivially capture the exchange and reuse the same authorizer to authenticate themselves.
Allow the service to reject an initial authorizer with a random challenge (ceph_x_authorize_challenge). The client then has to respond with an updated authorizer proving they are able to decrypt the service's challenge and that the new authorizer was produced for this specific connection instance.
The accepting side requires this challenge and response unconditionally if the client side advertises they have CEPHX_V2 feature bit.
This addresses CVE-2018-1128.
Link: http://tracker.ceph.com/issues/24836 Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/mds_client.c | 11 ++++++ include/linux/ceph/auth.h | 8 ++++ include/linux/ceph/messenger.h | 3 ++ include/linux/ceph/msgr.h | 2 +- net/ceph/auth.c | 16 ++++++++ net/ceph/auth_x.c | 72 +++++++++++++++++++++++++++++++--- net/ceph/auth_x_protocol.h | 7 ++++ net/ceph/messenger.c | 17 +++++++- net/ceph/osd_client.c | 11 ++++++ 9 files changed, 140 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index bf378ddca4db..a48984dd6426 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -4079,6 +4079,16 @@ static struct ceph_auth_handshake *get_authorizer(struct ceph_connection *con, return auth; }
+static int add_authorizer_challenge(struct ceph_connection *con, + void *challenge_buf, int challenge_buf_len) +{ + struct ceph_mds_session *s = con->private; + struct ceph_mds_client *mdsc = s->s_mdsc; + struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth; + + return ceph_auth_add_authorizer_challenge(ac, s->s_auth.authorizer, + challenge_buf, challenge_buf_len); +}
static int verify_authorizer_reply(struct ceph_connection *con) { @@ -4142,6 +4152,7 @@ static const struct ceph_connection_operations mds_con_ops = { .put = con_put, .dispatch = dispatch, .get_authorizer = get_authorizer, + .add_authorizer_challenge = add_authorizer_challenge, .verify_authorizer_reply = verify_authorizer_reply, .invalidate_authorizer = invalidate_authorizer, .peer_reset = peer_reset, diff --git a/include/linux/ceph/auth.h b/include/linux/ceph/auth.h index e931da8424a4..6728c2ee0205 100644 --- a/include/linux/ceph/auth.h +++ b/include/linux/ceph/auth.h @@ -64,6 +64,10 @@ struct ceph_auth_client_ops { /* ensure that an existing authorizer is up to date */ int (*update_authorizer)(struct ceph_auth_client *ac, int peer_type, struct ceph_auth_handshake *auth); + int (*add_authorizer_challenge)(struct ceph_auth_client *ac, + struct ceph_authorizer *a, + void *challenge_buf, + int challenge_buf_len); int (*verify_authorizer_reply)(struct ceph_auth_client *ac, struct ceph_authorizer *a); void (*invalidate_authorizer)(struct ceph_auth_client *ac, @@ -118,6 +122,10 @@ void ceph_auth_destroy_authorizer(struct ceph_authorizer *a); extern int ceph_auth_update_authorizer(struct ceph_auth_client *ac, int peer_type, struct ceph_auth_handshake *a); +int ceph_auth_add_authorizer_challenge(struct ceph_auth_client *ac, + struct ceph_authorizer *a, + void *challenge_buf, + int challenge_buf_len); extern int ceph_auth_verify_authorizer_reply(struct ceph_auth_client *ac, struct ceph_authorizer *a); extern void ceph_auth_invalidate_authorizer(struct ceph_auth_client *ac, diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 9056077c023f..18fbe910ed55 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -31,6 +31,9 @@ struct ceph_connection_operations { struct ceph_auth_handshake *(*get_authorizer) ( struct ceph_connection *con, int *proto, int force_new); + int (*add_authorizer_challenge)(struct ceph_connection *con, + void *challenge_buf, + int challenge_buf_len); int (*verify_authorizer_reply) (struct ceph_connection *con); int (*invalidate_authorizer)(struct ceph_connection *con);
diff --git a/include/linux/ceph/msgr.h b/include/linux/ceph/msgr.h index 73ae2a926548..9e50aede46c8 100644 --- a/include/linux/ceph/msgr.h +++ b/include/linux/ceph/msgr.h @@ -91,7 +91,7 @@ struct ceph_entity_inst { #define CEPH_MSGR_TAG_SEQ 13 /* 64-bit int follows with seen seq number */ #define CEPH_MSGR_TAG_KEEPALIVE2 14 /* keepalive2 byte + ceph_timespec */ #define CEPH_MSGR_TAG_KEEPALIVE2_ACK 15 /* keepalive2 reply */ - +#define CEPH_MSGR_TAG_CHALLENGE_AUTHORIZER 16 /* cephx v2 doing server challenge */
/* * connection negotiation diff --git a/net/ceph/auth.c b/net/ceph/auth.c index dbde2b3c3c15..fbeee068ea14 100644 --- a/net/ceph/auth.c +++ b/net/ceph/auth.c @@ -315,6 +315,22 @@ int ceph_auth_update_authorizer(struct ceph_auth_client *ac, } EXPORT_SYMBOL(ceph_auth_update_authorizer);
+int ceph_auth_add_authorizer_challenge(struct ceph_auth_client *ac, + struct ceph_authorizer *a, + void *challenge_buf, + int challenge_buf_len) +{ + int ret = 0; + + mutex_lock(&ac->mutex); + if (ac->ops && ac->ops->add_authorizer_challenge) + ret = ac->ops->add_authorizer_challenge(ac, a, challenge_buf, + challenge_buf_len); + mutex_unlock(&ac->mutex); + return ret; +} +EXPORT_SYMBOL(ceph_auth_add_authorizer_challenge); + int ceph_auth_verify_authorizer_reply(struct ceph_auth_client *ac, struct ceph_authorizer *a) { diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c index 722791f45b2a..ce28bb07d8fd 100644 --- a/net/ceph/auth_x.c +++ b/net/ceph/auth_x.c @@ -295,7 +295,8 @@ bad: * authorizer. The first part (ceph_x_authorize_a) should already be * encoded. */ -static int encrypt_authorizer(struct ceph_x_authorizer *au) +static int encrypt_authorizer(struct ceph_x_authorizer *au, + u64 *server_challenge) { struct ceph_x_authorize_a *msg_a; struct ceph_x_authorize_b *msg_b; @@ -308,16 +309,28 @@ static int encrypt_authorizer(struct ceph_x_authorizer *au) end = au->buf->vec.iov_base + au->buf->vec.iov_len;
msg_b = p + ceph_x_encrypt_offset(); - msg_b->struct_v = 1; + msg_b->struct_v = 2; msg_b->nonce = cpu_to_le64(au->nonce); + if (server_challenge) { + msg_b->have_challenge = 1; + msg_b->server_challenge_plus_one = + cpu_to_le64(*server_challenge + 1); + } else { + msg_b->have_challenge = 0; + msg_b->server_challenge_plus_one = 0; + }
ret = ceph_x_encrypt(&au->session_key, p, end - p, sizeof(*msg_b)); if (ret < 0) return ret;
p += ret; - WARN_ON(p > end); - au->buf->vec.iov_len = p - au->buf->vec.iov_base; + if (server_challenge) { + WARN_ON(p != end); + } else { + WARN_ON(p > end); + au->buf->vec.iov_len = p - au->buf->vec.iov_base; + }
return 0; } @@ -382,7 +395,7 @@ static int ceph_x_build_authorizer(struct ceph_auth_client *ac, le64_to_cpu(msg_a->ticket_blob.secret_id));
get_random_bytes(&au->nonce, sizeof(au->nonce)); - ret = encrypt_authorizer(au); + ret = encrypt_authorizer(au, NULL); if (ret) { pr_err("failed to encrypt authorizer: %d", ret); goto out_au; @@ -664,6 +677,54 @@ static int ceph_x_update_authorizer( return 0; }
+static int decrypt_authorize_challenge(struct ceph_x_authorizer *au, + void *challenge_buf, + int challenge_buf_len, + u64 *server_challenge) +{ + struct ceph_x_authorize_challenge *ch = + challenge_buf + sizeof(struct ceph_x_encrypt_header); + int ret; + + /* no leading len */ + ret = __ceph_x_decrypt(&au->session_key, challenge_buf, + challenge_buf_len); + if (ret < 0) + return ret; + if (ret < sizeof(*ch)) { + pr_err("bad size %d for ceph_x_authorize_challenge\n", ret); + return -EINVAL; + } + + *server_challenge = le64_to_cpu(ch->server_challenge); + return 0; +} + +static int ceph_x_add_authorizer_challenge(struct ceph_auth_client *ac, + struct ceph_authorizer *a, + void *challenge_buf, + int challenge_buf_len) +{ + struct ceph_x_authorizer *au = (void *)a; + u64 server_challenge; + int ret; + + ret = decrypt_authorize_challenge(au, challenge_buf, challenge_buf_len, + &server_challenge); + if (ret) { + pr_err("failed to decrypt authorize challenge: %d", ret); + return ret; + } + + ret = encrypt_authorizer(au, &server_challenge); + if (ret) { + pr_err("failed to encrypt authorizer w/ challenge: %d", ret); + return ret; + } + + return 0; +} + static int ceph_x_verify_authorizer_reply(struct ceph_auth_client *ac, struct ceph_authorizer *a) { @@ -816,6 +877,7 @@ static const struct ceph_auth_client_ops ceph_x_ops = { .handle_reply = ceph_x_handle_reply, .create_authorizer = ceph_x_create_authorizer, .update_authorizer = ceph_x_update_authorizer, + .add_authorizer_challenge = ceph_x_add_authorizer_challenge, .verify_authorizer_reply = ceph_x_verify_authorizer_reply, .invalidate_authorizer = ceph_x_invalidate_authorizer, .reset = ceph_x_reset, diff --git a/net/ceph/auth_x_protocol.h b/net/ceph/auth_x_protocol.h index 32c13d763b9a..24b0b74564d0 100644 --- a/net/ceph/auth_x_protocol.h +++ b/net/ceph/auth_x_protocol.h @@ -70,6 +70,13 @@ struct ceph_x_authorize_a { struct ceph_x_authorize_b { __u8 struct_v; __le64 nonce; + __u8 have_challenge; + __le64 server_challenge_plus_one; +} __attribute__ ((packed)); + +struct ceph_x_authorize_challenge { + __u8 struct_v; + __le64 server_challenge; } __attribute__ ((packed));
struct ceph_x_authorize_reply { diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index 0b121327d32f..ad33baa2008d 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -2052,9 +2052,24 @@ static int process_connect(struct ceph_connection *con) if (con->auth) { /* * Any connection that defines ->get_authorizer() - * should also define ->verify_authorizer_reply(). + * should also define ->add_authorizer_challenge() and + * ->verify_authorizer_reply(). + * * See get_connect_authorizer(). */ + if (con->in_reply.tag == CEPH_MSGR_TAG_CHALLENGE_AUTHORIZER) { + ret = con->ops->add_authorizer_challenge( + con, con->auth->authorizer_reply_buf, + le32_to_cpu(con->in_reply.authorizer_len)); + if (ret < 0) + return ret; + + con_out_kvec_reset(con); + __prepare_write_connect(con); + prepare_read_connect(con); + return 0; + } + ret = con->ops->verify_authorizer_reply(con); if (ret < 0) { con->error_msg = "bad authorize reply"; diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c index 2814dba5902d..53ea2d48896c 100644 --- a/net/ceph/osd_client.c +++ b/net/ceph/osd_client.c @@ -5292,6 +5292,16 @@ static struct ceph_auth_handshake *get_authorizer(struct ceph_connection *con, return auth; }
+static int add_authorizer_challenge(struct ceph_connection *con, + void *challenge_buf, int challenge_buf_len) +{ + struct ceph_osd *o = con->private; + struct ceph_osd_client *osdc = o->o_osdc; + struct ceph_auth_client *ac = osdc->client->monc.auth; + + return ceph_auth_add_authorizer_challenge(ac, o->o_auth.authorizer, + challenge_buf, challenge_buf_len); +}
static int verify_authorizer_reply(struct ceph_connection *con) { @@ -5341,6 +5351,7 @@ static const struct ceph_connection_operations osd_con_ops = { .put = put_osd_con, .dispatch = dispatch, .get_authorizer = get_authorizer, + .add_authorizer_challenge = add_authorizer_challenge, .verify_authorizer_reply = verify_authorizer_reply, .invalidate_authorizer = invalidate_authorizer, .alloc_msg = alloc_msg,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit cc255c76c70f7a87d97939621eae04b600d9f4a1 upstream.
Derive the signature from the entire buffer (both AES cipher blocks) instead of using just the first half of the first block, leaving out data_crc entirely.
This addresses CVE-2018-1129.
Link: http://tracker.ceph.com/issues/24837 Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/ceph/ceph_features.h | 7 +-- net/ceph/auth_x.c | 73 +++++++++++++++++++++++------- 2 files changed, 60 insertions(+), 20 deletions(-)
diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h index 59042d5ac520..70f42eef813b 100644 --- a/include/linux/ceph/ceph_features.h +++ b/include/linux/ceph/ceph_features.h @@ -165,9 +165,9 @@ DEFINE_CEPH_FEATURE(58, 1, FS_FILE_LAYOUT_V2) // overlap DEFINE_CEPH_FEATURE(59, 1, FS_BTIME) DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap -DEFINE_CEPH_FEATURE(60, 1, BLKIN_TRACING) // *do not share this bit* +DEFINE_CEPH_FEATURE(60, 1, OSD_RECOVERY_DELETES) // *do not share this bit* +DEFINE_CEPH_FEATURE(61, 1, CEPHX_V2) // *do not share this bit*
-DEFINE_CEPH_FEATURE(61, 1, RESERVED2) // unused, but slow down! DEFINE_CEPH_FEATURE(62, 1, RESERVED) // do not use; used as a sentinal DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facing
@@ -209,7 +209,8 @@ DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facin CEPH_FEATURE_SERVER_JEWEL | \ CEPH_FEATURE_MON_STATEFUL_SUB | \ CEPH_FEATURE_CRUSH_TUNABLES5 | \ - CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING) + CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING | \ + CEPH_FEATURE_CEPHX_V2)
#define CEPH_FEATURES_REQUIRED_DEFAULT \ (CEPH_FEATURE_NOSRCADDR | \ diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c index ce28bb07d8fd..10eb759bbcb4 100644 --- a/net/ceph/auth_x.c +++ b/net/ceph/auth_x.c @@ -9,6 +9,7 @@
#include <linux/ceph/decode.h> #include <linux/ceph/auth.h> +#include <linux/ceph/ceph_features.h> #include <linux/ceph/libceph.h> #include <linux/ceph/messenger.h>
@@ -803,26 +804,64 @@ static int calc_signature(struct ceph_x_authorizer *au, struct ceph_msg *msg, __le64 *psig) { void *enc_buf = au->enc_buf; - struct { - __le32 len; - __le32 header_crc; - __le32 front_crc; - __le32 middle_crc; - __le32 data_crc; - } __packed *sigblock = enc_buf + ceph_x_encrypt_offset(); int ret;
- sigblock->len = cpu_to_le32(4*sizeof(u32)); - sigblock->header_crc = msg->hdr.crc; - sigblock->front_crc = msg->footer.front_crc; - sigblock->middle_crc = msg->footer.middle_crc; - sigblock->data_crc = msg->footer.data_crc; - ret = ceph_x_encrypt(&au->session_key, enc_buf, CEPHX_AU_ENC_BUF_LEN, - sizeof(*sigblock)); - if (ret < 0) - return ret; + if (!CEPH_HAVE_FEATURE(msg->con->peer_features, CEPHX_V2)) { + struct { + __le32 len; + __le32 header_crc; + __le32 front_crc; + __le32 middle_crc; + __le32 data_crc; + } __packed *sigblock = enc_buf + ceph_x_encrypt_offset(); + + sigblock->len = cpu_to_le32(4*sizeof(u32)); + sigblock->header_crc = msg->hdr.crc; + sigblock->front_crc = msg->footer.front_crc; + sigblock->middle_crc = msg->footer.middle_crc; + sigblock->data_crc = msg->footer.data_crc; + + ret = ceph_x_encrypt(&au->session_key, enc_buf, + CEPHX_AU_ENC_BUF_LEN, sizeof(*sigblock)); + if (ret < 0) + return ret; + + *psig = *(__le64 *)(enc_buf + sizeof(u32)); + } else { + struct { + __le32 header_crc; + __le32 front_crc; + __le32 front_len; + __le32 middle_crc; + __le32 middle_len; + __le32 data_crc; + __le32 data_len; + __le32 seq_lower_word; + } __packed *sigblock = enc_buf; + struct { + __le64 a, b, c, d; + } __packed *penc = enc_buf; + int ciphertext_len; + + sigblock->header_crc = msg->hdr.crc; + sigblock->front_crc = msg->footer.front_crc; + sigblock->front_len = msg->hdr.front_len; + sigblock->middle_crc = msg->footer.middle_crc; + sigblock->middle_len = msg->hdr.middle_len; + sigblock->data_crc = msg->footer.data_crc; + sigblock->data_len = msg->hdr.data_len; + sigblock->seq_lower_word = *(__le32 *)&msg->hdr.seq; + + /* no leading len, no ceph_x_encrypt_header */ + ret = ceph_crypt(&au->session_key, true, enc_buf, + CEPHX_AU_ENC_BUF_LEN, sizeof(*sigblock), + &ciphertext_len); + if (ret) + return ret; + + *psig = penc->a ^ penc->b ^ penc->c ^ penc->d; + }
- *psig = *(__le64 *)(enc_buf + sizeof(u32)); return 0; }
On Tue, Dec 4, 2018 at 12:01 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
commit cc255c76c70f7a87d97939621eae04b600d9f4a1 upstream.
Derive the signature from the entire buffer (both AES cipher blocks) instead of using just the first half of the first block, leaving out data_crc entirely.
This addresses CVE-2018-1129.
Link: http://tracker.ceph.com/issues/24837 Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/ceph/ceph_features.h | 7 +-- net/ceph/auth_x.c | 73 +++++++++++++++++++++++------- 2 files changed, 60 insertions(+), 20 deletions(-)
diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h index 59042d5ac520..70f42eef813b 100644 --- a/include/linux/ceph/ceph_features.h +++ b/include/linux/ceph/ceph_features.h @@ -165,9 +165,9 @@ DEFINE_CEPH_FEATURE(58, 1, FS_FILE_LAYOUT_V2) // overlap DEFINE_CEPH_FEATURE(59, 1, FS_BTIME) DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap -DEFINE_CEPH_FEATURE(60, 1, BLKIN_TRACING) // *do not share this bit* +DEFINE_CEPH_FEATURE(60, 1, OSD_RECOVERY_DELETES) // *do not share this bit* +DEFINE_CEPH_FEATURE(61, 1, CEPHX_V2) // *do not share this bit*
-DEFINE_CEPH_FEATURE(61, 1, RESERVED2) // unused, but slow down! DEFINE_CEPH_FEATURE(62, 1, RESERVED) // do not use; used as a sentinal DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facing
@@ -209,7 +209,8 @@ DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facin CEPH_FEATURE_SERVER_JEWEL | \ CEPH_FEATURE_MON_STATEFUL_SUB | \ CEPH_FEATURE_CRUSH_TUNABLES5 | \
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING)
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING | \
CEPH_FEATURE_CEPHX_V2)
#define CEPH_FEATURES_REQUIRED_DEFAULT \ (CEPH_FEATURE_NOSRCADDR | \ diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c index ce28bb07d8fd..10eb759bbcb4 100644 --- a/net/ceph/auth_x.c +++ b/net/ceph/auth_x.c @@ -9,6 +9,7 @@
#include <linux/ceph/decode.h> #include <linux/ceph/auth.h> +#include <linux/ceph/ceph_features.h> #include <linux/ceph/libceph.h> #include <linux/ceph/messenger.h>
@@ -803,26 +804,64 @@ static int calc_signature(struct ceph_x_authorizer *au, struct ceph_msg *msg, __le64 *psig) { void *enc_buf = au->enc_buf;
struct {
__le32 len;
__le32 header_crc;
__le32 front_crc;
__le32 middle_crc;
__le32 data_crc;
} __packed *sigblock = enc_buf + ceph_x_encrypt_offset(); int ret;
sigblock->len = cpu_to_le32(4*sizeof(u32));
sigblock->header_crc = msg->hdr.crc;
sigblock->front_crc = msg->footer.front_crc;
sigblock->middle_crc = msg->footer.middle_crc;
sigblock->data_crc = msg->footer.data_crc;
ret = ceph_x_encrypt(&au->session_key, enc_buf, CEPHX_AU_ENC_BUF_LEN,
sizeof(*sigblock));
if (ret < 0)
return ret;
if (!CEPH_HAVE_FEATURE(msg->con->peer_features, CEPHX_V2)) {
struct {
__le32 len;
__le32 header_crc;
__le32 front_crc;
__le32 middle_crc;
__le32 data_crc;
} __packed *sigblock = enc_buf + ceph_x_encrypt_offset();
sigblock->len = cpu_to_le32(4*sizeof(u32));
sigblock->header_crc = msg->hdr.crc;
sigblock->front_crc = msg->footer.front_crc;
sigblock->middle_crc = msg->footer.middle_crc;
sigblock->data_crc = msg->footer.data_crc;
ret = ceph_x_encrypt(&au->session_key, enc_buf,
CEPHX_AU_ENC_BUF_LEN, sizeof(*sigblock));
if (ret < 0)
return ret;
*psig = *(__le64 *)(enc_buf + sizeof(u32));
} else {
struct {
__le32 header_crc;
__le32 front_crc;
__le32 front_len;
__le32 middle_crc;
__le32 middle_len;
__le32 data_crc;
__le32 data_len;
__le32 seq_lower_word;
} __packed *sigblock = enc_buf;
struct {
__le64 a, b, c, d;
} __packed *penc = enc_buf;
int ciphertext_len;
sigblock->header_crc = msg->hdr.crc;
sigblock->front_crc = msg->footer.front_crc;
sigblock->front_len = msg->hdr.front_len;
sigblock->middle_crc = msg->footer.middle_crc;
sigblock->middle_len = msg->hdr.middle_len;
sigblock->data_crc = msg->footer.data_crc;
sigblock->data_len = msg->hdr.data_len;
sigblock->seq_lower_word = *(__le32 *)&msg->hdr.seq;
/* no leading len, no ceph_x_encrypt_header */
ret = ceph_crypt(&au->session_key, true, enc_buf,
CEPHX_AU_ENC_BUF_LEN, sizeof(*sigblock),
&ciphertext_len);
if (ret)
return ret;
*psig = penc->a ^ penc->b ^ penc->c ^ penc->d;
}
*psig = *(__le64 *)(enc_buf + sizeof(u32)); return 0;
}
Hi Greg,
I thought this series (patches 13 - 18) was dropped from the 4.14 queue. If it wasn't, you also need to pick up the following:
f1d10e046379 libceph: weaken sizeof check in ceph_x_verify_authorizer_reply() 130f52f2b203 libceph: check authorizer reply/challenge length before reading
See our discussion with Sasha:
https://www.spinics.net/lists/stable/msg272462.html
Thanks,
Ilya
On Tue, Dec 04, 2018 at 01:06:40PM +0100, Ilya Dryomov wrote:
On Tue, Dec 4, 2018 at 12:01 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
commit cc255c76c70f7a87d97939621eae04b600d9f4a1 upstream.
Derive the signature from the entire buffer (both AES cipher blocks) instead of using just the first half of the first block, leaving out data_crc entirely.
This addresses CVE-2018-1129.
Link: http://tracker.ceph.com/issues/24837 Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Sage Weil sage@redhat.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/ceph/ceph_features.h | 7 +-- net/ceph/auth_x.c | 73 +++++++++++++++++++++++------- 2 files changed, 60 insertions(+), 20 deletions(-)
diff --git a/include/linux/ceph/ceph_features.h b/include/linux/ceph/ceph_features.h index 59042d5ac520..70f42eef813b 100644 --- a/include/linux/ceph/ceph_features.h +++ b/include/linux/ceph/ceph_features.h @@ -165,9 +165,9 @@ DEFINE_CEPH_FEATURE(58, 1, FS_FILE_LAYOUT_V2) // overlap DEFINE_CEPH_FEATURE(59, 1, FS_BTIME) DEFINE_CEPH_FEATURE(59, 1, FS_CHANGE_ATTR) // overlap DEFINE_CEPH_FEATURE(59, 1, MSG_ADDR2) // overlap -DEFINE_CEPH_FEATURE(60, 1, BLKIN_TRACING) // *do not share this bit* +DEFINE_CEPH_FEATURE(60, 1, OSD_RECOVERY_DELETES) // *do not share this bit* +DEFINE_CEPH_FEATURE(61, 1, CEPHX_V2) // *do not share this bit*
-DEFINE_CEPH_FEATURE(61, 1, RESERVED2) // unused, but slow down! DEFINE_CEPH_FEATURE(62, 1, RESERVED) // do not use; used as a sentinal DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facing
@@ -209,7 +209,8 @@ DEFINE_CEPH_FEATURE_DEPRECATED(63, 1, RESERVED_BROKEN, LUMINOUS) // client-facin CEPH_FEATURE_SERVER_JEWEL | \ CEPH_FEATURE_MON_STATEFUL_SUB | \ CEPH_FEATURE_CRUSH_TUNABLES5 | \
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING)
CEPH_FEATURE_NEW_OSDOPREPLY_ENCODING | \
CEPH_FEATURE_CEPHX_V2)
#define CEPH_FEATURES_REQUIRED_DEFAULT \ (CEPH_FEATURE_NOSRCADDR | \ diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c index ce28bb07d8fd..10eb759bbcb4 100644 --- a/net/ceph/auth_x.c +++ b/net/ceph/auth_x.c @@ -9,6 +9,7 @@
#include <linux/ceph/decode.h> #include <linux/ceph/auth.h> +#include <linux/ceph/ceph_features.h> #include <linux/ceph/libceph.h> #include <linux/ceph/messenger.h>
@@ -803,26 +804,64 @@ static int calc_signature(struct ceph_x_authorizer *au, struct ceph_msg *msg, __le64 *psig) { void *enc_buf = au->enc_buf;
struct {
__le32 len;
__le32 header_crc;
__le32 front_crc;
__le32 middle_crc;
__le32 data_crc;
} __packed *sigblock = enc_buf + ceph_x_encrypt_offset(); int ret;
sigblock->len = cpu_to_le32(4*sizeof(u32));
sigblock->header_crc = msg->hdr.crc;
sigblock->front_crc = msg->footer.front_crc;
sigblock->middle_crc = msg->footer.middle_crc;
sigblock->data_crc = msg->footer.data_crc;
ret = ceph_x_encrypt(&au->session_key, enc_buf, CEPHX_AU_ENC_BUF_LEN,
sizeof(*sigblock));
if (ret < 0)
return ret;
if (!CEPH_HAVE_FEATURE(msg->con->peer_features, CEPHX_V2)) {
struct {
__le32 len;
__le32 header_crc;
__le32 front_crc;
__le32 middle_crc;
__le32 data_crc;
} __packed *sigblock = enc_buf + ceph_x_encrypt_offset();
sigblock->len = cpu_to_le32(4*sizeof(u32));
sigblock->header_crc = msg->hdr.crc;
sigblock->front_crc = msg->footer.front_crc;
sigblock->middle_crc = msg->footer.middle_crc;
sigblock->data_crc = msg->footer.data_crc;
ret = ceph_x_encrypt(&au->session_key, enc_buf,
CEPHX_AU_ENC_BUF_LEN, sizeof(*sigblock));
if (ret < 0)
return ret;
*psig = *(__le64 *)(enc_buf + sizeof(u32));
} else {
struct {
__le32 header_crc;
__le32 front_crc;
__le32 front_len;
__le32 middle_crc;
__le32 middle_len;
__le32 data_crc;
__le32 data_len;
__le32 seq_lower_word;
} __packed *sigblock = enc_buf;
struct {
__le64 a, b, c, d;
} __packed *penc = enc_buf;
int ciphertext_len;
sigblock->header_crc = msg->hdr.crc;
sigblock->front_crc = msg->footer.front_crc;
sigblock->front_len = msg->hdr.front_len;
sigblock->middle_crc = msg->footer.middle_crc;
sigblock->middle_len = msg->hdr.middle_len;
sigblock->data_crc = msg->footer.data_crc;
sigblock->data_len = msg->hdr.data_len;
sigblock->seq_lower_word = *(__le32 *)&msg->hdr.seq;
/* no leading len, no ceph_x_encrypt_header */
ret = ceph_crypt(&au->session_key, true, enc_buf,
CEPHX_AU_ENC_BUF_LEN, sizeof(*sigblock),
&ciphertext_len);
if (ret)
return ret;
*psig = penc->a ^ penc->b ^ penc->c ^ penc->d;
}
*psig = *(__le64 *)(enc_buf + sizeof(u32)); return 0;
}
Hi Greg,
I thought this series (patches 13 - 18) was dropped from the 4.14 queue. If it wasn't, you also need to pick up the following:
f1d10e046379 libceph: weaken sizeof check in ceph_x_verify_authorizer_reply() 130f52f2b203 libceph: check authorizer reply/challenge length before reading
See our discussion with Sasha:
Ah, missed that, sorry. I've queued these patches up now, thanks!
greg k-h
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit af86ca4e3088fe5eacf2f7e58c01fa68ca067672 upstream.
Detect code patterns where malicious 'speculative store bypass' can be used and sanitize such patterns.
39: (bf) r3 = r10 40: (07) r3 += -216 41: (79) r8 = *(u64 *)(r7 +0) // slow read 42: (7a) *(u64 *)(r10 -72) = 0 // verifier inserts this instruction 43: (7b) *(u64 *)(r8 +0) = r3 // this store becomes slow due to r8 44: (79) r1 = *(u64 *)(r6 +0) // cpu speculatively executes this load 45: (71) r2 = *(u8 *)(r1 +0) // speculatively arbitrary 'load byte' // is now sanitized
Above code after x86 JIT becomes: e5: mov %rbp,%rdx e8: add $0xffffffffffffff28,%rdx ef: mov 0x0(%r13),%r14 f3: movq $0x0,-0x48(%rbp) fb: mov %rdx,0x0(%r14) ff: mov 0x0(%rbx),%rdi 103: movzbq 0x0(%rdi),%rsi
Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Thomas Gleixner tglx@linutronix.de [bwh: Backported to 4.14: - Add bpf_verifier_env parameter to check_stack_write() - Look up stack slot_types with state->stack_slot_type[] rather than state->stack[].slot_type[] - Drop bpf_verifier_env argument to verbose() - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf_verifier.h | 1 + kernel/bpf/verifier.c | 62 +++++++++++++++++++++++++++++++++--- 2 files changed, 59 insertions(+), 4 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index a3333004fd2b..8458cc5fbce5 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -113,6 +113,7 @@ struct bpf_insn_aux_data { struct bpf_map *map_ptr; /* pointer for call insn into lookup_elem */ }; int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ + int sanitize_stack_off; /* stack slot to be cleared */ bool seen; /* this insn was processed by the verifier */ };
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 013b0cd1958e..f6755fd5bae2 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -717,8 +717,9 @@ static bool is_spillable_regtype(enum bpf_reg_type type) /* check_stack_read/write functions track spill/fill of registers, * stack boundary and alignment are checked in check_mem_access() */ -static int check_stack_write(struct bpf_verifier_state *state, int off, - int size, int value_regno) +static int check_stack_write(struct bpf_verifier_env *env, + struct bpf_verifier_state *state, int off, + int size, int value_regno, int insn_idx) { int i, spi = (MAX_BPF_STACK + off) / BPF_REG_SIZE; /* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0, @@ -738,8 +739,32 @@ static int check_stack_write(struct bpf_verifier_state *state, int off, state->spilled_regs[spi] = state->regs[value_regno]; state->spilled_regs[spi].live |= REG_LIVE_WRITTEN;
- for (i = 0; i < BPF_REG_SIZE; i++) + for (i = 0; i < BPF_REG_SIZE; i++) { + if (state->stack_slot_type[MAX_BPF_STACK + off + i] == STACK_MISC && + !env->allow_ptr_leaks) { + int *poff = &env->insn_aux_data[insn_idx].sanitize_stack_off; + int soff = (-spi - 1) * BPF_REG_SIZE; + + /* detected reuse of integer stack slot with a pointer + * which means either llvm is reusing stack slot or + * an attacker is trying to exploit CVE-2018-3639 + * (speculative store bypass) + * Have to sanitize that slot with preemptive + * store of zero. + */ + if (*poff && *poff != soff) { + /* disallow programs where single insn stores + * into two different stack slots, since verifier + * cannot sanitize them + */ + verbose("insn %d cannot access two stack slots fp%d and fp%d", + insn_idx, *poff, soff); + return -EINVAL; + } + *poff = soff; + } state->stack_slot_type[MAX_BPF_STACK + off + i] = STACK_SPILL; + } } else { /* regular write of data into stack */ state->spilled_regs[spi] = (struct bpf_reg_state) {}; @@ -1216,7 +1241,8 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn verbose("attempt to corrupt spilled pointer on stack\n"); return -EACCES; } - err = check_stack_write(state, off, size, value_regno); + err = check_stack_write(env, state, off, size, + value_regno, insn_idx); } else { err = check_stack_read(state, off, size, value_regno); } @@ -4270,6 +4296,34 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env) else continue;
+ if (type == BPF_WRITE && + env->insn_aux_data[i + delta].sanitize_stack_off) { + struct bpf_insn patch[] = { + /* Sanitize suspicious stack slot with zero. + * There are no memory dependencies for this store, + * since it's only using frame pointer and immediate + * constant of zero + */ + BPF_ST_MEM(BPF_DW, BPF_REG_FP, + env->insn_aux_data[i + delta].sanitize_stack_off, + 0), + /* the original STX instruction will immediately + * overwrite the same stack slot with appropriate value + */ + *insn, + }; + + cnt = ARRAY_SIZE(patch); + new_prog = bpf_patch_insn_data(env, i + delta, patch, cnt); + if (!new_prog) + return -ENOMEM; + + delta += cnt - 1; + env->prog = new_prog; + insn = new_prog->insnsi + i + delta; + continue; + } + if (env->insn_aux_data[i + delta].ptr_type != PTR_TO_CTX) continue;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 6d88207fcfddc002afe3e2e4a455e5201089d5d9 upstream.
The tx configuration is now stored in ctx->tx_conf. And sk->sk_prot is updated trough a function This will simplify things when we add rx and support for different possible tx and rx cross configurations.
Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tls.h | 2 ++ net/tls/tls_main.c | 46 ++++++++++++++++++++++++++++++++-------------- 2 files changed, 34 insertions(+), 14 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h index 86ed3dd80fe7..0c3ab2af74d3 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -89,6 +89,8 @@ struct tls_context {
void *priv_ctx;
+ u8 tx_conf:2; + u16 prepend_size; u16 tag_size; u16 overhead_size; diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 4f2971f528db..191a8adee3ea 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -46,8 +46,18 @@ MODULE_DESCRIPTION("Transport Layer Security Support"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_ALIAS_TCP_ULP("tls");
-static struct proto tls_base_prot; -static struct proto tls_sw_prot; +enum { + TLS_BASE_TX, + TLS_SW_TX, + TLS_NUM_CONFIG, +}; + +static struct proto tls_prots[TLS_NUM_CONFIG]; + +static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx) +{ + sk->sk_prot = &tls_prots[ctx->tx_conf]; +}
int wait_on_pending_writer(struct sock *sk, long *timeo) { @@ -364,8 +374,8 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval, { struct tls_crypto_info *crypto_info, tmp_crypto_info; struct tls_context *ctx = tls_get_ctx(sk); - struct proto *prot = NULL; int rc = 0; + int tx_conf;
if (!optval || (optlen < sizeof(*crypto_info))) { rc = -EINVAL; @@ -422,11 +432,12 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval,
/* currently SW is default, we will have ethtool in future */ rc = tls_set_sw_offload(sk, ctx); - prot = &tls_sw_prot; + tx_conf = TLS_SW_TX; if (rc) goto err_crypto_info;
- sk->sk_prot = prot; + ctx->tx_conf = tx_conf; + update_sk_prot(sk, ctx); goto out;
err_crypto_info: @@ -488,7 +499,9 @@ static int tls_init(struct sock *sk) icsk->icsk_ulp_data = ctx; ctx->setsockopt = sk->sk_prot->setsockopt; ctx->getsockopt = sk->sk_prot->getsockopt; - sk->sk_prot = &tls_base_prot; + + ctx->tx_conf = TLS_BASE_TX; + update_sk_prot(sk, ctx); out: return rc; } @@ -499,16 +512,21 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = { .init = tls_init, };
+static void build_protos(struct proto *prot, struct proto *base) +{ + prot[TLS_BASE_TX] = *base; + prot[TLS_BASE_TX].setsockopt = tls_setsockopt; + prot[TLS_BASE_TX].getsockopt = tls_getsockopt; + + prot[TLS_SW_TX] = prot[TLS_BASE_TX]; + prot[TLS_SW_TX].close = tls_sk_proto_close; + prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; + prot[TLS_SW_TX].sendpage = tls_sw_sendpage; +} + static int __init tls_register(void) { - tls_base_prot = tcp_prot; - tls_base_prot.setsockopt = tls_setsockopt; - tls_base_prot.getsockopt = tls_getsockopt; - - tls_sw_prot = tls_base_prot; - tls_sw_prot.sendmsg = tls_sw_sendmsg; - tls_sw_prot.sendpage = tls_sw_sendpage; - tls_sw_prot.close = tls_sk_proto_close; + build_protos(tls_prots, &tcp_prot);
tcp_register_ulp(&tcp_tls_ulp_ops);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit ff45d820a2df163957ad8ab459b6eb6976144c18 upstream.
Previously the TLS ulp context would leak if we attached a TLS ulp to a socket but did not use the TLS_TX setsockopt, or did use it but it failed. This patch solves the issue by overriding prot[TLS_BASE_TX].close and fixing tls_sk_proto_close to work properly when its called with ctx->tx_conf == TLS_BASE_TX. This patch also removes ctx->free_resources as we can use ctx->tx_conf to obtain the relevant information.
Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.14: Keep using tls_ctx_free() as introduced by the earlier backport of "tls: zero the crypto information from tls_context before freeing"] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/tls.h | 2 +- net/tls/tls_main.c | 24 ++++++++++++++++-------- net/tls/tls_sw.c | 3 +-- 3 files changed, 18 insertions(+), 11 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h index 0c3ab2af74d3..604fd982da19 100644 --- a/include/net/tls.h +++ b/include/net/tls.h @@ -106,7 +106,6 @@ struct tls_context {
u16 pending_open_record_frags; int (*push_pending_record)(struct sock *sk, int flags); - void (*free_resources)(struct sock *sk);
void (*sk_write_space)(struct sock *sk); void (*sk_proto_close)(struct sock *sk, long timeout); @@ -131,6 +130,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); int tls_sw_sendpage(struct sock *sk, struct page *page, int offset, size_t size, int flags); void tls_sw_close(struct sock *sk, long timeout); +void tls_sw_free_tx_resources(struct sock *sk);
void tls_sk_destruct(struct sock *sk, struct tls_context *ctx); void tls_icsk_clean_acked(struct sock *sk); diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 191a8adee3ea..b5f9c578bcf0 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -249,6 +249,12 @@ static void tls_sk_proto_close(struct sock *sk, long timeout) void (*sk_proto_close)(struct sock *sk, long timeout);
lock_sock(sk); + sk_proto_close = ctx->sk_proto_close; + + if (ctx->tx_conf == TLS_BASE_TX) { + tls_ctx_free(ctx); + goto skip_tx_cleanup; + }
if (!tls_complete_pending_work(sk, ctx, 0, &timeo)) tls_handle_open_record(sk, 0); @@ -265,13 +271,16 @@ static void tls_sk_proto_close(struct sock *sk, long timeout) sg++; } } - ctx->free_resources(sk); + kfree(ctx->rec_seq); kfree(ctx->iv);
- sk_proto_close = ctx->sk_proto_close; - tls_ctx_free(ctx); + if (ctx->tx_conf == TLS_SW_TX) { + tls_sw_free_tx_resources(sk); + tls_ctx_free(ctx); + }
+skip_tx_cleanup: release_sock(sk); sk_proto_close(sk, timeout); } @@ -428,8 +437,6 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval, ctx->sk_write_space = sk->sk_write_space; sk->sk_write_space = tls_write_space;
- ctx->sk_proto_close = sk->sk_prot->close; - /* currently SW is default, we will have ethtool in future */ rc = tls_set_sw_offload(sk, ctx); tx_conf = TLS_SW_TX; @@ -499,6 +506,7 @@ static int tls_init(struct sock *sk) icsk->icsk_ulp_data = ctx; ctx->setsockopt = sk->sk_prot->setsockopt; ctx->getsockopt = sk->sk_prot->getsockopt; + ctx->sk_proto_close = sk->sk_prot->close;
ctx->tx_conf = TLS_BASE_TX; update_sk_prot(sk, ctx); @@ -515,11 +523,11 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = { static void build_protos(struct proto *prot, struct proto *base) { prot[TLS_BASE_TX] = *base; - prot[TLS_BASE_TX].setsockopt = tls_setsockopt; - prot[TLS_BASE_TX].getsockopt = tls_getsockopt; + prot[TLS_BASE_TX].setsockopt = tls_setsockopt; + prot[TLS_BASE_TX].getsockopt = tls_getsockopt; + prot[TLS_BASE_TX].close = tls_sk_proto_close;
prot[TLS_SW_TX] = prot[TLS_BASE_TX]; - prot[TLS_SW_TX].close = tls_sk_proto_close; prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; prot[TLS_SW_TX].sendpage = tls_sw_sendpage; } diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 6ae9ca567d6c..5996fb5756e1 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -646,7 +646,7 @@ sendpage_end: return ret; }
-static void tls_sw_free_resources(struct sock *sk) +void tls_sw_free_tx_resources(struct sock *sk) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx); @@ -685,7 +685,6 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx) }
ctx->priv_ctx = (struct tls_offload_context *)sw_ctx; - ctx->free_resources = tls_sw_free_resources;
crypto_info = &ctx->crypto_send.info; switch (crypto_info->cipher_type) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 196c31b4b54474b31dee3c30352c45c2a93e9226 upstream.
Avoid copying crypto_info again after cipher_type check to avoid a TOCTOU exploits. The temporary array on the stack is removed as we don't really need it
Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.14: Preserve changes made by earlier backports of "tls: return -EBUSY if crypto_info is already set" and "tls: zero the crypto information from tls_context before freeing"] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_main.c | 33 ++++++++++++++------------------- 1 file changed, 14 insertions(+), 19 deletions(-)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index b5f9c578bcf0..f88df514ad5f 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -381,7 +381,7 @@ static int tls_getsockopt(struct sock *sk, int level, int optname, static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval, unsigned int optlen) { - struct tls_crypto_info *crypto_info, tmp_crypto_info; + struct tls_crypto_info *crypto_info; struct tls_context *ctx = tls_get_ctx(sk); int rc = 0; int tx_conf; @@ -391,38 +391,33 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval, goto out; }
- rc = copy_from_user(&tmp_crypto_info, optval, sizeof(*crypto_info)); + crypto_info = &ctx->crypto_send.info; + /* Currently we don't support set crypto info more than one time */ + if (TLS_CRYPTO_INFO_READY(crypto_info)) { + rc = -EBUSY; + goto out; + } + + rc = copy_from_user(crypto_info, optval, sizeof(*crypto_info)); if (rc) { rc = -EFAULT; goto out; }
/* check version */ - if (tmp_crypto_info.version != TLS_1_2_VERSION) { + if (crypto_info->version != TLS_1_2_VERSION) { rc = -ENOTSUPP; - goto out; - } - - /* get user crypto info */ - crypto_info = &ctx->crypto_send.info; - - /* Currently we don't support set crypto info more than one time */ - if (TLS_CRYPTO_INFO_READY(crypto_info)) { - rc = -EBUSY; - goto out; + goto err_crypto_info; }
- switch (tmp_crypto_info.cipher_type) { + switch (crypto_info->cipher_type) { case TLS_CIPHER_AES_GCM_128: { if (optlen != sizeof(struct tls12_crypto_info_aes_gcm_128)) { rc = -EINVAL; goto err_crypto_info; } - rc = copy_from_user( - crypto_info, - optval, - sizeof(struct tls12_crypto_info_aes_gcm_128)); - + rc = copy_from_user(crypto_info + 1, optval + sizeof(*crypto_info), + optlen - sizeof(*crypto_info)); if (rc) { rc = -EFAULT; goto err_crypto_info;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit ee181e5201e640a4b92b217e9eab2531dab57d2c upstream.
If we fail to enable tls in the kernel we shouldn't override the sk_write_space callback
Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_main.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index f88df514ad5f..33187e34599b 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -429,9 +429,6 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval, goto err_crypto_info; }
- ctx->sk_write_space = sk->sk_write_space; - sk->sk_write_space = tls_write_space; - /* currently SW is default, we will have ethtool in future */ rc = tls_set_sw_offload(sk, ctx); tx_conf = TLS_SW_TX; @@ -440,6 +437,8 @@ static int do_tls_setsockopt_tx(struct sock *sk, char __user *optval,
ctx->tx_conf = tx_conf; update_sk_prot(sk, ctx); + ctx->sk_write_space = sk->sk_write_space; + sk->sk_write_space = tls_write_space; goto out;
err_crypto_info:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit c113187d38ff85dc302a1bb55864b203ebb2ba10 upstream.
The tls ulp overrides sk->prot with a new tls specific proto structs. The tls specific structs were previously based on the ipv4 specific tcp_prot sturct. As a result, attaching the tls ulp to an ipv6 tcp socket replaced some ipv6 callback with the ipv4 equivalents.
This patch adds ipv6 tls proto structs and uses them when attached to ipv6 sockets.
Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Boris Pismenny borisp@mellanox.com Signed-off-by: Ilya Lesokhin ilyal@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_main.c | 52 +++++++++++++++++++++++++++++++++------------- 1 file changed, 37 insertions(+), 15 deletions(-)
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c index 33187e34599b..e903bdd39b9f 100644 --- a/net/tls/tls_main.c +++ b/net/tls/tls_main.c @@ -46,17 +46,27 @@ MODULE_DESCRIPTION("Transport Layer Security Support"); MODULE_LICENSE("Dual BSD/GPL"); MODULE_ALIAS_TCP_ULP("tls");
+enum { + TLSV4, + TLSV6, + TLS_NUM_PROTS, +}; + enum { TLS_BASE_TX, TLS_SW_TX, TLS_NUM_CONFIG, };
-static struct proto tls_prots[TLS_NUM_CONFIG]; +static struct proto *saved_tcpv6_prot; +static DEFINE_MUTEX(tcpv6_prot_mutex); +static struct proto tls_prots[TLS_NUM_PROTS][TLS_NUM_CONFIG];
static inline void update_sk_prot(struct sock *sk, struct tls_context *ctx) { - sk->sk_prot = &tls_prots[ctx->tx_conf]; + int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4; + + sk->sk_prot = &tls_prots[ip_ver][ctx->tx_conf]; }
int wait_on_pending_writer(struct sock *sk, long *timeo) @@ -476,8 +486,21 @@ static int tls_setsockopt(struct sock *sk, int level, int optname, return do_tls_setsockopt(sk, optname, optval, optlen); }
+static void build_protos(struct proto *prot, struct proto *base) +{ + prot[TLS_BASE_TX] = *base; + prot[TLS_BASE_TX].setsockopt = tls_setsockopt; + prot[TLS_BASE_TX].getsockopt = tls_getsockopt; + prot[TLS_BASE_TX].close = tls_sk_proto_close; + + prot[TLS_SW_TX] = prot[TLS_BASE_TX]; + prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; + prot[TLS_SW_TX].sendpage = tls_sw_sendpage; +} + static int tls_init(struct sock *sk) { + int ip_ver = sk->sk_family == AF_INET6 ? TLSV6 : TLSV4; struct inet_connection_sock *icsk = inet_csk(sk); struct tls_context *ctx; int rc = 0; @@ -502,6 +525,17 @@ static int tls_init(struct sock *sk) ctx->getsockopt = sk->sk_prot->getsockopt; ctx->sk_proto_close = sk->sk_prot->close;
+ /* Build IPv6 TLS whenever the address of tcpv6_prot changes */ + if (ip_ver == TLSV6 && + unlikely(sk->sk_prot != smp_load_acquire(&saved_tcpv6_prot))) { + mutex_lock(&tcpv6_prot_mutex); + if (likely(sk->sk_prot != saved_tcpv6_prot)) { + build_protos(tls_prots[TLSV6], sk->sk_prot); + smp_store_release(&saved_tcpv6_prot, sk->sk_prot); + } + mutex_unlock(&tcpv6_prot_mutex); + } + ctx->tx_conf = TLS_BASE_TX; update_sk_prot(sk, ctx); out: @@ -514,21 +548,9 @@ static struct tcp_ulp_ops tcp_tls_ulp_ops __read_mostly = { .init = tls_init, };
-static void build_protos(struct proto *prot, struct proto *base) -{ - prot[TLS_BASE_TX] = *base; - prot[TLS_BASE_TX].setsockopt = tls_setsockopt; - prot[TLS_BASE_TX].getsockopt = tls_getsockopt; - prot[TLS_BASE_TX].close = tls_sk_proto_close; - - prot[TLS_SW_TX] = prot[TLS_BASE_TX]; - prot[TLS_SW_TX].sendmsg = tls_sw_sendmsg; - prot[TLS_SW_TX].sendpage = tls_sw_sendpage; -} - static int __init tls_register(void) { - build_protos(tls_prots, &tcp_prot); + build_protos(tls_prots[TLSV4], &tcp_prot);
tcp_register_ulp(&tcp_tls_ulp_ops);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 150085791afb8054e11d2e080d4b9cd755dd7f69 upstream.
In tls_sw_sendmsg() and tls_sw_sendpage(), the variable 'ret' has been set to return value of tls_complete_pending_work(). This allows return of proper error code if tls_complete_pending_work() fails.
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Vakul Garg vakul.garg@nxp.com Signed-off-by: David S. Miller davem@davemloft.net [bwh: Backported to 4.14: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- net/tls/tls_sw.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 5996fb5756e1..d18d4a478e4f 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -388,7 +388,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx); - int ret = 0; + int ret; int required_size; long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); bool eor = !(msg->msg_flags & MSG_MORE); @@ -403,7 +403,8 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
lock_sock(sk);
- if (tls_complete_pending_work(sk, tls_ctx, msg->msg_flags, &timeo)) + ret = tls_complete_pending_work(sk, tls_ctx, msg->msg_flags, &timeo); + if (ret) goto send_end;
if (unlikely(msg->msg_controllen)) { @@ -539,7 +540,7 @@ int tls_sw_sendpage(struct sock *sk, struct page *page, { struct tls_context *tls_ctx = tls_get_ctx(sk); struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx); - int ret = 0; + int ret; long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); bool eor; size_t orig_size = size; @@ -559,7 +560,8 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk);
- if (tls_complete_pending_work(sk, tls_ctx, flags, &timeo)) + ret = tls_complete_pending_work(sk, tls_ctx, flags, &timeo); + if (ret) goto sendpage_end;
/* Call the sk_stream functions to manage the sndbuf mem. */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit b5a8ffcae4103a9d823ea3aa3a761f65779fbe2a upstream.
Add a length check in wmi_set_ie to detect unsigned integer overflow.
Signed-off-by: Lior David qca_liord@qca.qualcomm.com Signed-off-by: Maya Erez qca_merez@qca.qualcomm.com Signed-off-by: Kalle Valo kvalo@qca.qualcomm.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/wil6210/wmi.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/wil6210/wmi.c b/drivers/net/wireless/ath/wil6210/wmi.c index ffdd2fa401b1..d63d7c326801 100644 --- a/drivers/net/wireless/ath/wil6210/wmi.c +++ b/drivers/net/wireless/ath/wil6210/wmi.c @@ -1380,8 +1380,14 @@ int wmi_set_ie(struct wil6210_priv *wil, u8 type, u16 ie_len, const void *ie) }; int rc; u16 len = sizeof(struct wmi_set_appie_cmd) + ie_len; - struct wmi_set_appie_cmd *cmd = kzalloc(len, GFP_KERNEL); + struct wmi_set_appie_cmd *cmd;
+ if (len < ie_len) { + rc = -EINVAL; + goto out; + } + + cmd = kzalloc(len, GFP_KERNEL); if (!cmd) { rc = -ENOMEM; goto out;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 315409b0098fb2651d86553f0436b70502b29bb2 upstream.
Reported in https://bugzilla.kernel.org/show_bug.cgi?id=199839, with an image that has an invalid chunk type but does not return an error.
Add chunk type check in btrfs_check_chunk_valid, to detect the wrong type combinations.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199839 Reported-by: Xu Wen wen.xu@gatech.edu Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Gu Jinxiang gujx@cn.fujitsu.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index a0947f4a3e87..cfd5728e7519 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6353,6 +6353,8 @@ static int btrfs_check_chunk_valid(struct btrfs_fs_info *fs_info, u16 num_stripes; u16 sub_stripes; u64 type; + u64 features; + bool mixed = false;
length = btrfs_chunk_length(leaf, chunk); stripe_len = btrfs_chunk_stripe_len(leaf, chunk); @@ -6391,6 +6393,32 @@ static int btrfs_check_chunk_valid(struct btrfs_fs_info *fs_info, btrfs_chunk_type(leaf, chunk)); return -EIO; } + + if ((type & BTRFS_BLOCK_GROUP_TYPE_MASK) == 0) { + btrfs_err(fs_info, "missing chunk type flag: 0x%llx", type); + return -EIO; + } + + if ((type & BTRFS_BLOCK_GROUP_SYSTEM) && + (type & (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA))) { + btrfs_err(fs_info, + "system chunk with data or metadata type: 0x%llx", type); + return -EIO; + } + + features = btrfs_super_incompat_flags(fs_info->super_copy); + if (features & BTRFS_FEATURE_INCOMPAT_MIXED_GROUPS) + mixed = true; + + if (!mixed) { + if ((type & BTRFS_BLOCK_GROUP_METADATA) && + (type & BTRFS_BLOCK_GROUP_DATA)) { + btrfs_err(fs_info, + "mixed chunk type in non-mixed mode: 0x%llx", type); + return -EIO; + } + } + if ((type & BTRFS_BLOCK_GROUP_RAID10 && sub_stripes != 2) || (type & BTRFS_BLOCK_GROUP_RAID1 && num_stripes < 1) || (type & BTRFS_BLOCK_GROUP_RAID5 && num_stripes < 2) ||
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 7ef49515fa6727cb4b6f2f5b0ffbc5fc20a9f8c6 upstream.
If a crafted image has missing block group items, it could cause unexpected behavior and breaks the assumption of 1:1 chunk<->block group mapping.
Although we have the block group -> chunk mapping check, we still need chunk -> block group mapping check.
This patch will do extra check to ensure each chunk has its corresponding block group.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847 Reported-by: Xu Wen wen.xu@gatech.edu Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Gu Jinxiang gujx@cn.fujitsu.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent-tree.c | 58 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 57 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 2cb3569ac548..fdc42eddccc2 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -10092,6 +10092,62 @@ btrfs_create_block_group_cache(struct btrfs_fs_info *fs_info, return cache; }
+ +/* + * Iterate all chunks and verify that each of them has the corresponding block + * group + */ +static int check_chunk_block_group_mappings(struct btrfs_fs_info *fs_info) +{ + struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree; + struct extent_map *em; + struct btrfs_block_group_cache *bg; + u64 start = 0; + int ret = 0; + + while (1) { + read_lock(&map_tree->map_tree.lock); + /* + * lookup_extent_mapping will return the first extent map + * intersecting the range, so setting @len to 1 is enough to + * get the first chunk. + */ + em = lookup_extent_mapping(&map_tree->map_tree, start, 1); + read_unlock(&map_tree->map_tree.lock); + if (!em) + break; + + bg = btrfs_lookup_block_group(fs_info, em->start); + if (!bg) { + btrfs_err(fs_info, + "chunk start=%llu len=%llu doesn't have corresponding block group", + em->start, em->len); + ret = -EUCLEAN; + free_extent_map(em); + break; + } + if (bg->key.objectid != em->start || + bg->key.offset != em->len || + (bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK) != + (em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK)) { + btrfs_err(fs_info, +"chunk start=%llu len=%llu flags=0x%llx doesn't match block group start=%llu len=%llu flags=0x%llx", + em->start, em->len, + em->map_lookup->type & BTRFS_BLOCK_GROUP_TYPE_MASK, + bg->key.objectid, bg->key.offset, + bg->flags & BTRFS_BLOCK_GROUP_TYPE_MASK); + ret = -EUCLEAN; + free_extent_map(em); + btrfs_put_block_group(bg); + break; + } + start = em->start + em->len; + free_extent_map(em); + btrfs_put_block_group(bg); + } + return ret; +} + int btrfs_read_block_groups(struct btrfs_fs_info *info) { struct btrfs_path *path; @@ -10264,7 +10320,7 @@ int btrfs_read_block_groups(struct btrfs_fs_info *info) }
init_global_block_rsv(info); - ret = 0; + ret = check_chunk_block_group_mappings(info); error: btrfs_free_path(path); return ret;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit c3267bbaa9cae09b62960eafe33ad19196803285 upstream.
Current check_leaf() function does a good job checking key order and item offset/size.
However it only checks from slot 0 to the last but one slot, this is good but makes later expansion hard.
So this refactoring iterates from slot 0 to the last slot. For key comparison, it uses a key with all 0 as initial key, so all valid keys should be larger than that.
And for item size/offset checks, it compares current item end with previous item offset. For slot 0, use leaf end as a special case.
This makes later item/key offset checks and item size checks easier to be implemented.
Also, makes check_leaf() to return -EUCLEAN other than -EIO to indicate error.
Signed-off-by: Qu Wenruo quwenruo.btrfs@gmx.com Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 50 +++++++++++++++++++++++++--------------------- 1 file changed, 27 insertions(+), 23 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 0e67cee73c53..4a1e63df1183 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -554,8 +554,9 @@ static noinline int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) { struct btrfs_fs_info *fs_info = root->fs_info; + /* No valid key type is 0, so all key should be larger than this key */ + struct btrfs_key prev_key = {0, 0, 0}; struct btrfs_key key; - struct btrfs_key leaf_key; u32 nritems = btrfs_header_nritems(leaf); int slot;
@@ -588,7 +589,7 @@ static noinline int check_leaf(struct btrfs_root *root, CORRUPT("non-root leaf's nritems is 0", leaf, check_root, 0); free_extent_buffer(eb); - return -EIO; + return -EUCLEAN; } free_extent_buffer(eb); } @@ -598,28 +599,23 @@ static noinline int check_leaf(struct btrfs_root *root, if (nritems == 0) return 0;
- /* Check the 0 item */ - if (btrfs_item_offset_nr(leaf, 0) + btrfs_item_size_nr(leaf, 0) != - BTRFS_LEAF_DATA_SIZE(fs_info)) { - CORRUPT("invalid item offset size pair", leaf, root, 0); - return -EIO; - } - /* - * Check to make sure each items keys are in the correct order and their - * offsets make sense. We only have to loop through nritems-1 because - * we check the current slot against the next slot, which verifies the - * next slot's offset+size makes sense and that the current's slot - * offset is correct. + * Check the following things to make sure this is a good leaf, and + * leaf users won't need to bother with similar sanity checks: + * + * 1) key order + * 2) item offset and size + * No overlap, no hole, all inside the leaf. */ - for (slot = 0; slot < nritems - 1; slot++) { - btrfs_item_key_to_cpu(leaf, &leaf_key, slot); - btrfs_item_key_to_cpu(leaf, &key, slot + 1); + for (slot = 0; slot < nritems; slot++) { + u32 item_end_expected; + + btrfs_item_key_to_cpu(leaf, &key, slot);
/* Make sure the keys are in the right order */ - if (btrfs_comp_cpu_keys(&leaf_key, &key) >= 0) { + if (btrfs_comp_cpu_keys(&prev_key, &key) >= 0) { CORRUPT("bad key order", leaf, root, slot); - return -EIO; + return -EUCLEAN; }
/* @@ -627,10 +623,14 @@ static noinline int check_leaf(struct btrfs_root *root, * item data starts at the end of the leaf and grows towards the * front. */ - if (btrfs_item_offset_nr(leaf, slot) != - btrfs_item_end_nr(leaf, slot + 1)) { + if (slot == 0) + item_end_expected = BTRFS_LEAF_DATA_SIZE(fs_info); + else + item_end_expected = btrfs_item_offset_nr(leaf, + slot - 1); + if (btrfs_item_end_nr(leaf, slot) != item_end_expected) { CORRUPT("slot offset bad", leaf, root, slot); - return -EIO; + return -EUCLEAN; }
/* @@ -641,8 +641,12 @@ static noinline int check_leaf(struct btrfs_root *root, if (btrfs_item_end_nr(leaf, slot) > BTRFS_LEAF_DATA_SIZE(fs_info)) { CORRUPT("slot end outside of leaf", leaf, root, slot); - return -EIO; + return -EUCLEAN; } + + prev_key.objectid = key.objectid; + prev_key.type = key.type; + prev_key.offset = key.offset; }
return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 7f43d4affb2a254d421ab20b0cf65ac2569909fb upstream.
Function check_leaf() checks if any item pointer points outside of the leaf, but it doesn't check if the pointer overlaps with the item itself.
Normally only the last item may be the victim, but adding such check is never a bad idea anyway.
Signed-off-by: Qu Wenruo quwenruo.btrfs@gmx.com Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 4a1e63df1183..da7b2039e4cb 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -644,6 +644,13 @@ static noinline int check_leaf(struct btrfs_root *root, return -EUCLEAN; }
+ /* Also check if the item pointer overlaps with btrfs item. */ + if (btrfs_item_nr_offset(slot) + sizeof(struct btrfs_item) > + btrfs_item_ptr_offset(leaf, slot)) { + CORRUPT("slot overlap with its data", leaf, root, slot); + return -EUCLEAN; + } + prev_key.objectid = key.objectid; prev_key.type = key.type; prev_key.offset = key.offset;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 40c3c40947324d9f40bf47830c92c59a9bbadf4a upstream.
Add extra checks for item with EXTENT_DATA type. This checks the following thing:
0) Key offset All key offsets must be aligned to sectorsize. Inline extent must have 0 for key offset.
1) Item size Uncompressed inline file extent size must match item size. (Compressed inline file extent has no information about its on-disk size.) Regular/preallocated file extent size must be a fixed value.
2) Every member of regular file extent item Including alignment for bytenr and offset, possible value for compression/encryption/type.
3) Type/compression/encode must be one of the valid values.
This should be the most comprehensive and strict check in the context of btrfs_item for EXTENT_DATA.
Signed-off-by: Qu Wenruo quwenruo.btrfs@gmx.com Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com [ switch to BTRFS_FILE_EXTENT_TYPES, similar to what BTRFS_COMPRESS_TYPES does ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 103 ++++++++++++++++++++++++++++++++ include/uapi/linux/btrfs_tree.h | 1 + 2 files changed, 104 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index da7b2039e4cb..ab8925b2efd1 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -550,6 +550,100 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, btrfs_header_level(eb) == 0 ? "leaf" : "node", \ reason, btrfs_header_bytenr(eb), root->objectid, slot)
+static int check_extent_data_item(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + struct btrfs_file_extent_item *fi; + u32 sectorsize = root->fs_info->sectorsize; + u32 item_size = btrfs_item_size_nr(leaf, slot); + + if (!IS_ALIGNED(key->offset, sectorsize)) { + CORRUPT("unaligned key offset for file extent", + leaf, root, slot); + return -EUCLEAN; + } + + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) { + CORRUPT("invalid file extent type", leaf, root, slot); + return -EUCLEAN; + } + + /* + * Support for new compression/encrption must introduce incompat flag, + * and must be caught in open_ctree(). + */ + if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) { + CORRUPT("invalid file extent compression", leaf, root, slot); + return -EUCLEAN; + } + if (btrfs_file_extent_encryption(leaf, fi)) { + CORRUPT("invalid file extent encryption", leaf, root, slot); + return -EUCLEAN; + } + if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { + /* Inline extent must have 0 as key offset */ + if (key->offset) { + CORRUPT("inline extent has non-zero key offset", + leaf, root, slot); + return -EUCLEAN; + } + + /* Compressed inline extent has no on-disk size, skip it */ + if (btrfs_file_extent_compression(leaf, fi) != + BTRFS_COMPRESS_NONE) + return 0; + + /* Uncompressed inline extent size must match item size */ + if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + + btrfs_file_extent_ram_bytes(leaf, fi)) { + CORRUPT("plaintext inline extent has invalid size", + leaf, root, slot); + return -EUCLEAN; + } + return 0; + } + + /* Regular or preallocated extent has fixed item size */ + if (item_size != sizeof(*fi)) { + CORRUPT( + "regluar or preallocated extent data item size is invalid", + leaf, root, slot); + return -EUCLEAN; + } + if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { + CORRUPT( + "regular or preallocated extent data item has unaligned value", + leaf, root, slot); + return -EUCLEAN; + } + + return 0; +} + +/* + * Common point to switch the item-specific validation. + */ +static int check_leaf_item(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + int ret = 0; + + switch (key->type) { + case BTRFS_EXTENT_DATA_KEY: + ret = check_extent_data_item(root, leaf, key, slot); + break; + } + return ret; +} + static noinline int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) { @@ -606,9 +700,13 @@ static noinline int check_leaf(struct btrfs_root *root, * 1) key order * 2) item offset and size * No overlap, no hole, all inside the leaf. + * 3) item content + * If possible, do comprehensive sanity check. + * NOTE: All checks must only rely on the item data itself. */ for (slot = 0; slot < nritems; slot++) { u32 item_end_expected; + int ret;
btrfs_item_key_to_cpu(leaf, &key, slot);
@@ -651,6 +749,11 @@ static noinline int check_leaf(struct btrfs_root *root, return -EUCLEAN; }
+ /* Check if the item size and content meet other criteria */ + ret = check_leaf_item(root, leaf, &key, slot); + if (ret < 0) + return ret; + prev_key.objectid = key.objectid; prev_key.type = key.type; prev_key.offset = key.offset; diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h index 7115838fbf2a..38ab0e06259a 100644 --- a/include/uapi/linux/btrfs_tree.h +++ b/include/uapi/linux/btrfs_tree.h @@ -734,6 +734,7 @@ struct btrfs_balance_item { #define BTRFS_FILE_EXTENT_INLINE 0 #define BTRFS_FILE_EXTENT_REG 1 #define BTRFS_FILE_EXTENT_PREALLOC 2 +#define BTRFS_FILE_EXTENT_TYPES 2
struct btrfs_file_extent_item { /*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 4b865cab96fe2a30ed512cf667b354bd291b3b0a upstream.
EXTENT_CSUM checker is a relatively easy one, only needs to check:
1) Objectid Fixed to BTRFS_EXTENT_CSUM_OBJECTID
2) Key offset alignment Must be aligned to sectorsize
3) Item size alignedment Must be aligned to csum size
Signed-off-by: Qu Wenruo quwenruo.btrfs@gmx.com Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index ab8925b2efd1..53841d773a40 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -627,6 +627,27 @@ static int check_extent_data_item(struct btrfs_root *root, return 0; }
+static int check_csum_item(struct btrfs_root *root, struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + u32 sectorsize = root->fs_info->sectorsize; + u32 csumsize = btrfs_super_csum_size(root->fs_info->super_copy); + + if (key->objectid != BTRFS_EXTENT_CSUM_OBJECTID) { + CORRUPT("invalid objectid for csum item", leaf, root, slot); + return -EUCLEAN; + } + if (!IS_ALIGNED(key->offset, sectorsize)) { + CORRUPT("unaligned key offset for csum item", leaf, root, slot); + return -EUCLEAN; + } + if (!IS_ALIGNED(btrfs_item_size_nr(leaf, slot), csumsize)) { + CORRUPT("unaligned csum item size", leaf, root, slot); + return -EUCLEAN; + } + return 0; +} + /* * Common point to switch the item-specific validation. */ @@ -640,6 +661,9 @@ static int check_leaf_item(struct btrfs_root *root, case BTRFS_EXTENT_DATA_KEY: ret = check_extent_data_item(root, leaf, key, slot); break; + case BTRFS_EXTENT_CSUM_KEY: + ret = check_csum_item(root, leaf, key, slot); + break; } return ret; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 557ea5dd003d371536f6b4e8f7c8209a2b6fd4e3 upstream.
It's no doubt the comprehensive tree block checker will become larger, so moving them into their own files is quite reasonable.
Signed-off-by: Qu Wenruo quwenruo.btrfs@gmx.com [ wording adjustments ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/Makefile | 2 +- fs/btrfs/disk-io.c | 285 +----------------------------------- fs/btrfs/tree-checker.c | 309 ++++++++++++++++++++++++++++++++++++++++ fs/btrfs/tree-checker.h | 26 ++++ 4 files changed, 340 insertions(+), 282 deletions(-) create mode 100644 fs/btrfs/tree-checker.c create mode 100644 fs/btrfs/tree-checker.h
diff --git a/fs/btrfs/Makefile b/fs/btrfs/Makefile index f2cd9dedb037..195229df5ba0 100644 --- a/fs/btrfs/Makefile +++ b/fs/btrfs/Makefile @@ -10,7 +10,7 @@ btrfs-y += super.o ctree.o extent-tree.o print-tree.o root-tree.o dir-item.o \ export.o tree-log.o free-space-cache.o zlib.o lzo.o zstd.o \ compression.o delayed-ref.o relocation.o delayed-inode.o scrub.o \ reada.o backref.o ulist.o qgroup.o send.o dev-replace.o raid56.o \ - uuid-tree.o props.o hash.o free-space-tree.o + uuid-tree.o props.o hash.o free-space-tree.o tree-checker.o
btrfs-$(CONFIG_BTRFS_FS_POSIX_ACL) += acl.o btrfs-$(CONFIG_BTRFS_FS_CHECK_INTEGRITY) += check-integrity.o diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 53841d773a40..2df5e906db08 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -50,6 +50,7 @@ #include "sysfs.h" #include "qgroup.h" #include "compression.h" +#include "tree-checker.h"
#ifdef CONFIG_X86 #include <asm/cpufeature.h> @@ -544,284 +545,6 @@ static int check_tree_block_fsid(struct btrfs_fs_info *fs_info, return ret; }
-#define CORRUPT(reason, eb, root, slot) \ - btrfs_crit(root->fs_info, \ - "corrupt %s, %s: block=%llu, root=%llu, slot=%d", \ - btrfs_header_level(eb) == 0 ? "leaf" : "node", \ - reason, btrfs_header_bytenr(eb), root->objectid, slot) - -static int check_extent_data_item(struct btrfs_root *root, - struct extent_buffer *leaf, - struct btrfs_key *key, int slot) -{ - struct btrfs_file_extent_item *fi; - u32 sectorsize = root->fs_info->sectorsize; - u32 item_size = btrfs_item_size_nr(leaf, slot); - - if (!IS_ALIGNED(key->offset, sectorsize)) { - CORRUPT("unaligned key offset for file extent", - leaf, root, slot); - return -EUCLEAN; - } - - fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); - - if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) { - CORRUPT("invalid file extent type", leaf, root, slot); - return -EUCLEAN; - } - - /* - * Support for new compression/encrption must introduce incompat flag, - * and must be caught in open_ctree(). - */ - if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) { - CORRUPT("invalid file extent compression", leaf, root, slot); - return -EUCLEAN; - } - if (btrfs_file_extent_encryption(leaf, fi)) { - CORRUPT("invalid file extent encryption", leaf, root, slot); - return -EUCLEAN; - } - if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { - /* Inline extent must have 0 as key offset */ - if (key->offset) { - CORRUPT("inline extent has non-zero key offset", - leaf, root, slot); - return -EUCLEAN; - } - - /* Compressed inline extent has no on-disk size, skip it */ - if (btrfs_file_extent_compression(leaf, fi) != - BTRFS_COMPRESS_NONE) - return 0; - - /* Uncompressed inline extent size must match item size */ - if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + - btrfs_file_extent_ram_bytes(leaf, fi)) { - CORRUPT("plaintext inline extent has invalid size", - leaf, root, slot); - return -EUCLEAN; - } - return 0; - } - - /* Regular or preallocated extent has fixed item size */ - if (item_size != sizeof(*fi)) { - CORRUPT( - "regluar or preallocated extent data item size is invalid", - leaf, root, slot); - return -EUCLEAN; - } - if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || - !IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || - !IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), sectorsize) || - !IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || - !IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { - CORRUPT( - "regular or preallocated extent data item has unaligned value", - leaf, root, slot); - return -EUCLEAN; - } - - return 0; -} - -static int check_csum_item(struct btrfs_root *root, struct extent_buffer *leaf, - struct btrfs_key *key, int slot) -{ - u32 sectorsize = root->fs_info->sectorsize; - u32 csumsize = btrfs_super_csum_size(root->fs_info->super_copy); - - if (key->objectid != BTRFS_EXTENT_CSUM_OBJECTID) { - CORRUPT("invalid objectid for csum item", leaf, root, slot); - return -EUCLEAN; - } - if (!IS_ALIGNED(key->offset, sectorsize)) { - CORRUPT("unaligned key offset for csum item", leaf, root, slot); - return -EUCLEAN; - } - if (!IS_ALIGNED(btrfs_item_size_nr(leaf, slot), csumsize)) { - CORRUPT("unaligned csum item size", leaf, root, slot); - return -EUCLEAN; - } - return 0; -} - -/* - * Common point to switch the item-specific validation. - */ -static int check_leaf_item(struct btrfs_root *root, - struct extent_buffer *leaf, - struct btrfs_key *key, int slot) -{ - int ret = 0; - - switch (key->type) { - case BTRFS_EXTENT_DATA_KEY: - ret = check_extent_data_item(root, leaf, key, slot); - break; - case BTRFS_EXTENT_CSUM_KEY: - ret = check_csum_item(root, leaf, key, slot); - break; - } - return ret; -} - -static noinline int check_leaf(struct btrfs_root *root, - struct extent_buffer *leaf) -{ - struct btrfs_fs_info *fs_info = root->fs_info; - /* No valid key type is 0, so all key should be larger than this key */ - struct btrfs_key prev_key = {0, 0, 0}; - struct btrfs_key key; - u32 nritems = btrfs_header_nritems(leaf); - int slot; - - /* - * Extent buffers from a relocation tree have a owner field that - * corresponds to the subvolume tree they are based on. So just from an - * extent buffer alone we can not find out what is the id of the - * corresponding subvolume tree, so we can not figure out if the extent - * buffer corresponds to the root of the relocation tree or not. So skip - * this check for relocation trees. - */ - if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) { - struct btrfs_root *check_root; - - key.objectid = btrfs_header_owner(leaf); - key.type = BTRFS_ROOT_ITEM_KEY; - key.offset = (u64)-1; - - check_root = btrfs_get_fs_root(fs_info, &key, false); - /* - * The only reason we also check NULL here is that during - * open_ctree() some roots has not yet been set up. - */ - if (!IS_ERR_OR_NULL(check_root)) { - struct extent_buffer *eb; - - eb = btrfs_root_node(check_root); - /* if leaf is the root, then it's fine */ - if (leaf != eb) { - CORRUPT("non-root leaf's nritems is 0", - leaf, check_root, 0); - free_extent_buffer(eb); - return -EUCLEAN; - } - free_extent_buffer(eb); - } - return 0; - } - - if (nritems == 0) - return 0; - - /* - * Check the following things to make sure this is a good leaf, and - * leaf users won't need to bother with similar sanity checks: - * - * 1) key order - * 2) item offset and size - * No overlap, no hole, all inside the leaf. - * 3) item content - * If possible, do comprehensive sanity check. - * NOTE: All checks must only rely on the item data itself. - */ - for (slot = 0; slot < nritems; slot++) { - u32 item_end_expected; - int ret; - - btrfs_item_key_to_cpu(leaf, &key, slot); - - /* Make sure the keys are in the right order */ - if (btrfs_comp_cpu_keys(&prev_key, &key) >= 0) { - CORRUPT("bad key order", leaf, root, slot); - return -EUCLEAN; - } - - /* - * Make sure the offset and ends are right, remember that the - * item data starts at the end of the leaf and grows towards the - * front. - */ - if (slot == 0) - item_end_expected = BTRFS_LEAF_DATA_SIZE(fs_info); - else - item_end_expected = btrfs_item_offset_nr(leaf, - slot - 1); - if (btrfs_item_end_nr(leaf, slot) != item_end_expected) { - CORRUPT("slot offset bad", leaf, root, slot); - return -EUCLEAN; - } - - /* - * Check to make sure that we don't point outside of the leaf, - * just in case all the items are consistent to each other, but - * all point outside of the leaf. - */ - if (btrfs_item_end_nr(leaf, slot) > - BTRFS_LEAF_DATA_SIZE(fs_info)) { - CORRUPT("slot end outside of leaf", leaf, root, slot); - return -EUCLEAN; - } - - /* Also check if the item pointer overlaps with btrfs item. */ - if (btrfs_item_nr_offset(slot) + sizeof(struct btrfs_item) > - btrfs_item_ptr_offset(leaf, slot)) { - CORRUPT("slot overlap with its data", leaf, root, slot); - return -EUCLEAN; - } - - /* Check if the item size and content meet other criteria */ - ret = check_leaf_item(root, leaf, &key, slot); - if (ret < 0) - return ret; - - prev_key.objectid = key.objectid; - prev_key.type = key.type; - prev_key.offset = key.offset; - } - - return 0; -} - -static int check_node(struct btrfs_root *root, struct extent_buffer *node) -{ - unsigned long nr = btrfs_header_nritems(node); - struct btrfs_key key, next_key; - int slot; - u64 bytenr; - int ret = 0; - - if (nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)) { - btrfs_crit(root->fs_info, - "corrupt node: block %llu root %llu nritems %lu", - node->start, root->objectid, nr); - return -EIO; - } - - for (slot = 0; slot < nr - 1; slot++) { - bytenr = btrfs_node_blockptr(node, slot); - btrfs_node_key_to_cpu(node, &key, slot); - btrfs_node_key_to_cpu(node, &next_key, slot + 1); - - if (!bytenr) { - CORRUPT("invalid item slot", node, root, slot); - ret = -EIO; - goto out; - } - - if (btrfs_comp_cpu_keys(&key, &next_key) >= 0) { - CORRUPT("bad key order", node, root, slot); - ret = -EIO; - goto out; - } - } -out: - return ret; -} - static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, u64 phy_offset, struct page *page, u64 start, u64 end, int mirror) @@ -887,12 +610,12 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, * that we don't try and read the other copies of this block, just * return -EIO. */ - if (found_level == 0 && check_leaf(root, eb)) { + if (found_level == 0 && btrfs_check_leaf(root, eb)) { set_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags); ret = -EIO; }
- if (found_level > 0 && check_node(root, eb)) + if (found_level > 0 && btrfs_check_node(root, eb)) ret = -EIO;
if (!ret) @@ -4147,7 +3870,7 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) buf->len, fs_info->dirty_metadata_batch); #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY - if (btrfs_header_level(buf) == 0 && check_leaf(root, buf)) { + if (btrfs_header_level(buf) == 0 && btrfs_check_leaf(root, buf)) { btrfs_print_leaf(buf); ASSERT(0); } diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c new file mode 100644 index 000000000000..56e25a630103 --- /dev/null +++ b/fs/btrfs/tree-checker.c @@ -0,0 +1,309 @@ +/* + * Copyright (C) Qu Wenruo 2017. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program. + */ + +/* + * The module is used to catch unexpected/corrupted tree block data. + * Such behavior can be caused either by a fuzzed image or bugs. + * + * The objective is to do leaf/node validation checks when tree block is read + * from disk, and check *every* possible member, so other code won't + * need to checking them again. + * + * Due to the potential and unwanted damage, every checker needs to be + * carefully reviewed otherwise so it does not prevent mount of valid images. + */ + +#include "ctree.h" +#include "tree-checker.h" +#include "disk-io.h" +#include "compression.h" + +#define CORRUPT(reason, eb, root, slot) \ + btrfs_crit(root->fs_info, \ + "corrupt %s, %s: block=%llu, root=%llu, slot=%d", \ + btrfs_header_level(eb) == 0 ? "leaf" : "node", \ + reason, btrfs_header_bytenr(eb), root->objectid, slot) + +static int check_extent_data_item(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + struct btrfs_file_extent_item *fi; + u32 sectorsize = root->fs_info->sectorsize; + u32 item_size = btrfs_item_size_nr(leaf, slot); + + if (!IS_ALIGNED(key->offset, sectorsize)) { + CORRUPT("unaligned key offset for file extent", + leaf, root, slot); + return -EUCLEAN; + } + + fi = btrfs_item_ptr(leaf, slot, struct btrfs_file_extent_item); + + if (btrfs_file_extent_type(leaf, fi) > BTRFS_FILE_EXTENT_TYPES) { + CORRUPT("invalid file extent type", leaf, root, slot); + return -EUCLEAN; + } + + /* + * Support for new compression/encrption must introduce incompat flag, + * and must be caught in open_ctree(). + */ + if (btrfs_file_extent_compression(leaf, fi) > BTRFS_COMPRESS_TYPES) { + CORRUPT("invalid file extent compression", leaf, root, slot); + return -EUCLEAN; + } + if (btrfs_file_extent_encryption(leaf, fi)) { + CORRUPT("invalid file extent encryption", leaf, root, slot); + return -EUCLEAN; + } + if (btrfs_file_extent_type(leaf, fi) == BTRFS_FILE_EXTENT_INLINE) { + /* Inline extent must have 0 as key offset */ + if (key->offset) { + CORRUPT("inline extent has non-zero key offset", + leaf, root, slot); + return -EUCLEAN; + } + + /* Compressed inline extent has no on-disk size, skip it */ + if (btrfs_file_extent_compression(leaf, fi) != + BTRFS_COMPRESS_NONE) + return 0; + + /* Uncompressed inline extent size must match item size */ + if (item_size != BTRFS_FILE_EXTENT_INLINE_DATA_START + + btrfs_file_extent_ram_bytes(leaf, fi)) { + CORRUPT("plaintext inline extent has invalid size", + leaf, root, slot); + return -EUCLEAN; + } + return 0; + } + + /* Regular or preallocated extent has fixed item size */ + if (item_size != sizeof(*fi)) { + CORRUPT( + "regluar or preallocated extent data item size is invalid", + leaf, root, slot); + return -EUCLEAN; + } + if (!IS_ALIGNED(btrfs_file_extent_ram_bytes(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_bytenr(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_disk_num_bytes(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_offset(leaf, fi), sectorsize) || + !IS_ALIGNED(btrfs_file_extent_num_bytes(leaf, fi), sectorsize)) { + CORRUPT( + "regular or preallocated extent data item has unaligned value", + leaf, root, slot); + return -EUCLEAN; + } + + return 0; +} + +static int check_csum_item(struct btrfs_root *root, struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + u32 sectorsize = root->fs_info->sectorsize; + u32 csumsize = btrfs_super_csum_size(root->fs_info->super_copy); + + if (key->objectid != BTRFS_EXTENT_CSUM_OBJECTID) { + CORRUPT("invalid objectid for csum item", leaf, root, slot); + return -EUCLEAN; + } + if (!IS_ALIGNED(key->offset, sectorsize)) { + CORRUPT("unaligned key offset for csum item", leaf, root, slot); + return -EUCLEAN; + } + if (!IS_ALIGNED(btrfs_item_size_nr(leaf, slot), csumsize)) { + CORRUPT("unaligned csum item size", leaf, root, slot); + return -EUCLEAN; + } + return 0; +} + +/* + * Common point to switch the item-specific validation. + */ +static int check_leaf_item(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + int ret = 0; + + switch (key->type) { + case BTRFS_EXTENT_DATA_KEY: + ret = check_extent_data_item(root, leaf, key, slot); + break; + case BTRFS_EXTENT_CSUM_KEY: + ret = check_csum_item(root, leaf, key, slot); + break; + } + return ret; +} + +int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) +{ + struct btrfs_fs_info *fs_info = root->fs_info; + /* No valid key type is 0, so all key should be larger than this key */ + struct btrfs_key prev_key = {0, 0, 0}; + struct btrfs_key key; + u32 nritems = btrfs_header_nritems(leaf); + int slot; + + /* + * Extent buffers from a relocation tree have a owner field that + * corresponds to the subvolume tree they are based on. So just from an + * extent buffer alone we can not find out what is the id of the + * corresponding subvolume tree, so we can not figure out if the extent + * buffer corresponds to the root of the relocation tree or not. So + * skip this check for relocation trees. + */ + if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) { + struct btrfs_root *check_root; + + key.objectid = btrfs_header_owner(leaf); + key.type = BTRFS_ROOT_ITEM_KEY; + key.offset = (u64)-1; + + check_root = btrfs_get_fs_root(fs_info, &key, false); + /* + * The only reason we also check NULL here is that during + * open_ctree() some roots has not yet been set up. + */ + if (!IS_ERR_OR_NULL(check_root)) { + struct extent_buffer *eb; + + eb = btrfs_root_node(check_root); + /* if leaf is the root, then it's fine */ + if (leaf != eb) { + CORRUPT("non-root leaf's nritems is 0", + leaf, check_root, 0); + free_extent_buffer(eb); + return -EUCLEAN; + } + free_extent_buffer(eb); + } + return 0; + } + + if (nritems == 0) + return 0; + + /* + * Check the following things to make sure this is a good leaf, and + * leaf users won't need to bother with similar sanity checks: + * + * 1) key ordering + * 2) item offset and size + * No overlap, no hole, all inside the leaf. + * 3) item content + * If possible, do comprehensive sanity check. + * NOTE: All checks must only rely on the item data itself. + */ + for (slot = 0; slot < nritems; slot++) { + u32 item_end_expected; + int ret; + + btrfs_item_key_to_cpu(leaf, &key, slot); + + /* Make sure the keys are in the right order */ + if (btrfs_comp_cpu_keys(&prev_key, &key) >= 0) { + CORRUPT("bad key order", leaf, root, slot); + return -EUCLEAN; + } + + /* + * Make sure the offset and ends are right, remember that the + * item data starts at the end of the leaf and grows towards the + * front. + */ + if (slot == 0) + item_end_expected = BTRFS_LEAF_DATA_SIZE(fs_info); + else + item_end_expected = btrfs_item_offset_nr(leaf, + slot - 1); + if (btrfs_item_end_nr(leaf, slot) != item_end_expected) { + CORRUPT("slot offset bad", leaf, root, slot); + return -EUCLEAN; + } + + /* + * Check to make sure that we don't point outside of the leaf, + * just in case all the items are consistent to each other, but + * all point outside of the leaf. + */ + if (btrfs_item_end_nr(leaf, slot) > + BTRFS_LEAF_DATA_SIZE(fs_info)) { + CORRUPT("slot end outside of leaf", leaf, root, slot); + return -EUCLEAN; + } + + /* Also check if the item pointer overlaps with btrfs item. */ + if (btrfs_item_nr_offset(slot) + sizeof(struct btrfs_item) > + btrfs_item_ptr_offset(leaf, slot)) { + CORRUPT("slot overlap with its data", leaf, root, slot); + return -EUCLEAN; + } + + /* Check if the item size and content meet other criteria */ + ret = check_leaf_item(root, leaf, &key, slot); + if (ret < 0) + return ret; + + prev_key.objectid = key.objectid; + prev_key.type = key.type; + prev_key.offset = key.offset; + } + + return 0; +} + +int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node) +{ + unsigned long nr = btrfs_header_nritems(node); + struct btrfs_key key, next_key; + int slot; + u64 bytenr; + int ret = 0; + + if (nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)) { + btrfs_crit(root->fs_info, + "corrupt node: block %llu root %llu nritems %lu", + node->start, root->objectid, nr); + return -EIO; + } + + for (slot = 0; slot < nr - 1; slot++) { + bytenr = btrfs_node_blockptr(node, slot); + btrfs_node_key_to_cpu(node, &key, slot); + btrfs_node_key_to_cpu(node, &next_key, slot + 1); + + if (!bytenr) { + CORRUPT("invalid item slot", node, root, slot); + ret = -EIO; + goto out; + } + + if (btrfs_comp_cpu_keys(&key, &next_key) >= 0) { + CORRUPT("bad key order", node, root, slot); + ret = -EIO; + goto out; + } + } +out: + return ret; +} diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h new file mode 100644 index 000000000000..96c486e95d70 --- /dev/null +++ b/fs/btrfs/tree-checker.h @@ -0,0 +1,26 @@ +/* + * Copyright (C) Qu Wenruo 2017. All rights reserved. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public + * License v2 as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public + * License along with this program. + */ + +#ifndef __BTRFS_TREE_CHECKER__ +#define __BTRFS_TREE_CHECKER__ + +#include "ctree.h" +#include "extent_io.h" + +int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf); +int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node); + +#endif
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit bba4f29896c986c4cec17bc0f19f2ce644fceae1 upstream.
Use inline function to replace macro since we don't need stringification. (Macro still exists until all callers get updated)
And add more info about the error, and replace EIO with EUCLEAN.
For nr_items error, report if it's too large or too small, and output the valid value range.
For node block pointer, added a new alignment checker.
For key order, also output the next key to make the problem more obvious.
Signed-off-by: Qu Wenruo quwenruo.btrfs@gmx.com [ wording adjustments, unindented long strings ] Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 68 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 61 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 56e25a630103..5acdf3355a3f 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -37,6 +37,46 @@ btrfs_header_level(eb) == 0 ? "leaf" : "node", \ reason, btrfs_header_bytenr(eb), root->objectid, slot)
+/* + * Error message should follow the following format: + * corrupt <type>: <identifier>, <reason>[, <bad_value>] + * + * @type: leaf or node + * @identifier: the necessary info to locate the leaf/node. + * It's recommened to decode key.objecitd/offset if it's + * meaningful. + * @reason: describe the error + * @bad_value: optional, it's recommened to output bad value and its + * expected value (range). + * + * Since comma is used to separate the components, only space is allowed + * inside each component. + */ + +/* + * Append generic "corrupt leaf/node root=%llu block=%llu slot=%d: " to @fmt. + * Allows callers to customize the output. + */ +__printf(4, 5) +static void generic_err(const struct btrfs_root *root, + const struct extent_buffer *eb, int slot, + const char *fmt, ...) +{ + struct va_format vaf; + va_list args; + + va_start(args, fmt); + + vaf.fmt = fmt; + vaf.va = &args; + + btrfs_crit(root->fs_info, + "corrupt %s: root=%llu block=%llu slot=%d, %pV", + btrfs_header_level(eb) == 0 ? "leaf" : "node", + root->objectid, btrfs_header_bytenr(eb), slot, &vaf); + va_end(args); +} + static int check_extent_data_item(struct btrfs_root *root, struct extent_buffer *leaf, struct btrfs_key *key, int slot) @@ -282,9 +322,11 @@ int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node)
if (nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)) { btrfs_crit(root->fs_info, - "corrupt node: block %llu root %llu nritems %lu", - node->start, root->objectid, nr); - return -EIO; +"corrupt node: root=%llu block=%llu, nritems too %s, have %lu expect range [1,%u]", + root->objectid, node->start, + nr == 0 ? "small" : "large", nr, + BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)); + return -EUCLEAN; }
for (slot = 0; slot < nr - 1; slot++) { @@ -293,14 +335,26 @@ int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node) btrfs_node_key_to_cpu(node, &next_key, slot + 1);
if (!bytenr) { - CORRUPT("invalid item slot", node, root, slot); - ret = -EIO; + generic_err(root, node, slot, + "invalid NULL node pointer"); + ret = -EUCLEAN; + goto out; + } + if (!IS_ALIGNED(bytenr, root->fs_info->sectorsize)) { + generic_err(root, node, slot, + "unaligned pointer, have %llu should be aligned to %u", + bytenr, root->fs_info->sectorsize); + ret = -EUCLEAN; goto out; }
if (btrfs_comp_cpu_keys(&key, &next_key) >= 0) { - CORRUPT("bad key order", node, root, slot); - ret = -EIO; + generic_err(root, node, slot, + "bad key order, current (%llu %u %llu) next (%llu %u %llu)", + key.objectid, key.type, key.offset, + next_key.objectid, next_key.type, + next_key.offset); + ret = -EUCLEAN; goto out; } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 69fc6cbbac542c349b3d350d10f6e394c253c81d upstream.
[BUG] If we run btrfs with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y, it will instantly cause kernel panic like:
------ ... assertion failed: 0, file: fs/btrfs/disk-io.c, line: 3853 ... Call Trace: btrfs_mark_buffer_dirty+0x187/0x1f0 [btrfs] setup_items_for_insert+0x385/0x650 [btrfs] __btrfs_drop_extents+0x129a/0x1870 [btrfs] ... -----
[Cause] Btrfs will call btrfs_check_leaf() in btrfs_mark_buffer_dirty() to check if the leaf is valid with CONFIG_BTRFS_FS_RUN_SANITY_TESTS=y.
However quite some btrfs_mark_buffer_dirty() callers(*) don't really initialize its item data but only initialize its item pointers, leaving item data uninitialized.
This makes tree-checker catch uninitialized data as error, causing such panic.
*: These callers include but not limited to setup_items_for_insert() btrfs_split_item() btrfs_expand_item()
[Fix] Add a new parameter @check_item_data to btrfs_check_leaf(). With @check_item_data set to false, item data check will be skipped and fallback to old btrfs_check_leaf() behavior.
So we can still get early warning if we screw up item pointers, and avoid false panic.
Cc: Filipe Manana fdmanana@gmail.com Reported-by: Lakshmipathi.G lakshmipathi.g@gmail.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Liu Bo bo.li.liu@oracle.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk
Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 10 ++++++++-- fs/btrfs/tree-checker.c | 27 ++++++++++++++++++++++----- fs/btrfs/tree-checker.h | 14 +++++++++++++- 3 files changed, 43 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2df5e906db08..e42673477c25 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -610,7 +610,7 @@ static int btree_readpage_end_io_hook(struct btrfs_io_bio *io_bio, * that we don't try and read the other copies of this block, just * return -EIO. */ - if (found_level == 0 && btrfs_check_leaf(root, eb)) { + if (found_level == 0 && btrfs_check_leaf_full(root, eb)) { set_bit(EXTENT_BUFFER_CORRUPT, &eb->bflags); ret = -EIO; } @@ -3870,7 +3870,13 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf) buf->len, fs_info->dirty_metadata_batch); #ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY - if (btrfs_header_level(buf) == 0 && btrfs_check_leaf(root, buf)) { + /* + * Since btrfs_mark_buffer_dirty() can be called with item pointer set + * but item data not updated. + * So here we should only check item pointers, not item data. + */ + if (btrfs_header_level(buf) == 0 && + btrfs_check_leaf_relaxed(root, buf)) { btrfs_print_leaf(buf); ASSERT(0); } diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 5acdf3355a3f..ff4fa8f905d3 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -195,7 +195,8 @@ static int check_leaf_item(struct btrfs_root *root, return ret; }
-int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) +static int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf, + bool check_item_data) { struct btrfs_fs_info *fs_info = root->fs_info; /* No valid key type is 0, so all key should be larger than this key */ @@ -299,10 +300,15 @@ int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) return -EUCLEAN; }
- /* Check if the item size and content meet other criteria */ - ret = check_leaf_item(root, leaf, &key, slot); - if (ret < 0) - return ret; + if (check_item_data) { + /* + * Check if the item size and content meet other + * criteria + */ + ret = check_leaf_item(root, leaf, &key, slot); + if (ret < 0) + return ret; + }
prev_key.objectid = key.objectid; prev_key.type = key.type; @@ -312,6 +318,17 @@ int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf) return 0; }
+int btrfs_check_leaf_full(struct btrfs_root *root, struct extent_buffer *leaf) +{ + return check_leaf(root, leaf, true); +} + +int btrfs_check_leaf_relaxed(struct btrfs_root *root, + struct extent_buffer *leaf) +{ + return check_leaf(root, leaf, false); +} + int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node) { unsigned long nr = btrfs_header_nritems(node); diff --git a/fs/btrfs/tree-checker.h b/fs/btrfs/tree-checker.h index 96c486e95d70..3d53e8d6fda0 100644 --- a/fs/btrfs/tree-checker.h +++ b/fs/btrfs/tree-checker.h @@ -20,7 +20,19 @@ #include "ctree.h" #include "extent_io.h"
-int btrfs_check_leaf(struct btrfs_root *root, struct extent_buffer *leaf); +/* + * Comprehensive leaf checker. + * Will check not only the item pointers, but also every possible member + * in item data. + */ +int btrfs_check_leaf_full(struct btrfs_root *root, struct extent_buffer *leaf); + +/* + * Less strict leaf checker. + * Will only check item pointers, not reading item data. + */ +int btrfs_check_leaf_relaxed(struct btrfs_root *root, + struct extent_buffer *leaf); int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node);
#endif
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit ad7b0368f33cffe67fecd302028915926e50ef7e upstream.
Add checker for dir item, for key types DIR_ITEM, DIR_INDEX and XATTR_ITEM.
This checker does comprehensive checks for:
1) dir_item header and its data size Against item boundary and maximum name/xattr length. This part is mostly the same as old verify_dir_item().
2) dir_type Against maximum file types, and against key type. Since XATTR key should only have FT_XATTR dir item, and normal dir item type should not have XATTR key.
The check between key->type and dir_type is newly introduced by this patch.
3) name hash For XATTR and DIR_ITEM key, key->offset is name hash (crc32c). Check the hash of the name against the key to ensure it's correct.
The name hash check is only found in btrfs-progs before this patch.
Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: Su Yue suy.fnst@cn.fujitsu.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 141 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index ff4fa8f905d3..b32df86de9bf 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -30,6 +30,7 @@ #include "tree-checker.h" #include "disk-io.h" #include "compression.h" +#include "hash.h"
#define CORRUPT(reason, eb, root, slot) \ btrfs_crit(root->fs_info, \ @@ -175,6 +176,141 @@ static int check_csum_item(struct btrfs_root *root, struct extent_buffer *leaf, return 0; }
+/* + * Customized reported for dir_item, only important new info is key->objectid, + * which represents inode number + */ +__printf(4, 5) +static void dir_item_err(const struct btrfs_root *root, + const struct extent_buffer *eb, int slot, + const char *fmt, ...) +{ + struct btrfs_key key; + struct va_format vaf; + va_list args; + + btrfs_item_key_to_cpu(eb, &key, slot); + va_start(args, fmt); + + vaf.fmt = fmt; + vaf.va = &args; + + btrfs_crit(root->fs_info, + "corrupt %s: root=%llu block=%llu slot=%d ino=%llu, %pV", + btrfs_header_level(eb) == 0 ? "leaf" : "node", root->objectid, + btrfs_header_bytenr(eb), slot, key.objectid, &vaf); + va_end(args); +} + +static int check_dir_item(struct btrfs_root *root, + struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + struct btrfs_dir_item *di; + u32 item_size = btrfs_item_size_nr(leaf, slot); + u32 cur = 0; + + di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item); + while (cur < item_size) { + char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)]; + u32 name_len; + u32 data_len; + u32 max_name_len; + u32 total_size; + u32 name_hash; + u8 dir_type; + + /* header itself should not cross item boundary */ + if (cur + sizeof(*di) > item_size) { + dir_item_err(root, leaf, slot, + "dir item header crosses item boundary, have %lu boundary %u", + cur + sizeof(*di), item_size); + return -EUCLEAN; + } + + /* dir type check */ + dir_type = btrfs_dir_type(leaf, di); + if (dir_type >= BTRFS_FT_MAX) { + dir_item_err(root, leaf, slot, + "invalid dir item type, have %u expect [0, %u)", + dir_type, BTRFS_FT_MAX); + return -EUCLEAN; + } + + if (key->type == BTRFS_XATTR_ITEM_KEY && + dir_type != BTRFS_FT_XATTR) { + dir_item_err(root, leaf, slot, + "invalid dir item type for XATTR key, have %u expect %u", + dir_type, BTRFS_FT_XATTR); + return -EUCLEAN; + } + if (dir_type == BTRFS_FT_XATTR && + key->type != BTRFS_XATTR_ITEM_KEY) { + dir_item_err(root, leaf, slot, + "xattr dir type found for non-XATTR key"); + return -EUCLEAN; + } + if (dir_type == BTRFS_FT_XATTR) + max_name_len = XATTR_NAME_MAX; + else + max_name_len = BTRFS_NAME_LEN; + + /* Name/data length check */ + name_len = btrfs_dir_name_len(leaf, di); + data_len = btrfs_dir_data_len(leaf, di); + if (name_len > max_name_len) { + dir_item_err(root, leaf, slot, + "dir item name len too long, have %u max %u", + name_len, max_name_len); + return -EUCLEAN; + } + if (name_len + data_len > BTRFS_MAX_XATTR_SIZE(root->fs_info)) { + dir_item_err(root, leaf, slot, + "dir item name and data len too long, have %u max %u", + name_len + data_len, + BTRFS_MAX_XATTR_SIZE(root->fs_info)); + return -EUCLEAN; + } + + if (data_len && dir_type != BTRFS_FT_XATTR) { + dir_item_err(root, leaf, slot, + "dir item with invalid data len, have %u expect 0", + data_len); + return -EUCLEAN; + } + + total_size = sizeof(*di) + name_len + data_len; + + /* header and name/data should not cross item boundary */ + if (cur + total_size > item_size) { + dir_item_err(root, leaf, slot, + "dir item data crosses item boundary, have %u boundary %u", + cur + total_size, item_size); + return -EUCLEAN; + } + + /* + * Special check for XATTR/DIR_ITEM, as key->offset is name + * hash, should match its name + */ + if (key->type == BTRFS_DIR_ITEM_KEY || + key->type == BTRFS_XATTR_ITEM_KEY) { + read_extent_buffer(leaf, namebuf, + (unsigned long)(di + 1), name_len); + name_hash = btrfs_name_hash(namebuf, name_len); + if (key->offset != name_hash) { + dir_item_err(root, leaf, slot, + "name hash mismatch with key, have 0x%016x expect 0x%016llx", + name_hash, key->offset); + return -EUCLEAN; + } + } + cur += total_size; + di = (struct btrfs_dir_item *)((void *)di + total_size); + } + return 0; +} + /* * Common point to switch the item-specific validation. */ @@ -191,6 +327,11 @@ static int check_leaf_item(struct btrfs_root *root, case BTRFS_EXTENT_CSUM_KEY: ret = check_csum_item(root, leaf, key, slot); break; + case BTRFS_DIR_ITEM_KEY: + case BTRFS_DIR_INDEX_KEY: + case BTRFS_XATTR_ITEM_KEY: + ret = check_dir_item(root, leaf, key, slot); + break; } return ret; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 7cfad65297bfe0aa2996cd72d21c898aa84436d9 upstream.
The return value of sizeof() is of type size_t, so we must print it using the %z format modifier rather than %l to avoid this warning on some architectures:
fs/btrfs/tree-checker.c: In function 'check_dir_item': fs/btrfs/tree-checker.c:273:50: error: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'u32' {aka 'unsigned int'} [-Werror=format=]
Fixes: 005887f2e3e0 ("btrfs: tree-checker: Add checker for dir item") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index b32df86de9bf..9376739a4cc8 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -223,7 +223,7 @@ static int check_dir_item(struct btrfs_root *root, /* header itself should not cross item boundary */ if (cur + sizeof(*di) > item_size) { dir_item_err(root, leaf, slot, - "dir item header crosses item boundary, have %lu boundary %u", + "dir item header crosses item boundary, have %zu boundary %u", cur + sizeof(*di), item_size); return -EUCLEAN; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit e2683fc9d219430f5b78889b50cde7f40efeba7b upstream.
I've noticed that the updated item checker stack consumption increased dramatically in 542f5385e20cf97447 ("btrfs: tree-checker: Add checker for dir item")
tree-checker.c:check_leaf +552 (176 -> 728)
The array is 255 bytes long, dynamic allocation would slow down the sanity checks so it's more reasonable to keep it on-stack. Moving the variable to the scope of use reduces the stack usage again
tree-checker.c:check_leaf -264 (728 -> 464)
Reviewed-by: Josef Bacik jbacik@fb.com Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 9376739a4cc8..f7e7a455b710 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -212,7 +212,6 @@ static int check_dir_item(struct btrfs_root *root,
di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item); while (cur < item_size) { - char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)]; u32 name_len; u32 data_len; u32 max_name_len; @@ -295,6 +294,8 @@ static int check_dir_item(struct btrfs_root *root, */ if (key->type == BTRFS_DIR_ITEM_KEY || key->type == BTRFS_XATTR_ITEM_KEY) { + char namebuf[max(BTRFS_NAME_LEN, XATTR_NAME_MAX)]; + read_extent_buffer(leaf, namebuf, (unsigned long)(di + 1), name_len); name_hash = btrfs_name_hash(namebuf, name_len);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit fce466eab7ac6baa9d2dcd88abcf945be3d4a089 upstream.
A crafted image with invalid block group items could make free space cache code to cause panic.
We could detect such invalid block group item by checking: 1) Item size Known fixed value. 2) Block group size (key.offset) We have an upper limit on block group item (10G) 3) Chunk objectid Known fixed value. 4) Type Only 4 valid type values, DATA, METADATA, SYSTEM and DATA|METADATA. No more than 1 bit set for profile type. 5) Used space No more than the block group size.
This should allow btrfs to detect and refuse to mount the crafted image.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199849 Reported-by: Xu Wen wen.xu@gatech.edu Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Gu Jinxiang gujx@cn.fujitsu.com Reviewed-by: Nikolay Borisov nborisov@suse.com Tested-by: Gu Jinxiang gujx@cn.fujitsu.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com [bwh: Backported to 4.14: - In check_leaf_item(), pass root->fs_info to check_block_group_item() - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 100 ++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.c | 2 +- fs/btrfs/volumes.h | 2 + 3 files changed, 103 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index f7e7a455b710..cf9b10a07134 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -31,6 +31,7 @@ #include "disk-io.h" #include "compression.h" #include "hash.h" +#include "volumes.h"
#define CORRUPT(reason, eb, root, slot) \ btrfs_crit(root->fs_info, \ @@ -312,6 +313,102 @@ static int check_dir_item(struct btrfs_root *root, return 0; }
+__printf(4, 5) +__cold +static void block_group_err(const struct btrfs_fs_info *fs_info, + const struct extent_buffer *eb, int slot, + const char *fmt, ...) +{ + struct btrfs_key key; + struct va_format vaf; + va_list args; + + btrfs_item_key_to_cpu(eb, &key, slot); + va_start(args, fmt); + + vaf.fmt = fmt; + vaf.va = &args; + + btrfs_crit(fs_info, + "corrupt %s: root=%llu block=%llu slot=%d bg_start=%llu bg_len=%llu, %pV", + btrfs_header_level(eb) == 0 ? "leaf" : "node", + btrfs_header_owner(eb), btrfs_header_bytenr(eb), slot, + key.objectid, key.offset, &vaf); + va_end(args); +} + +static int check_block_group_item(struct btrfs_fs_info *fs_info, + struct extent_buffer *leaf, + struct btrfs_key *key, int slot) +{ + struct btrfs_block_group_item bgi; + u32 item_size = btrfs_item_size_nr(leaf, slot); + u64 flags; + u64 type; + + /* + * Here we don't really care about alignment since extent allocator can + * handle it. We care more about the size, as if one block group is + * larger than maximum size, it's must be some obvious corruption. + */ + if (key->offset > BTRFS_MAX_DATA_CHUNK_SIZE || key->offset == 0) { + block_group_err(fs_info, leaf, slot, + "invalid block group size, have %llu expect (0, %llu]", + key->offset, BTRFS_MAX_DATA_CHUNK_SIZE); + return -EUCLEAN; + } + + if (item_size != sizeof(bgi)) { + block_group_err(fs_info, leaf, slot, + "invalid item size, have %u expect %zu", + item_size, sizeof(bgi)); + return -EUCLEAN; + } + + read_extent_buffer(leaf, &bgi, btrfs_item_ptr_offset(leaf, slot), + sizeof(bgi)); + if (btrfs_block_group_chunk_objectid(&bgi) != + BTRFS_FIRST_CHUNK_TREE_OBJECTID) { + block_group_err(fs_info, leaf, slot, + "invalid block group chunk objectid, have %llu expect %llu", + btrfs_block_group_chunk_objectid(&bgi), + BTRFS_FIRST_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + + if (btrfs_block_group_used(&bgi) > key->offset) { + block_group_err(fs_info, leaf, slot, + "invalid block group used, have %llu expect [0, %llu)", + btrfs_block_group_used(&bgi), key->offset); + return -EUCLEAN; + } + + flags = btrfs_block_group_flags(&bgi); + if (hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK) > 1) { + block_group_err(fs_info, leaf, slot, +"invalid profile flags, have 0x%llx (%lu bits set) expect no more than 1 bit set", + flags & BTRFS_BLOCK_GROUP_PROFILE_MASK, + hweight64(flags & BTRFS_BLOCK_GROUP_PROFILE_MASK)); + return -EUCLEAN; + } + + type = flags & BTRFS_BLOCK_GROUP_TYPE_MASK; + if (type != BTRFS_BLOCK_GROUP_DATA && + type != BTRFS_BLOCK_GROUP_METADATA && + type != BTRFS_BLOCK_GROUP_SYSTEM && + type != (BTRFS_BLOCK_GROUP_METADATA | + BTRFS_BLOCK_GROUP_DATA)) { + block_group_err(fs_info, leaf, slot, +"invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llu or 0x%llx", + type, hweight64(type), + BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA, + BTRFS_BLOCK_GROUP_SYSTEM, + BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA); + return -EUCLEAN; + } + return 0; +} + /* * Common point to switch the item-specific validation. */ @@ -333,6 +430,9 @@ static int check_leaf_item(struct btrfs_root *root, case BTRFS_XATTR_ITEM_KEY: ret = check_dir_item(root, leaf, key, slot); break; + case BTRFS_BLOCK_GROUP_ITEM_KEY: + ret = check_block_group_item(root->fs_info, leaf, key, slot); + break; } return ret; } diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index cfd5728e7519..9663b6aa2a56 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -4647,7 +4647,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
if (type & BTRFS_BLOCK_GROUP_DATA) { max_stripe_size = SZ_1G; - max_chunk_size = 10 * max_stripe_size; + max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE; if (!devs_max) devs_max = BTRFS_MAX_DEVS(info->chunk_root); } else if (type & BTRFS_BLOCK_GROUP_METADATA) { diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h index c5dd48eb7b3d..76fb6e84f201 100644 --- a/fs/btrfs/volumes.h +++ b/fs/btrfs/volumes.h @@ -24,6 +24,8 @@ #include <linux/btrfs.h> #include "async-thread.h"
+#define BTRFS_MAX_DATA_CHUNK_SIZE (10ULL * SZ_1G) + extern struct mutex uuid_mutex;
#define BTRFS_STRIPE_LEN SZ_64K
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit ba480dd4db9f1798541eb2d1c423fc95feee8d36 upstream.
A crafted image has empty root tree block, which will later cause NULL pointer dereference.
The following trees should never be empty: 1) Tree root Must contain at least root items for extent tree, device tree and fs tree
2) Chunk tree Or we can't even bootstrap as it contains the mapping.
3) Fs tree At least inode item for top level inode (.).
4) Device tree Dev extents for chunks
5) Extent tree Must have corresponding extent for each chunk.
If any of them is empty, we are sure the fs is corrupted and no need to mount it.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199847 Reported-by: Xu Wen wen.xu@gatech.edu Signed-off-by: Qu Wenruo wqu@suse.com Tested-by: Gu Jinxiang gujx@cn.fujitsu.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com [bwh: Backported to 4.14: Pass root instead of fs_info to generic_err()] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index cf9b10a07134..31756bac75b4 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -456,9 +456,22 @@ static int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf, * skip this check for relocation trees. */ if (nritems == 0 && !btrfs_header_flag(leaf, BTRFS_HEADER_FLAG_RELOC)) { + u64 owner = btrfs_header_owner(leaf); struct btrfs_root *check_root;
- key.objectid = btrfs_header_owner(leaf); + /* These trees must never be empty */ + if (owner == BTRFS_ROOT_TREE_OBJECTID || + owner == BTRFS_CHUNK_TREE_OBJECTID || + owner == BTRFS_EXTENT_TREE_OBJECTID || + owner == BTRFS_DEV_TREE_OBJECTID || + owner == BTRFS_FS_TREE_OBJECTID || + owner == BTRFS_DATA_RELOC_TREE_OBJECTID) { + generic_err(root, leaf, 0, + "invalid root, root %llu must never be empty", + owner); + return -EUCLEAN; + } + key.objectid = owner; key.type = BTRFS_ROOT_ITEM_KEY; key.offset = (u64)-1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 514c7dca85a0bf40be984dab0b477403a6db901f upstream.
A crafted btrfs image with incorrect chunk<->block group mapping will trigger a lot of unexpected things as the mapping is essential.
Although the problem can be caught by block group item checker added in "btrfs: tree-checker: Verify block_group_item", it's still not sufficient. A sufficiently valid block group item can pass the check added by the mentioned patch but could fail to match the existing chunk.
This patch will add extra block group -> chunk mapping check, to ensure we have a completely matching (start, len, flags) chunk for each block group at mount time.
Here we reuse the original helper find_first_block_group(), which is already doing the basic bg -> chunk checks, adding further checks of the start/len and type flags.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=199837 Reported-by: Xu Wen wen.xu@gatech.edu Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Su Yue suy.fnst@cn.fujitsu.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent-tree.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index fdc42eddccc2..83791d13c204 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9828,6 +9828,8 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info, int ret = 0; struct btrfs_key found_key; struct extent_buffer *leaf; + struct btrfs_block_group_item bg; + u64 flags; int slot;
ret = btrfs_search_slot(NULL, root, key, path, 0, 0); @@ -9862,8 +9864,32 @@ static int find_first_block_group(struct btrfs_fs_info *fs_info, "logical %llu len %llu found bg but no related chunk", found_key.objectid, found_key.offset); ret = -ENOENT; + } else if (em->start != found_key.objectid || + em->len != found_key.offset) { + btrfs_err(fs_info, + "block group %llu len %llu mismatch with chunk %llu len %llu", + found_key.objectid, found_key.offset, + em->start, em->len); + ret = -EUCLEAN; } else { - ret = 0; + read_extent_buffer(leaf, &bg, + btrfs_item_ptr_offset(leaf, slot), + sizeof(bg)); + flags = btrfs_block_group_flags(&bg) & + BTRFS_BLOCK_GROUP_TYPE_MASK; + + if (flags != (em->map_lookup->type & + BTRFS_BLOCK_GROUP_TYPE_MASK)) { + btrfs_err(fs_info, +"block group %llu len %llu type flags 0x%llx mismatch with chunk type flags 0x%llx", + found_key.objectid, + found_key.offset, flags, + (BTRFS_BLOCK_GROUP_TYPE_MASK & + em->map_lookup->type)); + ret = -EUCLEAN; + } else { + ret = 0; + } } free_extent_map(em); goto out;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit f556faa46eb4e96d0d0772e74ecf66781e132f72 upstream.
Although we have tree level check at tree read runtime, it's completely based on its parent level. We still need to do accurate level check to avoid invalid tree blocks sneak into kernel space.
The check itself is simple, for leaf its level should always be 0. For nodes its level should be in range [1, BTRFS_MAX_LEVEL - 1].
Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: Su Yue suy.fnst@cn.fujitsu.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com [bwh: Backported to 4.14: - Pass root instead of fs_info to generic_err() - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index 31756bac75b4..fa8f64119e6f 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -447,6 +447,13 @@ static int check_leaf(struct btrfs_root *root, struct extent_buffer *leaf, u32 nritems = btrfs_header_nritems(leaf); int slot;
+ if (btrfs_header_level(leaf) != 0) { + generic_err(root, leaf, 0, + "invalid level for leaf, have %d expect 0", + btrfs_header_level(leaf)); + return -EUCLEAN; + } + /* * Extent buffers from a relocation tree have a owner field that * corresponds to the subvolume tree they are based on. So just from an @@ -589,9 +596,16 @@ int btrfs_check_node(struct btrfs_root *root, struct extent_buffer *node) unsigned long nr = btrfs_header_nritems(node); struct btrfs_key key, next_key; int slot; + int level = btrfs_header_level(node); u64 bytenr; int ret = 0;
+ if (level <= 0 || level >= BTRFS_MAX_LEVEL) { + generic_err(root, node, 0, + "invalid level for node, have %d expect [1, %d]", + level, BTRFS_MAX_LEVEL - 1); + return -EUCLEAN; + } if (nr == 0 || nr > BTRFS_NODEPTRS_PER_BLOCK(root->fs_info)) { btrfs_crit(root->fs_info, "corrupt node: root=%llu block=%llu, nritems too %s, have %lu expect range [1,%u]",
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 761333f2f50ccc887aa9957ae829300262c0d15b upstream.
block_group_err shows the group system as a decimal value with a '0x' prefix, which is somewhat misleading.
Fix it to print hexadecimal, as was intended.
Fixes: fce466eab7ac6 ("btrfs: tree-checker: Verify block_group_item") Reviewed-by: Nikolay Borisov nborisov@suse.com Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Shaokun Zhang zhangshaokun@hisilicon.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c index fa8f64119e6f..f206aec1525d 100644 --- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -399,7 +399,7 @@ static int check_block_group_item(struct btrfs_fs_info *fs_info, type != (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_DATA)) { block_group_err(fs_info, leaf, slot, -"invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llu or 0x%llx", +"invalid type, have 0x%llx (%lu bits set) expect either 0x%llx, 0x%llx, 0x%llx or 0x%llx", type, hweight64(type), BTRFS_BLOCK_GROUP_DATA, BTRFS_BLOCK_GROUP_METADATA, BTRFS_BLOCK_GROUP_SYSTEM,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 0833721ec3658a4e9d5e58b6fa82cf9edc431e59 upstream.
This patch check blkaddr more accuratly before issue a write or read bio.
Signed-off-by: Yunlei He heyunlei@huawei.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/checkpoint.c | 2 ++ fs/f2fs/data.c | 5 +++-- fs/f2fs/f2fs.h | 1 + fs/f2fs/segment.h | 25 +++++++++++++++++++------ 4 files changed, 25 insertions(+), 8 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 41fce930f44c..1bd4cd8c79c6 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -69,6 +69,7 @@ static struct page *__get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index, .old_blkaddr = index, .new_blkaddr = index, .encrypted_page = NULL, + .is_meta = is_meta, };
if (unlikely(!is_meta)) @@ -163,6 +164,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, .op_flags = sync ? (REQ_META | REQ_PRIO) : REQ_RAHEAD, .encrypted_page = NULL, .in_list = false, + .is_meta = (type != META_POR), }; struct blk_plug plug;
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 6fbb6d75318a..5913de3f661d 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -369,6 +369,7 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) struct page *page = fio->encrypted_page ? fio->encrypted_page : fio->page;
+ verify_block_addr(fio, fio->new_blkaddr); trace_f2fs_submit_page_bio(page, fio); f2fs_trace_ios(fio, 0);
@@ -413,8 +414,8 @@ next: }
if (fio->old_blkaddr != NEW_ADDR) - verify_block_addr(sbi, fio->old_blkaddr); - verify_block_addr(sbi, fio->new_blkaddr); + verify_block_addr(fio, fio->old_blkaddr); + verify_block_addr(fio, fio->new_blkaddr);
bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page;
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 54f8520ad7a2..aa7b033af1b0 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -910,6 +910,7 @@ struct f2fs_io_info { bool submitted; /* indicate IO submission */ int need_lock; /* indicate we need to lock cp_rwsem */ bool in_list; /* indicate fio is in io_list */ + bool is_meta; /* indicate borrow meta inode mapping or not */ enum iostat_type io_type; /* io type */ };
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 4dfb5080098f..4b635e8c91b0 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -53,13 +53,19 @@ ((secno) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \ (sbi)->segs_per_sec)) \
-#define MAIN_BLKADDR(sbi) (SM_I(sbi)->main_blkaddr) -#define SEG0_BLKADDR(sbi) (SM_I(sbi)->seg0_blkaddr) +#define MAIN_BLKADDR(sbi) \ + (SM_I(sbi) ? SM_I(sbi)->main_blkaddr : \ + le32_to_cpu(F2FS_RAW_SUPER(sbi)->main_blkaddr)) +#define SEG0_BLKADDR(sbi) \ + (SM_I(sbi) ? SM_I(sbi)->seg0_blkaddr : \ + le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment0_blkaddr))
#define MAIN_SEGS(sbi) (SM_I(sbi)->main_segments) #define MAIN_SECS(sbi) ((sbi)->total_sections)
-#define TOTAL_SEGS(sbi) (SM_I(sbi)->segment_count) +#define TOTAL_SEGS(sbi) \ + (SM_I(sbi) ? SM_I(sbi)->segment_count : \ + le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment_count)) #define TOTAL_BLKS(sbi) (TOTAL_SEGS(sbi) << (sbi)->log_blocks_per_seg)
#define MAX_BLKADDR(sbi) (SEG0_BLKADDR(sbi) + TOTAL_BLKS(sbi)) @@ -619,10 +625,17 @@ static inline void check_seg_range(struct f2fs_sb_info *sbi, unsigned int segno) f2fs_bug_on(sbi, segno > TOTAL_SEGS(sbi) - 1); }
-static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr) +static inline void verify_block_addr(struct f2fs_io_info *fio, block_t blk_addr) { - BUG_ON(blk_addr < SEG0_BLKADDR(sbi) - || blk_addr >= MAX_BLKADDR(sbi)); + struct f2fs_sb_info *sbi = fio->sbi; + + if (PAGE_TYPE_OF_BIO(fio->type) == META && + (!is_read_io(fio->op) || fio->is_meta)) + BUG_ON(blk_addr < SEG0_BLKADDR(sbi) || + blk_addr >= MAIN_BLKADDR(sbi)); + else + BUG_ON(blk_addr < MAIN_BLKADDR(sbi) || + blk_addr >= MAX_BLKADDR(sbi)); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit b2ca374f33bd33fd822eb871876e4888cf79dc97 upstream.
syzbot hit the following crash on upstream commit 87ef12027b9b1dd0e0b12cf311fbcb19f9d92539 (Wed Apr 18 19:48:17 2018 +0000) Merge tag 'ceph-for-4.17-rc2' of git://github.com/ceph/ceph-client syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=83699adeb2d13579c31e
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5805208181407744 syzkaller reproducer: https://syzkaller.appspot.com/x/repro.syz?id=6005073343676416 Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6555047731134464 Kernel config: https://syzkaller.appspot.com/x/.config?id=1808800213120130118 compiler: gcc (GCC) 8.0.1 20180413 (experimental)
IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com It will help syzbot understand when the bug is fixed. See footer for details. If you forward the report, please keep this part and the footer.
F2FS-fs (loop0): Magic Mismatch, valid(0xf2f52010) - read(0x0) F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock F2FS-fs (loop0): invalid crc value BUG: unable to handle kernel paging request at ffffed006b2a50c0 PGD 21ffee067 P4D 21ffee067 PUD 21fbeb067 PMD 0 Oops: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 4514 Comm: syzkaller989480 Not tainted 4.17.0-rc1+ #8 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:build_sit_entries fs/f2fs/segment.c:3653 [inline] RIP: 0010:build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: 0018:ffff8801b102e5b0 EFLAGS: 00010a06 RAX: 1ffff1006b2a50c0 RBX: 0000000000000004 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801ac74243e RBP: ffff8801b102f410 R08: ffff8801acbd46c0 R09: fffffbfff14d9af8 R10: fffffbfff14d9af8 R11: ffff8801acbd46c0 R12: ffff8801ac742a80 R13: ffff8801d9519100 R14: dffffc0000000000 R15: ffff880359528600 FS: 0000000001e04880(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffed006b2a50c0 CR3: 00000001ac6ac000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: f2fs_fill_super+0x4095/0x7bf0 fs/f2fs/super.c:2803 mount_bdev+0x30c/0x3e0 fs/super.c:1165 f2fs_mount+0x34/0x40 fs/f2fs/super.c:3020 mount_fs+0xae/0x328 fs/super.c:1268 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037 vfs_kern_mount fs/namespace.c:1027 [inline] do_new_mount fs/namespace.c:2517 [inline] do_mount+0x564/0x3070 fs/namespace.c:2847 ksys_mount+0x12d/0x140 fs/namespace.c:3063 __do_sys_mount fs/namespace.c:3077 [inline] __se_sys_mount fs/namespace.c:3074 [inline] __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x443d6a RSP: 002b:00007ffd312813c8 EFLAGS: 00000297 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 0000000020000c00 RCX: 0000000000443d6a RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffd312813d0 RBP: 0000000000000003 R08: 0000000020016a00 R09: 000000000000000a R10: 0000000000000000 R11: 0000000000000297 R12: 0000000000000004 R13: 0000000000402c60 R14: 0000000000000000 R15: 0000000000000000 RIP: build_sit_entries fs/f2fs/segment.c:3653 [inline] RSP: ffff8801b102e5b0 RIP: build_segment_manager+0x7ef7/0xbf70 fs/f2fs/segment.c:3852 RSP: ffff8801b102e5b0 CR2: ffffed006b2a50c0 ---[ end trace a2034989e196ff17 ]---
Reported-and-tested-by: syzbot+83699adeb2d13579c31e@syzkaller.appspotmail.com Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk
Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/segment.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 3c7bbbae0afa..1104a6c80251 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -3304,6 +3304,15 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) unsigned int old_valid_blocks;
start = le32_to_cpu(segno_in_journal(journal, i)); + if (start >= MAIN_SEGS(sbi)) { + f2fs_msg(sbi->sb, KERN_ERR, + "Wrong journal entry on segno %u", + start); + set_sbi_flag(sbi, SBI_NEED_FSCK); + err = -EINVAL; + break; + } + se = &sit_i->sentries[start]; sit = sit_in_journal(journal, i);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 0cfe75c5b011994651a4ca6d74f20aa997bfc69a upstream.
In order to avoid the below overflow issue, we should have checked the boundaries in superblock before reaching out to allocation. As Linus suggested, the right place should be sanity_check_raw_super().
Dr Silvio Cesare of InfoSect reported:
There are integer overflows with using the cp_payload superblock field in the f2fs filesystem potentially leading to memory corruption.
include/linux/f2fs_fs.h
struct f2fs_super_block { ... __le32 cp_payload;
fs/f2fs/f2fs.h
typedef u32 block_t; /* * should not change u32, since it is the on-disk block * address format, __le32. */ ...
static inline block_t __cp_payload(struct f2fs_sb_info *sbi) { return le32_to_cpu(F2FS_RAW_SUPER(sbi)->cp_payload); }
fs/f2fs/checkpoint.c
block_t start_blk, orphan_blocks, i, j; ... start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi); orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi);
+++ integer overflows
... unsigned int cp_blks = 1 + __cp_payload(sbi); ... sbi->ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL);
+++ integer overflow leading to incorrect heap allocation.
int cp_payload_blks = __cp_payload(sbi); ... ckpt->cp_pack_start_sum = cpu_to_le32(1 + cp_payload_blks + orphan_blocks);
+++ sign bug and integer overflow
... for (i = 1; i < 1 + cp_payload_blks; i++)
+++ integer overflow
...
sbi->max_orphans = (sbi->blocks_per_seg - F2FS_CP_PACKS - NR_CURSEG_TYPE - __cp_payload(sbi)) * F2FS_ORPHANS_PER_BLOCK;
+++ integer overflow
Reported-by: Greg KH greg@kroah.com Reported-by: Silvio Cesare silvio.cesare@gmail.com Suggested-by: Linus Torvalds torvalds@linux-foundation.org Reviewed-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org [bwh: Backported to 4.14: No hot file extension support] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/super.c | 71 ++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 64 insertions(+), 7 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 7cda685296b2..c3e1090e7c0a 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1807,6 +1807,8 @@ static inline bool sanity_check_area_boundary(struct f2fs_sb_info *sbi, static int sanity_check_raw_super(struct f2fs_sb_info *sbi, struct buffer_head *bh) { + block_t segment_count, segs_per_sec, secs_per_zone; + block_t total_sections, blocks_per_seg; struct f2fs_super_block *raw_super = (struct f2fs_super_block *) (bh->b_data + F2FS_SUPER_OFFSET); struct super_block *sb = sbi->sb; @@ -1863,6 +1865,68 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, return 1; }
+ segment_count = le32_to_cpu(raw_super->segment_count); + segs_per_sec = le32_to_cpu(raw_super->segs_per_sec); + secs_per_zone = le32_to_cpu(raw_super->secs_per_zone); + total_sections = le32_to_cpu(raw_super->section_count); + + /* blocks_per_seg should be 512, given the above check */ + blocks_per_seg = 1 << le32_to_cpu(raw_super->log_blocks_per_seg); + + if (segment_count > F2FS_MAX_SEGMENT || + segment_count < F2FS_MIN_SEGMENTS) { + f2fs_msg(sb, KERN_INFO, + "Invalid segment count (%u)", + segment_count); + return 1; + } + + if (total_sections > segment_count || + total_sections < F2FS_MIN_SEGMENTS || + segs_per_sec > segment_count || !segs_per_sec) { + f2fs_msg(sb, KERN_INFO, + "Invalid segment/section count (%u, %u x %u)", + segment_count, total_sections, segs_per_sec); + return 1; + } + + if ((segment_count / segs_per_sec) < total_sections) { + f2fs_msg(sb, KERN_INFO, + "Small segment_count (%u < %u * %u)", + segment_count, segs_per_sec, total_sections); + return 1; + } + + if (segment_count > (le32_to_cpu(raw_super->block_count) >> 9)) { + f2fs_msg(sb, KERN_INFO, + "Wrong segment_count / block_count (%u > %u)", + segment_count, le32_to_cpu(raw_super->block_count)); + return 1; + } + + if (secs_per_zone > total_sections) { + f2fs_msg(sb, KERN_INFO, + "Wrong secs_per_zone (%u > %u)", + secs_per_zone, total_sections); + return 1; + } + if (le32_to_cpu(raw_super->extension_count) > F2FS_MAX_EXTENSION) { + f2fs_msg(sb, KERN_INFO, + "Corrupted extension count (%u > %u)", + le32_to_cpu(raw_super->extension_count), + F2FS_MAX_EXTENSION); + return 1; + } + + if (le32_to_cpu(raw_super->cp_payload) > + (blocks_per_seg - F2FS_CP_PACKS)) { + f2fs_msg(sb, KERN_INFO, + "Insane cp_payload (%u > %u)", + le32_to_cpu(raw_super->cp_payload), + blocks_per_seg - F2FS_CP_PACKS); + return 1; + } + /* check reserved ino info */ if (le32_to_cpu(raw_super->node_ino) != 1 || le32_to_cpu(raw_super->meta_ino) != 2 || @@ -1875,13 +1939,6 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, return 1; }
- if (le32_to_cpu(raw_super->segment_count) > F2FS_MAX_SEGMENT) { - f2fs_msg(sb, KERN_INFO, - "Invalid segment count (%u)", - le32_to_cpu(raw_super->segment_count)); - return 1; - } - /* check CP/SIT/NAT/SSA/MAIN_AREA area boundary */ if (sanity_check_area_boundary(sbi, bh)) return 1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 7b525dd01365c6764018e374d391c92466be1b7a upstream.
- rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability. - introduce is_valid_blkaddr() for cleanup.
No logic change in this patch.
Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/checkpoint.c | 4 ++-- fs/f2fs/data.c | 18 +++++------------- fs/f2fs/f2fs.h | 9 ++++++++- fs/f2fs/file.c | 2 +- fs/f2fs/inode.c | 2 +- fs/f2fs/node.c | 5 ++--- fs/f2fs/recovery.c | 6 +++--- fs/f2fs/segment.c | 4 ++-- fs/f2fs/segment.h | 2 +- 9 files changed, 25 insertions(+), 27 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 1bd4cd8c79c6..2b951046657e 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -118,7 +118,7 @@ struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) return __get_meta_page(sbi, index, false); }
-bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) +bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) { switch (type) { case META_NAT: @@ -174,7 +174,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, blk_start_plug(&plug); for (; nrpages-- > 0; blkno++) {
- if (!is_valid_blkaddr(sbi, blkno, type)) + if (!is_valid_meta_blkaddr(sbi, blkno, type)) goto out;
switch (type) { diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 5913de3f661d..f7b2909e9c5f 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -413,7 +413,7 @@ next: spin_unlock(&io->io_lock); }
- if (fio->old_blkaddr != NEW_ADDR) + if (is_valid_blkaddr(fio->old_blkaddr)) verify_block_addr(fio, fio->old_blkaddr); verify_block_addr(fio, fio->new_blkaddr);
@@ -946,7 +946,7 @@ next_dnode: next_block: blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
- if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) { + if (!is_valid_blkaddr(blkaddr)) { if (create) { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; @@ -1388,15 +1388,6 @@ static inline bool need_inplace_update(struct f2fs_io_info *fio) return need_inplace_update_policy(inode, fio); }
-static inline bool valid_ipu_blkaddr(struct f2fs_io_info *fio) -{ - if (fio->old_blkaddr == NEW_ADDR) - return false; - if (fio->old_blkaddr == NULL_ADDR) - return false; - return true; -} - int do_write_data_page(struct f2fs_io_info *fio) { struct page *page = fio->page; @@ -1411,7 +1402,7 @@ int do_write_data_page(struct f2fs_io_info *fio) f2fs_lookup_extent_cache(inode, page->index, &ei)) { fio->old_blkaddr = ei.blk + page->index - ei.fofs;
- if (valid_ipu_blkaddr(fio)) { + if (is_valid_blkaddr(fio->old_blkaddr)) { ipu_force = true; fio->need_lock = LOCK_DONE; goto got_it; @@ -1438,7 +1429,8 @@ got_it: * If current allocation needs SSR, * it had better in-place writes for updated data. */ - if (ipu_force || (valid_ipu_blkaddr(fio) && need_inplace_update(fio))) { + if (ipu_force || (is_valid_blkaddr(fio->old_blkaddr) && + need_inplace_update(fio))) { err = encrypt_one_page(fio); if (err) goto out_writepage; diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index aa7b033af1b0..d74c77b51b71 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2355,6 +2355,13 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, spin_unlock(&sbi->iostat_lock); }
+static inline bool is_valid_blkaddr(block_t blkaddr) +{ + if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + return false; + return true; +} + /* * file.c */ @@ -2565,7 +2572,7 @@ void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io); struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index); -bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); +bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type, bool sync); void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index); diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 6f589730782d..d368eda462bb 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -334,7 +334,7 @@ static bool __found_offset(block_t blkaddr, pgoff_t dirty, pgoff_t pgofs, switch (whence) { case SEEK_DATA: if ((blkaddr == NEW_ADDR && dirty == pgofs) || - (blkaddr != NEW_ADDR && blkaddr != NULL_ADDR)) + is_valid_blkaddr(blkaddr)) return true; break; case SEEK_HOLE: diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 259b0aa283f0..2bcbbf566f0c 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -66,7 +66,7 @@ static bool __written_first_block(struct f2fs_inode *ri) { block_t addr = le32_to_cpu(ri->i_addr[offset_in_addr(ri)]);
- if (addr != NEW_ADDR && addr != NULL_ADDR) + if (is_valid_blkaddr(addr)) return true; return false; } diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 712505ec5de4..5f212fb2d62b 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -334,8 +334,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, new_blkaddr == NULL_ADDR); f2fs_bug_on(sbi, nat_get_blkaddr(e) == NEW_ADDR && new_blkaddr == NEW_ADDR); - f2fs_bug_on(sbi, nat_get_blkaddr(e) != NEW_ADDR && - nat_get_blkaddr(e) != NULL_ADDR && + f2fs_bug_on(sbi, is_valid_blkaddr(nat_get_blkaddr(e)) && new_blkaddr == NEW_ADDR);
/* increment version no as node is removed */ @@ -350,7 +349,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni,
/* change address */ nat_set_blkaddr(e, new_blkaddr); - if (new_blkaddr == NEW_ADDR || new_blkaddr == NULL_ADDR) + if (!is_valid_blkaddr(new_blkaddr)) set_nat_flag(e, IS_CHECKPOINTED, false); __set_nat_cache_dirty(nm_i, e);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 765fadf954af..53f41ad4cbe1 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -236,7 +236,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, while (1) { struct fsync_inode_entry *entry;
- if (!is_valid_blkaddr(sbi, blkaddr, META_POR)) + if (!is_valid_meta_blkaddr(sbi, blkaddr, META_POR)) return 0;
page = get_tmp_page(sbi, blkaddr); @@ -479,7 +479,7 @@ retry_dn: }
/* dest is valid block, try to recover from src to dest */ - if (is_valid_blkaddr(sbi, dest, META_POR)) { + if (is_valid_meta_blkaddr(sbi, dest, META_POR)) {
if (src == NULL_ADDR) { err = reserve_new_block(&dn); @@ -540,7 +540,7 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list, while (1) { struct fsync_inode_entry *entry;
- if (!is_valid_blkaddr(sbi, blkaddr, META_POR)) + if (!is_valid_meta_blkaddr(sbi, blkaddr, META_POR)) break;
ra_meta_pages_cond(sbi, blkaddr); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 1104a6c80251..483d7a869679 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1758,7 +1758,7 @@ bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr) struct seg_entry *se; bool is_cp = false;
- if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + if (!is_valid_blkaddr(blkaddr)) return true;
mutex_lock(&sit_i->sentry_lock); @@ -2571,7 +2571,7 @@ void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr) { struct page *cpage;
- if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) + if (!is_valid_blkaddr(blkaddr)) return;
cpage = find_lock_page(META_MAPPING(sbi), blkaddr); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 4b635e8c91b0..f977774338c3 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -85,7 +85,7 @@ (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & ((sbi)->blocks_per_seg - 1))
#define GET_SEGNO(sbi, blk_addr) \ - ((((blk_addr) == NULL_ADDR) || ((blk_addr) == NEW_ADDR)) ? \ + ((!is_valid_blkaddr(blk_addr)) ? \ NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \ GET_SEGNO_FROM_SEG0(sbi, blk_addr))) #define BLKS_PER_SEC(sbi) \
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit e1da7872f6eda977bd812346bf588c35e4495a1e upstream.
This patch introduces verify_blkaddr to check meta/data block address with valid range to detect bug earlier.
In addition, once we encounter an invalid blkaddr, notice user to run fsck to fix, and let the kernel panic.
Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org [bwh: Backported to 4.14: I skipped an earlier renaming of is_valid_meta_blkaddr() to f2fs_is_valid_meta_blkaddr()] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/checkpoint.c | 11 +++++++++-- fs/f2fs/data.c | 8 ++++---- fs/f2fs/f2fs.h | 32 +++++++++++++++++++++++++++++--- fs/f2fs/file.c | 9 +++++---- fs/f2fs/inode.c | 7 ++++--- fs/f2fs/node.c | 4 ++-- fs/f2fs/recovery.c | 6 +++--- fs/f2fs/segment.c | 4 ++-- fs/f2fs/segment.h | 8 +++----- 9 files changed, 61 insertions(+), 28 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index 2b951046657e..d7bd9745e883 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -118,7 +118,8 @@ struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index) return __get_meta_page(sbi, index, false); }
-bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) +bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type) { switch (type) { case META_NAT: @@ -138,10 +139,16 @@ bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type) return false; break; case META_POR: + case DATA_GENERIC: if (unlikely(blkaddr >= MAX_BLKADDR(sbi) || blkaddr < MAIN_BLKADDR(sbi))) return false; break; + case META_GENERIC: + if (unlikely(blkaddr < SEG0_BLKADDR(sbi) || + blkaddr >= MAIN_BLKADDR(sbi))) + return false; + break; default: BUG(); } @@ -174,7 +181,7 @@ int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, blk_start_plug(&plug); for (; nrpages-- > 0; blkno++) {
- if (!is_valid_meta_blkaddr(sbi, blkno, type)) + if (!f2fs_is_valid_blkaddr(sbi, blkno, type)) goto out;
switch (type) { diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index f7b2909e9c5f..615878806611 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -413,7 +413,7 @@ next: spin_unlock(&io->io_lock); }
- if (is_valid_blkaddr(fio->old_blkaddr)) + if (__is_valid_data_blkaddr(fio->old_blkaddr)) verify_block_addr(fio, fio->old_blkaddr); verify_block_addr(fio, fio->new_blkaddr);
@@ -946,7 +946,7 @@ next_dnode: next_block: blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
- if (!is_valid_blkaddr(blkaddr)) { + if (!is_valid_data_blkaddr(sbi, blkaddr)) { if (create) { if (unlikely(f2fs_cp_error(sbi))) { err = -EIO; @@ -1402,7 +1402,7 @@ int do_write_data_page(struct f2fs_io_info *fio) f2fs_lookup_extent_cache(inode, page->index, &ei)) { fio->old_blkaddr = ei.blk + page->index - ei.fofs;
- if (is_valid_blkaddr(fio->old_blkaddr)) { + if (is_valid_data_blkaddr(fio->sbi, fio->old_blkaddr)) { ipu_force = true; fio->need_lock = LOCK_DONE; goto got_it; @@ -1429,7 +1429,7 @@ got_it: * If current allocation needs SSR, * it had better in-place writes for updated data. */ - if (ipu_force || (is_valid_blkaddr(fio->old_blkaddr) && + if (ipu_force || (is_valid_data_blkaddr(fio->sbi, fio->old_blkaddr) && need_inplace_update(fio))) { err = encrypt_one_page(fio); if (err) diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d74c77b51b71..d15d79457f5c 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -162,7 +162,7 @@ struct cp_control { };
/* - * For CP/NAT/SIT/SSA readahead + * indicate meta/data type */ enum { META_CP, @@ -170,6 +170,8 @@ enum { META_SIT, META_SSA, META_POR, + DATA_GENERIC, + META_GENERIC, };
/* for the list of ino */ @@ -2355,13 +2357,36 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, spin_unlock(&sbi->iostat_lock); }
-static inline bool is_valid_blkaddr(block_t blkaddr) +bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type); +void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); +static inline void verify_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type) +{ + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, type)) { + f2fs_msg(sbi->sb, KERN_ERR, + "invalid blkaddr: %u, type: %d, run fsck to fix.", + blkaddr, type); + f2fs_bug_on(sbi, 1); + } +} + +static inline bool __is_valid_data_blkaddr(block_t blkaddr) { if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) return false; return true; }
+static inline bool is_valid_data_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr) +{ + if (!__is_valid_data_blkaddr(blkaddr)) + return false; + verify_blkaddr(sbi, blkaddr, DATA_GENERIC); + return true; +} + /* * file.c */ @@ -2572,7 +2597,8 @@ void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io); struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index); struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index); -bool is_valid_meta_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); +bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, + block_t blkaddr, int type); int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages, int type, bool sync); void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index); diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index d368eda462bb..e9c575ef70b5 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -328,13 +328,13 @@ static pgoff_t __get_first_dirty_index(struct address_space *mapping, return pgofs; }
-static bool __found_offset(block_t blkaddr, pgoff_t dirty, pgoff_t pgofs, - int whence) +static bool __found_offset(struct f2fs_sb_info *sbi, block_t blkaddr, + pgoff_t dirty, pgoff_t pgofs, int whence) { switch (whence) { case SEEK_DATA: if ((blkaddr == NEW_ADDR && dirty == pgofs) || - is_valid_blkaddr(blkaddr)) + is_valid_data_blkaddr(sbi, blkaddr)) return true; break; case SEEK_HOLE: @@ -397,7 +397,8 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
- if (__found_offset(blkaddr, dirty, pgofs, whence)) { + if (__found_offset(F2FS_I_SB(inode), blkaddr, dirty, + pgofs, whence)) { f2fs_put_dnode(&dn); goto found; } diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 2bcbbf566f0c..b1c0eb433841 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -62,11 +62,12 @@ static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri) } }
-static bool __written_first_block(struct f2fs_inode *ri) +static bool __written_first_block(struct f2fs_sb_info *sbi, + struct f2fs_inode *ri) { block_t addr = le32_to_cpu(ri->i_addr[offset_in_addr(ri)]);
- if (is_valid_blkaddr(addr)) + if (is_valid_data_blkaddr(sbi, addr)) return true; return false; } @@ -235,7 +236,7 @@ static int do_read_inode(struct inode *inode) /* get rdev by using inline_info */ __get_inode_rdev(inode, ri);
- if (__written_first_block(ri)) + if (__written_first_block(sbi, ri)) set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN);
if (!need_inode_block_update(sbi, inode->i_ino)) diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 5f212fb2d62b..999814c8cbea 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -334,7 +334,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni, new_blkaddr == NULL_ADDR); f2fs_bug_on(sbi, nat_get_blkaddr(e) == NEW_ADDR && new_blkaddr == NEW_ADDR); - f2fs_bug_on(sbi, is_valid_blkaddr(nat_get_blkaddr(e)) && + f2fs_bug_on(sbi, is_valid_data_blkaddr(sbi, nat_get_blkaddr(e)) && new_blkaddr == NEW_ADDR);
/* increment version no as node is removed */ @@ -349,7 +349,7 @@ static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni,
/* change address */ nat_set_blkaddr(e, new_blkaddr); - if (!is_valid_blkaddr(new_blkaddr)) + if (!is_valid_data_blkaddr(sbi, new_blkaddr)) set_nat_flag(e, IS_CHECKPOINTED, false); __set_nat_cache_dirty(nm_i, e);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c index 53f41ad4cbe1..6ea445377767 100644 --- a/fs/f2fs/recovery.c +++ b/fs/f2fs/recovery.c @@ -236,7 +236,7 @@ static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head, while (1) { struct fsync_inode_entry *entry;
- if (!is_valid_meta_blkaddr(sbi, blkaddr, META_POR)) + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, META_POR)) return 0;
page = get_tmp_page(sbi, blkaddr); @@ -479,7 +479,7 @@ retry_dn: }
/* dest is valid block, try to recover from src to dest */ - if (is_valid_meta_blkaddr(sbi, dest, META_POR)) { + if (f2fs_is_valid_blkaddr(sbi, dest, META_POR)) {
if (src == NULL_ADDR) { err = reserve_new_block(&dn); @@ -540,7 +540,7 @@ static int recover_data(struct f2fs_sb_info *sbi, struct list_head *inode_list, while (1) { struct fsync_inode_entry *entry;
- if (!is_valid_meta_blkaddr(sbi, blkaddr, META_POR)) + if (!f2fs_is_valid_blkaddr(sbi, blkaddr, META_POR)) break;
ra_meta_pages_cond(sbi, blkaddr); diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 483d7a869679..5c698757e116 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -1758,7 +1758,7 @@ bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr) struct seg_entry *se; bool is_cp = false;
- if (!is_valid_blkaddr(blkaddr)) + if (!is_valid_data_blkaddr(sbi, blkaddr)) return true;
mutex_lock(&sit_i->sentry_lock); @@ -2571,7 +2571,7 @@ void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr) { struct page *cpage;
- if (!is_valid_blkaddr(blkaddr)) + if (!is_valid_data_blkaddr(sbi, blkaddr)) return;
cpage = find_lock_page(META_MAPPING(sbi), blkaddr); diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index f977774338c3..dd8f977fc273 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -85,7 +85,7 @@ (GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & ((sbi)->blocks_per_seg - 1))
#define GET_SEGNO(sbi, blk_addr) \ - ((!is_valid_blkaddr(blk_addr)) ? \ + ((!is_valid_data_blkaddr(sbi, blk_addr)) ? \ NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \ GET_SEGNO_FROM_SEG0(sbi, blk_addr))) #define BLKS_PER_SEC(sbi) \ @@ -631,11 +631,9 @@ static inline void verify_block_addr(struct f2fs_io_info *fio, block_t blk_addr)
if (PAGE_TYPE_OF_BIO(fio->type) == META && (!is_read_io(fio->op) || fio->is_meta)) - BUG_ON(blk_addr < SEG0_BLKADDR(sbi) || - blk_addr >= MAIN_BLKADDR(sbi)); + verify_blkaddr(sbi, blk_addr, META_GENERIC); else - BUG_ON(blk_addr < MAIN_BLKADDR(sbi) || - blk_addr >= MAX_BLKADDR(sbi)); + verify_blkaddr(sbi, blk_addr, DATA_GENERIC); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 42bf546c1fe3f3654bdf914e977acbc2b80a5be5 upstream.
As Wen Xu reported in below link:
https://bugzilla.kernel.org/show_bug.cgi?id=200183
- Overview Divide zero in reset_curseg() when mounting a crafted f2fs image
- Reproduce
- Kernel message [ 588.281510] divide error: 0000 [#1] SMP KASAN PTI [ 588.282701] CPU: 0 PID: 1293 Comm: mount Not tainted 4.18.0-rc1+ #4 [ 588.284000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 588.286178] RIP: 0010:reset_curseg+0x94/0x1a0 [ 588.298166] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246 [ 588.299360] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b [ 588.300809] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64 [ 588.305272] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000 [ 588.306822] FS: 00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 588.308456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 588.309623] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0 [ 588.311085] Call Trace: [ 588.311637] f2fs_build_segment_manager+0x103f/0x3410 [ 588.316136] ? f2fs_commit_super+0x1b0/0x1b0 [ 588.317031] ? set_blocksize+0x90/0x140 [ 588.319473] f2fs_mount+0x15/0x20 [ 588.320166] mount_fs+0x60/0x1a0 [ 588.320847] ? alloc_vfsmnt+0x309/0x360 [ 588.321647] vfs_kern_mount+0x6b/0x1a0 [ 588.322432] do_mount+0x34a/0x18c0 [ 588.323175] ? strndup_user+0x46/0x70 [ 588.323937] ? copy_mount_string+0x20/0x20 [ 588.324793] ? memcg_kmem_put_cache+0x1b/0xa0 [ 588.325702] ? kasan_check_write+0x14/0x20 [ 588.326562] ? _copy_from_user+0x6a/0x90 [ 588.327375] ? memdup_user+0x42/0x60 [ 588.328118] ksys_mount+0x83/0xd0 [ 588.328808] __x64_sys_mount+0x67/0x80 [ 588.329607] do_syscall_64+0x78/0x170 [ 588.330400] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 588.331461] RIP: 0033:0x7fad848e8b9a [ 588.336022] RSP: 002b:00007ffd7c5b6be8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 588.337547] RAX: ffffffffffffffda RBX: 00000000016f8030 RCX: 00007fad848e8b9a [ 588.338999] RDX: 00000000016f8210 RSI: 00000000016f9f30 RDI: 0000000001700ec0 [ 588.340442] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 588.341887] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001700ec0 [ 588.343341] R13: 00000000016f8210 R14: 0000000000000000 R15: 0000000000000003 [ 588.354891] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 588.355862] RIP: 0010:reset_curseg+0x94/0x1a0 [ 588.360742] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246 [ 588.361812] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b [ 588.363485] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64 [ 588.365213] RBP: ffff8801e88d7968 R08: ffffed003c32266f R09: ffffed003c32266f [ 588.366661] R10: 0000000000000001 R11: ffffed003c32266e R12: ffff8801f0337700 [ 588.368110] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000 [ 588.370057] FS: 00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 588.372099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 588.373291] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0
- Location https://elixir.bootlin.com/linux/latest/source/fs/f2fs/segment.c#L2147 curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
If secs_per_zone is corrupted due to fuzzing test, it will cause divide zero operation when using GET_ZONE_FROM_SEG macro, so we should do more sanity check with secs_per_zone during mount to avoid this issue.
Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index c3e1090e7c0a..084ff9c82a37 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1904,9 +1904,9 @@ static int sanity_check_raw_super(struct f2fs_sb_info *sbi, return 1; }
- if (secs_per_zone > total_sections) { + if (secs_per_zone > total_sections || !secs_per_zone) { f2fs_msg(sb, KERN_INFO, - "Wrong secs_per_zone (%u > %u)", + "Wrong secs_per_zone / total_sections (%u, %u)", secs_per_zone, total_sections); return 1; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
This was done as part of commit 5d64600d4f33 "f2fs: avoid bug_on on corrupted inode" upstream, but the specific check that commit added is not applicable to 4.14.
Cc: Jaegeuk Kim jaegeuk@kernel.org Cc: Chao Yu yuchao0@huawei.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/inode.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index b1c0eb433841..917a6c3d5649 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -180,6 +180,13 @@ void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page) ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page)); }
+static bool sanity_check_inode(struct inode *inode) +{ + struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + + return true; +} + static int do_read_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); @@ -281,6 +288,10 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) ret = do_read_inode(inode); if (ret) goto bad_inode; + if (!sanity_check_inode(inode)) { + ret = -EINVAL; + goto bad_inode; + } make_now: if (ino == F2FS_NODE_INO(sbi)) { inode->i_mapping->a_ops = &f2fs_node_aops;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 76d56d4ab4f2a9e4f085c7d77172194ddaccf7d2 upstream.
If FI_EXTRA_ATTR is set in inode by fuzzing, inode.i_addr[0] will be parsed as inode.i_extra_isize, then in __recover_inline_status, inline data address will beyond boundary of page, result in accessing invalid memory.
So in this condition, during reading inode page, let's do sanity check with EXTRA_ATTR feature of fs and extra_attr bit of inode, if they're inconsistent, deny to load this inode.
- Overview Out-of-bound access in f2fs_iget() when mounting a corrupted f2fs image
- Reproduce
The following message will be got in KASAN build of 4.18 upstream kernel. [ 819.392227] ================================================================== [ 819.393901] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0x736/0x1530 [ 819.395329] Read of size 4 at addr ffff8801f099c968 by task mount/1292
[ 819.397079] CPU: 1 PID: 1292 Comm: mount Not tainted 4.18.0-rc1+ #4 [ 819.397082] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 819.397088] Call Trace: [ 819.397124] dump_stack+0x7b/0xb5 [ 819.397154] print_address_description+0x70/0x290 [ 819.397159] kasan_report+0x291/0x390 [ 819.397163] ? f2fs_iget+0x736/0x1530 [ 819.397176] check_memory_region+0x139/0x190 [ 819.397182] __asan_loadN+0xf/0x20 [ 819.397185] f2fs_iget+0x736/0x1530 [ 819.397197] f2fs_fill_super+0x1b4f/0x2b40 [ 819.397202] ? f2fs_fill_super+0x1b4f/0x2b40 [ 819.397208] ? f2fs_commit_super+0x1b0/0x1b0 [ 819.397227] ? set_blocksize+0x90/0x140 [ 819.397241] mount_bdev+0x1c5/0x210 [ 819.397245] ? f2fs_commit_super+0x1b0/0x1b0 [ 819.397252] f2fs_mount+0x15/0x20 [ 819.397256] mount_fs+0x60/0x1a0 [ 819.397267] ? alloc_vfsmnt+0x309/0x360 [ 819.397272] vfs_kern_mount+0x6b/0x1a0 [ 819.397282] do_mount+0x34a/0x18c0 [ 819.397300] ? lockref_put_or_lock+0xcf/0x160 [ 819.397306] ? copy_mount_string+0x20/0x20 [ 819.397318] ? memcg_kmem_put_cache+0x1b/0xa0 [ 819.397324] ? kasan_check_write+0x14/0x20 [ 819.397334] ? _copy_from_user+0x6a/0x90 [ 819.397353] ? memdup_user+0x42/0x60 [ 819.397359] ksys_mount+0x83/0xd0 [ 819.397365] __x64_sys_mount+0x67/0x80 [ 819.397388] do_syscall_64+0x78/0x170 [ 819.397403] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 819.397422] RIP: 0033:0x7f54c667cb9a [ 819.397424] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 819.397483] RSP: 002b:00007ffd8f46cd08 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 819.397496] RAX: ffffffffffffffda RBX: 0000000000dfa030 RCX: 00007f54c667cb9a [ 819.397498] RDX: 0000000000dfa210 RSI: 0000000000dfbf30 RDI: 0000000000e02ec0 [ 819.397501] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 819.397503] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000000000e02ec0 [ 819.397505] R13: 0000000000dfa210 R14: 0000000000000000 R15: 0000000000000003
[ 819.397866] Allocated by task 139: [ 819.398702] save_stack+0x46/0xd0 [ 819.398705] kasan_kmalloc+0xad/0xe0 [ 819.398709] kasan_slab_alloc+0x11/0x20 [ 819.398713] kmem_cache_alloc+0xd1/0x1e0 [ 819.398717] dup_fd+0x50/0x4c0 [ 819.398740] copy_process.part.37+0xbed/0x32e0 [ 819.398744] _do_fork+0x16e/0x590 [ 819.398748] __x64_sys_clone+0x69/0x80 [ 819.398752] do_syscall_64+0x78/0x170 [ 819.398756] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 819.399097] Freed by task 159: [ 819.399743] save_stack+0x46/0xd0 [ 819.399747] __kasan_slab_free+0x13c/0x1a0 [ 819.399750] kasan_slab_free+0xe/0x10 [ 819.399754] kmem_cache_free+0x89/0x1e0 [ 819.399757] put_files_struct+0x132/0x150 [ 819.399761] exit_files+0x62/0x70 [ 819.399766] do_exit+0x47b/0x1390 [ 819.399770] do_group_exit+0x86/0x130 [ 819.399774] __x64_sys_exit_group+0x2c/0x30 [ 819.399778] do_syscall_64+0x78/0x170 [ 819.399782] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 819.400115] The buggy address belongs to the object at ffff8801f099c680 which belongs to the cache files_cache of size 704 [ 819.403234] The buggy address is located 40 bytes to the right of 704-byte region [ffff8801f099c680, ffff8801f099c940) [ 819.405689] The buggy address belongs to the page: [ 819.406709] page:ffffea0007c26700 count:1 mapcount:0 mapping:ffff8801f69a3340 index:0xffff8801f099d380 compound_mapcount: 0 [ 819.408984] flags: 0x2ffff0000008100(slab|head) [ 819.409932] raw: 02ffff0000008100 ffffea00077fb600 0000000200000002 ffff8801f69a3340 [ 819.411514] raw: ffff8801f099d380 0000000080130000 00000001ffffffff 0000000000000000 [ 819.413073] page dumped because: kasan: bad access detected
[ 819.414539] Memory state around the buggy address: [ 819.415521] ffff8801f099c800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 819.416981] ffff8801f099c880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 819.418454] >ffff8801f099c900: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 819.419921] ^ [ 819.421265] ffff8801f099c980: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb [ 819.422745] ffff8801f099ca00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 819.424206] ================================================================== [ 819.425668] Disabling lock debugging due to kernel taint [ 819.457463] F2FS-fs (loop0): Mounted with checkpoint version = 3
The kernel still mounts the image. If you run the following program on the mounted folder mnt,
(poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz; int err;
static int buf[8192]; memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint); int fd = open(foo_bar_baz, O_RDONLY, 0); if (fd >= 0) { read(fd, (char *)buf, 11); close(fd); } }
int main(int argc, char *argv[]) { activity(argv[1]); return 0; }
You can get kernel crash: [ 819.457463] F2FS-fs (loop0): Mounted with checkpoint version = 3 [ 918.028501] BUG: unable to handle kernel paging request at ffffed0048000d82 [ 918.044020] PGD 23ffee067 P4D 23ffee067 PUD 23fbef067 PMD 0 [ 918.045207] Oops: 0000 [#1] SMP KASAN PTI [ 918.046048] CPU: 0 PID: 1309 Comm: poc Tainted: G B 4.18.0-rc1+ #4 [ 918.047573] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 918.049552] RIP: 0010:check_memory_region+0x5e/0x190 [ 918.050565] Code: f8 49 c1 e8 03 49 89 db 49 c1 eb 03 4d 01 cb 4d 01 c1 4d 8d 63 01 4c 89 c8 4d 89 e2 4d 29 ca 49 83 fa 10 7f 3d 4d 85 d2 74 32 <41> 80 39 00 75 23 48 b8 01 00 00 00 00 fc ff df 4d 01 d1 49 01 c0 [ 918.054322] RSP: 0018:ffff8801e3a1f258 EFLAGS: 00010202 [ 918.055400] RAX: ffffed0048000d82 RBX: ffff880240006c11 RCX: ffffffffb8867d14 [ 918.056832] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880240006c10 [ 918.058253] RBP: ffff8801e3a1f268 R08: 1ffff10048000d82 R09: ffffed0048000d82 [ 918.059717] R10: 0000000000000001 R11: ffffed0048000d82 R12: ffffed0048000d83 [ 918.061159] R13: ffff8801e3a1f390 R14: 0000000000000000 R15: ffff880240006c08 [ 918.062614] FS: 00007fac9732c700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 918.064246] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 918.065412] CR2: ffffed0048000d82 CR3: 00000001df77a000 CR4: 00000000000006f0 [ 918.066882] Call Trace: [ 918.067410] __asan_loadN+0xf/0x20 [ 918.068149] f2fs_find_target_dentry+0xf4/0x270 [ 918.069083] ? __get_node_page+0x331/0x5b0 [ 918.069925] f2fs_find_in_inline_dir+0x24b/0x310 [ 918.070881] ? f2fs_recover_inline_data+0x4c0/0x4c0 [ 918.071905] ? unwind_next_frame.part.5+0x34f/0x490 [ 918.072901] ? unwind_dump+0x290/0x290 [ 918.073695] ? is_bpf_text_address+0xe/0x20 [ 918.074566] __f2fs_find_entry+0x599/0x670 [ 918.075408] ? kasan_unpoison_shadow+0x36/0x50 [ 918.076315] ? kasan_kmalloc+0xad/0xe0 [ 918.077100] ? memcg_kmem_put_cache+0x55/0xa0 [ 918.077998] ? f2fs_find_target_dentry+0x270/0x270 [ 918.079006] ? d_set_d_op+0x30/0x100 [ 918.079749] ? __d_lookup_rcu+0x69/0x2e0 [ 918.080556] ? __d_alloc+0x275/0x450 [ 918.081297] ? kasan_check_write+0x14/0x20 [ 918.082135] ? memset+0x31/0x40 [ 918.082820] ? fscrypt_setup_filename+0x1ec/0x4c0 [ 918.083782] ? d_alloc_parallel+0x5bb/0x8c0 [ 918.084640] f2fs_find_entry+0xe9/0x110 [ 918.085432] ? __f2fs_find_entry+0x670/0x670 [ 918.086308] ? kasan_check_write+0x14/0x20 [ 918.087163] f2fs_lookup+0x297/0x590 [ 918.087902] ? f2fs_link+0x2b0/0x2b0 [ 918.088646] ? legitimize_path.isra.29+0x61/0xa0 [ 918.089589] __lookup_slow+0x12e/0x240 [ 918.090371] ? may_delete+0x2b0/0x2b0 [ 918.091123] ? __nd_alloc_stack+0xa0/0xa0 [ 918.091944] lookup_slow+0x44/0x60 [ 918.092642] walk_component+0x3ee/0xa40 [ 918.093428] ? is_bpf_text_address+0xe/0x20 [ 918.094283] ? pick_link+0x3e0/0x3e0 [ 918.095047] ? in_group_p+0xa5/0xe0 [ 918.095771] ? generic_permission+0x53/0x1e0 [ 918.096666] ? security_inode_permission+0x1d/0x70 [ 918.097646] ? inode_permission+0x7a/0x1f0 [ 918.098497] link_path_walk+0x2a2/0x7b0 [ 918.099298] ? apparmor_capget+0x3d0/0x3d0 [ 918.100140] ? walk_component+0xa40/0xa40 [ 918.100958] ? path_init+0x2e6/0x580 [ 918.101695] path_openat+0x1bb/0x2160 [ 918.102471] ? __save_stack_trace+0x92/0x100 [ 918.103352] ? save_stack+0xb5/0xd0 [ 918.104070] ? vfs_unlink+0x250/0x250 [ 918.104822] ? save_stack+0x46/0xd0 [ 918.105538] ? kasan_slab_alloc+0x11/0x20 [ 918.106370] ? kmem_cache_alloc+0xd1/0x1e0 [ 918.107213] ? getname_flags+0x76/0x2c0 [ 918.107997] ? getname+0x12/0x20 [ 918.108677] ? do_sys_open+0x14b/0x2c0 [ 918.109450] ? __x64_sys_open+0x4c/0x60 [ 918.110255] ? do_syscall_64+0x78/0x170 [ 918.111083] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 918.112148] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 918.113204] ? f2fs_empty_inline_dir+0x1e0/0x1e0 [ 918.114150] ? timespec64_trunc+0x5c/0x90 [ 918.114993] ? wb_io_lists_depopulated+0x1a/0xc0 [ 918.115937] ? inode_io_list_move_locked+0x102/0x110 [ 918.116949] do_filp_open+0x12b/0x1d0 [ 918.117709] ? may_open_dev+0x50/0x50 [ 918.118475] ? kasan_kmalloc+0xad/0xe0 [ 918.119246] do_sys_open+0x17c/0x2c0 [ 918.119983] ? do_sys_open+0x17c/0x2c0 [ 918.120751] ? filp_open+0x60/0x60 [ 918.121463] ? task_work_run+0x4d/0xf0 [ 918.122237] __x64_sys_open+0x4c/0x60 [ 918.123001] do_syscall_64+0x78/0x170 [ 918.123759] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 918.124802] RIP: 0033:0x7fac96e3e040 [ 918.125537] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 09 27 2d 00 00 75 10 b8 02 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 7e e0 01 00 48 89 04 24 [ 918.129341] RSP: 002b:00007fff1b37f848 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 [ 918.130870] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fac96e3e040 [ 918.132295] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000122d080 [ 918.133748] RBP: 00007fff1b37f9b0 R08: 00007fac9710bbd8 R09: 0000000000000001 [ 918.135209] R10: 000000000000069d R11: 0000000000000246 R12: 0000000000400c20 [ 918.136650] R13: 00007fff1b37fab0 R14: 0000000000000000 R15: 0000000000000000 [ 918.138093] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 918.147924] CR2: ffffed0048000d82 [ 918.148619] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 918.149563] RIP: 0010:check_memory_region+0x5e/0x190 [ 918.150576] Code: f8 49 c1 e8 03 49 89 db 49 c1 eb 03 4d 01 cb 4d 01 c1 4d 8d 63 01 4c 89 c8 4d 89 e2 4d 29 ca 49 83 fa 10 7f 3d 4d 85 d2 74 32 <41> 80 39 00 75 23 48 b8 01 00 00 00 00 fc ff df 4d 01 d1 49 01 c0 [ 918.154360] RSP: 0018:ffff8801e3a1f258 EFLAGS: 00010202 [ 918.155411] RAX: ffffed0048000d82 RBX: ffff880240006c11 RCX: ffffffffb8867d14 [ 918.156833] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff880240006c10 [ 918.158257] RBP: ffff8801e3a1f268 R08: 1ffff10048000d82 R09: ffffed0048000d82 [ 918.159722] R10: 0000000000000001 R11: ffffed0048000d82 R12: ffffed0048000d83 [ 918.161149] R13: ffff8801e3a1f390 R14: 0000000000000000 R15: ffff880240006c08 [ 918.162587] FS: 00007fac9732c700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 918.164203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 918.165356] CR2: ffffed0048000d82 CR3: 00000001df77a000 CR4: 00000000000006f0
Reported-by: Wen Xu wen.xu@gatech.edu Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org [bwh: Backported to 4.14: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/inode.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 917a6c3d5649..428d502f23e8 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -184,6 +184,15 @@ static bool sanity_check_inode(struct inode *inode) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ if (f2fs_has_extra_attr(inode) && + !f2fs_sb_has_extra_attr(sbi->sb)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) is with extra_attr, " + "but extra_attr feature is off", + __func__, inode->i_ino); + return false; + } return true; }
@@ -233,6 +242,11 @@ static int do_read_inode(struct inode *inode)
get_inline_info(inode, ri);
+ if (!sanity_check_inode(inode)) { + f2fs_put_page(node_page, 1); + return -EINVAL; + } + fi->i_extra_isize = f2fs_has_extra_attr(inode) ? le16_to_cpu(ri->i_extra_isize) : 0;
@@ -288,10 +302,6 @@ struct inode *f2fs_iget(struct super_block *sb, unsigned long ino) ret = do_read_inode(inode); if (ret) goto bad_inode; - if (!sanity_check_inode(inode)) { - ret = -EINVAL; - goto bad_inode; - } make_now: if (ino == F2FS_NODE_INO(sbi)) { inode->i_mapping->a_ops = &f2fs_node_aops;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 9dc956b2c8523aed39d1e6508438be9fea28c8fc upstream.
This patch fixs to do sanity check with user_block_count.
- Overview Divide zero in utilization when mount() a corrupted f2fs image
- Reproduce (4.18 upstream kernel)
- Kernel message [ 564.099503] F2FS-fs (loop0): invalid crc value [ 564.101991] divide error: 0000 [#1] SMP KASAN PTI [ 564.103103] CPU: 1 PID: 1298 Comm: f2fs_discard-7: Not tainted 4.18.0-rc1+ #4 [ 564.104584] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 564.106624] RIP: 0010:issue_discard_thread+0x248/0x5c0 [ 564.107692] Code: ff ff 48 8b bd e8 fe ff ff 41 8b 9d 4c 04 00 00 e8 cd b8 ad ff 41 8b 85 50 04 00 00 31 d2 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <48> f7 f3 83 f8 50 7e 16 41 c7 86 7c ff ff ff 01 00 00 00 41 c7 86 [ 564.111686] RSP: 0018:ffff8801f3117dc0 EFLAGS: 00010206 [ 564.112775] RAX: 0000000000000384 RBX: 0000000000000000 RCX: ffffffffb88c1e03 [ 564.114250] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e3aa4850 [ 564.115706] RBP: ffff8801f3117f00 R08: 1ffffffff751a1d0 R09: fffffbfff751a1d0 [ 564.117177] R10: 0000000000000001 R11: fffffbfff751a1d0 R12: 00000000fffffffc [ 564.118634] R13: ffff8801e3aa4400 R14: ffff8801f3117ed8 R15: ffff8801e2050000 [ 564.120094] FS: 0000000000000000(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000 [ 564.121748] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 564.122923] CR2: 000000000202b078 CR3: 00000001f11ac000 CR4: 00000000000006e0 [ 564.124383] Call Trace: [ 564.124924] ? __issue_discard_cmd+0x480/0x480 [ 564.125882] ? __sched_text_start+0x8/0x8 [ 564.126756] ? __kthread_parkme+0xcb/0x100 [ 564.127620] ? kthread_blkcg+0x70/0x70 [ 564.128412] kthread+0x180/0x1d0 [ 564.129105] ? __issue_discard_cmd+0x480/0x480 [ 564.130029] ? kthread_associate_blkcg+0x150/0x150 [ 564.131033] ret_from_fork+0x35/0x40 [ 564.131794] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 564.141798] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 564.142773] RIP: 0010:issue_discard_thread+0x248/0x5c0 [ 564.143885] Code: ff ff 48 8b bd e8 fe ff ff 41 8b 9d 4c 04 00 00 e8 cd b8 ad ff 41 8b 85 50 04 00 00 31 d2 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <48> f7 f3 83 f8 50 7e 16 41 c7 86 7c ff ff ff 01 00 00 00 41 c7 86 [ 564.147776] RSP: 0018:ffff8801f3117dc0 EFLAGS: 00010206 [ 564.148856] RAX: 0000000000000384 RBX: 0000000000000000 RCX: ffffffffb88c1e03 [ 564.150424] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e3aa4850 [ 564.151906] RBP: ffff8801f3117f00 R08: 1ffffffff751a1d0 R09: fffffbfff751a1d0 [ 564.153463] R10: 0000000000000001 R11: fffffbfff751a1d0 R12: 00000000fffffffc [ 564.154915] R13: ffff8801e3aa4400 R14: ffff8801f3117ed8 R15: ffff8801e2050000 [ 564.156405] FS: 0000000000000000(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000 [ 564.158070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 564.159279] CR2: 000000000202b078 CR3: 00000001f11ac000 CR4: 00000000000006e0 [ 564.161043] ================================================================== [ 564.162587] BUG: KASAN: stack-out-of-bounds in from_kuid_munged+0x1d/0x50 [ 564.163994] Read of size 4 at addr ffff8801f3117c84 by task f2fs_discard-7:/1298
[ 564.165852] CPU: 1 PID: 1298 Comm: f2fs_discard-7: Tainted: G D 4.18.0-rc1+ #4 [ 564.167593] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 564.169522] Call Trace: [ 564.170057] dump_stack+0x7b/0xb5 [ 564.170778] print_address_description+0x70/0x290 [ 564.171765] kasan_report+0x291/0x390 [ 564.172540] ? from_kuid_munged+0x1d/0x50 [ 564.173408] __asan_load4+0x78/0x80 [ 564.174148] from_kuid_munged+0x1d/0x50 [ 564.174962] do_notify_parent+0x1f5/0x4f0 [ 564.175808] ? send_sigqueue+0x390/0x390 [ 564.176639] ? css_set_move_task+0x152/0x340 [ 564.184197] do_exit+0x1290/0x1390 [ 564.184950] ? __issue_discard_cmd+0x480/0x480 [ 564.185884] ? mm_update_next_owner+0x380/0x380 [ 564.186829] ? __sched_text_start+0x8/0x8 [ 564.187672] ? __kthread_parkme+0xcb/0x100 [ 564.188528] ? kthread_blkcg+0x70/0x70 [ 564.189333] ? kthread+0x180/0x1d0 [ 564.190052] ? __issue_discard_cmd+0x480/0x480 [ 564.190983] rewind_stack_do_exit+0x17/0x20
[ 564.192190] The buggy address belongs to the page: [ 564.193213] page:ffffea0007cc45c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 564.194856] flags: 0x2ffff0000000000() [ 564.195644] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000 [ 564.197247] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 564.198826] page dumped because: kasan: bad access detected
[ 564.200299] Memory state around the buggy address: [ 564.201306] ffff8801f3117b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 564.202779] ffff8801f3117c00: 00 00 00 00 00 00 00 00 00 00 00 f3 f3 f3 f3 f3 [ 564.204252] >ffff8801f3117c80: f3 f3 f3 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1 [ 564.205742] ^ [ 564.206424] ffff8801f3117d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 564.207908] ffff8801f3117d80: f3 f3 f3 f3 f3 f3 f3 f3 00 00 00 00 00 00 00 00 [ 564.209389] ================================================================== [ 564.231795] F2FS-fs (loop0): Mounted with checkpoint version = 2
- Location https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L586 return div_u64((u64)valid_user_blocks(sbi) * 100, sbi->user_block_count); Missing checks on sbi->user_block_count.
Reported-by: Wen Xu wen.xu@gatech.edu Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/super.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 084ff9c82a37..9fafb1404f39 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1956,6 +1956,8 @@ int sanity_check_ckpt(struct f2fs_sb_info *sbi) unsigned int sit_segs, nat_segs; unsigned int sit_bitmap_size, nat_bitmap_size; unsigned int log_blocks_per_seg; + unsigned int segment_count_main; + block_t user_block_count; int i;
total = le32_to_cpu(raw_super->segment_count); @@ -1980,6 +1982,16 @@ int sanity_check_ckpt(struct f2fs_sb_info *sbi) return 1; }
+ user_block_count = le64_to_cpu(ckpt->user_block_count); + segment_count_main = le32_to_cpu(raw_super->segment_count_main); + log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg); + if (!user_block_count || user_block_count >= + segment_count_main << log_blocks_per_seg) { + f2fs_msg(sbi->sb, KERN_ERR, + "Wrong user_block_count: %u", user_block_count); + return 1; + } + main_segs = le32_to_cpu(raw_super->segment_count_main); blocks_per_seg = sbi->blocks_per_seg;
@@ -1996,7 +2008,6 @@ int sanity_check_ckpt(struct f2fs_sb_info *sbi)
sit_bitmap_size = le32_to_cpu(ckpt->sit_ver_bitmap_bytesize); nat_bitmap_size = le32_to_cpu(ckpt->nat_ver_bitmap_bytesize); - log_blocks_per_seg = le32_to_cpu(raw_super->log_blocks_per_seg);
if (sit_bitmap_size != ((sit_segs / 2) << log_blocks_per_seg) / 8 || nat_bitmap_size != ((nat_segs / 2) << log_blocks_per_seg) / 8) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit e34438c903b653daca2b2a7de95aed46226f8ed3 upstream.
This patch adds to do sanity check with below fields of inode to avoid reported panic. - node footer - iblocks
https://bugzilla.kernel.org/show_bug.cgi?id=200223
- Overview BUG() triggered in f2fs_truncate_inode_blocks() when un-mounting a mounted f2fs image after writing to it
- Reproduce
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz; int err;
static int buf[8192]; memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
// open / write / read int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777); if (fd >= 0) { write(fd, (char *)buf, 517); write(fd, (char *)buf, sizeof(buf)); close(fd); }
}
int main(int argc, char *argv[]) { activity(argv[1]); return 0; }
- Kernel meesage [ 552.479723] F2FS-fs (loop0): Mounted with checkpoint version = 2 [ 556.451891] ------------[ cut here ]------------ [ 556.451899] kernel BUG at fs/f2fs/node.c:987! [ 556.452920] invalid opcode: 0000 [#1] SMP KASAN PTI [ 556.453936] CPU: 1 PID: 1310 Comm: umount Not tainted 4.18.0-rc1+ #4 [ 556.455213] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 556.457140] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0 [ 556.458280] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5 [ 556.462015] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286 [ 556.463068] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc [ 556.464479] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164 [ 556.465901] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d [ 556.467311] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64 [ 556.468706] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801 [ 556.470117] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000 [ 556.471702] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 556.472838] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0 [ 556.474265] Call Trace: [ 556.474782] ? f2fs_alloc_nid_failed+0xf0/0xf0 [ 556.475686] ? truncate_nodes+0x980/0x980 [ 556.476516] ? pagecache_get_page+0x21f/0x2f0 [ 556.477412] ? __asan_loadN+0xf/0x20 [ 556.478153] ? __get_node_page+0x331/0x5b0 [ 556.478992] ? reweight_entity+0x1e6/0x3b0 [ 556.479826] f2fs_truncate_blocks+0x55e/0x740 [ 556.480709] ? f2fs_truncate_data_blocks+0x20/0x20 [ 556.481689] ? __radix_tree_lookup+0x34/0x160 [ 556.482630] ? radix_tree_lookup+0xd/0x10 [ 556.483445] f2fs_truncate+0xd4/0x1a0 [ 556.484206] f2fs_evict_inode+0x5ce/0x630 [ 556.485032] evict+0x16f/0x290 [ 556.485664] iput+0x280/0x300 [ 556.486300] dentry_unlink_inode+0x165/0x1e0 [ 556.487169] __dentry_kill+0x16a/0x260 [ 556.487936] dentry_kill+0x70/0x250 [ 556.488651] shrink_dentry_list+0x125/0x260 [ 556.489504] shrink_dcache_parent+0xc1/0x110 [ 556.490379] ? shrink_dcache_sb+0x200/0x200 [ 556.491231] ? bit_wait_timeout+0xc0/0xc0 [ 556.492047] do_one_tree+0x12/0x40 [ 556.492743] shrink_dcache_for_umount+0x3f/0xa0 [ 556.493656] generic_shutdown_super+0x43/0x1c0 [ 556.494561] kill_block_super+0x52/0x80 [ 556.495341] kill_f2fs_super+0x62/0x70 [ 556.496105] deactivate_locked_super+0x6f/0xa0 [ 556.497004] deactivate_super+0x5e/0x80 [ 556.497785] cleanup_mnt+0x61/0xa0 [ 556.498492] __cleanup_mnt+0x12/0x20 [ 556.499218] task_work_run+0xc8/0xf0 [ 556.499949] exit_to_usermode_loop+0x125/0x130 [ 556.500846] do_syscall_64+0x138/0x170 [ 556.501609] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 556.502659] RIP: 0033:0x7f8028b77487 [ 556.503384] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48 [ 556.507137] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 556.508637] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487 [ 556.510069] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0 [ 556.511481] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014 [ 556.512892] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c [ 556.514320] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820 [ 556.515745] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 556.529276] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 556.530340] RIP: 0010:f2fs_truncate_inode_blocks+0x4a7/0x6f0 [ 556.531513] Code: e8 ae ea ff ff 41 89 c7 c1 e8 1f 84 c0 74 0a 41 83 ff fe 0f 85 35 ff ff ff 81 85 b0 fe ff ff fb 03 00 00 e9 f7 fd ff ff 0f 0b <0f> 0b e8 62 b7 9a 00 48 8b bd a0 fe ff ff e8 56 54 ae ff 48 8b b5 [ 556.535330] RSP: 0018:ffff8801f292f808 EFLAGS: 00010286 [ 556.536395] RAX: ffffed003e73242d RBX: ffff8801f292f958 RCX: ffffffffb88b81bc [ 556.537824] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8801f3992164 [ 556.539290] RBP: ffff8801f292f980 R08: ffffed003e73242d R09: ffffed003e73242d [ 556.540709] R10: 0000000000000001 R11: ffffed003e73242c R12: 00000000fffffc64 [ 556.542131] R13: ffff8801f3992000 R14: 0000000000000058 R15: 00000000ffff8801 [ 556.543579] FS: 00007f8029297840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000 [ 556.545180] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 556.546338] CR2: 000055f5f57305d8 CR3: 00000001f18b0000 CR4: 00000000000006e0 [ 556.547809] ================================================================== [ 556.549248] BUG: KASAN: stack-out-of-bounds in arch_tlb_gather_mmu+0x52/0x170 [ 556.550672] Write of size 8 at addr ffff8801f292fd10 by task umount/1310
[ 556.552338] CPU: 1 PID: 1310 Comm: umount Tainted: G D 4.18.0-rc1+ #4 [ 556.553886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 556.555756] Call Trace: [ 556.556264] dump_stack+0x7b/0xb5 [ 556.556944] print_address_description+0x70/0x290 [ 556.557903] kasan_report+0x291/0x390 [ 556.558649] ? arch_tlb_gather_mmu+0x52/0x170 [ 556.559537] __asan_store8+0x57/0x90 [ 556.560268] arch_tlb_gather_mmu+0x52/0x170 [ 556.561110] tlb_gather_mmu+0x12/0x40 [ 556.561862] exit_mmap+0x123/0x2a0 [ 556.562555] ? __ia32_sys_munmap+0x50/0x50 [ 556.563384] ? exit_aio+0x98/0x230 [ 556.564079] ? __x32_compat_sys_io_submit+0x260/0x260 [ 556.565099] ? taskstats_exit+0x1f4/0x640 [ 556.565925] ? kasan_check_read+0x11/0x20 [ 556.566739] ? mm_update_next_owner+0x322/0x380 [ 556.567652] mmput+0x8b/0x1d0 [ 556.568260] do_exit+0x43a/0x1390 [ 556.568937] ? mm_update_next_owner+0x380/0x380 [ 556.569855] ? deactivate_super+0x5e/0x80 [ 556.570668] ? cleanup_mnt+0x61/0xa0 [ 556.571395] ? __cleanup_mnt+0x12/0x20 [ 556.572156] ? task_work_run+0xc8/0xf0 [ 556.572917] ? exit_to_usermode_loop+0x125/0x130 [ 556.573861] rewind_stack_do_exit+0x17/0x20 [ 556.574707] RIP: 0033:0x7f8028b77487 [ 556.575428] Code: Bad RIP value. [ 556.576106] RSP: 002b:00007fff9f2e3598 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 556.577599] RAX: 0000000000000000 RBX: 0000000000ebd030 RCX: 00007f8028b77487 [ 556.579020] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000ec41e0 [ 556.580422] RBP: 0000000000ec41e0 R08: 0000000000000000 R09: 0000000000000014 [ 556.581833] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007f802908083c [ 556.583252] R13: 0000000000000000 R14: 0000000000ebd210 R15: 00007fff9f2e3820
[ 556.584983] The buggy address belongs to the page: [ 556.585961] page:ffffea0007ca4bc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 556.587540] flags: 0x2ffff0000000000() [ 556.588296] raw: 02ffff0000000000 0000000000000000 dead000000000200 0000000000000000 [ 556.589822] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 556.591359] page dumped because: kasan: bad access detected
[ 556.592786] Memory state around the buggy address: [ 556.593753] ffff8801f292fc00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 556.595191] ffff8801f292fc80: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 00 [ 556.596613] >ffff8801f292fd00: 00 00 f3 00 00 00 00 f3 f3 00 00 00 00 f4 f4 f4 [ 556.598044] ^ [ 556.598797] ffff8801f292fd80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 [ 556.600225] ffff8801f292fe00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4 [ 556.601647] ==================================================================
- Location https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/node.c#L987 case NODE_DIND_BLOCK: err = truncate_nodes(&dn, nofs, offset[1], 3); cont = 0; break;
default: BUG(); <--- }
Reported-by Wen Xu wen.xu@gatech.edu Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/inode.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index 428d502f23e8..be7d9773d291 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -180,9 +180,30 @@ void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page) ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page)); }
-static bool sanity_check_inode(struct inode *inode) +static bool sanity_check_inode(struct inode *inode, struct page *node_page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + unsigned long long iblocks; + + iblocks = le64_to_cpu(F2FS_INODE(node_page)->i_blocks); + if (!iblocks) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: corrupted inode i_blocks i_ino=%lx iblocks=%llu, " + "run fsck to fix.", + __func__, inode->i_ino, iblocks); + return false; + } + + if (ino_of_node(node_page) != nid_of_node(node_page)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: corrupted inode footer i_ino=%lx, ino,nid: " + "[%u, %u] run fsck to fix.", + __func__, inode->i_ino, + ino_of_node(node_page), nid_of_node(node_page)); + return false; + }
if (f2fs_has_extra_attr(inode) && !f2fs_sb_has_extra_attr(sbi->sb)) { @@ -242,7 +263,7 @@ static int do_read_inode(struct inode *inode)
get_inline_info(inode, ri);
- if (!sanity_check_inode(inode)) { + if (!sanity_check_inode(inode, node_page)) { f2fs_put_page(node_page, 1); return -EINVAL; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit c9b60788fc760d136211853f10ce73dc152d1f4a upstream.
This patch add to do sanity check with below field: - cp_pack_total_block_count - blkaddr of data/node - extent info
- Overview BUG() in verify_block_addr() when writing to a corrupted f2fs image
- Reproduce (4.18 upstream kernel)
- POC (poc.c)
static void activity(char *mpoint) {
char *foo_bar_baz; int err;
static int buf[8192]; memset(buf, 0, sizeof(buf));
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint);
int fd = open(foo_bar_baz, O_RDWR | O_TRUNC, 0777); if (fd >= 0) { write(fd, (char *)buf, sizeof(buf)); fdatasync(fd); close(fd); } }
int main(int argc, char *argv[]) { activity(argv[1]); return 0; }
- Kernel message [ 689.349473] F2FS-fs (loop0): Mounted with checkpoint version = 3 [ 699.728662] WARNING: CPU: 0 PID: 1309 at fs/f2fs/segment.c:2860 f2fs_inplace_write_data+0x232/0x240 [ 699.728670] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 699.729056] CPU: 0 PID: 1309 Comm: a.out Not tainted 4.18.0-rc1+ #4 [ 699.729064] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 699.729074] RIP: 0010:f2fs_inplace_write_data+0x232/0x240 [ 699.729076] Code: ff e9 cf fe ff ff 49 8d 7d 10 e8 39 45 ad ff 4d 8b 7d 10 be 04 00 00 00 49 8d 7f 48 e8 07 49 ad ff 45 8b 7f 48 e9 fb fe ff ff <0f> 0b f0 41 80 4d 48 04 e9 65 fe ff ff 90 66 66 66 66 90 55 48 8d [ 699.729130] RSP: 0018:ffff8801f43af568 EFLAGS: 00010202 [ 699.729139] RAX: 000000000000003f RBX: ffff8801f43af7b8 RCX: ffffffffb88c9113 [ 699.729142] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8802024e5540 [ 699.729144] RBP: ffff8801f43af590 R08: 0000000000000009 R09: ffffffffffffffe8 [ 699.729147] R10: 0000000000000001 R11: ffffed0039b0596a R12: ffff8802024e5540 [ 699.729149] R13: ffff8801f0335500 R14: ffff8801e3e7a700 R15: ffff8801e1ee4450 [ 699.729154] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 699.729156] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 699.729159] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0 [ 699.729171] Call Trace: [ 699.729192] f2fs_do_write_data_page+0x2e2/0xe00 [ 699.729203] ? f2fs_should_update_outplace+0xd0/0xd0 [ 699.729238] ? memcg_drain_all_list_lrus+0x280/0x280 [ 699.729269] ? __radix_tree_replace+0xa3/0x120 [ 699.729276] __write_data_page+0x5c7/0xe30 [ 699.729291] ? kasan_check_read+0x11/0x20 [ 699.729310] ? page_mapped+0x8a/0x110 [ 699.729321] ? page_mkclean+0xe9/0x160 [ 699.729327] ? f2fs_do_write_data_page+0xe00/0xe00 [ 699.729331] ? invalid_page_referenced_vma+0x130/0x130 [ 699.729345] ? clear_page_dirty_for_io+0x332/0x450 [ 699.729351] f2fs_write_cache_pages+0x4ca/0x860 [ 699.729358] ? __write_data_page+0xe30/0xe30 [ 699.729374] ? percpu_counter_add_batch+0x22/0xa0 [ 699.729380] ? kasan_check_write+0x14/0x20 [ 699.729391] ? _raw_spin_lock+0x17/0x40 [ 699.729403] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30 [ 699.729413] ? iov_iter_advance+0x113/0x640 [ 699.729418] ? f2fs_write_end+0x133/0x2e0 [ 699.729423] ? balance_dirty_pages_ratelimited+0x239/0x640 [ 699.729428] f2fs_write_data_pages+0x329/0x520 [ 699.729433] ? generic_perform_write+0x250/0x320 [ 699.729438] ? f2fs_write_cache_pages+0x860/0x860 [ 699.729454] ? current_time+0x110/0x110 [ 699.729459] ? f2fs_preallocate_blocks+0x1ef/0x370 [ 699.729464] do_writepages+0x37/0xb0 [ 699.729468] ? f2fs_write_cache_pages+0x860/0x860 [ 699.729472] ? do_writepages+0x37/0xb0 [ 699.729478] __filemap_fdatawrite_range+0x19a/0x1f0 [ 699.729483] ? delete_from_page_cache_batch+0x4e0/0x4e0 [ 699.729496] ? __vfs_write+0x2b2/0x410 [ 699.729501] file_write_and_wait_range+0x66/0xb0 [ 699.729506] f2fs_do_sync_file+0x1f9/0xd90 [ 699.729511] ? truncate_partial_data_page+0x290/0x290 [ 699.729521] ? __sb_end_write+0x30/0x50 [ 699.729526] ? vfs_write+0x20f/0x260 [ 699.729530] f2fs_sync_file+0x9a/0xb0 [ 699.729534] ? f2fs_do_sync_file+0xd90/0xd90 [ 699.729548] vfs_fsync_range+0x68/0x100 [ 699.729554] ? __fget_light+0xc9/0xe0 [ 699.729558] do_fsync+0x3d/0x70 [ 699.729562] __x64_sys_fdatasync+0x24/0x30 [ 699.729585] do_syscall_64+0x78/0x170 [ 699.729595] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 699.729613] RIP: 0033:0x7f9bf930d800 [ 699.729615] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24 [ 699.729668] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 699.729673] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800 [ 699.729675] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003 [ 699.729678] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000 [ 699.729680] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610 [ 699.729683] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000 [ 699.729687] ---[ end trace 4ce02f25ff7d3df5 ]--- [ 699.729782] ------------[ cut here ]------------ [ 699.729785] kernel BUG at fs/f2fs/segment.h:654! [ 699.731055] invalid opcode: 0000 [#1] SMP KASAN PTI [ 699.732104] CPU: 0 PID: 1309 Comm: a.out Tainted: G W 4.18.0-rc1+ #4 [ 699.733684] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 699.735611] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730 [ 699.736649] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0 [ 699.740524] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283 [ 699.741573] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef [ 699.743006] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c [ 699.744426] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55 [ 699.745833] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940 [ 699.747256] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001 [ 699.748683] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 699.750293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 699.751462] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0 [ 699.752874] Call Trace: [ 699.753386] ? f2fs_inplace_write_data+0x93/0x240 [ 699.754341] f2fs_inplace_write_data+0xd2/0x240 [ 699.755271] f2fs_do_write_data_page+0x2e2/0xe00 [ 699.756214] ? f2fs_should_update_outplace+0xd0/0xd0 [ 699.757215] ? memcg_drain_all_list_lrus+0x280/0x280 [ 699.758209] ? __radix_tree_replace+0xa3/0x120 [ 699.759164] __write_data_page+0x5c7/0xe30 [ 699.760002] ? kasan_check_read+0x11/0x20 [ 699.760823] ? page_mapped+0x8a/0x110 [ 699.761573] ? page_mkclean+0xe9/0x160 [ 699.762345] ? f2fs_do_write_data_page+0xe00/0xe00 [ 699.763332] ? invalid_page_referenced_vma+0x130/0x130 [ 699.764374] ? clear_page_dirty_for_io+0x332/0x450 [ 699.765347] f2fs_write_cache_pages+0x4ca/0x860 [ 699.766276] ? __write_data_page+0xe30/0xe30 [ 699.767161] ? percpu_counter_add_batch+0x22/0xa0 [ 699.768112] ? kasan_check_write+0x14/0x20 [ 699.768951] ? _raw_spin_lock+0x17/0x40 [ 699.769739] ? f2fs_mark_inode_dirty_sync.part.18+0x16/0x30 [ 699.770885] ? iov_iter_advance+0x113/0x640 [ 699.771743] ? f2fs_write_end+0x133/0x2e0 [ 699.772569] ? balance_dirty_pages_ratelimited+0x239/0x640 [ 699.773680] f2fs_write_data_pages+0x329/0x520 [ 699.774603] ? generic_perform_write+0x250/0x320 [ 699.775544] ? f2fs_write_cache_pages+0x860/0x860 [ 699.776510] ? current_time+0x110/0x110 [ 699.777299] ? f2fs_preallocate_blocks+0x1ef/0x370 [ 699.778279] do_writepages+0x37/0xb0 [ 699.779026] ? f2fs_write_cache_pages+0x860/0x860 [ 699.779978] ? do_writepages+0x37/0xb0 [ 699.780755] __filemap_fdatawrite_range+0x19a/0x1f0 [ 699.781746] ? delete_from_page_cache_batch+0x4e0/0x4e0 [ 699.782820] ? __vfs_write+0x2b2/0x410 [ 699.783597] file_write_and_wait_range+0x66/0xb0 [ 699.784540] f2fs_do_sync_file+0x1f9/0xd90 [ 699.785381] ? truncate_partial_data_page+0x290/0x290 [ 699.786415] ? __sb_end_write+0x30/0x50 [ 699.787204] ? vfs_write+0x20f/0x260 [ 699.787941] f2fs_sync_file+0x9a/0xb0 [ 699.788694] ? f2fs_do_sync_file+0xd90/0xd90 [ 699.789572] vfs_fsync_range+0x68/0x100 [ 699.790360] ? __fget_light+0xc9/0xe0 [ 699.791128] do_fsync+0x3d/0x70 [ 699.791779] __x64_sys_fdatasync+0x24/0x30 [ 699.792614] do_syscall_64+0x78/0x170 [ 699.793371] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 699.794406] RIP: 0033:0x7f9bf930d800 [ 699.795134] Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 49 bf 2c 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 be 78 01 00 48 89 04 24 [ 699.798960] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 699.800483] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800 [ 699.801923] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003 [ 699.803373] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000 [ 699.804798] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610 [ 699.806233] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000 [ 699.807667] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy [ 699.817079] ---[ end trace 4ce02f25ff7d3df6 ]--- [ 699.818068] RIP: 0010:f2fs_submit_page_bio+0x29b/0x730 [ 699.819114] Code: 54 49 8d bd 18 04 00 00 e8 b2 59 af ff 41 8b 8d 18 04 00 00 8b 45 b8 41 d3 e6 44 01 f0 4c 8d 73 14 41 39 c7 0f 82 37 fe ff ff <0f> 0b 65 8b 05 2c 04 77 47 89 c0 48 0f a3 05 52 c1 d5 01 0f 92 c0 [ 699.822919] RSP: 0018:ffff8801f43af508 EFLAGS: 00010283 [ 699.823977] RAX: 0000000000000000 RBX: ffff8801f43af7b8 RCX: ffffffffb88a7cef [ 699.825436] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8801e3e7a64c [ 699.826881] RBP: ffff8801f43af558 R08: ffffed003e066b55 R09: ffffed003e066b55 [ 699.828292] R10: 0000000000000001 R11: ffffed003e066b54 R12: ffffea0007876940 [ 699.829750] R13: ffff8801f0335500 R14: ffff8801e3e7a600 R15: 0000000000000001 [ 699.831192] FS: 00007f9bf97f5700(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000 [ 699.832793] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 699.833981] CR2: 00007f9bf925d170 CR3: 00000001f0c34000 CR4: 00000000000006f0 [ 699.835556] ================================================================== [ 699.837029] BUG: KASAN: stack-out-of-bounds in update_stack_state+0x38c/0x3e0 [ 699.838462] Read of size 8 at addr ffff8801f43af970 by task a.out/1309
[ 699.840086] CPU: 0 PID: 1309 Comm: a.out Tainted: G D W 4.18.0-rc1+ #4 [ 699.841603] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 699.843475] Call Trace: [ 699.843982] dump_stack+0x7b/0xb5 [ 699.844661] print_address_description+0x70/0x290 [ 699.845607] kasan_report+0x291/0x390 [ 699.846351] ? update_stack_state+0x38c/0x3e0 [ 699.853831] __asan_load8+0x54/0x90 [ 699.854569] update_stack_state+0x38c/0x3e0 [ 699.855428] ? __read_once_size_nocheck.constprop.7+0x20/0x20 [ 699.856601] ? __save_stack_trace+0x5e/0x100 [ 699.857476] unwind_next_frame.part.5+0x18e/0x490 [ 699.858448] ? unwind_dump+0x290/0x290 [ 699.859217] ? clear_page_dirty_for_io+0x332/0x450 [ 699.860185] __unwind_start+0x106/0x190 [ 699.860974] __save_stack_trace+0x5e/0x100 [ 699.861808] ? __save_stack_trace+0x5e/0x100 [ 699.862691] ? unlink_anon_vmas+0xba/0x2c0 [ 699.863525] save_stack_trace+0x1f/0x30 [ 699.864312] save_stack+0x46/0xd0 [ 699.864993] ? __alloc_pages_slowpath+0x1420/0x1420 [ 699.865990] ? flush_tlb_mm_range+0x15e/0x220 [ 699.866889] ? kasan_check_write+0x14/0x20 [ 699.867724] ? __dec_node_state+0x92/0xb0 [ 699.868543] ? lock_page_memcg+0x85/0xf0 [ 699.869350] ? unlock_page_memcg+0x16/0x80 [ 699.870185] ? page_remove_rmap+0x198/0x520 [ 699.871048] ? mark_page_accessed+0x133/0x200 [ 699.871930] ? _cond_resched+0x1a/0x50 [ 699.872700] ? unmap_page_range+0xcd4/0xe50 [ 699.873551] ? rb_next+0x58/0x80 [ 699.874217] ? rb_next+0x58/0x80 [ 699.874895] __kasan_slab_free+0x13c/0x1a0 [ 699.875734] ? unlink_anon_vmas+0xba/0x2c0 [ 699.876563] kasan_slab_free+0xe/0x10 [ 699.877315] kmem_cache_free+0x89/0x1e0 [ 699.878095] unlink_anon_vmas+0xba/0x2c0 [ 699.878913] free_pgtables+0x101/0x1b0 [ 699.879677] exit_mmap+0x146/0x2a0 [ 699.880378] ? __ia32_sys_munmap+0x50/0x50 [ 699.881214] ? kasan_check_read+0x11/0x20 [ 699.882052] ? mm_update_next_owner+0x322/0x380 [ 699.882985] mmput+0x8b/0x1d0 [ 699.883602] do_exit+0x43a/0x1390 [ 699.884288] ? mm_update_next_owner+0x380/0x380 [ 699.885212] ? f2fs_sync_file+0x9a/0xb0 [ 699.885995] ? f2fs_do_sync_file+0xd90/0xd90 [ 699.886877] ? vfs_fsync_range+0x68/0x100 [ 699.887694] ? __fget_light+0xc9/0xe0 [ 699.888442] ? do_fsync+0x3d/0x70 [ 699.889118] ? __x64_sys_fdatasync+0x24/0x30 [ 699.889996] rewind_stack_do_exit+0x17/0x20 [ 699.890860] RIP: 0033:0x7f9bf930d800 [ 699.891585] Code: Bad RIP value. [ 699.892268] RSP: 002b:00007ffee3606c68 EFLAGS: 00000246 ORIG_RAX: 000000000000004b [ 699.893781] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f9bf930d800 [ 699.895220] RDX: 0000000000008000 RSI: 00000000006010a0 RDI: 0000000000000003 [ 699.896643] RBP: 00007ffee3606ca0 R08: 0000000001503010 R09: 0000000000000000 [ 699.898069] R10: 00000000000002e8 R11: 0000000000000246 R12: 0000000000400610 [ 699.899505] R13: 00007ffee3606da0 R14: 0000000000000000 R15: 0000000000000000
[ 699.901241] The buggy address belongs to the page: [ 699.902215] page:ffffea0007d0ebc0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 699.903811] flags: 0x2ffff0000000000() [ 699.904585] raw: 02ffff0000000000 0000000000000000 ffffffff07d00101 0000000000000000 [ 699.906125] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000 [ 699.907673] page dumped because: kasan: bad access detected
[ 699.909108] Memory state around the buggy address: [ 699.910077] ffff8801f43af800: 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 00 [ 699.911528] ffff8801f43af880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 699.912953] >ffff8801f43af900: 00 00 00 00 00 00 00 00 f1 01 f4 f4 f4 f2 f2 f2 [ 699.914392] ^ [ 699.915758] ffff8801f43af980: f2 00 f4 f4 00 00 00 00 f2 00 00 00 00 00 00 00 [ 699.917193] ffff8801f43afa00: 00 00 00 00 00 00 00 00 00 f3 f3 f3 00 00 00 00 [ 699.918634] ==================================================================
- Location https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.h#L644
Reported-by Wen Xu wen.xu@gatech.edu Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org [bwh: Backported to 4.14: - Error label is different in validate_checkpoint() due to the earlier backport of "f2fs: fix invalid memory access" - Adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/checkpoint.c | 22 +++++++++++++++++++--- fs/f2fs/data.c | 33 +++++++++++++++++++++++++++------ fs/f2fs/f2fs.h | 3 +++ fs/f2fs/file.c | 12 ++++++++++++ fs/f2fs/inode.c | 17 +++++++++++++++++ fs/f2fs/node.c | 4 ++++ fs/f2fs/segment.h | 3 +-- 7 files changed, 83 insertions(+), 11 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index d7bd9745e883..c81cd5057b8e 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -86,8 +86,10 @@ repeat: fio.page = page;
if (f2fs_submit_page_bio(&fio)) { - f2fs_put_page(page, 1); - goto repeat; + memset(page_address(page), 0, PAGE_SIZE); + f2fs_stop_checkpoint(sbi, false); + f2fs_bug_on(sbi, 1); + return page; }
lock_page(page); @@ -141,8 +143,14 @@ bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, case META_POR: case DATA_GENERIC: if (unlikely(blkaddr >= MAX_BLKADDR(sbi) || - blkaddr < MAIN_BLKADDR(sbi))) + blkaddr < MAIN_BLKADDR(sbi))) { + if (type == DATA_GENERIC) { + f2fs_msg(sbi->sb, KERN_WARNING, + "access invalid blkaddr:%u", blkaddr); + WARN_ON(1); + } return false; + } break; case META_GENERIC: if (unlikely(blkaddr < SEG0_BLKADDR(sbi) || @@ -746,6 +754,14 @@ static struct page *validate_checkpoint(struct f2fs_sb_info *sbi, &cp_page_1, version); if (err) return NULL; + + if (le32_to_cpu(cp_block->cp_pack_total_block_count) > + sbi->blocks_per_seg) { + f2fs_msg(sbi->sb, KERN_WARNING, + "invalid cp_pack_total_block_count:%u", + le32_to_cpu(cp_block->cp_pack_total_block_count)); + goto invalid_cp; + } pre_version = *version;
cp_addr += le32_to_cpu(cp_block->cp_pack_total_block_count) - 1; diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c index 615878806611..8f6e7c3a10f8 100644 --- a/fs/f2fs/data.c +++ b/fs/f2fs/data.c @@ -369,7 +369,10 @@ int f2fs_submit_page_bio(struct f2fs_io_info *fio) struct page *page = fio->encrypted_page ? fio->encrypted_page : fio->page;
- verify_block_addr(fio, fio->new_blkaddr); + if (!f2fs_is_valid_blkaddr(fio->sbi, fio->new_blkaddr, + __is_meta_io(fio) ? META_GENERIC : DATA_GENERIC)) + return -EFAULT; + trace_f2fs_submit_page_bio(page, fio); f2fs_trace_ios(fio, 0);
@@ -946,6 +949,12 @@ next_dnode: next_block: blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
+ if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) { + err = -EFAULT; + goto sync_out; + } + if (!is_valid_data_blkaddr(sbi, blkaddr)) { if (create) { if (unlikely(f2fs_cp_error(sbi))) { @@ -1264,6 +1273,10 @@ got_it: SetPageUptodate(page); goto confused; } + + if (!f2fs_is_valid_blkaddr(F2FS_I_SB(inode), block_nr, + DATA_GENERIC)) + goto set_error_page; } else { zero_user_segment(page, 0, PAGE_SIZE); if (!PageUptodate(page)) @@ -1402,11 +1415,13 @@ int do_write_data_page(struct f2fs_io_info *fio) f2fs_lookup_extent_cache(inode, page->index, &ei)) { fio->old_blkaddr = ei.blk + page->index - ei.fofs;
- if (is_valid_data_blkaddr(fio->sbi, fio->old_blkaddr)) { - ipu_force = true; - fio->need_lock = LOCK_DONE; - goto got_it; - } + if (!f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr, + DATA_GENERIC)) + return -EFAULT; + + ipu_force = true; + fio->need_lock = LOCK_DONE; + goto got_it; }
/* Deadlock due to between page->lock and f2fs_lock_op */ @@ -1425,6 +1440,12 @@ int do_write_data_page(struct f2fs_io_info *fio) goto out_writepage; } got_it: + if (__is_valid_data_blkaddr(fio->old_blkaddr) && + !f2fs_is_valid_blkaddr(fio->sbi, fio->old_blkaddr, + DATA_GENERIC)) { + err = -EFAULT; + goto out_writepage; + } /* * If current allocation needs SSR, * it had better in-place writes for updated data. diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d15d79457f5c..3f1a44696036 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2357,6 +2357,9 @@ static inline void f2fs_update_iostat(struct f2fs_sb_info *sbi, spin_unlock(&sbi->iostat_lock); }
+#define __is_meta_io(fio) (PAGE_TYPE_OF_BIO(fio->type) == META && \ + (!is_read_io(fio->op) || fio->is_meta)) + bool f2fs_is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type); void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...); diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index e9c575ef70b5..7d3189f1941c 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -397,6 +397,13 @@ static loff_t f2fs_seek_block(struct file *file, loff_t offset, int whence) blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
+ if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(F2FS_I_SB(inode), + blkaddr, DATA_GENERIC)) { + f2fs_put_dnode(&dn); + goto fail; + } + if (__found_offset(F2FS_I_SB(inode), blkaddr, dirty, pgofs, whence)) { f2fs_put_dnode(&dn); @@ -496,6 +503,11 @@ int truncate_data_blocks_range(struct dnode_of_data *dn, int count)
dn->data_blkaddr = NULL_ADDR; set_data_blkaddr(dn); + + if (__is_valid_data_blkaddr(blkaddr) && + !f2fs_is_valid_blkaddr(sbi, blkaddr, DATA_GENERIC)) + continue; + invalidate_blocks(sbi, blkaddr); if (dn->ofs_in_node == 0 && IS_INODE(dn->node_page)) clear_inode_flag(dn->inode, FI_FIRST_BLOCK_WRITTEN); diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index be7d9773d291..aeed9943836a 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -214,6 +214,23 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) __func__, inode->i_ino); return false; } + + if (F2FS_I(inode)->extent_tree) { + struct extent_info *ei = &F2FS_I(inode)->extent_tree->largest; + + if (ei->len && + (!f2fs_is_valid_blkaddr(sbi, ei->blk, DATA_GENERIC) || + !f2fs_is_valid_blkaddr(sbi, ei->blk + ei->len - 1, + DATA_GENERIC))) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) extent info [%u, %u, %u] " + "is incorrect, run fsck to fix", + __func__, inode->i_ino, + ei->blk, ei->fofs, ei->len); + return false; + } + } return true; }
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c index 999814c8cbea..6adb6c60f017 100644 --- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1398,6 +1398,10 @@ static int __write_node_page(struct page *page, bool atomic, bool *submitted, return 0; }
+ if (__is_valid_data_blkaddr(ni.blk_addr) && + !f2fs_is_valid_blkaddr(sbi, ni.blk_addr, DATA_GENERIC)) + goto redirty_out; + if (atomic && !test_opt(sbi, NOBARRIER)) fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index dd8f977fc273..47348d98165b 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -629,8 +629,7 @@ static inline void verify_block_addr(struct f2fs_io_info *fio, block_t blk_addr) { struct f2fs_sb_info *sbi = fio->sbi;
- if (PAGE_TYPE_OF_BIO(fio->type) == META && - (!is_read_io(fio->op) || fio->is_meta)) + if (__is_meta_io(fio)) verify_blkaddr(sbi, blk_addr, META_GENERIC); else verify_blkaddr(sbi, blk_addr, DATA_GENERIC);
Hi Greg,
On Tue, Dec 4, 2018 at 11:05 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
commit c9b60788fc760d136211853f10ce73dc152d1f4a upstream.
There is another upstream commit which fixes this. 89d13c38501d ("f2fs: fix missing up_read")
On Tue, Dec 04, 2018 at 08:27:32PM +0000, Sudip Mukherjee wrote:
Hi Greg,
On Tue, Dec 4, 2018 at 11:05 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
4.14-stable review patch. If anyone has any objections, please let me know.
commit c9b60788fc760d136211853f10ce73dc152d1f4a upstream.
There is another upstream commit which fixes this. 89d13c38501d ("f2fs: fix missing up_read")
Thanks for pointing this out, now queued up.
greg k-h
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 18dd6470c2d14d10f5a2dd926925dc80dbd3abfd upstream.
If inode.i_extra_isize was fuzzed to an abnormal value, when calculating inline data size, the result will overflow, result in accessing invalid memory area when operating inline data.
Let's do sanity check with i_extra_isize during inode loading for fixing.
https://bugzilla.kernel.org/show_bug.cgi?id=200421
- Reproduce
- POC (poc.c) #define _GNU_SOURCE #include <sys/types.h> #include <sys/mount.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/xattr.h>
#include <dirent.h> #include <errno.h> #include <error.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h>
#include <linux/falloc.h> #include <linux/loop.h>
static void activity(char *mpoint) {
char *foo_bar_baz; char *foo_baz; char *xattr; int err;
err = asprintf(&foo_bar_baz, "%s/foo/bar/baz", mpoint); err = asprintf(&foo_baz, "%s/foo/baz", mpoint); err = asprintf(&xattr, "%s/foo/bar/xattr", mpoint);
rename(foo_bar_baz, foo_baz);
char buf2[113]; memset(buf2, 0, sizeof(buf2)); listxattr(xattr, buf2, sizeof(buf2)); removexattr(xattr, "user.mime_type");
}
int main(int argc, char *argv[]) { activity(argv[1]); return 0; }
- Kernel message Umount the image will leave the following message [ 2910.995489] F2FS-fs (loop0): Mounted with checkpoint version = 2 [ 2918.416465] ================================================================== [ 2918.416807] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0xcb9/0x1a80 [ 2918.417009] Read of size 4 at addr ffff88018efc2068 by task a.out/1229
[ 2918.417311] CPU: 1 PID: 1229 Comm: a.out Not tainted 4.17.0+ #1 [ 2918.417314] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 2918.417323] Call Trace: [ 2918.417366] dump_stack+0x71/0xab [ 2918.417401] print_address_description+0x6b/0x290 [ 2918.417407] kasan_report+0x28e/0x390 [ 2918.417411] ? f2fs_iget+0xcb9/0x1a80 [ 2918.417415] f2fs_iget+0xcb9/0x1a80 [ 2918.417422] ? f2fs_lookup+0x2e7/0x580 [ 2918.417425] f2fs_lookup+0x2e7/0x580 [ 2918.417433] ? __recover_dot_dentries+0x400/0x400 [ 2918.417447] ? legitimize_path.isra.29+0x5a/0xa0 [ 2918.417453] __lookup_slow+0x11c/0x220 [ 2918.417457] ? may_delete+0x2a0/0x2a0 [ 2918.417475] ? deref_stack_reg+0xe0/0xe0 [ 2918.417479] ? __lookup_hash+0xb0/0xb0 [ 2918.417483] lookup_slow+0x3e/0x60 [ 2918.417488] walk_component+0x3ac/0x990 [ 2918.417492] ? generic_permission+0x51/0x1e0 [ 2918.417495] ? inode_permission+0x51/0x1d0 [ 2918.417499] ? pick_link+0x3e0/0x3e0 [ 2918.417502] ? link_path_walk+0x4b1/0x770 [ 2918.417513] ? _raw_spin_lock_irqsave+0x25/0x50 [ 2918.417518] ? walk_component+0x990/0x990 [ 2918.417522] ? path_init+0x2e6/0x580 [ 2918.417526] path_lookupat+0x13f/0x430 [ 2918.417531] ? trailing_symlink+0x3a0/0x3a0 [ 2918.417534] ? do_renameat2+0x270/0x7b0 [ 2918.417538] ? __kasan_slab_free+0x14c/0x190 [ 2918.417541] ? do_renameat2+0x270/0x7b0 [ 2918.417553] ? kmem_cache_free+0x85/0x1e0 [ 2918.417558] ? do_renameat2+0x270/0x7b0 [ 2918.417563] filename_lookup+0x13c/0x280 [ 2918.417567] ? filename_parentat+0x2b0/0x2b0 [ 2918.417572] ? kasan_unpoison_shadow+0x31/0x40 [ 2918.417575] ? kasan_kmalloc+0xa6/0xd0 [ 2918.417593] ? strncpy_from_user+0xaa/0x1c0 [ 2918.417598] ? getname_flags+0x101/0x2b0 [ 2918.417614] ? path_listxattr+0x87/0x110 [ 2918.417619] path_listxattr+0x87/0x110 [ 2918.417623] ? listxattr+0xc0/0xc0 [ 2918.417637] ? mm_fault_error+0x1b0/0x1b0 [ 2918.417654] do_syscall_64+0x73/0x160 [ 2918.417660] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2918.417676] RIP: 0033:0x7f2f3a3480d7 [ 2918.417677] Code: f0 ff ff 73 01 c3 48 8b 0d be dd 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 c2 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 91 dd 2b 00 f7 d8 64 89 01 48 [ 2918.417732] RSP: 002b:00007fff4095b7d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000c2 [ 2918.417744] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f2f3a3480d7 [ 2918.417746] RDX: 0000000000000071 RSI: 00007fff4095b810 RDI: 000000000126a0c0 [ 2918.417749] RBP: 00007fff4095b890 R08: 000000000126a010 R09: 0000000000000000 [ 2918.417751] R10: 00000000000001ab R11: 0000000000000206 R12: 00000000004005e0 [ 2918.417753] R13: 00007fff4095b990 R14: 0000000000000000 R15: 0000000000000000
[ 2918.417853] Allocated by task 329: [ 2918.418002] kasan_kmalloc+0xa6/0xd0 [ 2918.418007] kmem_cache_alloc+0xc8/0x1e0 [ 2918.418023] mempool_init_node+0x194/0x230 [ 2918.418027] mempool_init+0x12/0x20 [ 2918.418042] bioset_init+0x2bd/0x380 [ 2918.418052] blk_alloc_queue_node+0xe9/0x540 [ 2918.418075] dm_create+0x2c0/0x800 [ 2918.418080] dev_create+0xd2/0x530 [ 2918.418083] ctl_ioctl+0x2a3/0x5b0 [ 2918.418087] dm_ctl_ioctl+0xa/0x10 [ 2918.418092] do_vfs_ioctl+0x13e/0x8c0 [ 2918.418095] ksys_ioctl+0x66/0x70 [ 2918.418098] __x64_sys_ioctl+0x3d/0x50 [ 2918.418102] do_syscall_64+0x73/0x160 [ 2918.418106] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 2918.418204] Freed by task 0: [ 2918.418301] (stack is not available)
[ 2918.418521] The buggy address belongs to the object at ffff88018efc0000 which belongs to the cache biovec-max of size 8192 [ 2918.418894] The buggy address is located 104 bytes to the right of 8192-byte region [ffff88018efc0000, ffff88018efc2000) [ 2918.419257] The buggy address belongs to the page: [ 2918.419431] page:ffffea00063bf000 count:1 mapcount:0 mapping:ffff8801f2242540 index:0x0 compound_mapcount: 0 [ 2918.419702] flags: 0x17fff8000008100(slab|head) [ 2918.419879] raw: 017fff8000008100 dead000000000100 dead000000000200 ffff8801f2242540 [ 2918.420101] raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000 [ 2918.420322] page dumped because: kasan: bad access detected
[ 2918.420599] Memory state around the buggy address: [ 2918.420764] ffff88018efc1f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2918.420975] ffff88018efc1f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2918.421194] >ffff88018efc2000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2918.421406] ^ [ 2918.421627] ffff88018efc2080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 2918.421838] ffff88018efc2100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 2918.422046] ================================================================== [ 2918.422264] Disabling lock debugging due to kernel taint [ 2923.901641] BUG: unable to handle kernel paging request at ffff88018f0db000 [ 2923.901884] PGD 22226a067 P4D 22226a067 PUD 222273067 PMD 18e642063 PTE 800000018f0db061 [ 2923.902120] Oops: 0003 [#1] SMP KASAN PTI [ 2923.902274] CPU: 1 PID: 1231 Comm: umount Tainted: G B 4.17.0+ #1 [ 2923.902490] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 2923.902761] RIP: 0010:__memset+0x24/0x30 [ 2923.902906] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 [ 2923.903446] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206 [ 2923.903622] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb [ 2923.903833] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc [ 2923.904062] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c [ 2923.904273] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000 [ 2923.904485] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80 [ 2923.904693] FS: 00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 2923.904937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2923.910080] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0 [ 2923.914930] Call Trace: [ 2923.919724] f2fs_truncate_inline_inode+0x114/0x170 [ 2923.924487] f2fs_truncate_blocks+0x11b/0x7c0 [ 2923.929178] ? f2fs_truncate_data_blocks+0x10/0x10 [ 2923.933834] ? dqget+0x670/0x670 [ 2923.938437] ? f2fs_destroy_extent_tree+0xd6/0x270 [ 2923.943107] ? __radix_tree_lookup+0x2f/0x150 [ 2923.947772] f2fs_truncate+0xd4/0x1a0 [ 2923.952491] f2fs_evict_inode+0x5ab/0x610 [ 2923.957204] evict+0x15f/0x280 [ 2923.961898] __dentry_kill+0x161/0x250 [ 2923.966634] shrink_dentry_list+0xf3/0x250 [ 2923.971897] shrink_dcache_parent+0xa9/0x100 [ 2923.976561] ? shrink_dcache_sb+0x1f0/0x1f0 [ 2923.981177] ? wait_for_completion+0x8a/0x210 [ 2923.985781] ? migrate_swap_stop+0x2d0/0x2d0 [ 2923.990332] do_one_tree+0xe/0x40 [ 2923.994735] shrink_dcache_for_umount+0x3a/0xa0 [ 2923.999077] generic_shutdown_super+0x3e/0x1c0 [ 2924.003350] kill_block_super+0x4b/0x70 [ 2924.007619] deactivate_locked_super+0x65/0x90 [ 2924.011812] cleanup_mnt+0x5c/0xa0 [ 2924.015995] task_work_run+0xce/0xf0 [ 2924.020174] exit_to_usermode_loop+0x115/0x120 [ 2924.024293] do_syscall_64+0x12f/0x160 [ 2924.028479] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 2924.032709] RIP: 0033:0x7fa8b2868487 [ 2924.036888] Code: 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e1 c9 2b 00 f7 d8 64 89 01 48 [ 2924.045750] RSP: 002b:00007ffc39824d58 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 [ 2924.050190] RAX: 0000000000000000 RBX: 00000000008ea030 RCX: 00007fa8b2868487 [ 2924.054604] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000000008f4360 [ 2924.058940] RBP: 00000000008f4360 R08: 0000000000000000 R09: 0000000000000014 [ 2924.063186] R10: 00000000000006b2 R11: 0000000000000246 R12: 00007fa8b2d7183c [ 2924.067418] R13: 0000000000000000 R14: 00000000008ea210 R15: 00007ffc39824fe0 [ 2924.071534] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy [ 2924.098044] CR2: ffff88018f0db000 [ 2924.102520] ---[ end trace a8e0d899985faf31 ]--- [ 2924.107012] RIP: 0010:__memset+0x24/0x30 [ 2924.111448] Code: 90 90 90 90 90 90 66 66 90 66 90 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 [ 2924.120724] RSP: 0018:ffff88018ddf7ae0 EFLAGS: 00010206 [ 2924.125312] RAX: 0000000000000000 RBX: ffff8801d549d888 RCX: 1ffffffffffdaffb [ 2924.129931] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88018f0daffc [ 2924.134537] RBP: ffff88018efc206c R08: 1ffff10031df840d R09: ffff88018efc206c [ 2924.139175] R10: ffffffffffffe1ee R11: ffffed0031df65fa R12: 0000000000000000 [ 2924.143825] R13: ffff8801d549dc98 R14: 00000000ffffc3db R15: ffffea00063bec80 [ 2924.148500] FS: 00007fa8b2f8a840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 2924.153247] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2924.158003] CR2: ffff88018f0db000 CR3: 000000018f892000 CR4: 00000000000006e0 [ 2924.164641] BUG: Bad rss-counter state mm:00000000fa04621e idx:0 val:4 [ 2924.170007] BUG: Bad rss-counter tate mm:00000000fa04621e idx:1 val:2
- Location https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/inline.c#L78 memset(addr + from, 0, MAX_INLINE_DATA(inode) - from); Here the length can be negative.
Reported-by Wen Xu wen.xu@gatech.edu Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org [bwh: Backported to 4.14: adjust context] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/inode.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c index aeed9943836a..9a40724dbaa6 100644 --- a/fs/f2fs/inode.c +++ b/fs/f2fs/inode.c @@ -183,6 +183,7 @@ void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page) static bool sanity_check_inode(struct inode *inode, struct page *node_page) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); + struct f2fs_inode_info *fi = F2FS_I(inode); unsigned long long iblocks;
iblocks = le64_to_cpu(F2FS_INODE(node_page)->i_blocks); @@ -215,6 +216,17 @@ static bool sanity_check_inode(struct inode *inode, struct page *node_page) return false; }
+ if (fi->i_extra_isize > F2FS_TOTAL_EXTRA_ATTR_SIZE || + fi->i_extra_isize % sizeof(__le32)) { + set_sbi_flag(sbi, SBI_NEED_FSCK); + f2fs_msg(sbi->sb, KERN_WARNING, + "%s: inode (ino=%lx) has corrupted i_extra_isize: %d, " + "max: %zu", + __func__, inode->i_ino, fi->i_extra_isize, + F2FS_TOTAL_EXTRA_ATTR_SIZE); + return false; + } + if (F2FS_I(inode)->extent_tree) { struct extent_info *ei = &F2FS_I(inode)->extent_tree->largest;
@@ -280,14 +292,14 @@ static int do_read_inode(struct inode *inode)
get_inline_info(inode, ri);
+ fi->i_extra_isize = f2fs_has_extra_attr(inode) ? + le16_to_cpu(ri->i_extra_isize) : 0; + if (!sanity_check_inode(inode, node_page)) { f2fs_put_page(node_page, 1); return -EINVAL; }
- fi->i_extra_isize = f2fs_has_extra_attr(inode) ? - le16_to_cpu(ri->i_extra_isize) : 0; - /* check data exist */ if (f2fs_has_inline_data(inode) && !f2fs_exist_data(inode)) __recover_inline_status(inode, node_page);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit e494c2f995d6181d6e29c4927d68e0f295ecf75b upstream.
After fuzzing, cp_pack_start_sum could be corrupted, so current log's summary info should be wrong due to loading incorrect summary block. Then, if segment's type in current log is exceeded NR_CURSEG_TYPE, it can lead accessing invalid dirty_i->dirty_segmap bitmap finally.
Add sanity check for cp_pack_start_sum to fix this issue.
https://bugzilla.kernel.org/show_bug.cgi?id=200419
- Reproduce
- Kernel message (f2fs-dev w/ KASAN) [ 3117.578432] F2FS-fs (loop0): Invalid log blocks per segment (8)
[ 3117.578445] F2FS-fs (loop0): Can't find valid F2FS filesystem in 2th superblock [ 3117.581364] F2FS-fs (loop0): invalid crc_offset: 30716 [ 3117.583564] WARNING: CPU: 1 PID: 1225 at fs/f2fs/checkpoint.c:90 __get_meta_page+0x448/0x4b0 [ 3117.583570] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy [ 3117.584014] CPU: 1 PID: 1225 Comm: mount Not tainted 4.17.0+ #1 [ 3117.584017] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3117.584022] RIP: 0010:__get_meta_page+0x448/0x4b0 [ 3117.584023] Code: 00 49 8d bc 24 84 00 00 00 e8 74 54 da ff 41 83 8c 24 84 00 00 00 08 4c 89 f6 4c 89 ef e8 c0 d9 95 00 48 89 ef e8 18 e3 00 00 <0f> 0b f0 80 4d 48 04 e9 0f fe ff ff 0f 0b 48 89 c7 48 89 04 24 e8 [ 3117.584072] RSP: 0018:ffff88018eb678c0 EFLAGS: 00010286 [ 3117.584082] RAX: ffff88018f0a6a78 RBX: ffffea0007a46600 RCX: ffffffff9314d1b2 [ 3117.584085] RDX: ffffffff00000001 RSI: 0000000000000000 RDI: ffff88018f0a6a98 [ 3117.584087] RBP: ffff88018ebe9980 R08: 0000000000000002 R09: 0000000000000001 [ 3117.584090] R10: 0000000000000001 R11: ffffed00326e4450 R12: ffff880193722200 [ 3117.584092] R13: ffff88018ebe9afc R14: 0000000000000206 R15: ffff88018eb67900 [ 3117.584096] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 3117.584098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3117.584101] CR2: 00000000016f21b8 CR3: 0000000191c22000 CR4: 00000000000006e0 [ 3117.584112] Call Trace: [ 3117.584121] ? f2fs_set_meta_page_dirty+0x150/0x150 [ 3117.584127] ? f2fs_build_segment_manager+0xbf9/0x3190 [ 3117.584133] ? f2fs_npages_for_summary_flush+0x75/0x120 [ 3117.584145] f2fs_build_segment_manager+0xda8/0x3190 [ 3117.584151] ? f2fs_get_valid_checkpoint+0x298/0xa00 [ 3117.584156] ? f2fs_flush_sit_entries+0x10e0/0x10e0 [ 3117.584184] ? map_id_range_down+0x17c/0x1b0 [ 3117.584188] ? __put_user_ns+0x30/0x30 [ 3117.584206] ? find_next_bit+0x53/0x90 [ 3117.584237] ? cpumask_next+0x16/0x20 [ 3117.584249] f2fs_fill_super+0x1948/0x2b40 [ 3117.584258] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.584279] ? sget_userns+0x65e/0x690 [ 3117.584296] ? set_blocksize+0x88/0x130 [ 3117.584302] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.584305] mount_bdev+0x1c0/0x200 [ 3117.584310] mount_fs+0x5c/0x190 [ 3117.584320] vfs_kern_mount+0x64/0x190 [ 3117.584330] do_mount+0x2e4/0x1450 [ 3117.584343] ? lockref_put_return+0x130/0x130 [ 3117.584347] ? copy_mount_string+0x20/0x20 [ 3117.584357] ? kasan_unpoison_shadow+0x31/0x40 [ 3117.584362] ? kasan_kmalloc+0xa6/0xd0 [ 3117.584373] ? memcg_kmem_put_cache+0x16/0x90 [ 3117.584377] ? __kmalloc_track_caller+0x196/0x210 [ 3117.584383] ? _copy_from_user+0x61/0x90 [ 3117.584396] ? memdup_user+0x3e/0x60 [ 3117.584401] ksys_mount+0x7e/0xd0 [ 3117.584405] __x64_sys_mount+0x62/0x70 [ 3117.584427] do_syscall_64+0x73/0x160 [ 3117.584440] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.584455] RIP: 0033:0x7f5693f14b9a [ 3117.584456] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 3117.584505] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 3117.584510] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a [ 3117.584512] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040 [ 3117.584514] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 3117.584516] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040 [ 3117.584519] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003 [ 3117.584523] ---[ end trace a8e0d899985faf31 ]--- [ 3117.685663] F2FS-fs (loop0): f2fs_check_nid_range: out-of-range nid=2, run fsck to fix. [ 3117.685673] F2FS-fs (loop0): recover_data: ino = 2 (i_size: recover) recovered = 1, err = 0 [ 3117.685707] ================================================================== [ 3117.685955] BUG: KASAN: slab-out-of-bounds in __remove_dirty_segment+0xdd/0x1e0 [ 3117.686175] Read of size 8 at addr ffff88018f0a63d0 by task mount/1225
[ 3117.686477] CPU: 0 PID: 1225 Comm: mount Tainted: G W 4.17.0+ #1 [ 3117.686481] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3117.686483] Call Trace: [ 3117.686494] dump_stack+0x71/0xab [ 3117.686512] print_address_description+0x6b/0x290 [ 3117.686517] kasan_report+0x28e/0x390 [ 3117.686522] ? __remove_dirty_segment+0xdd/0x1e0 [ 3117.686527] __remove_dirty_segment+0xdd/0x1e0 [ 3117.686532] locate_dirty_segment+0x189/0x190 [ 3117.686538] f2fs_allocate_new_segments+0xa9/0xe0 [ 3117.686543] recover_data+0x703/0x2c20 [ 3117.686547] ? f2fs_recover_fsync_data+0x48f/0xd50 [ 3117.686553] ? ksys_mount+0x7e/0xd0 [ 3117.686564] ? policy_nodemask+0x1a/0x90 [ 3117.686567] ? policy_node+0x56/0x70 [ 3117.686571] ? add_fsync_inode+0xf0/0xf0 [ 3117.686592] ? blk_finish_plug+0x44/0x60 [ 3117.686597] ? f2fs_ra_meta_pages+0x38b/0x5e0 [ 3117.686602] ? find_inode_fast+0xac/0xc0 [ 3117.686606] ? f2fs_is_valid_blkaddr+0x320/0x320 [ 3117.686618] ? __radix_tree_lookup+0x150/0x150 [ 3117.686633] ? dqget+0x670/0x670 [ 3117.686648] ? pagecache_get_page+0x29/0x410 [ 3117.686656] ? kmem_cache_alloc+0x176/0x1e0 [ 3117.686660] ? f2fs_is_valid_blkaddr+0x11d/0x320 [ 3117.686664] f2fs_recover_fsync_data+0xc23/0xd50 [ 3117.686670] ? f2fs_space_for_roll_forward+0x60/0x60 [ 3117.686674] ? rb_insert_color+0x323/0x3d0 [ 3117.686678] ? f2fs_recover_orphan_inodes+0xa5/0x700 [ 3117.686683] ? proc_register+0x153/0x1d0 [ 3117.686686] ? f2fs_remove_orphan_inode+0x10/0x10 [ 3117.686695] ? f2fs_attr_store+0x50/0x50 [ 3117.686700] ? proc_create_single_data+0x52/0x60 [ 3117.686707] f2fs_fill_super+0x1d06/0x2b40 [ 3117.686728] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.686735] ? sget_userns+0x65e/0x690 [ 3117.686740] ? set_blocksize+0x88/0x130 [ 3117.686745] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.686748] mount_bdev+0x1c0/0x200 [ 3117.686753] mount_fs+0x5c/0x190 [ 3117.686758] vfs_kern_mount+0x64/0x190 [ 3117.686762] do_mount+0x2e4/0x1450 [ 3117.686769] ? lockref_put_return+0x130/0x130 [ 3117.686773] ? copy_mount_string+0x20/0x20 [ 3117.686777] ? kasan_unpoison_shadow+0x31/0x40 [ 3117.686780] ? kasan_kmalloc+0xa6/0xd0 [ 3117.686786] ? memcg_kmem_put_cache+0x16/0x90 [ 3117.686790] ? __kmalloc_track_caller+0x196/0x210 [ 3117.686795] ? _copy_from_user+0x61/0x90 [ 3117.686801] ? memdup_user+0x3e/0x60 [ 3117.686804] ksys_mount+0x7e/0xd0 [ 3117.686809] __x64_sys_mount+0x62/0x70 [ 3117.686816] do_syscall_64+0x73/0x160 [ 3117.686824] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.686829] RIP: 0033:0x7f5693f14b9a [ 3117.686830] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 3117.686887] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 3117.686892] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a [ 3117.686894] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040 [ 3117.686896] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 3117.686899] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040 [ 3117.686901] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003
[ 3117.687005] Allocated by task 1225: [ 3117.687152] kasan_kmalloc+0xa6/0xd0 [ 3117.687157] kmem_cache_alloc_trace+0xfd/0x200 [ 3117.687161] f2fs_build_segment_manager+0x2d09/0x3190 [ 3117.687165] f2fs_fill_super+0x1948/0x2b40 [ 3117.687168] mount_bdev+0x1c0/0x200 [ 3117.687171] mount_fs+0x5c/0x190 [ 3117.687174] vfs_kern_mount+0x64/0x190 [ 3117.687177] do_mount+0x2e4/0x1450 [ 3117.687180] ksys_mount+0x7e/0xd0 [ 3117.687182] __x64_sys_mount+0x62/0x70 [ 3117.687186] do_syscall_64+0x73/0x160 [ 3117.687190] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 3117.687285] Freed by task 19: [ 3117.687412] __kasan_slab_free+0x137/0x190 [ 3117.687416] kfree+0x8b/0x1b0 [ 3117.687460] ttm_bo_man_put_node+0x61/0x80 [ttm] [ 3117.687476] ttm_bo_cleanup_refs+0x15f/0x250 [ttm] [ 3117.687492] ttm_bo_delayed_delete+0x2f0/0x300 [ttm] [ 3117.687507] ttm_bo_delayed_workqueue+0x17/0x50 [ttm] [ 3117.687528] process_one_work+0x2f9/0x740 [ 3117.687531] worker_thread+0x78/0x6b0 [ 3117.687541] kthread+0x177/0x1c0 [ 3117.687545] ret_from_fork+0x35/0x40
[ 3117.687638] The buggy address belongs to the object at ffff88018f0a6300 which belongs to the cache kmalloc-192 of size 192 [ 3117.688014] The buggy address is located 16 bytes to the right of 192-byte region [ffff88018f0a6300, ffff88018f0a63c0) [ 3117.688382] The buggy address belongs to the page: [ 3117.688554] page:ffffea00063c2980 count:1 mapcount:0 mapping:ffff8801f3403180 index:0x0 [ 3117.688788] flags: 0x17fff8000000100(slab) [ 3117.688944] raw: 017fff8000000100 ffffea00063c2840 0000000e0000000e ffff8801f3403180 [ 3117.689166] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 3117.689386] page dumped because: kasan: bad access detected
[ 3117.689653] Memory state around the buggy address: [ 3117.689816] ffff88018f0a6280: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc [ 3117.690027] ffff88018f0a6300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 3117.690239] >ffff88018f0a6380: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 3117.690448] ^ [ 3117.690644] ffff88018f0a6400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 3117.690868] ffff88018f0a6480: 00 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 3117.691077] ================================================================== [ 3117.691290] Disabling lock debugging due to kernel taint [ 3117.693893] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 3117.694120] PGD 80000001f01bc067 P4D 80000001f01bc067 PUD 1d9638067 PMD 0 [ 3117.694338] Oops: 0002 [#1] SMP KASAN PTI [ 3117.694490] CPU: 1 PID: 1225 Comm: mount Tainted: G B W 4.17.0+ #1 [ 3117.694703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [ 3117.695073] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0 [ 3117.695246] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7 [ 3117.695793] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292 [ 3117.695969] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000 [ 3117.696182] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297 [ 3117.696391] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb [ 3117.696604] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019 [ 3117.696813] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0 [ 3117.697032] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 3117.697280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3117.702357] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0 [ 3117.707235] Call Trace: [ 3117.712077] locate_dirty_segment+0x189/0x190 [ 3117.716891] f2fs_allocate_new_segments+0xa9/0xe0 [ 3117.721617] recover_data+0x703/0x2c20 [ 3117.726316] ? f2fs_recover_fsync_data+0x48f/0xd50 [ 3117.730957] ? ksys_mount+0x7e/0xd0 [ 3117.735573] ? policy_nodemask+0x1a/0x90 [ 3117.740198] ? policy_node+0x56/0x70 [ 3117.744829] ? add_fsync_inode+0xf0/0xf0 [ 3117.749487] ? blk_finish_plug+0x44/0x60 [ 3117.754152] ? f2fs_ra_meta_pages+0x38b/0x5e0 [ 3117.758831] ? find_inode_fast+0xac/0xc0 [ 3117.763448] ? f2fs_is_valid_blkaddr+0x320/0x320 [ 3117.768046] ? __radix_tree_lookup+0x150/0x150 [ 3117.772603] ? dqget+0x670/0x670 [ 3117.777159] ? pagecache_get_page+0x29/0x410 [ 3117.781648] ? kmem_cache_alloc+0x176/0x1e0 [ 3117.786067] ? f2fs_is_valid_blkaddr+0x11d/0x320 [ 3117.790476] f2fs_recover_fsync_data+0xc23/0xd50 [ 3117.794790] ? f2fs_space_for_roll_forward+0x60/0x60 [ 3117.799086] ? rb_insert_color+0x323/0x3d0 [ 3117.803304] ? f2fs_recover_orphan_inodes+0xa5/0x700 [ 3117.807563] ? proc_register+0x153/0x1d0 [ 3117.811766] ? f2fs_remove_orphan_inode+0x10/0x10 [ 3117.815947] ? f2fs_attr_store+0x50/0x50 [ 3117.820087] ? proc_create_single_data+0x52/0x60 [ 3117.824262] f2fs_fill_super+0x1d06/0x2b40 [ 3117.828367] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.832432] ? sget_userns+0x65e/0x690 [ 3117.836500] ? set_blocksize+0x88/0x130 [ 3117.840501] ? f2fs_commit_super+0x1a0/0x1a0 [ 3117.844420] mount_bdev+0x1c0/0x200 [ 3117.848275] mount_fs+0x5c/0x190 [ 3117.852053] vfs_kern_mount+0x64/0x190 [ 3117.855810] do_mount+0x2e4/0x1450 [ 3117.859441] ? lockref_put_return+0x130/0x130 [ 3117.862996] ? copy_mount_string+0x20/0x20 [ 3117.866417] ? kasan_unpoison_shadow+0x31/0x40 [ 3117.869719] ? kasan_kmalloc+0xa6/0xd0 [ 3117.872948] ? memcg_kmem_put_cache+0x16/0x90 [ 3117.876121] ? __kmalloc_track_caller+0x196/0x210 [ 3117.879333] ? _copy_from_user+0x61/0x90 [ 3117.882467] ? memdup_user+0x3e/0x60 [ 3117.885604] ksys_mount+0x7e/0xd0 [ 3117.888700] __x64_sys_mount+0x62/0x70 [ 3117.891742] do_syscall_64+0x73/0x160 [ 3117.894692] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 3117.897669] RIP: 0033:0x7f5693f14b9a [ 3117.900563] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48 [ 3117.906922] RSP: 002b:00007fff27346488 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 3117.910159] RAX: ffffffffffffffda RBX: 00000000016e2030 RCX: 00007f5693f14b9a [ 3117.913469] RDX: 00000000016e2210 RSI: 00000000016e3f30 RDI: 00000000016ee040 [ 3117.916764] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013 [ 3117.920071] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 00000000016ee040 [ 3117.923393] R13: 00000000016e2210 R14: 0000000000000000 R15: 0000000000000003 [ 3117.926680] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm snd_timer joydev input_leds serio_raw snd soundcore mac_hid i2c_piix4 ib_iser rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi btrfs zstd_decompress zstd_compress xxhash raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 8139too qxl ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel psmouse aes_x86_64 8139cp crypto_simd cryptd mii glue_helper pata_acpi floppy [ 3117.949979] CR2: 0000000000000000 [ 3117.954283] ---[ end trace a8e0d899985faf32 ]--- [ 3117.958575] RIP: 0010:__remove_dirty_segment+0xe2/0x1e0 [ 3117.962810] Code: c4 48 89 c7 e8 cf bb d7 ff 45 0f b6 24 24 41 83 e4 3f 44 88 64 24 07 41 83 e4 3f 4a 8d 7c e3 08 e8 b3 bc d7 ff 4a 8b 4c e3 08 <f0> 4c 0f b3 29 0f 82 94 00 00 00 48 8d bd 20 04 00 00 e8 97 bb d7 [ 3117.971789] RSP: 0018:ffff88018eb67638 EFLAGS: 00010292 [ 3117.976333] RAX: 0000000000000000 RBX: ffff88018f0a6300 RCX: 0000000000000000 [ 3117.980926] RDX: 0000000000000000 RSI: 0000000000000297 RDI: 0000000000000297 [ 3117.985497] RBP: ffff88018ebe9980 R08: ffffed003e743ebb R09: ffffed003e743ebb [ 3117.990098] R10: 0000000000000001 R11: ffffed003e743eba R12: 0000000000000019 [ 3117.994761] R13: 0000000000000014 R14: 0000000000000320 R15: ffff88018ebe99e0 [ 3117.999392] FS: 00007f5694636840(0000) GS:ffff8801f3b00000(0000) knlGS:0000000000000000 [ 3118.004096] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3118.008816] CR2: 00007fe89bb1a000 CR3: 0000000191c22000 CR4: 00000000000006e0
- Location https://elixir.bootlin.com/linux/v4.18-rc3/source/fs/f2fs/segment.c#L775 if (test_and_clear_bit(segno, dirty_i->dirty_segmap[t])) dirty_i->nr_dirty[t]--; Here dirty_i->dirty_segmap[t] can be NULL which leads to crash in test_and_clear_bit()
Reported-by Wen Xu wen.xu@gatech.edu Signed-off-by: Chao Yu yuchao0@huawei.com Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org [bwh: Backported to 4.14: The function is called sanity_check_ckpt()] Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/f2fs/checkpoint.c | 8 ++++---- fs/f2fs/super.c | 12 ++++++++++++ 2 files changed, 16 insertions(+), 4 deletions(-)
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c index c81cd5057b8e..624817eeb25e 100644 --- a/fs/f2fs/checkpoint.c +++ b/fs/f2fs/checkpoint.c @@ -825,15 +825,15 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi) cp_block = (struct f2fs_checkpoint *)page_address(cur_page); memcpy(sbi->ckpt, cp_block, blk_size);
- /* Sanity checking of checkpoint */ - if (sanity_check_ckpt(sbi)) - goto free_fail_no_cp; - if (cur_page == cp1) sbi->cur_cp_pack = 1; else sbi->cur_cp_pack = 2;
+ /* Sanity checking of checkpoint */ + if (sanity_check_ckpt(sbi)) + goto free_fail_no_cp; + if (cp_blks <= 1) goto done;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 9fafb1404f39..de4de4ebe64c 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1957,6 +1957,7 @@ int sanity_check_ckpt(struct f2fs_sb_info *sbi) unsigned int sit_bitmap_size, nat_bitmap_size; unsigned int log_blocks_per_seg; unsigned int segment_count_main; + unsigned int cp_pack_start_sum, cp_payload; block_t user_block_count; int i;
@@ -2017,6 +2018,17 @@ int sanity_check_ckpt(struct f2fs_sb_info *sbi) return 1; }
+ cp_pack_start_sum = __start_sum_addr(sbi); + cp_payload = __cp_payload(sbi); + if (cp_pack_start_sum < cp_payload + 1 || + cp_pack_start_sum > blocks_per_seg - 1 - + NR_CURSEG_TYPE) { + f2fs_msg(sbi->sb, KERN_ERR, + "Wrong cp_pack_start_sum: %u", + cp_pack_start_sum); + return 1; + } + if (unlikely(f2fs_cp_error(sbi))) { f2fs_msg(sbi->sb, KERN_ERR, "A bug case: need to run fsck"); return 1;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
commit 7b38460dc8e4eafba06c78f8e37099d3b34d473c upstream.
Kanda Motohiro reported that expanding a tiny xattr into a large xattr fails on XFS because we remove the tiny xattr from a shortform fork and then try to re-add it after converting the fork to extents format having not removed the ATTR_REPLACE flag. This fails because the attr is no longer present, causing a fs shutdown.
This is derived from the patch in his bug report, but we really shouldn't ignore a nonzero retval from the remove call.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199119 Reported-by: kanda.motohiro@gmail.com Reviewed-by: Dave Chinner dchinner@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Ben Hutchings ben.hutchings@codethink.co.uk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/libxfs/xfs_attr.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c index 6249c92671de..ea66f04f46f7 100644 --- a/fs/xfs/libxfs/xfs_attr.c +++ b/fs/xfs/libxfs/xfs_attr.c @@ -501,7 +501,14 @@ xfs_attr_shortform_addname(xfs_da_args_t *args) if (args->flags & ATTR_CREATE) return retval; retval = xfs_attr_shortform_remove(args); - ASSERT(retval == 0); + if (retval) + return retval; + /* + * Since we have removed the old attr, clear ATTR_REPLACE so + * that the leaf format add routine won't trip over the attr + * not being around. + */ + args->flags &= ~ATTR_REPLACE; }
if (args->namelen >= XFS_ATTR_SF_ENTSIZE_MAX ||
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
This reverts commit e87efc44dd36ba3db59847c418354711ebad779b which was upstream commit 4ec7cece87b3ed21ffcd407c62fb2f151a366bc1.
From Dietmar May's report on the stable mailing list
(https://www.spinics.net/lists/stable/msg272201.html):
I've run into some problems which appear due to (a) recent patch(es) on the wlcore wifi driver.
4.4.160 - commit 3fdd34643ffc378b5924941fad40352c04610294 4.9.131 - commit afeeecc764436f31d4447575bb9007732333818c
Earlier versions (4.9.130 and 4.4.159 - tested back to 4.4.49) do not exhibit this problem. It is still present in 4.9.141.
master as of 4.20.0-rc4 does not exhibit this problem.
Basically, during client association when in AP mode (running hostapd), handshake may or may not complete following a noticeable delay. If successful, then the driver fails consistently in warn_slowpath_null during disassociation. If unsuccessful, the wifi client attempts multiple times, sometimes failing repeatedly. I've had clients unable to connect for 3-5 minutes during testing, with the syslog filled with dozens of backtraces. syslog details are below.
I'm working on an embedded device with a TI 3352 ARM processor and a murata wl1271 module in sdio mode. We're running a fully patched ubuntu 18.04 ARM build, with a kernel built from kernel.org's stable/linux repo https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-4.9.y&id=afeeecc764436f31d4447575bb9007732333818c. Relevant parts of the kernel config are included below.
The commit message states:
/I've only seen this few times with the runtime PM patches enabled so this one is probably not needed before that. This seems to work currently based on the current PM implementation timer. Let's apply this separately though in case others are hitting this issue./
We're not doing anything explicit with power management. The device is an IoT edge gateway with battery backup, normally running on wall power. The battery is currently used solely to shut down the system cleanly to avoid filesystem corruption.
The device tree is configured to keep power in suspend; but the device should never suspend, so in our case, there is no need to call wl1271_ps_elp_wakeup() or wl1271_ps_elp_sleep(), as occurs in the patch.
Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ti/wlcore/cmd.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/net/wireless/ti/wlcore/cmd.c b/drivers/net/wireless/ti/wlcore/cmd.c index f48c3f62966d..761cf8573a80 100644 --- a/drivers/net/wireless/ti/wlcore/cmd.c +++ b/drivers/net/wireless/ti/wlcore/cmd.c @@ -35,7 +35,6 @@ #include "wl12xx_80211.h" #include "cmd.h" #include "event.h" -#include "ps.h" #include "tx.h" #include "hw_ops.h"
@@ -192,10 +191,6 @@ int wlcore_cmd_wait_for_event_or_timeout(struct wl1271 *wl,
timeout_time = jiffies + msecs_to_jiffies(WL1271_EVENT_TIMEOUT);
- ret = wl1271_ps_elp_wakeup(wl); - if (ret < 0) - return ret; - do { if (time_after(jiffies, timeout_time)) { wl1271_debug(DEBUG_CMD, "timeout waiting for event %d", @@ -227,7 +222,6 @@ int wlcore_cmd_wait_for_event_or_timeout(struct wl1271 *wl, } while (!event);
out: - wl1271_ps_elp_sleep(wl); kfree(events_vector); return ret; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Machata petrm@mellanox.com
[ Upstream commit b5dd186d10ba59e6b5ba60e42b3b083df56df6f3 ]
When a packet is trapped and the corresponding SKB marked as already-forwarded, it retains this marking even after it is forwarded across veth links into another bridge. There, since it ingresses the bridge over veth, which doesn't have offload_fwd_mark, it triggers a warning in nbp_switchdev_frame_mark().
Then nbp_switchdev_allowed_egress() decides not to allow egress from this bridge through another veth, because the SKB is already marked, and the mark (of 0) of course matches. Thus the packet is incorrectly blocked.
Solve by resetting offload_fwd_mark() in skb_scrub_packet(). That function is called from tunnels and also from veth, and thus catches the cases where traffic is forwarded between bridges and transformed in a way that invalidates the marking.
Fixes: 6bc506b4fb06 ("bridge: switchdev: Add forward mark support for stacked devices") Fixes: abf4bb6b63d0 ("skbuff: Add the offload_mr_fwd_mark field") Signed-off-by: Petr Machata petrm@mellanox.com Suggested-by: Ido Schimmel idosch@mellanox.com Acked-by: Jiri Pirko jiri@mellanox.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/skbuff.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4882,6 +4882,10 @@ void skb_scrub_packet(struct sk_buff *sk nf_reset(skb); nf_reset_trace(skb);
+#ifdef CONFIG_NET_SWITCHDEV + skb->offload_fwd_mark = 0; +#endif + if (!xnet) return;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit 6d0f60b0f8588fd4380ea5df9601e12fddd55ce2 ]
Set xdp_prog pointer to NULL if bpf_prog_add fails since that routine reports the error code instead of NULL in case of failure and xdp_prog pointer value is used in the driver to verify if XDP is currently enabled. Moreover report the error code to userspace if nicvf_xdp_setup fails
Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cavium/thunder/nicvf_main.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c @@ -1691,6 +1691,7 @@ static int nicvf_xdp_setup(struct nicvf bool if_up = netif_running(nic->netdev); struct bpf_prog *old_prog; bool bpf_attached = false; + int ret = 0;
/* For now just support only the usual MTU sized frames */ if (prog && (dev->mtu > 1500)) { @@ -1724,8 +1725,12 @@ static int nicvf_xdp_setup(struct nicvf if (nic->xdp_prog) { /* Attach BPF program */ nic->xdp_prog = bpf_prog_add(nic->xdp_prog, nic->rx_queues - 1); - if (!IS_ERR(nic->xdp_prog)) + if (!IS_ERR(nic->xdp_prog)) { bpf_attached = true; + } else { + ret = PTR_ERR(nic->xdp_prog); + nic->xdp_prog = NULL; + } }
/* Calculate Tx queues needed for XDP and network stack */ @@ -1737,7 +1742,7 @@ static int nicvf_xdp_setup(struct nicvf netif_trans_update(nic->netdev); }
- return 0; + return ret; }
static int nicvf_xdp(struct net_device *netdev, struct netdev_xdp *xdp)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit e59ff2c49ae16e1d179de679aca81405829aee6c ]
We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we can receive partial csumed packets with metadata kept in the vnet_hdr. This may have several side effects:
- It could be overridden by header adjustment, thus is might be not correct after XDP processing. - There's no way to pass such metadata information through XDP_REDIRECT to another driver. - XDP does not support checksum offload right now.
So simply disable guest csum if possible in this the case of XDP.
Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Cc: Jesper Dangaard Brouer brouer@redhat.com Cc: Pavel Popa pashinho1990@gmail.com Cc: David Ahern dsahern@gmail.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -61,7 +61,8 @@ static const unsigned long guest_offload VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, VIRTIO_NET_F_GUEST_ECN, - VIRTIO_NET_F_GUEST_UFO + VIRTIO_NET_F_GUEST_UFO, + VIRTIO_NET_F_GUEST_CSUM };
struct virtnet_stats { @@ -1939,9 +1940,6 @@ static int virtnet_clear_guest_offloads( if (!vi->guest_offloads) return 0;
- if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM)) - offloads = 1ULL << VIRTIO_NET_F_GUEST_CSUM; - return virtnet_set_guest_offloads(vi, offloads); }
@@ -1951,8 +1949,6 @@ static int virtnet_restore_guest_offload
if (!vi->guest_offloads) return 0; - if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM)) - offloads |= 1ULL << VIRTIO_NET_F_GUEST_CSUM;
return virtnet_set_guest_offloads(vi, offloads); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Wang jasowang@redhat.com
[ Upstream commit 18ba58e1c234ea1a2d9835ac8c1735d965ce4640 ]
We don't support partial csumed packet since its metadata will be lost or incorrect during XDP processing. So fail the XDP set if guest_csum feature is negotiated.
Fixes: f600b6905015 ("virtio_net: Add XDP support") Reported-by: Jesper Dangaard Brouer brouer@redhat.com Cc: Jesper Dangaard Brouer brouer@redhat.com Cc: Pavel Popa pashinho1990@gmail.com Cc: David Ahern dsahern@gmail.com Signed-off-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/virtio_net.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1966,8 +1966,9 @@ static int virtnet_xdp_set(struct net_de && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) || virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) || virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) || - virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO))) { - NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO, disable LRO first"); + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) || + virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) { + NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing LRO/CSUM, disable LRO/CSUM first"); return -EOPNOTSUPP; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lorenzo Bianconi lorenzo.bianconi@redhat.com
[ Upstream commit ef2a7cf1d8831535b8991459567b385661eb4a36 ]
Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine since it is used to check if tso dma descriptor queue has been previously allocated. The issue can be triggered with the following reproducer:
$ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o $ip link set dev enP2p1s0v0 xdpdrv off
[ 341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0 [ 341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018 [ 341.521874] pstate: 60400005 (nZCv daif +PAN -UAO) [ 341.526654] pc : __vunmap+0x98/0xe0 [ 341.530132] lr : __vunmap+0x98/0xe0 [ 341.533609] sp : ffff00001c5db860 [ 341.536913] x29: ffff00001c5db860 x28: 0000000000020000 [ 341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000 [ 341.547515] x25: 0000000000000000 x24: 00000000fbd00000 [ 341.552816] x23: 0000000000000000 x22: ffff810feb5090b0 [ 341.558117] x21: 0000000000000000 x20: 0000000000000000 [ 341.563418] x19: ffff000017e57000 x18: 0000000000000000 [ 341.568719] x17: 0000000000000000 x16: 0000000000000000 [ 341.574020] x15: 0000000000000010 x14: ffffffffffffffff [ 341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f [ 341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510 [ 341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8 [ 341.595224] x7 : 3430303030303030 x6 : 00000000000006ef [ 341.600525] x5 : 00000000003fffff x4 : 0000000000000000 [ 341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff [ 341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038 [ 341.616428] Call trace: [ 341.618866] __vunmap+0x98/0xe0 [ 341.621997] vunmap+0x3c/0x50 [ 341.624961] arch_dma_free+0x68/0xa0 [ 341.628534] dma_direct_free+0x50/0x80 [ 341.632285] nicvf_free_resources+0x160/0x2d8 [nicvf] [ 341.637327] nicvf_config_data_transfer+0x174/0x5e8 [nicvf] [ 341.642890] nicvf_stop+0x298/0x340 [nicvf] [ 341.647066] __dev_close_many+0x9c/0x108 [ 341.650977] dev_close_many+0xa4/0x158 [ 341.654720] rollback_registered_many+0x140/0x530 [ 341.659414] rollback_registered+0x54/0x80 [ 341.663499] unregister_netdevice_queue+0x9c/0xe8 [ 341.668192] unregister_netdev+0x28/0x38 [ 341.672106] nicvf_remove+0xa4/0xa8 [nicvf] [ 341.676280] nicvf_shutdown+0x20/0x30 [nicvf] [ 341.680630] pci_device_shutdown+0x44/0x88 [ 341.684720] device_shutdown+0x144/0x250 [ 341.688640] kernel_restart_prepare+0x44/0x50 [ 341.692986] kernel_restart+0x20/0x68 [ 341.696638] __se_sys_reboot+0x210/0x238 [ 341.700550] __arm64_sys_reboot+0x24/0x30 [ 341.704555] el0_svc_handler+0x94/0x110 [ 341.708382] el0_svc+0x8/0xc [ 341.711252] ---[ end trace 3f4019c8439959c9 ]--- [ 341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4 [ 341.723872] flags: 0x1fffe000000000() [ 341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000 [ 341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 [ 341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
where xdp_dummy.c is a simple bpf program that forwards the incoming frames to the network stack (available here: https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_...)
Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support") Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller") Signed-off-by: Lorenzo Bianconi lorenzo.bianconi@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -585,10 +585,12 @@ static void nicvf_free_snd_queue(struct if (!sq->dmem.base) return;
- if (sq->tso_hdrs) + if (sq->tso_hdrs) { dma_free_coherent(&nic->pdev->dev, sq->dmem.q_len * TSO_HEADER_SIZE, sq->tso_hdrs, sq->tso_hdrs_phys); + sq->tso_hdrs = NULL; + }
/* Free pending skbs in the queue */ smp_rmb();
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Willem de Bruijn willemb@google.com
[ Upstream commit 5cd8d46ea1562be80063f53c7c6a5f40224de623 ]
tpacket_snd sends packets with user pages linked into skb frags. It notifies that pages can be reused when the skb is released by setting skb->destructor to tpacket_destruct_skb.
This can cause data corruption if the skb is orphaned (e.g., on transmit through veth) or cloned (e.g., on mirror to another psock).
Create a kernel-private copy of data in these cases, same as tun/tap zerocopy transmission. Reuse that infrastructure: mark the skb as SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
Unlike other zerocopy packets, do not set shinfo destructor_arg to struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify when the original skb is released and a timestamp is recorded. Do not change this timestamp behavior. The ubuf_info->callback is not needed anyway, as no zerocopy notification is expected.
Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The resulting value is not a valid ubuf_info pointer, nor a valid tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
The fix relies on features introduced in commit 52267790ef52 ("sock: add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
Tested with from `./in_netns.sh ./txring_overwrite` from http://github.com/wdebruij/kerneltools/tests
Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap") Reported-by: Anand H. Krishnan anandhkrishnan@gmail.com Signed-off-by: Willem de Bruijn willemb@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/skbuff.h | 18 +++++++++++++++++- net/packet/af_packet.c | 4 ++-- 2 files changed, 19 insertions(+), 3 deletions(-)
--- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1288,6 +1288,22 @@ static inline void skb_zcopy_set(struct } }
+static inline void skb_zcopy_set_nouarg(struct sk_buff *skb, void *val) +{ + skb_shinfo(skb)->destructor_arg = (void *)((uintptr_t) val | 0x1UL); + skb_shinfo(skb)->tx_flags |= SKBTX_ZEROCOPY_FRAG; +} + +static inline bool skb_zcopy_is_nouarg(struct sk_buff *skb) +{ + return (uintptr_t) skb_shinfo(skb)->destructor_arg & 0x1UL; +} + +static inline void *skb_zcopy_get_nouarg(struct sk_buff *skb) +{ + return (void *)((uintptr_t) skb_shinfo(skb)->destructor_arg & ~0x1UL); +} + /* Release a reference on a zerocopy structure */ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy) { @@ -1297,7 +1313,7 @@ static inline void skb_zcopy_clear(struc if (uarg->callback == sock_zerocopy_callback) { uarg->zerocopy = uarg->zerocopy && zerocopy; sock_zerocopy_put(uarg); - } else { + } else if (!skb_zcopy_is_nouarg(skb)) { uarg->callback(uarg, zerocopy); }
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2433,7 +2433,7 @@ static void tpacket_destruct_skb(struct void *ph; __u32 ts;
- ph = skb_shinfo(skb)->destructor_arg; + ph = skb_zcopy_get_nouarg(skb); packet_dec_pending(&po->tx_ring);
ts = __packet_set_timestamp(po, ph, skb); @@ -2499,7 +2499,7 @@ static int tpacket_fill_skb(struct packe skb->priority = po->sk.sk_priority; skb->mark = po->sk.sk_mark; sock_tx_timestamp(&po->sk, sockc->tsflags, &skb_shinfo(skb)->tx_flags); - skb_shinfo(skb)->destructor_arg = ph.raw; + skb_zcopy_set_nouarg(skb, ph.raw);
skb_reserve(skb, hlen); skb_reset_network_header(skb);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pan Bian bianpan2016@163.com
[ Upstream commit cfc435198f53a6fa1f656d98466b24967ff457d0 ]
skb is freed via dev_kfree_skb_any, however, skb->len is read then. This may result in a use-after-free bug.
Fixes: e6161d64263 ("rapidio/rionet: rework driver initialization and removal") Signed-off-by: Pan Bian bianpan2016@163.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/rionet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/net/rionet.c +++ b/drivers/net/rionet.c @@ -216,9 +216,9 @@ static int rionet_start_xmit(struct sk_b * it just report sending a packet to the target * (without actual packet transfer). */ - dev_kfree_skb_any(skb); ndev->stats.tx_packets++; ndev->stats.tx_bytes += skb->len; + dev_kfree_skb_any(skb); } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Julian Wiedmann jwi@linux.ibm.com
[ Upstream commit 9a764c1e59684c0358e16ccaafd870629f2cfe67 ]
The response for a SNMP request can consist of multiple parts, which the cmd callback stages into a kernel buffer until all parts have been received. If the callback detects that the staging buffer provides insufficient space, it bails out with error. This processing is buggy for the first part of the response - while it initially checks for a length of 'data_len', it later copies an additional amount of 'offsetof(struct qeth_snmp_cmd, data)' bytes.
Fix the calculation of 'data_len' for the first part of the response. This also nicely cleans up the memcpy code.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Julian Wiedmann jwi@linux.ibm.com Reviewed-by: Ursula Braun ubraun@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/net/qeth_core_main.c | 27 ++++++++++++--------------- 1 file changed, 12 insertions(+), 15 deletions(-)
--- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -4545,8 +4545,8 @@ static int qeth_snmp_command_cb(struct q { struct qeth_ipa_cmd *cmd; struct qeth_arp_query_info *qinfo; - struct qeth_snmp_cmd *snmp; unsigned char *data; + void *snmp_data; __u16 data_len;
QETH_CARD_TEXT(card, 3, "snpcmdcb"); @@ -4554,7 +4554,6 @@ static int qeth_snmp_command_cb(struct q cmd = (struct qeth_ipa_cmd *) sdata; data = (unsigned char *)((char *)cmd - reply->offset); qinfo = (struct qeth_arp_query_info *) reply->param; - snmp = &cmd->data.setadapterparms.data.snmp;
if (cmd->hdr.return_code) { QETH_CARD_TEXT_(card, 4, "scer1%x", cmd->hdr.return_code); @@ -4567,10 +4566,15 @@ static int qeth_snmp_command_cb(struct q return 0; } data_len = *((__u16 *)QETH_IPA_PDU_LEN_PDU1(data)); - if (cmd->data.setadapterparms.hdr.seq_no == 1) - data_len -= (__u16)((char *)&snmp->data - (char *)cmd); - else - data_len -= (__u16)((char *)&snmp->request - (char *)cmd); + if (cmd->data.setadapterparms.hdr.seq_no == 1) { + snmp_data = &cmd->data.setadapterparms.data.snmp; + data_len -= offsetof(struct qeth_ipa_cmd, + data.setadapterparms.data.snmp); + } else { + snmp_data = &cmd->data.setadapterparms.data.snmp.request; + data_len -= offsetof(struct qeth_ipa_cmd, + data.setadapterparms.data.snmp.request); + }
/* check if there is enough room in userspace */ if ((qinfo->udata_len - qinfo->udata_offset) < data_len) { @@ -4583,16 +4587,9 @@ static int qeth_snmp_command_cb(struct q QETH_CARD_TEXT_(card, 4, "sseqn%i", cmd->data.setadapterparms.hdr.seq_no); /*copy entries to user buffer*/ - if (cmd->data.setadapterparms.hdr.seq_no == 1) { - memcpy(qinfo->udata + qinfo->udata_offset, - (char *)snmp, - data_len + offsetof(struct qeth_snmp_cmd, data)); - qinfo->udata_offset += offsetof(struct qeth_snmp_cmd, data); - } else { - memcpy(qinfo->udata + qinfo->udata_offset, - (char *)&snmp->request, data_len); - } + memcpy(qinfo->udata + qinfo->udata_offset, snmp_data, data_len); qinfo->udata_offset += data_len; + /* check if all replies received ... */ QETH_CARD_TEXT_(card, 4, "srtot%i", cmd->data.setadapterparms.hdr.used_total);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bernd Eckstein 3erndeckstein@gmail.com
[ Upstream commit 45611c61dd503454b2edae00aabe1e429ec49ebe ]
The bug is not easily reproducable, as it may occur very infrequently (we had machines with 20minutes heavy downloading before it occurred) However, on a virual machine (VMWare on Windows 10 host) it occurred pretty frequently (1-2 seconds after a speedtest was started)
dev->tx_skb mab be freed via dev_kfree_skb_irq on a callback before it is set.
This causes the following problems: - double free of the skb or potential memory leak - in dmesg: 'recvmsg bug' and 'recvmsg bug 2' and eventually general protection fault
Example dmesg output: [ 134.841986] ------------[ cut here ]------------ [ 134.841987] recvmsg bug: copied 9C24A555 seq 9C24B557 rcvnxt 9C25A6B3 fl 0 [ 134.841993] WARNING: CPU: 7 PID: 2629 at /build/linux-hwe-On9fm7/linux-hwe-4.15.0/net/ipv4/tcp.c:1865 tcp_recvmsg+0x44d/0xab0 [ 134.841994] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi [ 134.842046] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu [ 134.842046] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 134.842048] RIP: 0010:tcp_recvmsg+0x44d/0xab0 [ 134.842048] RSP: 0018:ffffa6630422bcc8 EFLAGS: 00010286 [ 134.842049] RAX: 0000000000000000 RBX: ffff997616f4f200 RCX: 0000000000000006 [ 134.842049] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9976257d6490 [ 134.842050] RBP: ffffa6630422bd98 R08: 0000000000000001 R09: 000000000004bba4 [ 134.842050] R10: 0000000001e00c6f R11: 000000000004bba4 R12: ffff99760dee3000 [ 134.842051] R13: 0000000000000000 R14: ffff99760dee3514 R15: 0000000000000000 [ 134.842051] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000 [ 134.842052] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 134.842053] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0 [ 134.842055] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 134.842055] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 134.842057] Call Trace: [ 134.842060] ? aa_sk_perm+0x53/0x1a0 [ 134.842064] inet_recvmsg+0x51/0xc0 [ 134.842066] sock_recvmsg+0x43/0x50 [ 134.842070] SYSC_recvfrom+0xe4/0x160 [ 134.842072] ? __schedule+0x3de/0x8b0 [ 134.842075] ? ktime_get_ts64+0x4c/0xf0 [ 134.842079] SyS_recvfrom+0xe/0x10 [ 134.842082] do_syscall_64+0x73/0x130 [ 134.842086] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 134.842086] RIP: 0033:0x7fe331f5a81d [ 134.842088] RSP: 002b:00007ffe8da98398 EFLAGS: 00000246 ORIG_RAX: 000000000000002d [ 134.842090] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007fe331f5a81d [ 134.842094] RDX: 00000000000003fb RSI: 0000000001e00874 RDI: 0000000000000003 [ 134.842095] RBP: 00007fe32f642c70 R08: 0000000000000000 R09: 0000000000000000 [ 134.842097] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe332347698 [ 134.842099] R13: 0000000001b7e0a0 R14: 0000000001e00874 R15: 0000000000000000 [ 134.842103] Code: 24 fd ff ff e9 cc fe ff ff 48 89 d8 41 8b 8c 24 10 05 00 00 44 8b 45 80 48 c7 c7 08 bd 59 8b 48 89 85 68 ff ff ff e8 b3 c4 7d ff <0f> 0b 48 8b 85 68 ff ff ff e9 e9 fe ff ff 41 8b 8c 24 10 05 00 [ 134.842126] ---[ end trace b7138fc08c83147f ]--- [ 134.842144] general protection fault: 0000 [#1] SMP PTI [ 134.842145] Modules linked in: ipheth(OE) kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd vmw_balloon intel_rapl_perf joydev input_leds serio_raw vmw_vsock_vmci_transport vsock shpchp i2c_piix4 mac_hid binfmt_misc vmw_vmci parport_pc ppdev lp parport autofs4 vmw_pvscsi vmxnet3 hid_generic usbhid hid vmwgfx ttm drm_kms_helper syscopyarea sysfillrect mptspi mptscsih sysimgblt ahci psmouse fb_sys_fops pata_acpi mptbase libahci e1000 drm scsi_transport_spi [ 134.842161] CPU: 7 PID: 2629 Comm: python Tainted: G W OE 4.15.0-34-generic #37~16.04.1-Ubuntu [ 134.842162] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017 [ 134.842164] RIP: 0010:tcp_close+0x2c6/0x440 [ 134.842165] RSP: 0018:ffffa6630422bde8 EFLAGS: 00010202 [ 134.842167] RAX: 0000000000000000 RBX: ffff99760dee3000 RCX: 0000000180400034 [ 134.842168] RDX: 5c4afd407207a6c4 RSI: ffffe868495bd300 RDI: ffff997616f4f200 [ 134.842169] RBP: ffffa6630422be08 R08: 0000000016f4d401 R09: 0000000180400034 [ 134.842169] R10: ffffa6630422bd98 R11: 0000000000000000 R12: 000000000000600c [ 134.842170] R13: 0000000000000000 R14: ffff99760dee30c8 R15: ffff9975bd44fe00 [ 134.842171] FS: 00007fe332347700(0000) GS:ffff9976257c0000(0000) knlGS:0000000000000000 [ 134.842173] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 134.842174] CR2: 0000000001e41000 CR3: 000000020e9b4006 CR4: 00000000003606e0 [ 134.842177] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 134.842178] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 134.842179] Call Trace: [ 134.842181] inet_release+0x42/0x70 [ 134.842183] __sock_release+0x42/0xb0 [ 134.842184] sock_close+0x15/0x20 [ 134.842187] __fput+0xea/0x220 [ 134.842189] ____fput+0xe/0x10 [ 134.842191] task_work_run+0x8a/0xb0 [ 134.842193] exit_to_usermode_loop+0xc4/0xd0 [ 134.842195] do_syscall_64+0xf4/0x130 [ 134.842197] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 134.842197] RIP: 0033:0x7fe331f5a560 [ 134.842198] RSP: 002b:00007ffe8da982e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 [ 134.842200] RAX: 0000000000000000 RBX: 00007fe32f642c70 RCX: 00007fe331f5a560 [ 134.842201] RDX: 00000000008f5320 RSI: 0000000001cd4b50 RDI: 0000000000000003 [ 134.842202] RBP: 00007fe32f6500f8 R08: 000000000000003c R09: 00000000009343c0 [ 134.842203] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe32f6500d0 [ 134.842204] R13: 00000000008f5320 R14: 00000000008f5320 R15: 0000000001cd4770 [ 134.842205] Code: c8 00 00 00 45 31 e4 49 39 fe 75 4d eb 50 83 ab d8 00 00 00 01 48 8b 17 48 8b 47 08 48 c7 07 00 00 00 00 48 c7 47 08 00 00 00 00 <48> 89 42 08 48 89 10 0f b6 57 34 8b 47 2c 2b 47 28 83 e2 01 80 [ 134.842226] RIP: tcp_close+0x2c6/0x440 RSP: ffffa6630422bde8 [ 134.842227] ---[ end trace b7138fc08c831480 ]---
The proposed patch eliminates a potential racing condition. Before, usb_submit_urb was called and _after_ that, the skb was attached (dev->tx_skb). So, on a callback it was possible, however unlikely that the skb was freed before it was set. That way (because dev->tx_skb was not set to NULL after it was freed), it could happen that a skb from a earlier transmission was freed a second time (and the skb we should have freed did not get freed at all)
Now we free the skb directly in ipheth_tx(). It is not passed to the callback anymore, eliminating the posibility of a double free of the same skb. Depending on the retval of usb_submit_urb() we use dev_kfree_skb_any() respectively dev_consume_skb_any() to free the skb.
Signed-off-by: Oliver Zweigle Oliver.Zweigle@faro.com Signed-off-by: Bernd Eckstein 3ernd.Eckstein@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/usb/ipheth.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
--- a/drivers/net/usb/ipheth.c +++ b/drivers/net/usb/ipheth.c @@ -140,7 +140,6 @@ struct ipheth_device { struct usb_device *udev; struct usb_interface *intf; struct net_device *net; - struct sk_buff *tx_skb; struct urb *tx_urb; struct urb *rx_urb; unsigned char *tx_buf; @@ -229,6 +228,7 @@ static void ipheth_rcvbulk_callback(stru case -ENOENT: case -ECONNRESET: case -ESHUTDOWN: + case -EPROTO: return; case 0: break; @@ -280,7 +280,6 @@ static void ipheth_sndbulk_callback(stru dev_err(&dev->intf->dev, "%s: urb status: %d\n", __func__, status);
- dev_kfree_skb_irq(dev->tx_skb); netif_wake_queue(dev->net); }
@@ -410,7 +409,7 @@ static int ipheth_tx(struct sk_buff *skb if (skb->len > IPHETH_BUF_SIZE) { WARN(1, "%s: skb too large: %d bytes\n", __func__, skb->len); dev->net->stats.tx_dropped++; - dev_kfree_skb_irq(skb); + dev_kfree_skb_any(skb); return NETDEV_TX_OK; }
@@ -430,12 +429,11 @@ static int ipheth_tx(struct sk_buff *skb dev_err(&dev->intf->dev, "%s: usb_submit_urb: %d\n", __func__, retval); dev->net->stats.tx_errors++; - dev_kfree_skb_irq(skb); + dev_kfree_skb_any(skb); } else { - dev->tx_skb = skb; - dev->net->stats.tx_packets++; dev->net->stats.tx_bytes += skb->len; + dev_consume_skb_any(skb); netif_stop_queue(net); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra peterz@infradead.org
commit ce48c146495a1a50e48cdbfbfaba3e708be7c07c upstream
Tejun reported the following cpu-hotplug lock (percpu-rwsem) read recursion:
tg_set_cfs_bandwidth() get_online_cpus() cpus_read_lock()
cfs_bandwidth_usage_inc() static_key_slow_inc() cpus_read_lock()
Reported-by: Tejun Heo tj@kernel.org Tested-by: Tejun Heo tj@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20180122215328.GP3397@worktop Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/jump_label.h | 7 +++++++ kernel/jump_label.c | 12 +++++++++--- kernel/sched/fair.c | 4 ++-- 3 files changed, 18 insertions(+), 5 deletions(-)
--- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -160,6 +160,8 @@ extern void arch_jump_label_transform_st extern int jump_label_text_reserved(void *start, void *end); extern void static_key_slow_inc(struct static_key *key); extern void static_key_slow_dec(struct static_key *key); +extern void static_key_slow_inc_cpuslocked(struct static_key *key); +extern void static_key_slow_dec_cpuslocked(struct static_key *key); extern void jump_label_apply_nops(struct module *mod); extern int static_key_count(struct static_key *key); extern void static_key_enable(struct static_key *key); @@ -222,6 +224,9 @@ static inline void static_key_slow_dec(s atomic_dec(&key->enabled); }
+#define static_key_slow_inc_cpuslocked(key) static_key_slow_inc(key) +#define static_key_slow_dec_cpuslocked(key) static_key_slow_dec(key) + static inline int jump_label_text_reserved(void *start, void *end) { return 0; @@ -416,6 +421,8 @@ extern bool ____wrong_branch_error(void)
#define static_branch_inc(x) static_key_slow_inc(&(x)->key) #define static_branch_dec(x) static_key_slow_dec(&(x)->key) +#define static_branch_inc_cpuslocked(x) static_key_slow_inc_cpuslocked(&(x)->key) +#define static_branch_dec_cpuslocked(x) static_key_slow_dec_cpuslocked(&(x)->key)
/* * Normal usage; boolean enable/disable. --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -79,7 +79,7 @@ int static_key_count(struct static_key * } EXPORT_SYMBOL_GPL(static_key_count);
-static void static_key_slow_inc_cpuslocked(struct static_key *key) +void static_key_slow_inc_cpuslocked(struct static_key *key) { int v, v1;
@@ -180,7 +180,7 @@ void static_key_disable(struct static_ke } EXPORT_SYMBOL_GPL(static_key_disable);
-static void static_key_slow_dec_cpuslocked(struct static_key *key, +static void __static_key_slow_dec_cpuslocked(struct static_key *key, unsigned long rate_limit, struct delayed_work *work) { @@ -211,7 +211,7 @@ static void __static_key_slow_dec(struct struct delayed_work *work) { cpus_read_lock(); - static_key_slow_dec_cpuslocked(key, rate_limit, work); + __static_key_slow_dec_cpuslocked(key, rate_limit, work); cpus_read_unlock(); }
@@ -229,6 +229,12 @@ void static_key_slow_dec(struct static_k } EXPORT_SYMBOL_GPL(static_key_slow_dec);
+void static_key_slow_dec_cpuslocked(struct static_key *key) +{ + STATIC_KEY_CHECK_USE(); + __static_key_slow_dec_cpuslocked(key, 0, NULL); +} + void static_key_slow_dec_deferred(struct static_key_deferred *key) { STATIC_KEY_CHECK_USE(); --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4040,12 +4040,12 @@ static inline bool cfs_bandwidth_used(vo
void cfs_bandwidth_usage_inc(void) { - static_key_slow_inc(&__cfs_bandwidth_used); + static_key_slow_inc_cpuslocked(&__cfs_bandwidth_used); }
void cfs_bandwidth_usage_dec(void) { - static_key_slow_dec(&__cfs_bandwidth_used); + static_key_slow_dec_cpuslocked(&__cfs_bandwidth_used); } #else /* HAVE_JUMP_LABEL */ static bool cfs_bandwidth_used(void)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit 24809860012e0130fbafe536709e08a22b3e959e upstream
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that the CPUID 8000_0008.EBX[26] will mean that the speculative store bypass disable is no longer needed.
A copy of this document is available at: https://bugzilla.kernel.org/show_bug.cgi?id=199889
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Janakarajan Natarajan Janakarajan.Natarajan@amd.com Cc: kvm@vger.kernel.org Cc: andrew.cooper3@citrix.com Cc: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Link: https://lkml.kernel.org/r/20180601145921.9500-2-konrad.wilk@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/common.c | 3 ++- arch/x86/kvm/cpuid.c | 2 +- 3 files changed, 4 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -285,6 +285,7 @@ #define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ #define X86_FEATURE_AMD_STIBP (13*32+15) /* "" Single Thread Indirect Branch Predictors */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ +#define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -958,7 +958,8 @@ static void __init cpu_set_bug_bits(stru rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
if (!x86_match_cpu(cpu_no_spec_store_bypass) && - !(ia32_cap & ARCH_CAP_SSB_NO)) + !(ia32_cap & ARCH_CAP_SSB_NO) && + !cpu_has(c, X86_FEATURE_AMD_SSB_NO)) setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
if (x86_match_cpu(cpu_no_speculation)) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,7 @@ static inline int __do_cpuid_ent(struct
/* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD); + F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD) | F(AMD_SSB_NO);
/* cpuid 0xC0000001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features =
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit 6ac2f49edb1ef5446089c7c660017732886d62d6 upstream
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that if CPUID 8000_0008.EBX[24] is set we should be using the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f) for speculative store bypass disable.
This in effect means we should clear the X86_FEATURE_VIRT_SSBD flag so that we would prefer the SPEC_CTRL MSR.
See the document titled: 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf
A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199889
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Janakarajan Natarajan Janakarajan.Natarajan@amd.com Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed karahmed@amazon.de Cc: andrew.cooper3@citrix.com Cc: Joerg Roedel joro@8bytes.org Cc: Radim Krčmář rkrcmar@redhat.com Cc: Andy Lutomirski luto@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Borislav Petkov bp@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/bugs.c | 12 +++++++----- arch/x86/kernel/cpu/common.c | 6 ++++++ arch/x86/kvm/cpuid.c | 10 ++++++++-- arch/x86/kvm/svm.c | 8 +++++--- 5 files changed, 27 insertions(+), 10 deletions(-)
--- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -284,6 +284,7 @@ #define X86_FEATURE_AMD_IBPB (13*32+12) /* "" Indirect Branch Prediction Barrier */ #define X86_FEATURE_AMD_IBRS (13*32+14) /* "" Indirect Branch Restricted Speculation */ #define X86_FEATURE_AMD_STIBP (13*32+15) /* "" Single Thread Indirect Branch Predictors */ +#define X86_FEATURE_AMD_SSBD (13*32+24) /* "" Speculative Store Bypass Disable */ #define X86_FEATURE_VIRT_SSBD (13*32+25) /* Virtualized Speculative Store Bypass Disable */ #define X86_FEATURE_AMD_SSB_NO (13*32+26) /* "" Speculative Store Bypass is fixed in hardware. */
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -532,18 +532,20 @@ static enum ssb_mitigation __init __ssb_ if (mode == SPEC_STORE_BYPASS_DISABLE) { setup_force_cpu_cap(X86_FEATURE_SPEC_STORE_BYPASS_DISABLE); /* - * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD uses - * a completely different MSR and bit dependent on family. + * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD may + * use a completely different MSR and bit dependent on family. */ switch (boot_cpu_data.x86_vendor) { case X86_VENDOR_INTEL: + case X86_VENDOR_AMD: + if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) { + x86_amd_ssb_disable(); + break; + } x86_spec_ctrl_base |= SPEC_CTRL_SSBD; x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); break; - case X86_VENDOR_AMD: - x86_amd_ssb_disable(); - break; } }
--- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -760,6 +760,12 @@ static void init_speculation_control(str set_cpu_cap(c, X86_FEATURE_STIBP); set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); } + + if (cpu_has(c, X86_FEATURE_AMD_SSBD)) { + set_cpu_cap(c, X86_FEATURE_SSBD); + set_cpu_cap(c, X86_FEATURE_MSR_SPEC_CTRL); + clear_cpu_cap(c, X86_FEATURE_VIRT_SSBD); + } }
void get_cpu_cap(struct cpuinfo_x86 *c) --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -367,7 +367,8 @@ static inline int __do_cpuid_ent(struct
/* cpuid 0x80000008.ebx */ const u32 kvm_cpuid_8000_0008_ebx_x86_features = - F(AMD_IBPB) | F(AMD_IBRS) | F(VIRT_SSBD) | F(AMD_SSB_NO); + F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) | + F(AMD_SSB_NO);
/* cpuid 0xC0000001.edx */ const u32 kvm_cpuid_C000_0001_edx_x86_features = @@ -649,7 +650,12 @@ static inline int __do_cpuid_ent(struct entry->ebx |= F(VIRT_SSBD); entry->ebx &= kvm_cpuid_8000_0008_ebx_x86_features; cpuid_mask(&entry->ebx, CPUID_8000_0008_EBX); - if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD)) + /* + * The preference is to use SPEC CTRL MSR instead of the + * VIRT_SPEC MSR. + */ + if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) && + !boot_cpu_has(X86_FEATURE_AMD_SSBD)) entry->ebx |= F(VIRT_SSBD); break; } --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -3644,7 +3644,8 @@ static int svm_get_msr(struct kvm_vcpu * break; case MSR_IA32_SPEC_CTRL: if (!msr_info->host_initiated && - !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS)) + !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) && + !guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD)) return 1;
msr_info->data = svm->spec_ctrl; @@ -3749,11 +3750,12 @@ static int svm_set_msr(struct kvm_vcpu * break; case MSR_IA32_SPEC_CTRL: if (!msr->host_initiated && - !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS)) + !guest_cpuid_has(vcpu, X86_FEATURE_AMD_IBRS) && + !guest_cpuid_has(vcpu, X86_FEATURE_AMD_SSBD)) return 1;
/* The STIBP bit doesn't fault even if it's not advertised */ - if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP)) + if (data & ~(SPEC_CTRL_IBRS | SPEC_CTRL_STIBP | SPEC_CTRL_SSBD)) return 1;
svm->spec_ctrl = data;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Konrad Rzeszutek Wilk konrad.wilk@oracle.com
commit 108fab4b5c8f12064ef86e02cb0459992affb30f upstream
Both AMD and Intel can have SPEC_CTRL_MSR for SSBD.
However AMD also has two more other ways of doing it - which are !SPEC_CTRL MSR ways.
Signed-off-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Kees Cook keescook@chromium.org Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed karahmed@amazon.de Cc: andrew.cooper3@citrix.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: Borislav Petkov bp@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Link: https://lkml.kernel.org/r/20180601145921.9500-4-konrad.wilk@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -535,17 +535,12 @@ static enum ssb_mitigation __init __ssb_ * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD may * use a completely different MSR and bit dependent on family. */ - switch (boot_cpu_data.x86_vendor) { - case X86_VENDOR_INTEL: - case X86_VENDOR_AMD: - if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) { - x86_amd_ssb_disable(); - break; - } + if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + x86_amd_ssb_disable(); + else { x86_spec_ctrl_base |= SPEC_CTRL_SSBD; x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); - break; } }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
commit 845d382bb15c6e7dc5026c0ff919c5b13fc7e11b upstream
If either the X86_FEATURE_AMD_SSBD or X86_FEATURE_VIRT_SSBD features are present, then there is no need to perform the check for the LS_CFG SSBD mitigation support.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Cc: Borislav Petkov bpetkov@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Link: http://lkml.kernel.org/r/20180702213553.29202.21089.stgit@tlendack-t1.amdoff... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/amd.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -554,7 +554,9 @@ static void bsp_init_amd(struct cpuinfo_ nodes_per_socket = ((value >> 3) & 7) + 1; }
- if (c->x86 >= 0x15 && c->x86 <= 0x17) { + if (!boot_cpu_has(X86_FEATURE_AMD_SSBD) && + !boot_cpu_has(X86_FEATURE_VIRT_SSBD) && + c->x86 >= 0x15 && c->x86 <= 0x17) { unsigned int bit;
switch (c->x86) {
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Lendacky thomas.lendacky@amd.com
commit 612bc3b3d4be749f73a513a17d9b3ee1330d3487 upstream
On AMD, the presence of the MSR_SPEC_CTRL feature does not imply that the SSBD mitigation support should use the SPEC_CTRL MSR. Other features could have caused the MSR_SPEC_CTRL feature to be set, while a different SSBD mitigation option is in place.
Update the SSBD support to check for the actual SSBD features that will use the SPEC_CTRL MSR.
Signed-off-by: Tom Lendacky thomas.lendacky@amd.com Cc: Borislav Petkov bpetkov@suse.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Fixes: 6ac2f49edb1e ("x86/bugs: Add AMD's SPEC_CTRL MSR usage") Link: http://lkml.kernel.org/r/20180702213602.29202.33151.stgit@tlendack-t1.amdoff... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -166,7 +166,8 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, guestval |= guest_spec_ctrl & x86_spec_ctrl_mask;
/* SSBD controlled in MSR_SPEC_CTRL */ - if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD)) + if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || + static_cpu_has(X86_FEATURE_AMD_SSBD)) hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
if (hostval != guestval) { @@ -535,9 +536,10 @@ static enum ssb_mitigation __init __ssb_ * Intel uses the SPEC CTRL MSR Bit(2) for this, while AMD may * use a completely different MSR and bit dependent on family. */ - if (!static_cpu_has(X86_FEATURE_MSR_SPEC_CTRL)) + if (!static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) && + !static_cpu_has(X86_FEATURE_AMD_SSBD)) { x86_amd_ssb_disable(); - else { + } else { x86_spec_ctrl_base |= SPEC_CTRL_SSBD; x86_spec_ctrl_mask |= SPEC_CTRL_SSBD; wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Kosina jkosina@suse.cz
commit 53c613fe6349994f023245519265999eed75957f upstream
STIBP is a feature provided by certain Intel ucodes / CPUs. This feature (once enabled) prevents cross-hyperthread control of decisions made by indirect branch predictors.
Enable this feature if
- the CPU is vulnerable to spectre v2 - the CPU supports SMT and has SMT siblings online - spectre_v2 mitigation autoselection is enabled (default)
After some previous discussion, this leaves STIBP on all the time, as wrmsr on crossing kernel boundary is a no-no. This could perhaps later be a bit more optimized (like disabling it in NOHZ, experiment with disabling it in idle, etc) if needed.
Note that the synchronization of the mask manipulation via newly added spec_ctrl_mutex is currently not strictly needed, as the only updater is already being serialized by cpu_add_remove_lock, but let's make this a little bit more future-proof.
Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "WoodhouseDavid" dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: "SchauflerCasey" casey.schaufler@intel.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 57 ++++++++++++++++++++++++++++++++++++++++----- kernel/cpu.c | 11 +++++++- 2 files changed, 61 insertions(+), 7 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -34,12 +34,10 @@ static void __init spectre_v2_select_mit static void __init ssb_select_mitigation(void); static void __init l1tf_select_mitigation(void);
-/* - * Our boot-time value of the SPEC_CTRL MSR. We read it once so that any - * writes to SPEC_CTRL contain whatever reserved bits have been set. - */ -u64 __ro_after_init x86_spec_ctrl_base; +/* The base value of the SPEC_CTRL MSR that always has to be preserved. */ +u64 x86_spec_ctrl_base; EXPORT_SYMBOL_GPL(x86_spec_ctrl_base); +static DEFINE_MUTEX(spec_ctrl_mutex);
/* * The vendor and possibly platform specific bits which can be modified in @@ -324,6 +322,46 @@ static enum spectre_v2_mitigation_cmd __ return cmd; }
+static bool stibp_needed(void) +{ + if (spectre_v2_enabled == SPECTRE_V2_NONE) + return false; + + if (!boot_cpu_has(X86_FEATURE_STIBP)) + return false; + + return true; +} + +static void update_stibp_msr(void *info) +{ + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); +} + +void arch_smt_update(void) +{ + u64 mask; + + if (!stibp_needed()) + return; + + mutex_lock(&spec_ctrl_mutex); + mask = x86_spec_ctrl_base; + if (cpu_smt_control == CPU_SMT_ENABLED) + mask |= SPEC_CTRL_STIBP; + else + mask &= ~SPEC_CTRL_STIBP; + + if (mask != x86_spec_ctrl_base) { + pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", + cpu_smt_control == CPU_SMT_ENABLED ? + "Enabling" : "Disabling"); + x86_spec_ctrl_base = mask; + on_each_cpu(update_stibp_msr, NULL, 1); + } + mutex_unlock(&spec_ctrl_mutex); +} + static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -423,6 +461,9 @@ specv2_set_mode: setup_force_cpu_cap(X86_FEATURE_USE_IBRS_FW); pr_info("Enabling Restricted Speculation for firmware calls\n"); } + + /* Enable STIBP if appropriate */ + arch_smt_update(); }
#undef pr_fmt @@ -813,6 +854,8 @@ static ssize_t l1tf_show_state(char *buf static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { + int ret; + if (!boot_cpu_has_bug(bug)) return sprintf(buf, "Not affected\n");
@@ -827,10 +870,12 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: __user pointer sanitization\n");
case X86_BUG_SPECTRE_V2: - return sprintf(buf, "%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + ret = sprintf(buf, "%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", + (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", spectre_v2_module_string()); + return ret;
case X86_BUG_SPEC_STORE_BYPASS: return sprintf(buf, "%s\n", ssb_strings[ssb_mode]); --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2045,6 +2045,12 @@ static void cpuhp_online_cpu_device(unsi kobject_uevent(&dev->kobj, KOBJ_ONLINE); }
+/* + * Architectures that need SMT-specific errata handling during SMT hotplug + * should override this. + */ +void __weak arch_smt_update(void) { }; + static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0; @@ -2071,8 +2077,10 @@ static int cpuhp_smt_disable(enum cpuhp_ */ cpuhp_offline_cpu_device(cpu); } - if (!ret) + if (!ret) { cpu_smt_control = ctrlval; + arch_smt_update(); + } cpu_maps_update_done(); return ret; } @@ -2083,6 +2091,7 @@ static int cpuhp_smt_enable(void)
cpu_maps_update_begin(); cpu_smt_control = CPU_SMT_ENABLED; + arch_smt_update(); for_each_present_cpu(cpu) { /* Skip online CPUs and CPUs on offline nodes */ if (cpu_online(cpu) || !node_online(cpu_to_node(cpu)))
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Kosina jkosina@suse.cz
commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream
Currently, IBPB is only issued in cases when switching into a non-dumpable process, the rationale being to protect such 'important and security sensitive' processess (such as GPG) from data leaking into a different userspace process via spectre v2.
This is however completely insufficient to provide proper userspace-to-userpace spectrev2 protection, as any process can poison branch buffers before being scheduled out, and the newly scheduled process immediately becomes spectrev2 victim.
In order to minimize the performance impact (for usecases that do require spectrev2 protection), issue the barrier only in cases when switching between processess where the victim can't be ptraced by the potential attacker (as in such cases, the attacker doesn't have to bother with branch buffers at all).
[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably fine-grained ]
Fixes: 18bf3c3ea8 ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch") Originally-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "WoodhouseDavid" dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: "SchauflerCasey" casey.schaufler@intel.com Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/tlb.c | 31 ++++++++++++++++++++----------- include/linux/ptrace.h | 21 +++++++++++++++++++-- kernel/ptrace.c | 10 ++++++++++ 3 files changed, 49 insertions(+), 13 deletions(-)
--- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -7,6 +7,7 @@ #include <linux/export.h> #include <linux/cpu.h> #include <linux/debugfs.h> +#include <linux/ptrace.h>
#include <asm/tlbflush.h> #include <asm/mmu_context.h> @@ -180,6 +181,19 @@ static void sync_current_stack_to_mm(str } }
+static bool ibpb_needed(struct task_struct *tsk, u64 last_ctx_id) +{ + /* + * Check if the current (previous) task has access to the memory + * of the @tsk (next) task. If access is denied, make sure to + * issue a IBPB to stop user->user Spectre-v2 attacks. + * + * Note: __ptrace_may_access() returns 0 or -ERRNO. + */ + return (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id && + ptrace_may_access_sched(tsk, PTRACE_MODE_SPEC_IBPB)); +} + void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk) { @@ -256,18 +270,13 @@ void switch_mm_irqs_off(struct mm_struct * one process from doing Spectre-v2 attacks on another. * * As an optimization, flush indirect branches only when - * switching into processes that disable dumping. This - * protects high value processes like gpg, without having - * too high performance overhead. IBPB is *expensive*! - * - * This will not flush branches when switching into kernel - * threads. It will also not flush if we switch to idle - * thread and back to the same process. It will flush if we - * switch to a different non-dumpable process. + * switching into a processes that can't be ptrace by the + * current one (as in such case, attacker has much more + * convenient way how to tamper with the next process than + * branch buffer poisoning). */ - if (tsk && tsk->mm && - tsk->mm->context.ctx_id != last_ctx_id && - get_dumpable(tsk->mm) != SUID_DUMP_USER) + if (static_cpu_has(X86_FEATURE_USE_IBPB) && + ibpb_needed(tsk, last_ctx_id)) indirect_branch_prediction_barrier();
if (IS_ENABLED(CONFIG_VMAP_STACK)) { --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -62,14 +62,17 @@ extern void exit_ptrace(struct task_stru #define PTRACE_MODE_READ 0x01 #define PTRACE_MODE_ATTACH 0x02 #define PTRACE_MODE_NOAUDIT 0x04 -#define PTRACE_MODE_FSCREDS 0x08 -#define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_FSCREDS 0x08 +#define PTRACE_MODE_REALCREDS 0x10 +#define PTRACE_MODE_SCHED 0x20 +#define PTRACE_MODE_IBPB 0x40
/* shorthands for READ/ATTACH and FSCREDS/REALCREDS combinations */ #define PTRACE_MODE_READ_FSCREDS (PTRACE_MODE_READ | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_READ_REALCREDS (PTRACE_MODE_READ | PTRACE_MODE_REALCREDS) #define PTRACE_MODE_ATTACH_FSCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS) +#define PTRACE_MODE_SPEC_IBPB (PTRACE_MODE_ATTACH_REALCREDS | PTRACE_MODE_IBPB)
/** * ptrace_may_access - check whether the caller is permitted to access @@ -87,6 +90,20 @@ extern void exit_ptrace(struct task_stru */ extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
+/** + * ptrace_may_access - check whether the caller is permitted to access + * a target task. + * @task: target task + * @mode: selects type of access and caller credentials + * + * Returns true on success, false on denial. + * + * Similar to ptrace_may_access(). Only to be called from context switch + * code. Does not call into audit and the regular LSM hooks due to locking + * constraints. + */ +extern bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode); + static inline int ptrace_reparented(struct task_struct *child) { return !same_thread_group(child->real_parent, child->parent); --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -261,6 +261,9 @@ static int ptrace_check_attach(struct ta
static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode) { + if (mode & PTRACE_MODE_SCHED) + return false; + if (mode & PTRACE_MODE_NOAUDIT) return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE); else @@ -328,9 +331,16 @@ ok: !ptrace_has_cap(mm->user_ns, mode))) return -EPERM;
+ if (mode & PTRACE_MODE_SCHED) + return 0; return security_ptrace_access_check(task, mode); }
+bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode) +{ + return __ptrace_may_access(task, mode | PTRACE_MODE_SCHED); +} + bool ptrace_may_access(struct task_struct *task, unsigned int mode) { int err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Kosina jkosina@suse.cz
commit bb4b3b7762735cdaba5a40fd94c9303d9ffa147a upstream
If spectrev2 mitigation has been enabled, RSB is filled on context switch in order to protect from various classes of spectrev2 attacks.
If this mitigation is enabled, say so in sysfs for spectrev2.
Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: "WoodhouseDavid" dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Tim Chen tim.c.chen@linux.intel.com Cc: "SchauflerCasey" casey.schaufler@intel.com Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -870,10 +870,11 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: __user pointer sanitization\n");
case X86_BUG_SPECTRE_V2: - ret = sprintf(buf, "%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + ret = sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", + boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", spectre_v2_module_string()); return ret;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhenzhong Duan zhenzhong.duan@oracle.com
commit 0cbb76d6285794f30953bfa3ab831714b59dd700 upstream
..so that they match their asm counterpart.
Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it.
Signed-off-by: Zhenzhong Duan zhenzhong.duan@oracle.com Signed-off-by: Borislav Petkov bp@suse.de Cc: Daniel Borkmann daniel@iogearbox.net Cc: David Woodhouse dwmw@amazon.co.uk Cc: H. Peter Anvin hpa@zytor.com Cc: Ingo Molnar mingo@redhat.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Wang YanQing udknight@gmail.com Cc: dhaval.giani@oracle.com Cc: srinivas.eeda@oracle.com Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@default Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -170,11 +170,15 @@ */ # define CALL_NOSPEC \ ANNOTATE_NOSPEC_ALTERNATIVE \ - ALTERNATIVE( \ + ALTERNATIVE_2( \ ANNOTATE_RETPOLINE_SAFE \ "call *%[thunk_target]\n", \ "call __x86_indirect_thunk_%V[thunk_target]\n", \ - X86_FEATURE_RETPOLINE) + X86_FEATURE_RETPOLINE, \ + "lfence;\n" \ + ANNOTATE_RETPOLINE_SAFE \ + "call *%[thunk_target]\n", \ + X86_FEATURE_RETPOLINE_AMD) # define THUNK_TARGET(addr) [thunk_target] "r" (addr)
#elif defined(CONFIG_X86_32) && defined(CONFIG_RETPOLINE) @@ -184,7 +188,8 @@ * here, anyway. */ # define CALL_NOSPEC \ - ALTERNATIVE( \ + ANNOTATE_NOSPEC_ALTERNATIVE \ + ALTERNATIVE_2( \ ANNOTATE_RETPOLINE_SAFE \ "call *%[thunk_target]\n", \ " jmp 904f;\n" \ @@ -199,7 +204,11 @@ " ret;\n" \ " .align 16\n" \ "904: call 901b;\n", \ - X86_FEATURE_RETPOLINE) + X86_FEATURE_RETPOLINE, \ + "lfence;\n" \ + ANNOTATE_RETPOLINE_SAFE \ + "call *%[thunk_target]\n", \ + X86_FEATURE_RETPOLINE_AMD)
# define THUNK_TARGET(addr) [thunk_target] "rm" (addr) #else /* No retpoline for C / inline asm */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhenzhong Duan zhenzhong.duan@oracle.com
commit 4cd24de3a0980bf3100c9dcb08ef65ca7c31af48 upstream
Since retpoline capable compilers are widely available, make CONFIG_RETPOLINE hard depend on the compiler capability.
Break the build when CONFIG_RETPOLINE is enabled and the compiler does not support it. Emit an error message in that case:
"arch/x86/Makefile:226: *** You are building kernel with non-retpoline compiler, please update your compiler.. Stop."
[dwmw: Fail the build with non-retpoline compiler]
Suggested-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Zhenzhong Duan zhenzhong.duan@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Borislav Petkov bp@suse.de Cc: Daniel Borkmann daniel@iogearbox.net Cc: H. Peter Anvin hpa@zytor.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: Andy Lutomirski luto@kernel.org Cc: Masahiro Yamada yamada.masahiro@socionext.com Cc: Michal Marek michal.lkml@markovi.net Cc: srinivas.eeda@oracle.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@default Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig | 4 ---- arch/x86/Makefile | 5 +++-- arch/x86/include/asm/nospec-branch.h | 10 ++++++---- arch/x86/kernel/cpu/bugs.c | 2 +- scripts/Makefile.build | 2 -- 5 files changed, 10 insertions(+), 13 deletions(-)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -440,10 +440,6 @@ config RETPOLINE branches. Requires a compiler with -mindirect-branch=thunk-extern support for full protection. The kernel may run slower.
- Without compiler support, at least indirect branches in assembler - code are eliminated. Since this includes the syscall entry path, - it is not entirely pointless. - config INTEL_RDT bool "Intel Resource Director Technology support" default n --- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -241,9 +241,10 @@ KBUILD_CFLAGS += -fno-asynchronous-unwin
# Avoid indirect branches in kernel to deal with Spectre ifdef CONFIG_RETPOLINE -ifneq ($(RETPOLINE_CFLAGS),) - KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) -DRETPOLINE +ifeq ($(RETPOLINE_CFLAGS),) + $(error You are building kernel with non-retpoline compiler, please update your compiler.) endif + KBUILD_CFLAGS += $(RETPOLINE_CFLAGS) endif
archscripts: scripts_basic --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -162,11 +162,12 @@ _ASM_PTR " 999b\n\t" \ ".popsection\n\t"
-#if defined(CONFIG_X86_64) && defined(RETPOLINE) +#ifdef CONFIG_RETPOLINE +#ifdef CONFIG_X86_64
/* - * Since the inline asm uses the %V modifier which is only in newer GCC, - * the 64-bit one is dependent on RETPOLINE not CONFIG_RETPOLINE. + * Inline asm uses the %V modifier which is only in newer GCC + * which is ensured when CONFIG_RETPOLINE is defined. */ # define CALL_NOSPEC \ ANNOTATE_NOSPEC_ALTERNATIVE \ @@ -181,7 +182,7 @@ X86_FEATURE_RETPOLINE_AMD) # define THUNK_TARGET(addr) [thunk_target] "r" (addr)
-#elif defined(CONFIG_X86_32) && defined(CONFIG_RETPOLINE) +#else /* CONFIG_X86_32 */ /* * For i386 we use the original ret-equivalent retpoline, because * otherwise we'll run out of registers. We don't care about CET @@ -211,6 +212,7 @@ X86_FEATURE_RETPOLINE_AMD)
# define THUNK_TARGET(addr) [thunk_target] "rm" (addr) +#endif #else /* No retpoline for C / inline asm */ # define CALL_NOSPEC "call *%[thunk_target]\n" # define THUNK_TARGET(addr) [thunk_target] "rm" (addr) --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -251,7 +251,7 @@ static void __init spec2_print_if_secure
static inline bool retp_compiler(void) { - return __is_defined(RETPOLINE); + return __is_defined(CONFIG_RETPOLINE); }
static inline bool match_option(const char *arg, int arglen, const char *opt) --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -272,10 +272,8 @@ else objtool_args += $(call cc-ifversion, -lt, 0405, --no-unreachable) endif ifdef CONFIG_RETPOLINE -ifneq ($(RETPOLINE_CFLAGS),) objtool_args += --retpoline endif -endif
ifdef CONFIG_MODVERSIONS
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhenzhong Duan zhenzhong.duan@oracle.com
commit ef014aae8f1cd2793e4e014bbb102bed53f852b7 upstream
Now that CONFIG_RETPOLINE hard depends on compiler support, there is no reason to keep the minimal retpoline support around which only provided basic protection in the assembly files.
Suggested-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Zhenzhong Duan zhenzhong.duan@oracle.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: David Woodhouse dwmw@amazon.co.uk Cc: Borislav Petkov bp@suse.de Cc: H. Peter Anvin hpa@zytor.com Cc: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: srinivas.eeda@oracle.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@default Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 3 --- arch/x86/kernel/cpu/bugs.c | 13 ++----------- 2 files changed, 2 insertions(+), 14 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -221,11 +221,8 @@ /* The Spectre V2 mitigation variants */ enum spectre_v2_mitigation { SPECTRE_V2_NONE, - SPECTRE_V2_RETPOLINE_MINIMAL, - SPECTRE_V2_RETPOLINE_MINIMAL_AMD, SPECTRE_V2_RETPOLINE_GENERIC, SPECTRE_V2_RETPOLINE_AMD, - SPECTRE_V2_IBRS, SPECTRE_V2_IBRS_ENHANCED, };
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -134,8 +134,6 @@ enum spectre_v2_mitigation_cmd {
static const char *spectre_v2_strings[] = { [SPECTRE_V2_NONE] = "Vulnerable", - [SPECTRE_V2_RETPOLINE_MINIMAL] = "Vulnerable: Minimal generic ASM retpoline", - [SPECTRE_V2_RETPOLINE_MINIMAL_AMD] = "Vulnerable: Minimal AMD ASM retpoline", [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS", @@ -249,11 +247,6 @@ static void __init spec2_print_if_secure pr_info("%s selected on command line.\n", reason); }
-static inline bool retp_compiler(void) -{ - return __is_defined(CONFIG_RETPOLINE); -} - static inline bool match_option(const char *arg, int arglen, const char *opt) { int len = strlen(opt); @@ -414,14 +407,12 @@ retpoline_auto: pr_err("Spectre mitigation: LFENCE not serializing, switching to generic retpoline\n"); goto retpoline_generic; } - mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_AMD : - SPECTRE_V2_RETPOLINE_MINIMAL_AMD; + mode = SPECTRE_V2_RETPOLINE_AMD; setup_force_cpu_cap(X86_FEATURE_RETPOLINE_AMD); setup_force_cpu_cap(X86_FEATURE_RETPOLINE); } else { retpoline_generic: - mode = retp_compiler() ? SPECTRE_V2_RETPOLINE_GENERIC : - SPECTRE_V2_RETPOLINE_MINIMAL; + mode = SPECTRE_V2_RETPOLINE_GENERIC; setup_force_cpu_cap(X86_FEATURE_RETPOLINE); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit 8eb729b77faf83ac4c1f363a9ad68d042415f24c upstream
"Reduced Data Speculation" is an obsolete term. The correct new name is "Speculative store bypass disable" - which is abbreviated into SSBD.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185003.593893901@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/thread_info.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -81,7 +81,7 @@ struct thread_info { #define TIF_SIGPENDING 2 /* signal pending */ #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_SINGLESTEP 4 /* reenable singlestep on user return*/ -#define TIF_SSBD 5 /* Reduced data speculation */ +#define TIF_SSBD 5 /* Speculative store bypass disable */ #define TIF_SYSCALL_EMU 6 /* syscall emulation active */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit 24848509aa55eac39d524b587b051f4e86df3c12 upstream
Remove the unnecessary 'else' statement in spectre_v2_parse_cmdline() to save an indentation level.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185003.688010903@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -275,22 +275,21 @@ static enum spectre_v2_mitigation_cmd __
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; - else { - ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); - if (ret < 0) - return SPECTRE_V2_CMD_AUTO;
- for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) { - if (!match_option(arg, ret, mitigation_options[i].option)) - continue; - cmd = mitigation_options[i].cmd; - break; - } + ret = cmdline_find_option(boot_command_line, "spectre_v2", arg, sizeof(arg)); + if (ret < 0) + return SPECTRE_V2_CMD_AUTO;
- if (i >= ARRAY_SIZE(mitigation_options)) { - pr_err("unknown option (%s). Switching to AUTO select\n", arg); - return SPECTRE_V2_CMD_AUTO; - } + for (i = 0; i < ARRAY_SIZE(mitigation_options); i++) { + if (!match_option(arg, ret, mitigation_options[i].option)) + continue; + cmd = mitigation_options[i].cmd; + break; + } + + if (i >= ARRAY_SIZE(mitigation_options)) { + pr_err("unknown option (%s). Switching to AUTO select\n", arg); + return SPECTRE_V2_CMD_AUTO; }
if ((cmd == SPECTRE_V2_CMD_RETPOLINE ||
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit b86bda0426853bfe8a3506c7d2a5b332760ae46b upstream
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185003.783903657@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -844,8 +844,6 @@ static ssize_t l1tf_show_state(char *buf static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { - int ret; - if (!boot_cpu_has_bug(bug)) return sprintf(buf, "Not affected\n");
@@ -860,13 +858,12 @@ static ssize_t cpu_show_common(struct de return sprintf(buf, "Mitigation: __user pointer sanitization\n");
case X86_BUG_SPECTRE_V2: - ret = sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], + return sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", spectre_v2_module_string()); - return ret;
case X86_BUG_SPEC_STORE_BYPASS: return sprintf(buf, "%s\n", ssb_strings[ssb_mode]);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit a8f76ae41cd633ac00be1b3019b1eb4741be3828 upstream
The Spectre V2 printout in cpu_show_common() handles conditionals for the various mitigation methods directly in the sprintf() argument list. That's hard to read and will become unreadable if more complex decisions need to be made for a particular method.
Move the conditionals for STIBP and IBPB string selection into helper functions, so they can be extended later on.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185003.874479208@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -841,6 +841,22 @@ static ssize_t l1tf_show_state(char *buf } #endif
+static char *stibp_state(void) +{ + if (x86_spec_ctrl_base & SPEC_CTRL_STIBP) + return ", STIBP"; + else + return ""; +} + +static char *ibpb_state(void) +{ + if (boot_cpu_has(X86_FEATURE_USE_IBPB)) + return ", IBPB"; + else + return ""; +} + static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, char *buf, unsigned int bug) { @@ -859,9 +875,9 @@ static ssize_t cpu_show_common(struct de
case X86_BUG_SPECTRE_V2: return sprintf(buf, "%s%s%s%s%s%s\n", spectre_v2_strings[spectre_v2_enabled], - boot_cpu_has(X86_FEATURE_USE_IBPB) ? ", IBPB" : "", + ibpb_state(), boot_cpu_has(X86_FEATURE_USE_IBRS_FW) ? ", IBRS_FW" : "", - (x86_spec_ctrl_base & SPEC_CTRL_STIBP) ? ", STIBP" : "", + stibp_state(), boot_cpu_has(X86_FEATURE_RSB_CTXSW) ? ", RSB filling" : "", spectre_v2_module_string());
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit 34bce7c9690b1d897686aac89604ba7adc365556 upstream
If enhanced IBRS is active, STIBP is redundant for mitigating Spectre v2 user space exploits from hyperthread sibling.
Disable STIBP when enhanced IBRS is used.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185003.966801480@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -319,6 +319,10 @@ static bool stibp_needed(void) if (spectre_v2_enabled == SPECTRE_V2_NONE) return false;
+ /* Enhanced IBRS makes using STIBP unnecessary. */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return false; + if (!boot_cpu_has(X86_FEATURE_STIBP)) return false;
@@ -843,6 +847,9 @@ static ssize_t l1tf_show_state(char *buf
static char *stibp_state(void) { + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return ""; + if (x86_spec_ctrl_base & SPEC_CTRL_STIBP) return ", STIBP"; else
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 26c4d75b234040c11728a8acb796b3a85ba7507c upstream
During context switch, the SSBD bit in SPEC_CTRL MSR is updated according to changes of the TIF_SSBD flag in the current and next running task.
Currently, only the bit controlling speculative store bypass disable in SPEC_CTRL MSR is updated and the related update functions all have "speculative_store" or "ssb" in their names.
For enhanced mitigation control other bits in SPEC_CTRL MSR need to be updated as well, which makes the SSB names inadequate.
Rename the "speculative_store*" functions to a more generic name. No functional change.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.058866968@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/spec-ctrl.h | 6 +++--- arch/x86/kernel/cpu/bugs.c | 4 ++-- arch/x86/kernel/process.c | 12 ++++++------ 3 files changed, 11 insertions(+), 11 deletions(-)
--- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -70,11 +70,11 @@ extern void speculative_store_bypass_ht_ static inline void speculative_store_bypass_ht_init(void) { } #endif
-extern void speculative_store_bypass_update(unsigned long tif); +extern void speculation_ctrl_update(unsigned long tif);
-static inline void speculative_store_bypass_update_current(void) +static inline void speculation_ctrl_update_current(void) { - speculative_store_bypass_update(current_thread_info()->flags); + speculation_ctrl_update(current_thread_info()->flags); }
#endif --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -199,7 +199,7 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, tif = setguest ? ssbd_spec_ctrl_to_tif(guestval) : ssbd_spec_ctrl_to_tif(hostval);
- speculative_store_bypass_update(tif); + speculation_ctrl_update(tif); } } EXPORT_SYMBOL_GPL(x86_virt_spec_ctrl); @@ -629,7 +629,7 @@ static int ssb_prctl_set(struct task_str * mitigation until it is next scheduled. */ if (task == current && update) - speculative_store_bypass_update_current(); + speculation_ctrl_update_current();
return 0; } --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -398,27 +398,27 @@ static __always_inline void amd_set_ssb_ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn)); }
-static __always_inline void intel_set_ssb_state(unsigned long tifn) +static __always_inline void spec_ctrl_update_msr(unsigned long tifn) { u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn);
wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
-static __always_inline void __speculative_store_bypass_update(unsigned long tifn) +static __always_inline void __speculation_ctrl_update(unsigned long tifn) { if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) amd_set_ssb_virt_state(tifn); else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) amd_set_core_ssb_state(tifn); else - intel_set_ssb_state(tifn); + spec_ctrl_update_msr(tifn); }
-void speculative_store_bypass_update(unsigned long tif) +void speculation_ctrl_update(unsigned long tif) { preempt_disable(); - __speculative_store_bypass_update(tif); + __speculation_ctrl_update(tif); preempt_enable(); }
@@ -455,7 +455,7 @@ void __switch_to_xtra(struct task_struct set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
if ((tifp ^ tifn) & _TIF_SSBD) - __speculative_store_bypass_update(tifn); + __speculation_ctrl_update(tifn); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit 01daf56875ee0cd50ed496a09b20eb369b45dfa5 upstream
The logic to detect whether there's a change in the previous and next task's flag relevant to update speculation control MSRs is spread out across multiple functions.
Consolidate all checks needed for updating speculation control MSRs into the new __speculation_ctrl_update() helper function.
This makes it easy to pick the right speculation control MSR and the bits in MSR_IA32_SPEC_CTRL that need updating based on TIF flags changes.
Originally-by: Thomas Lendacky Thomas.Lendacky@amd.com Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.151077005@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/process.c | 42 +++++++++++++++++++++++++++--------------- 1 file changed, 27 insertions(+), 15 deletions(-)
--- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -398,27 +398,40 @@ static __always_inline void amd_set_ssb_ wrmsrl(MSR_AMD64_VIRT_SPEC_CTRL, ssbd_tif_to_spec_ctrl(tifn)); }
-static __always_inline void spec_ctrl_update_msr(unsigned long tifn) +/* + * Update the MSRs managing speculation control, during context switch. + * + * tifp: Previous task's thread flags + * tifn: Next task's thread flags + */ +static __always_inline void __speculation_ctrl_update(unsigned long tifp, + unsigned long tifn) { - u64 msr = x86_spec_ctrl_base | ssbd_tif_to_spec_ctrl(tifn); + u64 msr = x86_spec_ctrl_base; + bool updmsr = false;
- wrmsrl(MSR_IA32_SPEC_CTRL, msr); -} + /* If TIF_SSBD is different, select the proper mitigation method */ + if ((tifp ^ tifn) & _TIF_SSBD) { + if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) { + amd_set_ssb_virt_state(tifn); + } else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) { + amd_set_core_ssb_state(tifn); + } else if (static_cpu_has(X86_FEATURE_SPEC_CTRL_SSBD) || + static_cpu_has(X86_FEATURE_AMD_SSBD)) { + msr |= ssbd_tif_to_spec_ctrl(tifn); + updmsr = true; + } + }
-static __always_inline void __speculation_ctrl_update(unsigned long tifn) -{ - if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) - amd_set_ssb_virt_state(tifn); - else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) - amd_set_core_ssb_state(tifn); - else - spec_ctrl_update_msr(tifn); + if (updmsr) + wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
void speculation_ctrl_update(unsigned long tif) { + /* Forced update. Make sure all relevant TIF flags are different */ preempt_disable(); - __speculation_ctrl_update(tif); + __speculation_ctrl_update(~tif, tif); preempt_enable(); }
@@ -454,8 +467,7 @@ void __switch_to_xtra(struct task_struct if ((tifp ^ tifn) & _TIF_NOCPUID) set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
- if ((tifp ^ tifn) & _TIF_SSBD) - __speculation_ctrl_update(tifn); + __speculation_ctrl_update(tifp, tifn); }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra (Intel) peterz@infradead.org
commit c5511d03ec090980732e929c318a7a6374b5550e upstream
Currently the 'sched_smt_present' static key is enabled when at CPU bringup SMT topology is observed, but it is never disabled. However there is demand to also disable the key when the topology changes such that there is no SMT present anymore.
Implement this by making the key count the number of cores that have SMT enabled.
In particular, the SMT topology bits are set before interrrupts are enabled and similarly, are cleared after interrupts are disabled for the last time and the CPU dies.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/core.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5617,15 +5617,10 @@ int sched_cpu_activate(unsigned int cpu)
#ifdef CONFIG_SCHED_SMT /* - * The sched_smt_present static key needs to be evaluated on every - * hotplug event because at boot time SMT might be disabled when - * the number of booted CPUs is limited. - * - * If then later a sibling gets hotplugged, then the key would stay - * off and SMT scheduling would never be functional. + * When going up, increment the number of cores with SMT present. */ - if (cpumask_weight(cpu_smt_mask(cpu)) > 1) - static_branch_enable_cpuslocked(&sched_smt_present); + if (cpumask_weight(cpu_smt_mask(cpu)) == 2) + static_branch_inc_cpuslocked(&sched_smt_present); #endif set_cpu_active(cpu, true);
@@ -5669,6 +5664,14 @@ int sched_cpu_deactivate(unsigned int cp */ synchronize_rcu_mult(call_rcu, call_rcu_sched);
+#ifdef CONFIG_SCHED_SMT + /* + * When going down, decrement the number of cores with SMT present. + */ + if (cpumask_weight(cpu_smt_mask(cpu)) == 2) + static_branch_dec_cpuslocked(&sched_smt_present); +#endif + if (!sched_smp_initialized) return 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit dbe733642e01dd108f71436aaea7b328cb28fd87 upstream
CONFIG_SCHED_SMT is enabled by all distros, so there is not a real point to have it configurable. The runtime overhead in the core scheduler code is minimal because the actual SMT scheduling parts are conditional on a static key.
This allows to expose the scheduler's SMT state static key to the speculation control code. Alternatively the scheduler's static key could be made always available when CONFIG_SMP is enabled, but that's just adding an unused static key to every other architecture for nothing.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.337452245@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/Kconfig | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -955,13 +955,7 @@ config NR_CPUS approximately eight kilobytes to the kernel image.
config SCHED_SMT - bool "SMT (Hyperthreading) scheduler support" - depends on SMP - ---help--- - SMT scheduler support improves the CPU scheduler's decision making - when dealing with Intel Pentium 4 chips with HyperThreading at a - cost of slightly increased overhead in some places. If unsure say - N here. + def_bool y if SMP
config SCHED_MC def_bool y
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 321a874a7ef85655e93b3206d0f36b4a6097f948 upstream
Make the scheduler's 'sched_smt_present' static key globaly available, so it can be used in the x86 speculation control code.
Provide a query function and a stub for the CONFIG_SMP=n case.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sched/smt.h | 18 ++++++++++++++++++ kernel/sched/sched.h | 4 +--- 2 files changed, 19 insertions(+), 3 deletions(-)
--- /dev/null +++ b/include/linux/sched/smt.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_SCHED_SMT_H +#define _LINUX_SCHED_SMT_H + +#include <linux/static_key.h> + +#ifdef CONFIG_SCHED_SMT +extern struct static_key_false sched_smt_present; + +static __always_inline bool sched_smt_active(void) +{ + return static_branch_likely(&sched_smt_present); +} +#else +static inline bool sched_smt_active(void) { return false; } +#endif + +#endif --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -20,6 +20,7 @@ #include <linux/sched/task_stack.h> #include <linux/sched/cputime.h> #include <linux/sched/init.h> +#include <linux/sched/smt.h>
#include <linux/u64_stats_sync.h> #include <linux/kernel_stat.h> @@ -825,9 +826,6 @@ static inline int cpu_of(struct rq *rq)
#ifdef CONFIG_SCHED_SMT - -extern struct static_key_false sched_smt_present; - extern void __update_idle_core(struct rq *rq);
static inline void update_idle_core(struct rq *rq)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream
arch_smt_update() is only called when the sysfs SMT control knob is changed. This means that when SMT is enabled in the sysfs control knob the system is considered to have SMT active even if all siblings are offline.
To allow finegrained control of the speculation mitigations, the actual SMT state is more interesting than the fact that siblings could be enabled.
Rework the code, so arch_smt_update() is invoked from each individual CPU hotplug function, and simplify the update function while at it.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 11 +++++------ include/linux/sched/smt.h | 2 ++ kernel/cpu.c | 15 +++++++++------ 3 files changed, 16 insertions(+), 12 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -14,6 +14,7 @@ #include <linux/module.h> #include <linux/nospec.h> #include <linux/prctl.h> +#include <linux/sched/smt.h>
#include <asm/spec-ctrl.h> #include <asm/cmdline.h> @@ -342,16 +343,14 @@ void arch_smt_update(void) return;
mutex_lock(&spec_ctrl_mutex); - mask = x86_spec_ctrl_base; - if (cpu_smt_control == CPU_SMT_ENABLED) + + mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; + if (sched_smt_active()) mask |= SPEC_CTRL_STIBP; - else - mask &= ~SPEC_CTRL_STIBP;
if (mask != x86_spec_ctrl_base) { pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", - cpu_smt_control == CPU_SMT_ENABLED ? - "Enabling" : "Disabling"); + mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); x86_spec_ctrl_base = mask; on_each_cpu(update_stibp_msr, NULL, 1); } --- a/include/linux/sched/smt.h +++ b/include/linux/sched/smt.h @@ -15,4 +15,6 @@ static __always_inline bool sched_smt_ac static inline bool sched_smt_active(void) { return false; } #endif
+void arch_smt_update(void); + #endif --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -10,6 +10,7 @@ #include <linux/sched/signal.h> #include <linux/sched/hotplug.h> #include <linux/sched/task.h> +#include <linux/sched/smt.h> #include <linux/unistd.h> #include <linux/cpu.h> #include <linux/oom.h> @@ -347,6 +348,12 @@ void cpu_hotplug_enable(void) EXPORT_SYMBOL_GPL(cpu_hotplug_enable); #endif /* CONFIG_HOTPLUG_CPU */
+/* + * Architectures that need SMT-specific errata handling during SMT hotplug + * should override this. + */ +void __weak arch_smt_update(void) { } + #ifdef CONFIG_HOTPLUG_SMT enum cpuhp_smt_control cpu_smt_control __read_mostly = CPU_SMT_ENABLED; EXPORT_SYMBOL_GPL(cpu_smt_control); @@ -998,6 +1005,7 @@ out: * concurrent CPU hotplug via cpu_add_remove_lock. */ lockup_detector_cleanup(); + arch_smt_update(); return ret; }
@@ -1126,6 +1134,7 @@ static int _cpu_up(unsigned int cpu, int ret = cpuhp_up_callbacks(cpu, st, target); out: cpus_write_unlock(); + arch_smt_update(); return ret; }
@@ -2045,12 +2054,6 @@ static void cpuhp_online_cpu_device(unsi kobject_uevent(&dev->kobj, KOBJ_ONLINE); }
-/* - * Architectures that need SMT-specific errata handling during SMT hotplug - * should override this. - */ -void __weak arch_smt_update(void) { }; - static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 130d6f946f6f2a972ee3ec8540b7243ab99abe97 upstream
Use the now exposed real SMT state, not the SMT sysfs control knob state. This reflects the state of the system when the mitigation status is queried.
This does not change the warning in the VMX launch code. There the dependency on the control knob makes sense because siblings could be brought online anytime after launching the VM.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.613357354@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -829,13 +829,14 @@ static ssize_t l1tf_show_state(char *buf
if (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_EPT_DISABLED || (l1tf_vmx_mitigation == VMENTER_L1D_FLUSH_NEVER && - cpu_smt_control == CPU_SMT_ENABLED)) + sched_smt_active())) { return sprintf(buf, "%s; VMX: %s\n", L1TF_DEFAULT_MSG, l1tf_vmx_states[l1tf_vmx_mitigation]); + }
return sprintf(buf, "%s; VMX: %s, SMT %s\n", L1TF_DEFAULT_MSG, l1tf_vmx_states[l1tf_vmx_mitigation], - cpu_smt_control == CPU_SMT_ENABLED ? "vulnerable" : "disabled"); + sched_smt_active() ? "vulnerable" : "disabled"); } #else static ssize_t l1tf_show_state(char *buf)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 15d6b7aab0793b2de8a05d8a828777dd24db424e upstream
Reorder the code so it is better grouped. No functional change.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.707122879@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 168 ++++++++++++++++++++++----------------------- 1 file changed, 84 insertions(+), 84 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -123,29 +123,6 @@ void __init check_bugs(void) #endif }
-/* The kernel command line selection */ -enum spectre_v2_mitigation_cmd { - SPECTRE_V2_CMD_NONE, - SPECTRE_V2_CMD_AUTO, - SPECTRE_V2_CMD_FORCE, - SPECTRE_V2_CMD_RETPOLINE, - SPECTRE_V2_CMD_RETPOLINE_GENERIC, - SPECTRE_V2_CMD_RETPOLINE_AMD, -}; - -static const char *spectre_v2_strings[] = { - [SPECTRE_V2_NONE] = "Vulnerable", - [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", - [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", - [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS", -}; - -#undef pr_fmt -#define pr_fmt(fmt) "Spectre V2 : " fmt - -static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init = - SPECTRE_V2_NONE; - void x86_virt_spec_ctrl(u64 guest_spec_ctrl, u64 guest_virt_spec_ctrl, bool setguest) { @@ -215,6 +192,12 @@ static void x86_amd_ssb_disable(void) wrmsrl(MSR_AMD64_LS_CFG, msrval); }
+#undef pr_fmt +#define pr_fmt(fmt) "Spectre V2 : " fmt + +static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init = + SPECTRE_V2_NONE; + #ifdef RETPOLINE static bool spectre_v2_bad_module;
@@ -236,18 +219,6 @@ static inline const char *spectre_v2_mod static inline const char *spectre_v2_module_string(void) { return ""; } #endif
-static void __init spec2_print_if_insecure(const char *reason) -{ - if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - pr_info("%s selected on command line.\n", reason); -} - -static void __init spec2_print_if_secure(const char *reason) -{ - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - pr_info("%s selected on command line.\n", reason); -} - static inline bool match_option(const char *arg, int arglen, const char *opt) { int len = strlen(opt); @@ -255,24 +226,53 @@ static inline bool match_option(const ch return len == arglen && !strncmp(arg, opt, len); }
+/* The kernel command line selection for spectre v2 */ +enum spectre_v2_mitigation_cmd { + SPECTRE_V2_CMD_NONE, + SPECTRE_V2_CMD_AUTO, + SPECTRE_V2_CMD_FORCE, + SPECTRE_V2_CMD_RETPOLINE, + SPECTRE_V2_CMD_RETPOLINE_GENERIC, + SPECTRE_V2_CMD_RETPOLINE_AMD, +}; + +static const char *spectre_v2_strings[] = { + [SPECTRE_V2_NONE] = "Vulnerable", + [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", + [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", + [SPECTRE_V2_IBRS_ENHANCED] = "Mitigation: Enhanced IBRS", +}; + static const struct { const char *option; enum spectre_v2_mitigation_cmd cmd; bool secure; } mitigation_options[] = { - { "off", SPECTRE_V2_CMD_NONE, false }, - { "on", SPECTRE_V2_CMD_FORCE, true }, - { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, - { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false }, - { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false }, - { "auto", SPECTRE_V2_CMD_AUTO, false }, + { "off", SPECTRE_V2_CMD_NONE, false }, + { "on", SPECTRE_V2_CMD_FORCE, true }, + { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, + { "retpoline,amd", SPECTRE_V2_CMD_RETPOLINE_AMD, false }, + { "retpoline,generic", SPECTRE_V2_CMD_RETPOLINE_GENERIC, false }, + { "auto", SPECTRE_V2_CMD_AUTO, false }, };
+static void __init spec2_print_if_insecure(const char *reason) +{ + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + pr_info("%s selected on command line.\n", reason); +} + +static void __init spec2_print_if_secure(const char *reason) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + pr_info("%s selected on command line.\n", reason); +} + static enum spectre_v2_mitigation_cmd __init spectre_v2_parse_cmdline(void) { + enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO; char arg[20]; int ret, i; - enum spectre_v2_mitigation_cmd cmd = SPECTRE_V2_CMD_AUTO;
if (cmdline_find_option_bool(boot_command_line, "nospectre_v2")) return SPECTRE_V2_CMD_NONE; @@ -315,48 +315,6 @@ static enum spectre_v2_mitigation_cmd __ return cmd; }
-static bool stibp_needed(void) -{ - if (spectre_v2_enabled == SPECTRE_V2_NONE) - return false; - - /* Enhanced IBRS makes using STIBP unnecessary. */ - if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) - return false; - - if (!boot_cpu_has(X86_FEATURE_STIBP)) - return false; - - return true; -} - -static void update_stibp_msr(void *info) -{ - wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); -} - -void arch_smt_update(void) -{ - u64 mask; - - if (!stibp_needed()) - return; - - mutex_lock(&spec_ctrl_mutex); - - mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; - if (sched_smt_active()) - mask |= SPEC_CTRL_STIBP; - - if (mask != x86_spec_ctrl_base) { - pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", - mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); - x86_spec_ctrl_base = mask; - on_each_cpu(update_stibp_msr, NULL, 1); - } - mutex_unlock(&spec_ctrl_mutex); -} - static void __init spectre_v2_select_mitigation(void) { enum spectre_v2_mitigation_cmd cmd = spectre_v2_parse_cmdline(); @@ -459,6 +417,48 @@ specv2_set_mode: arch_smt_update(); }
+static bool stibp_needed(void) +{ + if (spectre_v2_enabled == SPECTRE_V2_NONE) + return false; + + /* Enhanced IBRS makes using STIBP unnecessary. */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return false; + + if (!boot_cpu_has(X86_FEATURE_STIBP)) + return false; + + return true; +} + +static void update_stibp_msr(void *info) +{ + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); +} + +void arch_smt_update(void) +{ + u64 mask; + + if (!stibp_needed()) + return; + + mutex_lock(&spec_ctrl_mutex); + + mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; + if (sched_smt_active()) + mask |= SPEC_CTRL_STIBP; + + if (mask != x86_spec_ctrl_base) { + pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", + mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); + x86_spec_ctrl_base = mask; + on_each_cpu(update_stibp_msr, NULL, 1); + } + mutex_unlock(&spec_ctrl_mutex); +} + #undef pr_fmt #define pr_fmt(fmt) "Speculative Store Bypass: " fmt
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 8770709f411763884535662744a3786a1806afd3 upstream
checkpatch.pl muttered when reshuffling the code: WARNING: static const char * array should probably be static const char * const
Fix up all the string arrays.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.800018931@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -236,7 +236,7 @@ enum spectre_v2_mitigation_cmd { SPECTRE_V2_CMD_RETPOLINE_AMD, };
-static const char *spectre_v2_strings[] = { +static const char * const spectre_v2_strings[] = { [SPECTRE_V2_NONE] = "Vulnerable", [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", [SPECTRE_V2_RETPOLINE_AMD] = "Mitigation: Full AMD retpoline", @@ -473,7 +473,7 @@ enum ssb_mitigation_cmd { SPEC_STORE_BYPASS_CMD_SECCOMP, };
-static const char *ssb_strings[] = { +static const char * const ssb_strings[] = { [SPEC_STORE_BYPASS_NONE] = "Vulnerable", [SPEC_STORE_BYPASS_DISABLE] = "Mitigation: Speculative Store Bypass disabled", [SPEC_STORE_BYPASS_PRCTL] = "Mitigation: Speculative Store Bypass disabled via prctl", @@ -813,7 +813,7 @@ early_param("l1tf", l1tf_cmdline); #define L1TF_DEFAULT_MSG "Mitigation: PTE Inversion"
#if IS_ENABLED(CONFIG_KVM_INTEL) -static const char *l1tf_vmx_states[] = { +static const char * const l1tf_vmx_states[] = { [VMENTER_L1D_FLUSH_AUTO] = "auto", [VMENTER_L1D_FLUSH_NEVER] = "vulnerable", [VMENTER_L1D_FLUSH_COND] = "conditional cache flushes",
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 30ba72a990f5096ae08f284de17986461efcc408 upstream
No point to keep that around.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.893886356@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -247,7 +247,7 @@ static const struct { const char *option; enum spectre_v2_mitigation_cmd cmd; bool secure; -} mitigation_options[] = { +} mitigation_options[] __initdata = { { "off", SPECTRE_V2_CMD_NONE, false }, { "on", SPECTRE_V2_CMD_FORCE, true }, { "retpoline", SPECTRE_V2_CMD_RETPOLINE, false }, @@ -483,7 +483,7 @@ static const char * const ssb_strings[] static const struct { const char *option; enum ssb_mitigation_cmd cmd; -} ssb_mitigation_options[] = { +} ssb_mitigation_options[] __initdata = { { "auto", SPEC_STORE_BYPASS_CMD_AUTO }, /* Platform decides */ { "on", SPEC_STORE_BYPASS_CMD_ON }, /* Disable Speculative Store Bypass */ { "off", SPEC_STORE_BYPASS_CMD_NONE }, /* Don't touch Speculative Store Bypass */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 495d470e9828500e0155027f230449ac5e29c025 upstream
There is no point in having two functions and a conditional at the call site.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185004.986890749@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 17 ++++------------- 1 file changed, 4 insertions(+), 13 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -256,15 +256,9 @@ static const struct { { "auto", SPECTRE_V2_CMD_AUTO, false }, };
-static void __init spec2_print_if_insecure(const char *reason) +static void __init spec_v2_print_cond(const char *reason, bool secure) { - if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) - pr_info("%s selected on command line.\n", reason); -} - -static void __init spec2_print_if_secure(const char *reason) -{ - if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2) != secure) pr_info("%s selected on command line.\n", reason); }
@@ -307,11 +301,8 @@ static enum spectre_v2_mitigation_cmd __ return SPECTRE_V2_CMD_AUTO; }
- if (mitigation_options[i].secure) - spec2_print_if_secure(mitigation_options[i].option); - else - spec2_print_if_insecure(mitigation_options[i].option); - + spec_v2_print_cond(mitigation_options[i].option, + mitigation_options[i].secure); return cmd; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream
Add command line control for user space indirect branch speculation mitigations. The new option is: spectre_v2_user=
The initial options are:
- on: Unconditionally enabled - off: Unconditionally disabled -auto: Kernel selects mitigation (default off for now)
When the spectre_v2= command line argument is either 'on' or 'off' this implies that the application to application control follows that state even if a contradicting spectre_v2_user= argument is supplied.
Originally-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 32 +++++ arch/x86/include/asm/nospec-branch.h | 10 + arch/x86/kernel/cpu/bugs.c | 133 ++++++++++++++++++++---- 3 files changed, 156 insertions(+), 19 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3994,9 +3994,13 @@
spectre_v2= [X86] Control mitigation of Spectre variant 2 (indirect branch speculation) vulnerability. + The default operation protects the kernel from + user space attacks.
- on - unconditionally enable - off - unconditionally disable + on - unconditionally enable, implies + spectre_v2_user=on + off - unconditionally disable, implies + spectre_v2_user=off auto - kernel detects whether your CPU model is vulnerable
@@ -4006,6 +4010,12 @@ CONFIG_RETPOLINE configuration option, and the compiler with which the kernel was built.
+ Selecting 'on' will also enable the mitigation + against user space to user space task attacks. + + Selecting 'off' will disable both the kernel and + the user space protections. + Specific mitigations can also be selected manually:
retpoline - replace indirect branches @@ -4015,6 +4025,24 @@ Not specifying this option is equivalent to spectre_v2=auto.
+ spectre_v2_user= + [X86] Control mitigation of Spectre variant 2 + (indirect branch speculation) vulnerability between + user space tasks + + on - Unconditionally enable mitigations. Is + enforced by spectre_v2=on + + off - Unconditionally disable mitigations. Is + enforced by spectre_v2=off + + auto - Kernel selects the mitigation depending on + the available CPU features and vulnerability. + Default is off. + + Not specifying this option is equivalent to + spectre_v2_user=auto. + spec_store_bypass_disable= [HW] Control Speculative Store Bypass (SSB) Disable mitigation (Speculative Store Bypass vulnerability) --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -3,6 +3,8 @@ #ifndef _ASM_X86_NOSPEC_BRANCH_H_ #define _ASM_X86_NOSPEC_BRANCH_H_
+#include <linux/static_key.h> + #include <asm/alternative.h> #include <asm/alternative-asm.h> #include <asm/cpufeatures.h> @@ -226,6 +228,12 @@ enum spectre_v2_mitigation { SPECTRE_V2_IBRS_ENHANCED, };
+/* The indirect branch speculation control variants */ +enum spectre_v2_user_mitigation { + SPECTRE_V2_USER_NONE, + SPECTRE_V2_USER_STRICT, +}; + /* The Speculative Store Bypass disable variants */ enum ssb_mitigation { SPEC_STORE_BYPASS_NONE, @@ -303,6 +311,8 @@ do { \ preempt_enable(); \ } while (0)
+DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp); + #endif /* __ASSEMBLY__ */
/* --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -53,6 +53,9 @@ static u64 __ro_after_init x86_spec_ctrl u64 __ro_after_init x86_amd_ls_cfg_base; u64 __ro_after_init x86_amd_ls_cfg_ssbd_mask;
+/* Control conditional STIPB in switch_to() */ +DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp); + void __init check_bugs(void) { identify_boot_cpu(); @@ -198,6 +201,9 @@ static void x86_amd_ssb_disable(void) static enum spectre_v2_mitigation spectre_v2_enabled __ro_after_init = SPECTRE_V2_NONE;
+static enum spectre_v2_user_mitigation spectre_v2_user __ro_after_init = + SPECTRE_V2_USER_NONE; + #ifdef RETPOLINE static bool spectre_v2_bad_module;
@@ -236,6 +242,104 @@ enum spectre_v2_mitigation_cmd { SPECTRE_V2_CMD_RETPOLINE_AMD, };
+enum spectre_v2_user_cmd { + SPECTRE_V2_USER_CMD_NONE, + SPECTRE_V2_USER_CMD_AUTO, + SPECTRE_V2_USER_CMD_FORCE, +}; + +static const char * const spectre_v2_user_strings[] = { + [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", + [SPECTRE_V2_USER_STRICT] = "User space: Mitigation: STIBP protection", +}; + +static const struct { + const char *option; + enum spectre_v2_user_cmd cmd; + bool secure; +} v2_user_options[] __initdata = { + { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, + { "off", SPECTRE_V2_USER_CMD_NONE, false }, + { "on", SPECTRE_V2_USER_CMD_FORCE, true }, +}; + +static void __init spec_v2_user_print_cond(const char *reason, bool secure) +{ + if (boot_cpu_has_bug(X86_BUG_SPECTRE_V2) != secure) + pr_info("spectre_v2_user=%s forced on command line.\n", reason); +} + +static enum spectre_v2_user_cmd __init +spectre_v2_parse_user_cmdline(enum spectre_v2_mitigation_cmd v2_cmd) +{ + char arg[20]; + int ret, i; + + switch (v2_cmd) { + case SPECTRE_V2_CMD_NONE: + return SPECTRE_V2_USER_CMD_NONE; + case SPECTRE_V2_CMD_FORCE: + return SPECTRE_V2_USER_CMD_FORCE; + default: + break; + } + + ret = cmdline_find_option(boot_command_line, "spectre_v2_user", + arg, sizeof(arg)); + if (ret < 0) + return SPECTRE_V2_USER_CMD_AUTO; + + for (i = 0; i < ARRAY_SIZE(v2_user_options); i++) { + if (match_option(arg, ret, v2_user_options[i].option)) { + spec_v2_user_print_cond(v2_user_options[i].option, + v2_user_options[i].secure); + return v2_user_options[i].cmd; + } + } + + pr_err("Unknown user space protection option (%s). Switching to AUTO select\n", arg); + return SPECTRE_V2_USER_CMD_AUTO; +} + +static void __init +spectre_v2_user_select_mitigation(enum spectre_v2_mitigation_cmd v2_cmd) +{ + enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE; + bool smt_possible = IS_ENABLED(CONFIG_SMP); + + if (!boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_STIBP)) + return; + + if (cpu_smt_control == CPU_SMT_FORCE_DISABLED || + cpu_smt_control == CPU_SMT_NOT_SUPPORTED) + smt_possible = false; + + switch (spectre_v2_parse_user_cmdline(v2_cmd)) { + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_NONE: + goto set_mode; + case SPECTRE_V2_USER_CMD_FORCE: + mode = SPECTRE_V2_USER_STRICT; + break; + } + + /* Initialize Indirect Branch Prediction Barrier */ + if (boot_cpu_has(X86_FEATURE_IBPB)) { + setup_force_cpu_cap(X86_FEATURE_USE_IBPB); + pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); + } + + /* If enhanced IBRS is enabled no STIPB required */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) + return; + +set_mode: + spectre_v2_user = mode; + /* Only print the STIBP mode when SMT possible */ + if (smt_possible) + pr_info("%s\n", spectre_v2_user_strings[mode]); +} + static const char * const spectre_v2_strings[] = { [SPECTRE_V2_NONE] = "Vulnerable", [SPECTRE_V2_RETPOLINE_GENERIC] = "Mitigation: Full generic retpoline", @@ -382,12 +486,6 @@ specv2_set_mode: setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW); pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch\n");
- /* Initialize Indirect Branch Prediction Barrier if supported */ - if (boot_cpu_has(X86_FEATURE_IBPB)) { - setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); - } - /* * Retpoline means the kernel is safe because it has no indirect * branches. Enhanced IBRS protects firmware too, so, enable restricted @@ -404,23 +502,21 @@ specv2_set_mode: pr_info("Enabling Restricted Speculation for firmware calls\n"); }
+ /* Set up IBPB and STIBP depending on the general spectre V2 command */ + spectre_v2_user_select_mitigation(cmd); + /* Enable STIBP if appropriate */ arch_smt_update(); }
static bool stibp_needed(void) { - if (spectre_v2_enabled == SPECTRE_V2_NONE) - return false; - /* Enhanced IBRS makes using STIBP unnecessary. */ if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return false;
- if (!boot_cpu_has(X86_FEATURE_STIBP)) - return false; - - return true; + /* Check for strict user mitigation mode */ + return spectre_v2_user == SPECTRE_V2_USER_STRICT; }
static void update_stibp_msr(void *info) @@ -841,10 +937,13 @@ static char *stibp_state(void) if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return "";
- if (x86_spec_ctrl_base & SPEC_CTRL_STIBP) - return ", STIBP"; - else - return ""; + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + return ", STIBP: disabled"; + case SPECTRE_V2_USER_STRICT: + return ", STIBP: forced"; + } + return ""; }
static char *ibpb_state(void)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit 5bfbe3ad5840d941b89bcac54b821ba14f50a0ba upstream
To avoid the overhead of STIBP always on, it's necessary to allow per task control of STIBP.
Add a new task flag TIF_SPEC_IB and evaluate it during context switch if SMT is active and flag evaluation is enabled by the speculation control code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the guest/host switch works properly.
This has no effect because TIF_SPEC_IB cannot be set yet and the static key which controls evaluation is off. Preparatory patch for adding the control code.
[ tglx: Simplify the context switch logic and make the TIF evaluation depend on SMP=y and on the static key controlling the conditional update. Rename it to TIF_SPEC_IB because it controls both STIBP and IBPB ]
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/msr-index.h | 5 +++-- arch/x86/include/asm/spec-ctrl.h | 12 ++++++++++++ arch/x86/include/asm/thread_info.h | 5 ++++- arch/x86/kernel/cpu/bugs.c | 4 ++++ arch/x86/kernel/process.c | 20 ++++++++++++++++++-- 5 files changed, 41 insertions(+), 5 deletions(-)
--- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -41,9 +41,10 @@
#define MSR_IA32_SPEC_CTRL 0x00000048 /* Speculation Control */ #define SPEC_CTRL_IBRS (1 << 0) /* Indirect Branch Restricted Speculation */ -#define SPEC_CTRL_STIBP (1 << 1) /* Single Thread Indirect Branch Predictors */ +#define SPEC_CTRL_STIBP_SHIFT 1 /* Single Thread Indirect Branch Predictor (STIBP) bit */ +#define SPEC_CTRL_STIBP (1 << SPEC_CTRL_STIBP_SHIFT) /* STIBP mask */ #define SPEC_CTRL_SSBD_SHIFT 2 /* Speculative Store Bypass Disable bit */ -#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */ +#define SPEC_CTRL_SSBD (1 << SPEC_CTRL_SSBD_SHIFT) /* Speculative Store Bypass Disable */
#define MSR_IA32_PRED_CMD 0x00000049 /* Prediction Command */ #define PRED_CMD_IBPB (1 << 0) /* Indirect Branch Prediction Barrier */ --- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -53,12 +53,24 @@ static inline u64 ssbd_tif_to_spec_ctrl( return (tifn & _TIF_SSBD) >> (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT); }
+static inline u64 stibp_tif_to_spec_ctrl(u64 tifn) +{ + BUILD_BUG_ON(TIF_SPEC_IB < SPEC_CTRL_STIBP_SHIFT); + return (tifn & _TIF_SPEC_IB) >> (TIF_SPEC_IB - SPEC_CTRL_STIBP_SHIFT); +} + static inline unsigned long ssbd_spec_ctrl_to_tif(u64 spec_ctrl) { BUILD_BUG_ON(TIF_SSBD < SPEC_CTRL_SSBD_SHIFT); return (spec_ctrl & SPEC_CTRL_SSBD) << (TIF_SSBD - SPEC_CTRL_SSBD_SHIFT); }
+static inline unsigned long stibp_spec_ctrl_to_tif(u64 spec_ctrl) +{ + BUILD_BUG_ON(TIF_SPEC_IB < SPEC_CTRL_STIBP_SHIFT); + return (spec_ctrl & SPEC_CTRL_STIBP) << (TIF_SPEC_IB - SPEC_CTRL_STIBP_SHIFT); +} + static inline u64 ssbd_tif_to_amd_ls_cfg(u64 tifn) { return (tifn & _TIF_SSBD) ? x86_amd_ls_cfg_ssbd_mask : 0ULL; --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -85,6 +85,7 @@ struct thread_info { #define TIF_SYSCALL_EMU 6 /* syscall emulation active */ #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ +#define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ #define TIF_UPROBE 12 /* breakpointed or singlestepping */ #define TIF_PATCH_PENDING 13 /* pending live patching update */ @@ -112,6 +113,7 @@ struct thread_info { #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) +#define _TIF_SPEC_IB (1 << TIF_SPEC_IB) #define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY) #define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING) @@ -148,7 +150,8 @@ struct thread_info {
/* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW \ - (_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP|_TIF_SSBD) + (_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP| \ + _TIF_SSBD|_TIF_SPEC_IB)
#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -147,6 +147,10 @@ x86_virt_spec_ctrl(u64 guest_spec_ctrl, static_cpu_has(X86_FEATURE_AMD_SSBD)) hostval |= ssbd_tif_to_spec_ctrl(ti->flags);
+ /* Conditional STIBP enabled? */ + if (static_branch_unlikely(&switch_to_cond_stibp)) + hostval |= stibp_tif_to_spec_ctrl(ti->flags); + if (hostval != guestval) { msrval = setguest ? guestval : hostval; wrmsrl(MSR_IA32_SPEC_CTRL, msrval); --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -407,11 +407,17 @@ static __always_inline void amd_set_ssb_ static __always_inline void __speculation_ctrl_update(unsigned long tifp, unsigned long tifn) { + unsigned long tif_diff = tifp ^ tifn; u64 msr = x86_spec_ctrl_base; bool updmsr = false;
- /* If TIF_SSBD is different, select the proper mitigation method */ - if ((tifp ^ tifn) & _TIF_SSBD) { + /* + * If TIF_SSBD is different, select the proper mitigation + * method. Note that if SSBD mitigation is disabled or permanentely + * enabled this branch can't be taken because nothing can set + * TIF_SSBD. + */ + if (tif_diff & _TIF_SSBD) { if (static_cpu_has(X86_FEATURE_VIRT_SSBD)) { amd_set_ssb_virt_state(tifn); } else if (static_cpu_has(X86_FEATURE_LS_CFG_SSBD)) { @@ -423,6 +429,16 @@ static __always_inline void __speculatio } }
+ /* + * Only evaluate TIF_SPEC_IB if conditional STIBP is enabled, + * otherwise avoid the MSR write. + */ + if (IS_ENABLED(CONFIG_SMP) && + static_branch_unlikely(&switch_to_cond_stibp)) { + updmsr |= !!(tif_diff & _TIF_SPEC_IB); + msr |= stibp_tif_to_spec_ctrl(tifn); + } + if (updmsr) wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit ff16701a29cba3aafa0bd1656d766813b2d0a811 upstream
Move the conditional invocation of __switch_to_xtra() into an inline function so the logic can be shared between 32 and 64 bit.
Remove the handthrough of the TSS pointer and retrieve the pointer directly in the bitmap handling function. Use this_cpu_ptr() instead of the per_cpu() indirection.
This is a preparatory change so integration of conditional indirect branch speculation optimization happens only in one place.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.280855518@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/switch_to.h | 3 --- arch/x86/kernel/process.c | 12 +++++++----- arch/x86/kernel/process.h | 24 ++++++++++++++++++++++++ arch/x86/kernel/process_32.c | 8 +------- arch/x86/kernel/process_64.c | 10 +++------- 5 files changed, 35 insertions(+), 22 deletions(-)
--- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -11,9 +11,6 @@ struct task_struct *__switch_to_asm(stru
__visible struct task_struct *__switch_to(struct task_struct *prev, struct task_struct *next); -struct tss_struct; -void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, - struct tss_struct *tss);
/* This runs runs on the previous thread's stack. */ static inline void prepare_switch_to(struct task_struct *prev, --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -41,6 +41,8 @@ #include <asm/prctl.h> #include <asm/spec-ctrl.h>
+#include "process.h" + /* * per-CPU TSS segments. Threads are completely 'soft' on Linux, * no more per-task TSS's. The TSS size is kept cacheline-aligned @@ -255,11 +257,12 @@ void arch_setup_new_exec(void) enable_cpuid(); }
-static inline void switch_to_bitmap(struct tss_struct *tss, - struct thread_struct *prev, +static inline void switch_to_bitmap(struct thread_struct *prev, struct thread_struct *next, unsigned long tifp, unsigned long tifn) { + struct tss_struct *tss = this_cpu_ptr(&cpu_tss_rw); + if (tifn & _TIF_IO_BITMAP) { /* * Copy the relevant range of the IO bitmap. @@ -451,8 +454,7 @@ void speculation_ctrl_update(unsigned lo preempt_enable(); }
-void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p, - struct tss_struct *tss) +void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) { struct thread_struct *prev, *next; unsigned long tifp, tifn; @@ -462,7 +464,7 @@ void __switch_to_xtra(struct task_struct
tifn = READ_ONCE(task_thread_info(next_p)->flags); tifp = READ_ONCE(task_thread_info(prev_p)->flags); - switch_to_bitmap(tss, prev, next, tifp, tifn); + switch_to_bitmap(prev, next, tifp, tifn);
propagate_user_return_notify(prev_p, next_p);
--- /dev/null +++ b/arch/x86/kernel/process.h @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +// +// Code shared between 32 and 64 bit + +void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p); + +/* + * This needs to be inline to optimize for the common case where no extra + * work needs to be done. + */ +static inline void switch_to_extra(struct task_struct *prev, + struct task_struct *next) +{ + unsigned long next_tif = task_thread_info(next)->flags; + unsigned long prev_tif = task_thread_info(prev)->flags; + + /* + * __switch_to_xtra() handles debug registers, i/o bitmaps, + * speculation mitigations etc. + */ + if (unlikely(next_tif & _TIF_WORK_CTXSW_NEXT || + prev_tif & _TIF_WORK_CTXSW_PREV)) + __switch_to_xtra(prev, next); +} --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -234,7 +234,6 @@ __switch_to(struct task_struct *prev_p, struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id(); - struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu);
/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
@@ -266,12 +265,7 @@ __switch_to(struct task_struct *prev_p, if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) set_iopl_mask(next->iopl);
- /* - * Now maybe handle debug registers and/or IO bitmaps - */ - if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV || - task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT)) - __switch_to_xtra(prev_p, next_p, tss); + switch_to_extra(prev_p, next_p);
/* * Leave lazy mode, flushing any hypercalls made here. --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -59,6 +59,8 @@ #include <asm/unistd_32_ia32.h> #endif
+#include "process.h" + __visible DEFINE_PER_CPU(unsigned long, rsp_scratch);
/* Prints also some state that isn't saved in the pt_regs */ @@ -400,7 +402,6 @@ __switch_to(struct task_struct *prev_p, struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id(); - struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu);
WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count) != -1); @@ -467,12 +468,7 @@ __switch_to(struct task_struct *prev_p, /* Reload sp0. */ update_sp0(next_p);
- /* - * Now maybe reload the debug registers and handle I/O bitmaps - */ - if (unlikely(task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT || - task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV)) - __switch_to_xtra(prev_p, next_p, tss); + __switch_to_xtra(prev_p, next_p);
#ifdef CONFIG_XEN_PV /*
On Tue, 4 Dec 2018, Greg Kroah-Hartman wrote:
--- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -234,7 +234,6 @@ __switch_to(struct task_struct *prev_p, struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id();
- struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu);
/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */ @@ -266,12 +265,7 @@ __switch_to(struct task_struct *prev_p, if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) set_iopl_mask(next->iopl);
- /*
* Now maybe handle debug registers and/or IO bitmaps
*/
- if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);
- switch_to_extra(prev_p, next_p);
This is missing the hunk below.
Thanks,
tglx
8<------------------
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 67cecc9a2b6f..c2df91eab573 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -59,6 +59,8 @@ #include <asm/intel_rdt_sched.h> #include <asm/proto.h>
+#include "process.h" + void __show_regs(struct pt_regs *regs, int all) { unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
On Tue, Dec 04, 2018 at 12:14:13PM +0100, Thomas Gleixner wrote:
On Tue, 4 Dec 2018, Greg Kroah-Hartman wrote:
--- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -234,7 +234,6 @@ __switch_to(struct task_struct *prev_p, struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id();
- struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu);
/* never put a printk in __switch_to... printk() calls wake_up*() indirectly */ @@ -266,12 +265,7 @@ __switch_to(struct task_struct *prev_p, if (get_kernel_rpl() && unlikely(prev->iopl != next->iopl)) set_iopl_mask(next->iopl);
- /*
* Now maybe handle debug registers and/or IO bitmaps
*/
- if (unlikely(task_thread_info(prev_p)->flags & _TIF_WORK_CTXSW_PREV ||
task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
__switch_to_xtra(prev_p, next_p, tss);
- switch_to_extra(prev_p, next_p);
This is missing the hunk below.
Thanks,
tglx
8<------------------
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 67cecc9a2b6f..c2df91eab573 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -59,6 +59,8 @@ #include <asm/intel_rdt_sched.h> #include <asm/proto.h> +#include "process.h"
void __show_regs(struct pt_regs *regs, int all) { unsigned long cr0 = 0L, cr2 = 0L, cr3 = 0L, cr4 = 0L;
Thanks, now updated.
greg k-h
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 5635d99953f04b550738f6f4c1c532667c3fd872 upstream
The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke __switch_to_xtra() when:
- CONFIG_SMP is disabled
- The conditional STIPB mode is disabled
The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the only set bit in the work masks.
Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at run time when the static key controlling the conditional STIBP mode is disabled.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/thread_info.h | 13 +++++++++++-- arch/x86/kernel/process.h | 15 +++++++++++++++ 2 files changed, 26 insertions(+), 2 deletions(-)
--- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -149,9 +149,18 @@ struct thread_info { _TIF_FSCHECK)
/* flags to check in __switch_to() */ -#define _TIF_WORK_CTXSW \ +#define _TIF_WORK_CTXSW_BASE \ (_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP| \ - _TIF_SSBD|_TIF_SPEC_IB) + _TIF_SSBD) + +/* + * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated. + */ +#ifdef CONFIG_SMP +# define _TIF_WORK_CTXSW (_TIF_WORK_CTXSW_BASE | _TIF_SPEC_IB) +#else +# define _TIF_WORK_CTXSW (_TIF_WORK_CTXSW_BASE) +#endif
#define _TIF_WORK_CTXSW_PREV (_TIF_WORK_CTXSW|_TIF_USER_RETURN_NOTIFY) #define _TIF_WORK_CTXSW_NEXT (_TIF_WORK_CTXSW) --- a/arch/x86/kernel/process.h +++ b/arch/x86/kernel/process.h @@ -2,6 +2,8 @@ // // Code shared between 32 and 64 bit
+#include <asm/spec-ctrl.h> + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p);
/* @@ -14,6 +16,19 @@ static inline void switch_to_extra(struc unsigned long next_tif = task_thread_info(next)->flags; unsigned long prev_tif = task_thread_info(prev)->flags;
+ if (IS_ENABLED(CONFIG_SMP)) { + /* + * Avoid __switch_to_xtra() invocation when conditional + * STIPB is disabled and the only different bit is + * TIF_SPEC_IB. For CONFIG_SMP=n TIF_SPEC_IB is not + * in the TIF_WORK_CTXSW masks. + */ + if (!static_branch_likely(&switch_to_cond_stibp)) { + prev_tif &= ~_TIF_SPEC_IB; + next_tif &= ~_TIF_SPEC_IB; + } + } + /* * __switch_to_xtra() handles debug registers, i/o bitmaps, * speculation mitigations etc.
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 4c71a2b6fd7e42814aa68a6dec88abf3b42ea573 upstream
The IBPB speculation barrier is issued from switch_mm() when the kernel switches to a user space task with a different mm than the user space task which ran last on the same CPU.
An additional optimization is to avoid IBPB when the incoming task can be ptraced by the outgoing task. This optimization only works when switching directly between two user space tasks. When switching from a kernel task to a user space task the optimization fails because the previous task cannot be accessed anymore. So for quite some scenarios the optimization is just adding overhead.
The upcoming conditional IBPB support will issue IBPB only for user space tasks which have the TIF_SPEC_IB bit set. This requires to handle the following cases:
1) Switch from a user space task (potential attacker) which has TIF_SPEC_IB set to a user space task (potential victim) which has TIF_SPEC_IB not set.
2) Switch from a user space task (potential attacker) which has TIF_SPEC_IB not set to a user space task (potential victim) which has TIF_SPEC_IB set.
This needs to be optimized for the case where the IBPB can be avoided when only kernel threads ran in between user space tasks which belong to the same process.
The current check whether two tasks belong to the same context is using the tasks context id. While correct, it's simpler to use the mm pointer because it allows to mangle the TIF_SPEC_IB bit into it. The context id based mechanism requires extra storage, which creates worse code.
When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into the per CPU storage which is used to track the last user space mm which was running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of the incoming task to make the decision whether IBPB needs to be issued or not to cover the two cases above.
As conditional IBPB is going to be the default, remove the dubious ptrace check for the IBPB always case and simply issue IBPB always when the process changes.
Move the storage to a different place in the struct as the original one created a hole.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/nospec-branch.h | 2 arch/x86/include/asm/tlbflush.h | 8 +- arch/x86/kernel/cpu/bugs.c | 29 +++++++- arch/x86/mm/tlb.c | 114 ++++++++++++++++++++++++++--------- 4 files changed, 118 insertions(+), 35 deletions(-)
--- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -312,6 +312,8 @@ do { \ } while (0)
DECLARE_STATIC_KEY_FALSE(switch_to_cond_stibp); +DECLARE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); +DECLARE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
#endif /* __ASSEMBLY__ */
--- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -185,10 +185,14 @@ struct tlb_state {
#define LOADED_MM_SWITCHING ((struct mm_struct *)1)
+ /* Last user mm for optimizing IBPB */ + union { + struct mm_struct *last_user_mm; + unsigned long last_user_mm_ibpb; + }; + u16 loaded_mm_asid; u16 next_asid; - /* last user mm's ctx id */ - u64 last_ctx_id;
/* * We can be in one of several states: --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -55,6 +55,10 @@ u64 __ro_after_init x86_amd_ls_cfg_ssbd_
/* Control conditional STIPB in switch_to() */ DEFINE_STATIC_KEY_FALSE(switch_to_cond_stibp); +/* Control conditional IBPB in switch_mm() */ +DEFINE_STATIC_KEY_FALSE(switch_mm_cond_ibpb); +/* Control unconditional IBPB in switch_mm() */ +DEFINE_STATIC_KEY_FALSE(switch_mm_always_ibpb);
void __init check_bugs(void) { @@ -330,7 +334,17 @@ spectre_v2_user_select_mitigation(enum s /* Initialize Indirect Branch Prediction Barrier */ if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB); - pr_info("Spectre v2 mitigation: Enabling Indirect Branch Prediction Barrier\n"); + + switch (mode) { + case SPECTRE_V2_USER_STRICT: + static_branch_enable(&switch_mm_always_ibpb); + break; + default: + break; + } + + pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", + mode == SPECTRE_V2_USER_STRICT ? "always-on" : "conditional"); }
/* If enhanced IBRS is enabled no STIPB required */ @@ -952,10 +966,15 @@ static char *stibp_state(void)
static char *ibpb_state(void) { - if (boot_cpu_has(X86_FEATURE_USE_IBPB)) - return ", IBPB"; - else - return ""; + if (boot_cpu_has(X86_FEATURE_IBPB)) { + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + return ", IBPB: disabled"; + case SPECTRE_V2_USER_STRICT: + return ", IBPB: always-on"; + } + } + return ""; }
static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr, --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -7,7 +7,6 @@ #include <linux/export.h> #include <linux/cpu.h> #include <linux/debugfs.h> -#include <linux/ptrace.h>
#include <asm/tlbflush.h> #include <asm/mmu_context.h> @@ -31,6 +30,12 @@ */
/* + * Use bit 0 to mangle the TIF_SPEC_IB state into the mm pointer which is + * stored in cpu_tlb_state.last_user_mm_ibpb. + */ +#define LAST_USER_MM_IBPB 0x1UL + +/* * We get here when we do something requiring a TLB invalidation * but could not go invalidate all of the contexts. We do the * necessary invalidation by clearing out the 'ctx_id' which @@ -181,17 +186,87 @@ static void sync_current_stack_to_mm(str } }
-static bool ibpb_needed(struct task_struct *tsk, u64 last_ctx_id) +static inline unsigned long mm_mangle_tif_spec_ib(struct task_struct *next) +{ + unsigned long next_tif = task_thread_info(next)->flags; + unsigned long ibpb = (next_tif >> TIF_SPEC_IB) & LAST_USER_MM_IBPB; + + return (unsigned long)next->mm | ibpb; +} + +static void cond_ibpb(struct task_struct *next) { + if (!next || !next->mm) + return; + /* - * Check if the current (previous) task has access to the memory - * of the @tsk (next) task. If access is denied, make sure to - * issue a IBPB to stop user->user Spectre-v2 attacks. - * - * Note: __ptrace_may_access() returns 0 or -ERRNO. + * Both, the conditional and the always IBPB mode use the mm + * pointer to avoid the IBPB when switching between tasks of the + * same process. Using the mm pointer instead of mm->context.ctx_id + * opens a hypothetical hole vs. mm_struct reuse, which is more or + * less impossible to control by an attacker. Aside of that it + * would only affect the first schedule so the theoretically + * exposed data is not really interesting. */ - return (tsk && tsk->mm && tsk->mm->context.ctx_id != last_ctx_id && - ptrace_may_access_sched(tsk, PTRACE_MODE_SPEC_IBPB)); + if (static_branch_likely(&switch_mm_cond_ibpb)) { + unsigned long prev_mm, next_mm; + + /* + * This is a bit more complex than the always mode because + * it has to handle two cases: + * + * 1) Switch from a user space task (potential attacker) + * which has TIF_SPEC_IB set to a user space task + * (potential victim) which has TIF_SPEC_IB not set. + * + * 2) Switch from a user space task (potential attacker) + * which has TIF_SPEC_IB not set to a user space task + * (potential victim) which has TIF_SPEC_IB set. + * + * This could be done by unconditionally issuing IBPB when + * a task which has TIF_SPEC_IB set is either scheduled in + * or out. Though that results in two flushes when: + * + * - the same user space task is scheduled out and later + * scheduled in again and only a kernel thread ran in + * between. + * + * - a user space task belonging to the same process is + * scheduled in after a kernel thread ran in between + * + * - a user space task belonging to the same process is + * scheduled in immediately. + * + * Optimize this with reasonably small overhead for the + * above cases. Mangle the TIF_SPEC_IB bit into the mm + * pointer of the incoming task which is stored in + * cpu_tlbstate.last_user_mm_ibpb for comparison. + */ + next_mm = mm_mangle_tif_spec_ib(next); + prev_mm = this_cpu_read(cpu_tlbstate.last_user_mm_ibpb); + + /* + * Issue IBPB only if the mm's are different and one or + * both have the IBPB bit set. + */ + if (next_mm != prev_mm && + (next_mm | prev_mm) & LAST_USER_MM_IBPB) + indirect_branch_prediction_barrier(); + + this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, next_mm); + } + + if (static_branch_unlikely(&switch_mm_always_ibpb)) { + /* + * Only flush when switching to a user space task with a + * different context than the user space task which ran + * last on this CPU. + */ + if (this_cpu_read(cpu_tlbstate.last_user_mm) != next->mm) { + indirect_branch_prediction_barrier(); + this_cpu_write(cpu_tlbstate.last_user_mm, next->mm); + } + } }
void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, @@ -262,22 +337,13 @@ void switch_mm_irqs_off(struct mm_struct } else { u16 new_asid; bool need_flush; - u64 last_ctx_id = this_cpu_read(cpu_tlbstate.last_ctx_id);
/* * Avoid user/user BTB poisoning by flushing the branch * predictor when switching between processes. This stops * one process from doing Spectre-v2 attacks on another. - * - * As an optimization, flush indirect branches only when - * switching into a processes that can't be ptrace by the - * current one (as in such case, attacker has much more - * convenient way how to tamper with the next process than - * branch buffer poisoning). */ - if (static_cpu_has(X86_FEATURE_USE_IBPB) && - ibpb_needed(tsk, last_ctx_id)) - indirect_branch_prediction_barrier(); + cond_ibpb(tsk);
if (IS_ENABLED(CONFIG_VMAP_STACK)) { /* @@ -327,14 +393,6 @@ void switch_mm_irqs_off(struct mm_struct trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0); }
- /* - * Record last user mm's context id, so we can avoid - * flushing branch buffer with IBPB if we switch back - * to the same user. - */ - if (next != &init_mm) - this_cpu_write(cpu_tlbstate.last_ctx_id, next->context.ctx_id); - /* Make sure we write CR3 before loaded_mm. */ barrier();
@@ -415,7 +473,7 @@ void initialize_tlbstate_and_flush(void) write_cr3(build_cr3(mm->pgd, 0));
/* Reinitialize tlbstate. */ - this_cpu_write(cpu_tlbstate.last_ctx_id, mm->context.ctx_id); + this_cpu_write(cpu_tlbstate.last_user_mm_ibpb, LAST_USER_MM_IBPB); this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0); this_cpu_write(cpu_tlbstate.next_asid, 1); this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 46f7ecb1e7359f183f5bbd1e08b90e10e52164f9 upstream
The IBPB control code in x86 removed the usage. Remove the functionality which was introduced for this.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/ptrace.h | 17 ----------------- kernel/ptrace.c | 10 ---------- 2 files changed, 27 deletions(-)
--- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -64,15 +64,12 @@ extern void exit_ptrace(struct task_stru #define PTRACE_MODE_NOAUDIT 0x04 #define PTRACE_MODE_FSCREDS 0x08 #define PTRACE_MODE_REALCREDS 0x10 -#define PTRACE_MODE_SCHED 0x20 -#define PTRACE_MODE_IBPB 0x40
/* shorthands for READ/ATTACH and FSCREDS/REALCREDS combinations */ #define PTRACE_MODE_READ_FSCREDS (PTRACE_MODE_READ | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_READ_REALCREDS (PTRACE_MODE_READ | PTRACE_MODE_REALCREDS) #define PTRACE_MODE_ATTACH_FSCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_FSCREDS) #define PTRACE_MODE_ATTACH_REALCREDS (PTRACE_MODE_ATTACH | PTRACE_MODE_REALCREDS) -#define PTRACE_MODE_SPEC_IBPB (PTRACE_MODE_ATTACH_REALCREDS | PTRACE_MODE_IBPB)
/** * ptrace_may_access - check whether the caller is permitted to access @@ -90,20 +87,6 @@ extern void exit_ptrace(struct task_stru */ extern bool ptrace_may_access(struct task_struct *task, unsigned int mode);
-/** - * ptrace_may_access - check whether the caller is permitted to access - * a target task. - * @task: target task - * @mode: selects type of access and caller credentials - * - * Returns true on success, false on denial. - * - * Similar to ptrace_may_access(). Only to be called from context switch - * code. Does not call into audit and the regular LSM hooks due to locking - * constraints. - */ -extern bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode); - static inline int ptrace_reparented(struct task_struct *child) { return !same_thread_group(child->real_parent, child->parent); --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -261,9 +261,6 @@ static int ptrace_check_attach(struct ta
static int ptrace_has_cap(struct user_namespace *ns, unsigned int mode) { - if (mode & PTRACE_MODE_SCHED) - return false; - if (mode & PTRACE_MODE_NOAUDIT) return has_ns_capability_noaudit(current, ns, CAP_SYS_PTRACE); else @@ -331,16 +328,9 @@ ok: !ptrace_has_cap(mm->user_ns, mode))) return -EPERM;
- if (mode & PTRACE_MODE_SCHED) - return 0; return security_ptrace_access_check(task, mode); }
-bool ptrace_may_access_sched(struct task_struct *task, unsigned int mode) -{ - return __ptrace_may_access(task, mode | PTRACE_MODE_SCHED); -} - bool ptrace_may_access(struct task_struct *task, unsigned int mode) { int err;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit e6da8bb6f9abb2628381904b24163c770e630bac upstream
The update of the TIF_SSBD flag and the conditional speculation control MSR update is done in the ssb_prctl_set() function directly. The upcoming prctl support for controlling indirect branch speculation via STIBP needs the same mechanism.
Split the code out and make it reusable. Reword the comment about updates for other tasks.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.652305076@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -699,10 +699,29 @@ static void ssb_select_mitigation(void) #undef pr_fmt #define pr_fmt(fmt) "Speculation prctl: " fmt
-static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) +static void task_update_spec_tif(struct task_struct *tsk, int tifbit, bool on) { bool update;
+ if (on) + update = !test_and_set_tsk_thread_flag(tsk, tifbit); + else + update = test_and_clear_tsk_thread_flag(tsk, tifbit); + + /* + * Immediately update the speculation control MSRs for the current + * task, but for a non-current task delay setting the CPU + * mitigation until it is scheduled next. + * + * This can only happen for SECCOMP mitigation. For PRCTL it's + * always the current task. + */ + if (tsk == current && update) + speculation_ctrl_update_current(); +} + +static int ssb_prctl_set(struct task_struct *task, unsigned long ctrl) +{ if (ssb_mode != SPEC_STORE_BYPASS_PRCTL && ssb_mode != SPEC_STORE_BYPASS_SECCOMP) return -ENXIO; @@ -713,28 +732,20 @@ static int ssb_prctl_set(struct task_str if (task_spec_ssb_force_disable(task)) return -EPERM; task_clear_spec_ssb_disable(task); - update = test_and_clear_tsk_thread_flag(task, TIF_SSBD); + task_update_spec_tif(task, TIF_SSBD, false); break; case PR_SPEC_DISABLE: task_set_spec_ssb_disable(task); - update = !test_and_set_tsk_thread_flag(task, TIF_SSBD); + task_update_spec_tif(task, TIF_SSBD, true); break; case PR_SPEC_FORCE_DISABLE: task_set_spec_ssb_disable(task); task_set_spec_ssb_force_disable(task); - update = !test_and_set_tsk_thread_flag(task, TIF_SSBD); + task_update_spec_tif(task, TIF_SSBD, true); break; default: return -ERANGE; } - - /* - * If being set on non-current task, delay setting the CPU - * mitigation until it is next scheduled. - */ - if (task == current && update) - speculation_ctrl_update_current(); - return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 6d991ba509ebcfcc908e009d1db51972a4f7a064 upstream
The seccomp speculation control operates on all tasks of a process, but only the current task of a process can update the MSR immediately. For the other threads the update is deferred to the next context switch.
This creates the following situation with Process A and B:
Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2 does not have the speculation control TIF bit set. Process B task 1 has the speculation control TIF bit set.
CPU0 CPU1 MSR bit is set ProcB.T1 schedules out ProcA.T2 schedules in MSR bit is cleared ProcA.T1 seccomp_update() set TIF bit on ProcA.T2 ProcB.T1 schedules in MSR is not updated <-- FAIL
This happens because the context switch code tries to avoid the MSR update if the speculation control TIF bits of the incoming and the outgoing task are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks scheduling back and forth on CPU1, which keeps the MSR stale forever.
In theory this could be remedied by IPIs, but chasing the remote task which could be migrated is complex and full of races.
The straight forward solution is to avoid the asychronous update of the TIF bit and defer it to the next context switch. The speculation control state is stored in task_struct::atomic_flags by the prctl and seccomp updates already.
Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the atomic_flags. Check the bit on context switch and force a synchronous update of the speculation control if set. Use the same mechanism for updating the current task.
Reported-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linut... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/spec-ctrl.h | 6 +----- arch/x86/include/asm/thread_info.h | 4 +++- arch/x86/kernel/cpu/bugs.c | 18 +++++++----------- arch/x86/kernel/process.c | 30 +++++++++++++++++++++++++++++- 4 files changed, 40 insertions(+), 18 deletions(-)
--- a/arch/x86/include/asm/spec-ctrl.h +++ b/arch/x86/include/asm/spec-ctrl.h @@ -83,10 +83,6 @@ static inline void speculative_store_byp #endif
extern void speculation_ctrl_update(unsigned long tif); - -static inline void speculation_ctrl_update_current(void) -{ - speculation_ctrl_update(current_thread_info()->flags); -} +extern void speculation_ctrl_update_current(void);
#endif --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -86,6 +86,7 @@ struct thread_info { #define TIF_SYSCALL_AUDIT 7 /* syscall auditing active */ #define TIF_SECCOMP 8 /* secure computing */ #define TIF_SPEC_IB 9 /* Indirect branch speculation mitigation */ +#define TIF_SPEC_FORCE_UPDATE 10 /* Force speculation MSR update in context switch */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ #define TIF_UPROBE 12 /* breakpointed or singlestepping */ #define TIF_PATCH_PENDING 13 /* pending live patching update */ @@ -114,6 +115,7 @@ struct thread_info { #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) #define _TIF_SECCOMP (1 << TIF_SECCOMP) #define _TIF_SPEC_IB (1 << TIF_SPEC_IB) +#define _TIF_SPEC_FORCE_UPDATE (1 << TIF_SPEC_FORCE_UPDATE) #define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY) #define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_PATCH_PENDING (1 << TIF_PATCH_PENDING) @@ -151,7 +153,7 @@ struct thread_info { /* flags to check in __switch_to() */ #define _TIF_WORK_CTXSW_BASE \ (_TIF_IO_BITMAP|_TIF_NOCPUID|_TIF_NOTSC|_TIF_BLOCKSTEP| \ - _TIF_SSBD) + _TIF_SSBD | _TIF_SPEC_FORCE_UPDATE)
/* * Avoid calls to __switch_to_xtra() on UP as STIBP is not evaluated. --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -699,14 +699,10 @@ static void ssb_select_mitigation(void) #undef pr_fmt #define pr_fmt(fmt) "Speculation prctl: " fmt
-static void task_update_spec_tif(struct task_struct *tsk, int tifbit, bool on) +static void task_update_spec_tif(struct task_struct *tsk) { - bool update; - - if (on) - update = !test_and_set_tsk_thread_flag(tsk, tifbit); - else - update = test_and_clear_tsk_thread_flag(tsk, tifbit); + /* Force the update of the real TIF bits */ + set_tsk_thread_flag(tsk, TIF_SPEC_FORCE_UPDATE);
/* * Immediately update the speculation control MSRs for the current @@ -716,7 +712,7 @@ static void task_update_spec_tif(struct * This can only happen for SECCOMP mitigation. For PRCTL it's * always the current task. */ - if (tsk == current && update) + if (tsk == current) speculation_ctrl_update_current(); }
@@ -732,16 +728,16 @@ static int ssb_prctl_set(struct task_str if (task_spec_ssb_force_disable(task)) return -EPERM; task_clear_spec_ssb_disable(task); - task_update_spec_tif(task, TIF_SSBD, false); + task_update_spec_tif(task); break; case PR_SPEC_DISABLE: task_set_spec_ssb_disable(task); - task_update_spec_tif(task, TIF_SSBD, true); + task_update_spec_tif(task); break; case PR_SPEC_FORCE_DISABLE: task_set_spec_ssb_disable(task); task_set_spec_ssb_force_disable(task); - task_update_spec_tif(task, TIF_SSBD, true); + task_update_spec_tif(task); break; default: return -ERANGE; --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -446,6 +446,18 @@ static __always_inline void __speculatio wrmsrl(MSR_IA32_SPEC_CTRL, msr); }
+static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk) +{ + if (test_and_clear_tsk_thread_flag(tsk, TIF_SPEC_FORCE_UPDATE)) { + if (task_spec_ssb_disable(tsk)) + set_tsk_thread_flag(tsk, TIF_SSBD); + else + clear_tsk_thread_flag(tsk, TIF_SSBD); + } + /* Return the updated threadinfo flags*/ + return task_thread_info(tsk)->flags; +} + void speculation_ctrl_update(unsigned long tif) { /* Forced update. Make sure all relevant TIF flags are different */ @@ -454,6 +466,14 @@ void speculation_ctrl_update(unsigned lo preempt_enable(); }
+/* Called from seccomp/prctl update */ +void speculation_ctrl_update_current(void) +{ + preempt_disable(); + speculation_ctrl_update(speculation_ctrl_update_tif(current)); + preempt_enable(); +} + void __switch_to_xtra(struct task_struct *prev_p, struct task_struct *next_p) { struct thread_struct *prev, *next; @@ -485,7 +505,15 @@ void __switch_to_xtra(struct task_struct if ((tifp ^ tifn) & _TIF_NOCPUID) set_cpuid_faulting(!!(tifn & _TIF_NOCPUID));
- __speculation_ctrl_update(tifp, tifn); + if (likely(!((tifp | tifn) & _TIF_SPEC_FORCE_UPDATE))) { + __speculation_ctrl_update(tifp, tifn); + } else { + speculation_ctrl_update_tif(prev_p); + tifn = speculation_ctrl_update_tif(next_p); + + /* Enforce MSR update to ensure consistent state */ + __speculation_ctrl_update(~tifn, tifn); + } }
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 6893a959d7fdebbab5f5aa112c277d5a44435ba1 upstream
The upcoming fine grained per task STIBP control needs to be updated on CPU hotplug as well.
Split out the code which controls the strict mode so the prctl control code can be added later. Mark the SMP function call argument __unused while at it.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.759457117@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 46 ++++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 21 deletions(-)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -527,40 +527,44 @@ specv2_set_mode: arch_smt_update(); }
-static bool stibp_needed(void) +static void update_stibp_msr(void * __unused) { - /* Enhanced IBRS makes using STIBP unnecessary. */ - if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) - return false; - - /* Check for strict user mitigation mode */ - return spectre_v2_user == SPECTRE_V2_USER_STRICT; + wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); }
-static void update_stibp_msr(void *info) +/* Update x86_spec_ctrl_base in case SMT state changed. */ +static void update_stibp_strict(void) { - wrmsrl(MSR_IA32_SPEC_CTRL, x86_spec_ctrl_base); + u64 mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; + + if (sched_smt_active()) + mask |= SPEC_CTRL_STIBP; + + if (mask == x86_spec_ctrl_base) + return; + + pr_info("Update user space SMT mitigation: STIBP %s\n", + mask & SPEC_CTRL_STIBP ? "always-on" : "off"); + x86_spec_ctrl_base = mask; + on_each_cpu(update_stibp_msr, NULL, 1); }
void arch_smt_update(void) { - u64 mask; - - if (!stibp_needed()) + /* Enhanced IBRS implies STIBP. No update required. */ + if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return;
mutex_lock(&spec_ctrl_mutex);
- mask = x86_spec_ctrl_base & ~SPEC_CTRL_STIBP; - if (sched_smt_active()) - mask |= SPEC_CTRL_STIBP; - - if (mask != x86_spec_ctrl_base) { - pr_info("Spectre v2 cross-process SMT mitigation: %s STIBP\n", - mask & SPEC_CTRL_STIBP ? "Enabling" : "Disabling"); - x86_spec_ctrl_base = mask; - on_each_cpu(update_stibp_msr, NULL, 1); + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + break; + case SPECTRE_V2_USER_STRICT: + update_stibp_strict(); + break; } + mutex_unlock(&spec_ctrl_mutex); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 9137bb27e60e554dab694eafa4cca241fa3a694f upstream
Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of indirect branch speculation via STIBP and IBPB.
Invocations: Check indirect branch speculation status with - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
Enable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
Disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
Force disable indirect branch speculation with - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
See Documentation/userspace-api/spec_ctrl.rst.
Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/userspace-api/spec_ctrl.rst | 9 ++++ arch/x86/include/asm/nospec-branch.h | 1 arch/x86/kernel/cpu/bugs.c | 67 ++++++++++++++++++++++++++++++ arch/x86/kernel/process.c | 5 ++ include/linux/sched.h | 9 ++++ include/uapi/linux/prctl.h | 1 6 files changed, 92 insertions(+)
--- a/Documentation/userspace-api/spec_ctrl.rst +++ b/Documentation/userspace-api/spec_ctrl.rst @@ -92,3 +92,12 @@ Speculation misfeature controls * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_ENABLE, 0, 0); * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_DISABLE, 0, 0); * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_STORE_BYPASS, PR_SPEC_FORCE_DISABLE, 0, 0); + +- PR_SPEC_INDIR_BRANCH: Indirect Branch Speculation in User Processes + (Mitigate Spectre V2 style attacks against user processes) + + Invocations: + * prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0); + * prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0); --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -232,6 +232,7 @@ enum spectre_v2_mitigation { enum spectre_v2_user_mitigation { SPECTRE_V2_USER_NONE, SPECTRE_V2_USER_STRICT, + SPECTRE_V2_USER_PRCTL, };
/* The Speculative Store Bypass disable variants */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -563,6 +563,8 @@ void arch_smt_update(void) case SPECTRE_V2_USER_STRICT: update_stibp_strict(); break; + case SPECTRE_V2_USER_PRCTL: + break; }
mutex_unlock(&spec_ctrl_mutex); @@ -749,12 +751,50 @@ static int ssb_prctl_set(struct task_str return 0; }
+static int ib_prctl_set(struct task_struct *task, unsigned long ctrl) +{ + switch (ctrl) { + case PR_SPEC_ENABLE: + if (spectre_v2_user == SPECTRE_V2_USER_NONE) + return 0; + /* + * Indirect branch speculation is always disabled in strict + * mode. + */ + if (spectre_v2_user == SPECTRE_V2_USER_STRICT) + return -EPERM; + task_clear_spec_ib_disable(task); + task_update_spec_tif(task); + break; + case PR_SPEC_DISABLE: + case PR_SPEC_FORCE_DISABLE: + /* + * Indirect branch speculation is always allowed when + * mitigation is force disabled. + */ + if (spectre_v2_user == SPECTRE_V2_USER_NONE) + return -EPERM; + if (spectre_v2_user == SPECTRE_V2_USER_STRICT) + return 0; + task_set_spec_ib_disable(task); + if (ctrl == PR_SPEC_FORCE_DISABLE) + task_set_spec_ib_force_disable(task); + task_update_spec_tif(task); + break; + default: + return -ERANGE; + } + return 0; +} + int arch_prctl_spec_ctrl_set(struct task_struct *task, unsigned long which, unsigned long ctrl) { switch (which) { case PR_SPEC_STORE_BYPASS: return ssb_prctl_set(task, ctrl); + case PR_SPEC_INDIRECT_BRANCH: + return ib_prctl_set(task, ctrl); default: return -ENODEV; } @@ -787,11 +827,34 @@ static int ssb_prctl_get(struct task_str } }
+static int ib_prctl_get(struct task_struct *task) +{ + if (!boot_cpu_has_bug(X86_BUG_SPECTRE_V2)) + return PR_SPEC_NOT_AFFECTED; + + switch (spectre_v2_user) { + case SPECTRE_V2_USER_NONE: + return PR_SPEC_ENABLE; + case SPECTRE_V2_USER_PRCTL: + if (task_spec_ib_force_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; + if (task_spec_ib_disable(task)) + return PR_SPEC_PRCTL | PR_SPEC_DISABLE; + return PR_SPEC_PRCTL | PR_SPEC_ENABLE; + case SPECTRE_V2_USER_STRICT: + return PR_SPEC_DISABLE; + default: + return PR_SPEC_NOT_AFFECTED; + } +} + int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which) { switch (which) { case PR_SPEC_STORE_BYPASS: return ssb_prctl_get(task); + case PR_SPEC_INDIRECT_BRANCH: + return ib_prctl_get(task); default: return -ENODEV; } @@ -971,6 +1034,8 @@ static char *stibp_state(void) return ", STIBP: disabled"; case SPECTRE_V2_USER_STRICT: return ", STIBP: forced"; + case SPECTRE_V2_USER_PRCTL: + return ""; } return ""; } @@ -983,6 +1048,8 @@ static char *ibpb_state(void) return ", IBPB: disabled"; case SPECTRE_V2_USER_STRICT: return ", IBPB: always-on"; + case SPECTRE_V2_USER_PRCTL: + return ""; } } return ""; --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -453,6 +453,11 @@ static unsigned long speculation_ctrl_up set_tsk_thread_flag(tsk, TIF_SSBD); else clear_tsk_thread_flag(tsk, TIF_SSBD); + + if (task_spec_ib_disable(tsk)) + set_tsk_thread_flag(tsk, TIF_SPEC_IB); + else + clear_tsk_thread_flag(tsk, TIF_SPEC_IB); } /* Return the updated threadinfo flags*/ return task_thread_info(tsk)->flags; --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1405,6 +1405,8 @@ static inline bool is_percpu_thread(void #define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */ #define PFA_SPEC_SSB_DISABLE 3 /* Speculative Store Bypass disabled */ #define PFA_SPEC_SSB_FORCE_DISABLE 4 /* Speculative Store Bypass force disabled*/ +#define PFA_SPEC_IB_DISABLE 5 /* Indirect branch speculation restricted */ +#define PFA_SPEC_IB_FORCE_DISABLE 6 /* Indirect branch speculation permanently restricted */
#define TASK_PFA_TEST(name, func) \ static inline bool task_##func(struct task_struct *p) \ @@ -1436,6 +1438,13 @@ TASK_PFA_CLEAR(SPEC_SSB_DISABLE, spec_ss TASK_PFA_TEST(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable) TASK_PFA_SET(SPEC_SSB_FORCE_DISABLE, spec_ssb_force_disable)
+TASK_PFA_TEST(SPEC_IB_DISABLE, spec_ib_disable) +TASK_PFA_SET(SPEC_IB_DISABLE, spec_ib_disable) +TASK_PFA_CLEAR(SPEC_IB_DISABLE, spec_ib_disable) + +TASK_PFA_TEST(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) +TASK_PFA_SET(SPEC_IB_FORCE_DISABLE, spec_ib_force_disable) + static inline void current_restore_flags(unsigned long orig_flags, unsigned long flags) { --- a/include/uapi/linux/prctl.h +++ b/include/uapi/linux/prctl.h @@ -203,6 +203,7 @@ struct prctl_mm_map { #define PR_SET_SPECULATION_CTRL 53 /* Speculation control variants */ # define PR_SPEC_STORE_BYPASS 0 +# define PR_SPEC_INDIRECT_BRANCH 1 /* Return and control values for PR_SET/GET_SPECULATION_CTRL */ # define PR_SPEC_NOT_AFFECTED 0 # define PR_SPEC_PRCTL (1UL << 0)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream
Now that all prerequisites are in place:
- Add the prctl command line option
- Default the 'auto' mode to 'prctl'
- When SMT state changes, update the static key which controls the conditional STIBP evaluation on context switch.
- At init update the static key which controls the conditional IBPB evaluation on context switch.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 7 +++- arch/x86/kernel/cpu/bugs.c | 41 ++++++++++++++++++------ 2 files changed, 38 insertions(+), 10 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4036,9 +4036,14 @@ off - Unconditionally disable mitigations. Is enforced by spectre_v2=off
+ prctl - Indirect branch speculation is enabled, + but mitigation can be enabled via prctl + per thread. The mitigation control state + is inherited on fork. + auto - Kernel selects the mitigation depending on the available CPU features and vulnerability. - Default is off. + Default is prctl.
Not specifying this option is equivalent to spectre_v2_user=auto. --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -254,11 +254,13 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_NONE, SPECTRE_V2_USER_CMD_AUTO, SPECTRE_V2_USER_CMD_FORCE, + SPECTRE_V2_USER_CMD_PRCTL, };
static const char * const spectre_v2_user_strings[] = { [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", [SPECTRE_V2_USER_STRICT] = "User space: Mitigation: STIBP protection", + [SPECTRE_V2_USER_PRCTL] = "User space: Mitigation: STIBP via prctl", };
static const struct { @@ -269,6 +271,7 @@ static const struct { { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, { "off", SPECTRE_V2_USER_CMD_NONE, false }, { "on", SPECTRE_V2_USER_CMD_FORCE, true }, + { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, };
static void __init spec_v2_user_print_cond(const char *reason, bool secure) @@ -323,12 +326,15 @@ spectre_v2_user_select_mitigation(enum s smt_possible = false;
switch (spectre_v2_parse_user_cmdline(v2_cmd)) { - case SPECTRE_V2_USER_CMD_AUTO: case SPECTRE_V2_USER_CMD_NONE: goto set_mode; case SPECTRE_V2_USER_CMD_FORCE: mode = SPECTRE_V2_USER_STRICT; break; + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_PRCTL: + mode = SPECTRE_V2_USER_PRCTL; + break; }
/* Initialize Indirect Branch Prediction Barrier */ @@ -339,6 +345,9 @@ spectre_v2_user_select_mitigation(enum s case SPECTRE_V2_USER_STRICT: static_branch_enable(&switch_mm_always_ibpb); break; + case SPECTRE_V2_USER_PRCTL: + static_branch_enable(&switch_mm_cond_ibpb); + break; default: break; } @@ -351,6 +360,12 @@ spectre_v2_user_select_mitigation(enum s if (spectre_v2_enabled == SPECTRE_V2_IBRS_ENHANCED) return;
+ /* + * If SMT is not possible or STIBP is not available clear the STIPB + * mode. + */ + if (!smt_possible || !boot_cpu_has(X86_FEATURE_STIBP)) + mode = SPECTRE_V2_USER_NONE; set_mode: spectre_v2_user = mode; /* Only print the STIBP mode when SMT possible */ @@ -549,6 +564,15 @@ static void update_stibp_strict(void) on_each_cpu(update_stibp_msr, NULL, 1); }
+/* Update the static key controlling the evaluation of TIF_SPEC_IB */ +static void update_indir_branch_cond(void) +{ + if (sched_smt_active()) + static_branch_enable(&switch_to_cond_stibp); + else + static_branch_disable(&switch_to_cond_stibp); +} + void arch_smt_update(void) { /* Enhanced IBRS implies STIBP. No update required. */ @@ -564,6 +588,7 @@ void arch_smt_update(void) update_stibp_strict(); break; case SPECTRE_V2_USER_PRCTL: + update_indir_branch_cond(); break; }
@@ -1035,7 +1060,8 @@ static char *stibp_state(void) case SPECTRE_V2_USER_STRICT: return ", STIBP: forced"; case SPECTRE_V2_USER_PRCTL: - return ""; + if (static_key_enabled(&switch_to_cond_stibp)) + return ", STIBP: conditional"; } return ""; } @@ -1043,14 +1069,11 @@ static char *stibp_state(void) static char *ibpb_state(void) { if (boot_cpu_has(X86_FEATURE_IBPB)) { - switch (spectre_v2_user) { - case SPECTRE_V2_USER_NONE: - return ", IBPB: disabled"; - case SPECTRE_V2_USER_STRICT: + if (static_key_enabled(&switch_mm_always_ibpb)) return ", IBPB: always-on"; - case SPECTRE_V2_USER_PRCTL: - return ""; - } + if (static_key_enabled(&switch_mm_cond_ibpb)) + return ", IBPB: conditional"; + return ", IBPB: disabled"; } return ""; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream
If 'prctl' mode of user space protection from spectre v2 is selected on the kernel command-line, STIBP and IBPB are applied on tasks which restrict their indirect branch speculation via prctl.
SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it makes sense to prevent spectre v2 user space to user space attacks as well.
The Intel mitigation guide documents how STIPB works:
Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor prevents the predicted targets of indirect branches on any logical processor of that core from being controlled by software that executes (or executed previously) on another logical processor of the same core.
Ergo setting STIBP protects the task itself from being attacked from a task running on a different hyper-thread and protects the tasks running on different hyper-threads from being attacked.
While the document suggests that the branch predictors are shielded between the logical processors, the observed performance regressions suggest that STIBP simply disables the branch predictor more or less completely. Of course the document wording is vague, but the fact that there is also no requirement for issuing IBPB when STIBP is used points clearly in that direction. The kernel still issues IBPB even when STIBP is used until Intel clarifies the whole mechanism.
IBPB is issued when the task switches out, so malicious sandbox code cannot mistrain the branch predictor for the next user space task on the same logical processor.
Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 9 ++++++++- arch/x86/include/asm/nospec-branch.h | 1 + arch/x86/kernel/cpu/bugs.c | 17 ++++++++++++++++- 3 files changed, 25 insertions(+), 2 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4041,9 +4041,16 @@ per thread. The mitigation control state is inherited on fork.
+ seccomp + - Same as "prctl" above, but all seccomp + threads will enable the mitigation unless + they explicitly opt out. + auto - Kernel selects the mitigation depending on the available CPU features and vulnerability. - Default is prctl. + + Default mitigation: + If CONFIG_SECCOMP=y then "seccomp", otherwise "prctl"
Not specifying this option is equivalent to spectre_v2_user=auto. --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -233,6 +233,7 @@ enum spectre_v2_user_mitigation { SPECTRE_V2_USER_NONE, SPECTRE_V2_USER_STRICT, SPECTRE_V2_USER_PRCTL, + SPECTRE_V2_USER_SECCOMP, };
/* The Speculative Store Bypass disable variants */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -255,12 +255,14 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_AUTO, SPECTRE_V2_USER_CMD_FORCE, SPECTRE_V2_USER_CMD_PRCTL, + SPECTRE_V2_USER_CMD_SECCOMP, };
static const char * const spectre_v2_user_strings[] = { [SPECTRE_V2_USER_NONE] = "User space: Vulnerable", [SPECTRE_V2_USER_STRICT] = "User space: Mitigation: STIBP protection", [SPECTRE_V2_USER_PRCTL] = "User space: Mitigation: STIBP via prctl", + [SPECTRE_V2_USER_SECCOMP] = "User space: Mitigation: STIBP via seccomp and prctl", };
static const struct { @@ -272,6 +274,7 @@ static const struct { { "off", SPECTRE_V2_USER_CMD_NONE, false }, { "on", SPECTRE_V2_USER_CMD_FORCE, true }, { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, + { "seccomp", SPECTRE_V2_USER_CMD_SECCOMP, false }, };
static void __init spec_v2_user_print_cond(const char *reason, bool secure) @@ -331,10 +334,16 @@ spectre_v2_user_select_mitigation(enum s case SPECTRE_V2_USER_CMD_FORCE: mode = SPECTRE_V2_USER_STRICT; break; - case SPECTRE_V2_USER_CMD_AUTO: case SPECTRE_V2_USER_CMD_PRCTL: mode = SPECTRE_V2_USER_PRCTL; break; + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_SECCOMP: + if (IS_ENABLED(CONFIG_SECCOMP)) + mode = SPECTRE_V2_USER_SECCOMP; + else + mode = SPECTRE_V2_USER_PRCTL; + break; }
/* Initialize Indirect Branch Prediction Barrier */ @@ -346,6 +355,7 @@ spectre_v2_user_select_mitigation(enum s static_branch_enable(&switch_mm_always_ibpb); break; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: static_branch_enable(&switch_mm_cond_ibpb); break; default: @@ -588,6 +598,7 @@ void arch_smt_update(void) update_stibp_strict(); break; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: update_indir_branch_cond(); break; } @@ -830,6 +841,8 @@ void arch_seccomp_spec_mitigate(struct t { if (ssb_mode == SPEC_STORE_BYPASS_SECCOMP) ssb_prctl_set(task, PR_SPEC_FORCE_DISABLE); + if (spectre_v2_user == SPECTRE_V2_USER_SECCOMP) + ib_prctl_set(task, PR_SPEC_FORCE_DISABLE); } #endif
@@ -861,6 +874,7 @@ static int ib_prctl_get(struct task_stru case SPECTRE_V2_USER_NONE: return PR_SPEC_ENABLE; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: if (task_spec_ib_force_disable(task)) return PR_SPEC_PRCTL | PR_SPEC_FORCE_DISABLE; if (task_spec_ib_disable(task)) @@ -1060,6 +1074,7 @@ static char *stibp_state(void) case SPECTRE_V2_USER_STRICT: return ", STIBP: forced"; case SPECTRE_V2_USER_PRCTL: + case SPECTRE_V2_USER_SECCOMP: if (static_key_enabled(&switch_to_cond_stibp)) return ", STIBP: conditional"; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
commit 55a974021ec952ee460dc31ca08722158639de72 upstream
Provide the possibility to enable IBPB always in combination with 'prctl' and 'seccomp'.
Add the extra command line options and rework the IBPB selection to evaluate the command instead of the mode selected by the STIPB switch case.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ingo Molnar mingo@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Andy Lutomirski luto@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Jiri Kosina jkosina@suse.cz Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Andrea Arcangeli aarcange@redhat.com Cc: David Woodhouse dwmw@amazon.co.uk Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Dave Hansen dave.hansen@intel.com Cc: Casey Schaufler casey.schaufler@intel.com Cc: Asit Mallick asit.k.mallick@intel.com Cc: Arjan van de Ven arjan@linux.intel.com Cc: Jon Masters jcm@redhat.com Cc: Waiman Long longman9394@gmail.com Cc: Greg KH gregkh@linuxfoundation.org Cc: Dave Stewart david.c.stewart@intel.com Cc: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20181125185006.144047038@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/kernel-parameters.txt | 12 ++++++++ arch/x86/kernel/cpu/bugs.c | 34 ++++++++++++++++-------- 2 files changed, 35 insertions(+), 11 deletions(-)
--- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -4041,11 +4041,23 @@ per thread. The mitigation control state is inherited on fork.
+ prctl,ibpb + - Like "prctl" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different user + space processes. + seccomp - Same as "prctl" above, but all seccomp threads will enable the mitigation unless they explicitly opt out.
+ seccomp,ibpb + - Like "seccomp" above, but only STIBP is + controlled per thread. IBPB is issued + always when switching between different + user space processes. + auto - Kernel selects the mitigation depending on the available CPU features and vulnerability.
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -255,7 +255,9 @@ enum spectre_v2_user_cmd { SPECTRE_V2_USER_CMD_AUTO, SPECTRE_V2_USER_CMD_FORCE, SPECTRE_V2_USER_CMD_PRCTL, + SPECTRE_V2_USER_CMD_PRCTL_IBPB, SPECTRE_V2_USER_CMD_SECCOMP, + SPECTRE_V2_USER_CMD_SECCOMP_IBPB, };
static const char * const spectre_v2_user_strings[] = { @@ -270,11 +272,13 @@ static const struct { enum spectre_v2_user_cmd cmd; bool secure; } v2_user_options[] __initdata = { - { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, - { "off", SPECTRE_V2_USER_CMD_NONE, false }, - { "on", SPECTRE_V2_USER_CMD_FORCE, true }, - { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, - { "seccomp", SPECTRE_V2_USER_CMD_SECCOMP, false }, + { "auto", SPECTRE_V2_USER_CMD_AUTO, false }, + { "off", SPECTRE_V2_USER_CMD_NONE, false }, + { "on", SPECTRE_V2_USER_CMD_FORCE, true }, + { "prctl", SPECTRE_V2_USER_CMD_PRCTL, false }, + { "prctl,ibpb", SPECTRE_V2_USER_CMD_PRCTL_IBPB, false }, + { "seccomp", SPECTRE_V2_USER_CMD_SECCOMP, false }, + { "seccomp,ibpb", SPECTRE_V2_USER_CMD_SECCOMP_IBPB, false }, };
static void __init spec_v2_user_print_cond(const char *reason, bool secure) @@ -320,6 +324,7 @@ spectre_v2_user_select_mitigation(enum s { enum spectre_v2_user_mitigation mode = SPECTRE_V2_USER_NONE; bool smt_possible = IS_ENABLED(CONFIG_SMP); + enum spectre_v2_user_cmd cmd;
if (!boot_cpu_has(X86_FEATURE_IBPB) && !boot_cpu_has(X86_FEATURE_STIBP)) return; @@ -328,17 +333,20 @@ spectre_v2_user_select_mitigation(enum s cpu_smt_control == CPU_SMT_NOT_SUPPORTED) smt_possible = false;
- switch (spectre_v2_parse_user_cmdline(v2_cmd)) { + cmd = spectre_v2_parse_user_cmdline(v2_cmd); + switch (cmd) { case SPECTRE_V2_USER_CMD_NONE: goto set_mode; case SPECTRE_V2_USER_CMD_FORCE: mode = SPECTRE_V2_USER_STRICT; break; case SPECTRE_V2_USER_CMD_PRCTL: + case SPECTRE_V2_USER_CMD_PRCTL_IBPB: mode = SPECTRE_V2_USER_PRCTL; break; case SPECTRE_V2_USER_CMD_AUTO: case SPECTRE_V2_USER_CMD_SECCOMP: + case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: if (IS_ENABLED(CONFIG_SECCOMP)) mode = SPECTRE_V2_USER_SECCOMP; else @@ -350,12 +358,15 @@ spectre_v2_user_select_mitigation(enum s if (boot_cpu_has(X86_FEATURE_IBPB)) { setup_force_cpu_cap(X86_FEATURE_USE_IBPB);
- switch (mode) { - case SPECTRE_V2_USER_STRICT: + switch (cmd) { + case SPECTRE_V2_USER_CMD_FORCE: + case SPECTRE_V2_USER_CMD_PRCTL_IBPB: + case SPECTRE_V2_USER_CMD_SECCOMP_IBPB: static_branch_enable(&switch_mm_always_ibpb); break; - case SPECTRE_V2_USER_PRCTL: - case SPECTRE_V2_USER_SECCOMP: + case SPECTRE_V2_USER_CMD_PRCTL: + case SPECTRE_V2_USER_CMD_AUTO: + case SPECTRE_V2_USER_CMD_SECCOMP: static_branch_enable(&switch_mm_cond_ibpb); break; default: @@ -363,7 +374,8 @@ spectre_v2_user_select_mitigation(enum s }
pr_info("mitigation: Enabling %s Indirect Branch Prediction Barrier\n", - mode == SPECTRE_V2_USER_STRICT ? "always-on" : "conditional"); + static_key_enabled(&switch_mm_always_ibpb) ? + "always-on" : "conditional"); }
/* If enhanced IBRS is enabled no STIPB required */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Junaid Shahid junaids@google.com
commit 0e0fee5c539b61fdd098332e0e2cc375d9073706 upstream.
When a guest page table is updated via an emulated write, kvm_mmu_pte_write() is called to update the shadow PTE using the just written guest PTE value. But if two emulated guest PTE writes happened concurrently, it is possible that the guest PTE and the shadow PTE end up being out of sync. Emulated writes do not mark the shadow page as unsync-ed, so this inconsistency will not be resolved even by a guest TLB flush (unless the page was marked as unsync-ed at some other point).
This is fixed by re-reading the current value of the guest PTE after the MMU lock has been acquired instead of just using the value that was written prior to calling kvm_mmu_pte_write().
Signed-off-by: Junaid Shahid junaids@google.com Reviewed-by: Wanpeng Li wanpengli@tencent.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/mmu.c | 27 +++++++++------------------ 1 file changed, 9 insertions(+), 18 deletions(-)
--- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -4734,9 +4734,9 @@ static bool need_remote_flush(u64 old, u }
static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa, - const u8 *new, int *bytes) + int *bytes) { - u64 gentry; + u64 gentry = 0; int r;
/* @@ -4748,22 +4748,12 @@ static u64 mmu_pte_write_fetch_gpte(stru /* Handle a 32-bit guest writing two halves of a 64-bit gpte */ *gpa &= ~(gpa_t)7; *bytes = 8; - r = kvm_vcpu_read_guest(vcpu, *gpa, &gentry, 8); - if (r) - gentry = 0; - new = (const u8 *)&gentry; }
- switch (*bytes) { - case 4: - gentry = *(const u32 *)new; - break; - case 8: - gentry = *(const u64 *)new; - break; - default: - gentry = 0; - break; + if (*bytes == 4 || *bytes == 8) { + r = kvm_vcpu_read_guest_atomic(vcpu, *gpa, &gentry, *bytes); + if (r) + gentry = 0; }
return gentry; @@ -4876,8 +4866,6 @@ static void kvm_mmu_pte_write(struct kvm
pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes);
- gentry = mmu_pte_write_fetch_gpte(vcpu, &gpa, new, &bytes); - /* * No need to care whether allocation memory is successful * or not since pte prefetch is skiped if it does not have @@ -4886,6 +4874,9 @@ static void kvm_mmu_pte_write(struct kvm mmu_topup_memory_caches(vcpu);
spin_lock(&vcpu->kvm->mmu_lock); + + gentry = mmu_pte_write_fetch_gpte(vcpu, &gpa, &bytes); + ++vcpu->kvm->stat.mmu_pte_write; kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jim Mattson jmattson@google.com
commit fd65d3142f734bc4376053c8d75670041903134d upstream.
Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question.
Fixes: 15d45071523d ("KVM/x86: Add IBPB support") Reported-by: Neel Natu neelnatu@google.com Signed-off-by: Jim Mattson jmattson@google.com Reviewed-by: Konrad Rzeszutek Wilk konrad.wilk@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/svm.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
--- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1733,21 +1733,31 @@ out: return ERR_PTR(err); }
+static void svm_clear_current_vmcb(struct vmcb *vmcb) +{ + int i; + + for_each_online_cpu(i) + cmpxchg(&per_cpu(svm_data, i)->current_vmcb, vmcb, NULL); +} + static void svm_free_vcpu(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu);
+ /* + * The vmcb page can be recycled, causing a false negative in + * svm_vcpu_load(). So, ensure that no logical CPU has this + * vmcb page recorded as its current vmcb. + */ + svm_clear_current_vmcb(svm->vmcb); + __free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT)); __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER); __free_page(virt_to_page(svm->nested.hsave)); __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, svm); - /* - * The vmcb page can be recycled, causing a false negative in - * svm_vcpu_load(). So do a full IBPB now. - */ - indirect_branch_prediction_barrier(); }
static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Liran Alon liran.alon@oracle.com
commit bcbfbd8ec21096027f1ee13ce6c185e8175166f6 upstream.
kvm_pv_clock_pairing() allocates local var "struct kvm_clock_pairing clock_pairing" on stack and initializes all it's fields besides padding (clock_pairing.pad[]).
Because clock_pairing var is written completely (including padding) to guest memory, failure to init struct padding results in kernel info-leak.
Fix the issue by making sure to also init the padding with zeroes.
Fixes: 55dd00a73a51 ("KVM: x86: add KVM_HC_CLOCK_PAIRING hypercall") Reported-by: syzbot+a8ef68d71211ba264f56@syzkaller.appspotmail.com Reviewed-by: Mark Kanda mark.kanda@oracle.com Signed-off-by: Liran Alon liran.alon@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6378,6 +6378,7 @@ static int kvm_pv_clock_pairing(struct k clock_pairing.nsec = ts.tv_nsec; clock_pairing.tsc = kvm_read_l1_tsc(vcpu, cycle); clock_pairing.flags = 0; + memset(&clock_pairing.pad, 0, sizeof(clock_pairing.pad));
ret = 0; if (kvm_write_guest(vcpu->kvm, paddr, &clock_pairing,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wanpeng Li wanpengli@tencent.com
commit e97f852fd4561e77721bb9a4e0ea9d98305b1e93 upstream.
Reported by syzkaller:
BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8 PGD 80000003ec4da067 P4D 80000003ec4da067 PUD 3f7bfa067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 5059 Comm: debug Tainted: G OE 4.19.0-rc5 #16 RIP: 0010:__lock_acquire+0x1a6/0x1990 Call Trace: lock_acquire+0xdb/0x210 _raw_spin_lock+0x38/0x70 kvm_ioapic_scan_entry+0x3e/0x110 [kvm] vcpu_enter_guest+0x167e/0x1910 [kvm] kvm_arch_vcpu_ioctl_run+0x35c/0x610 [kvm] kvm_vcpu_ioctl+0x3e9/0x6d0 [kvm] do_vfs_ioctl+0xa5/0x690 ksys_ioctl+0x6d/0x80 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x83/0x6e0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT6 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This can be triggered by the following program:
#define _GNU_SOURCE
#include <endian.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/syscall.h> #include <sys/types.h> #include <unistd.h>
uint64_t r[3] = {0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff};
int main(void) { syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0); long res = 0; memcpy((void*)0x20000040, "/dev/kvm", 9); res = syscall(__NR_openat, 0xffffffffffffff9c, 0x20000040, 0, 0); if (res != -1) r[0] = res; res = syscall(__NR_ioctl, r[0], 0xae01, 0); if (res != -1) r[1] = res; res = syscall(__NR_ioctl, r[1], 0xae41, 0); if (res != -1) r[2] = res; memcpy( (void*)0x20000080, "\x01\x00\x00\x00\x00\x5b\x61\xbb\x96\x00\x00\x40\x00\x00\x00\x00\x01\x00" "\x08\x00\x00\x00\x00\x00\x0b\x77\xd1\x78\x4d\xd8\x3a\xed\xb1\x5c\x2e\x43" "\xaa\x43\x39\xd6\xff\xf5\xf0\xa8\x98\xf2\x3e\x37\x29\x89\xde\x88\xc6\x33" "\xfc\x2a\xdb\xb7\xe1\x4c\xac\x28\x61\x7b\x9c\xa9\xbc\x0d\xa0\x63\xfe\xfe" "\xe8\x75\xde\xdd\x19\x38\xdc\x34\xf5\xec\x05\xfd\xeb\x5d\xed\x2e\xaf\x22" "\xfa\xab\xb7\xe4\x42\x67\xd0\xaf\x06\x1c\x6a\x35\x67\x10\x55\xcb", 106); syscall(__NR_ioctl, r[2], 0x4008ae89, 0x20000080); syscall(__NR_ioctl, r[2], 0xae80, 0); return 0; }
This patch fixes it by bailing out scan ioapic if ioapic is not initialized in kernel.
Reported-by: Wei Wu ww9210@gmail.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: Wei Wu ww9210@gmail.com Signed-off-by: Wanpeng Li wanpengli@tencent.com Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6885,7 +6885,8 @@ static void vcpu_scan_ioapic(struct kvm_ else { if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active) kvm_x86_ops->sync_pir_to_irr(vcpu); - kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors); + if (ioapic_in_kernel(vcpu->kvm)) + kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors); } bitmap_or((ulong *)eoi_exit_bitmap, vcpu->arch.ioapic_handled_vectors, vcpu_to_synic(vcpu)->vec_bitmap, 256);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Filippov jcmvbkbc@gmail.com
commit 2958b66694e018c552be0b60521fec27e8d12988 upstream.
coprocessor_flush_all may be called from a context of a thread that is different from the thread being flushed. In that case contents of the cpenable special register may not match ti->cpenable of the target thread, resulting in unhandled coprocessor exception in the kernel context. Set cpenable special register to the ti->cpenable of the target register for the duration of the flush and restore it afterwards. This fixes the following crash caused by coprocessor register inspection in native gdb:
(gdb) p/x $w0 Illegal instruction in kernel: sig: 9 [#1] PREEMPT Call Trace: ___might_sleep+0x184/0x1a4 __might_sleep+0x41/0xac exit_signals+0x14/0x218 do_exit+0xc9/0x8b8 die+0x99/0xa0 do_illegal_instruction+0x18/0x6c common_exception+0x77/0x77 coprocessor_flush+0x16/0x3c arch_ptrace+0x46c/0x674 sys_ptrace+0x2ce/0x3b4 system_call+0x54/0x80 common_exception+0x77/0x77 note: gdb[100] exited with preempt_count 1 Killed
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/process.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/xtensa/kernel/process.c +++ b/arch/xtensa/kernel/process.c @@ -88,18 +88,21 @@ void coprocessor_release_all(struct thre
void coprocessor_flush_all(struct thread_info *ti) { - unsigned long cpenable; + unsigned long cpenable, old_cpenable; int i;
preempt_disable();
+ RSR_CPENABLE(old_cpenable); cpenable = ti->cpenable; + WSR_CPENABLE(cpenable);
for (i = 0; i < XCHAL_CP_MAX; i++) { if ((cpenable & 1) != 0 && coprocessor_owner[i] == ti) coprocessor_flush(ti, i); cpenable >>= 1; } + WSR_CPENABLE(old_cpenable);
preempt_enable(); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Filippov jcmvbkbc@gmail.com
commit 03bc996af0cc71c7f30c384d8ce7260172423b34 upstream.
Coprocessor context offsets are used by the assembly code that moves coprocessor context between the individual fields of the thread_info::xtregs_cp structure and coprocessor registers. This fixes coprocessor context clobbering on flushing and reloading during normal user code execution and user process debugging in the presence of more than one coprocessor in the core configuration.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/asm-offsets.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/xtensa/kernel/asm-offsets.c +++ b/arch/xtensa/kernel/asm-offsets.c @@ -91,14 +91,14 @@ int main(void) DEFINE(THREAD_SP, offsetof (struct task_struct, thread.sp)); DEFINE(THREAD_CPENABLE, offsetof (struct thread_info, cpenable)); #if XTENSA_HAVE_COPROCESSORS - DEFINE(THREAD_XTREGS_CP0, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP1, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP2, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP3, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP4, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP5, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP6, offsetof (struct thread_info, xtregs_cp)); - DEFINE(THREAD_XTREGS_CP7, offsetof (struct thread_info, xtregs_cp)); + DEFINE(THREAD_XTREGS_CP0, offsetof(struct thread_info, xtregs_cp.cp0)); + DEFINE(THREAD_XTREGS_CP1, offsetof(struct thread_info, xtregs_cp.cp1)); + DEFINE(THREAD_XTREGS_CP2, offsetof(struct thread_info, xtregs_cp.cp2)); + DEFINE(THREAD_XTREGS_CP3, offsetof(struct thread_info, xtregs_cp.cp3)); + DEFINE(THREAD_XTREGS_CP4, offsetof(struct thread_info, xtregs_cp.cp4)); + DEFINE(THREAD_XTREGS_CP5, offsetof(struct thread_info, xtregs_cp.cp5)); + DEFINE(THREAD_XTREGS_CP6, offsetof(struct thread_info, xtregs_cp.cp6)); + DEFINE(THREAD_XTREGS_CP7, offsetof(struct thread_info, xtregs_cp.cp7)); #endif DEFINE(THREAD_XTREGS_USER, offsetof (struct thread_info, xtregs_user)); DEFINE(XTREGS_USER_SIZE, sizeof(xtregs_user_t));
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Filippov jcmvbkbc@gmail.com
commit 38a35a78c5e270cbe53c4fef6b0d3c2da90dd849 upstream.
Layout of coprocessor registers in the elf_xtregs_t and xtregs_coprocessor_t may be different due to alignment. Thus it is not always possible to copy data between the xtregs_coprocessor_t structure and the elf_xtregs_t and get correct values for all registers. Use a table of offsets and sizes of individual coprocessor register groups to do coprocessor context copying in the ptrace_getxregs and ptrace_setxregs. This fixes incorrect coprocessor register values reading from the user process by the native gdb on an xtensa core with multiple coprocessors and registers with high alignment requirements.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/xtensa/kernel/ptrace.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-)
--- a/arch/xtensa/kernel/ptrace.c +++ b/arch/xtensa/kernel/ptrace.c @@ -127,12 +127,37 @@ static int ptrace_setregs(struct task_st }
+#if XTENSA_HAVE_COPROCESSORS +#define CP_OFFSETS(cp) \ + { \ + .elf_xtregs_offset = offsetof(elf_xtregs_t, cp), \ + .ti_offset = offsetof(struct thread_info, xtregs_cp.cp), \ + .sz = sizeof(xtregs_ ## cp ## _t), \ + } + +static const struct { + size_t elf_xtregs_offset; + size_t ti_offset; + size_t sz; +} cp_offsets[] = { + CP_OFFSETS(cp0), + CP_OFFSETS(cp1), + CP_OFFSETS(cp2), + CP_OFFSETS(cp3), + CP_OFFSETS(cp4), + CP_OFFSETS(cp5), + CP_OFFSETS(cp6), + CP_OFFSETS(cp7), +}; +#endif + static int ptrace_getxregs(struct task_struct *child, void __user *uregs) { struct pt_regs *regs = task_pt_regs(child); struct thread_info *ti = task_thread_info(child); elf_xtregs_t __user *xtregs = uregs; int ret = 0; + int i __maybe_unused;
if (!access_ok(VERIFY_WRITE, uregs, sizeof(elf_xtregs_t))) return -EIO; @@ -140,8 +165,13 @@ static int ptrace_getxregs(struct task_s #if XTENSA_HAVE_COPROCESSORS /* Flush all coprocessor registers to memory. */ coprocessor_flush_all(ti); - ret |= __copy_to_user(&xtregs->cp0, &ti->xtregs_cp, - sizeof(xtregs_coprocessor_t)); + + for (i = 0; i < ARRAY_SIZE(cp_offsets); ++i) + ret |= __copy_to_user((char __user *)xtregs + + cp_offsets[i].elf_xtregs_offset, + (const char *)ti + + cp_offsets[i].ti_offset, + cp_offsets[i].sz); #endif ret |= __copy_to_user(&xtregs->opt, ®s->xtregs_opt, sizeof(xtregs->opt)); @@ -157,6 +187,7 @@ static int ptrace_setxregs(struct task_s struct pt_regs *regs = task_pt_regs(child); elf_xtregs_t *xtregs = uregs; int ret = 0; + int i __maybe_unused;
if (!access_ok(VERIFY_READ, uregs, sizeof(elf_xtregs_t))) return -EFAULT; @@ -166,8 +197,11 @@ static int ptrace_setxregs(struct task_s coprocessor_flush_all(ti); coprocessor_release_all(ti);
- ret |= __copy_from_user(&ti->xtregs_cp, &xtregs->cp0, - sizeof(xtregs_coprocessor_t)); + for (i = 0; i < ARRAY_SIZE(cp_offsets); ++i) + ret |= __copy_from_user((char *)ti + cp_offsets[i].ti_offset, + (const char __user *)xtregs + + cp_offsets[i].elf_xtregs_offset, + cp_offsets[i].sz); #endif ret |= __copy_from_user(®s->xtregs_opt, &xtregs->opt, sizeof(xtregs->opt));
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit f505754fd6599230371cb01b9332754ddc104be1 upstream.
We were using the path name received from user space without checking that it is null terminated. While btrfs-progs is well behaved and does proper validation and null termination, someone could call the ioctl and pass a non-null terminated patch, leading to buffer overrun problems in the kernel. The ioctl is protected by CAP_SYS_ADMIN.
So just set the last byte of the path to a null character, similar to what we do in other ioctls (add/remove/resize device, snapshot creation, etc).
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/super.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2176,6 +2176,7 @@ static long btrfs_control_ioctl(struct f vol = memdup_user((void __user *)arg, sizeof(*vol)); if (IS_ERR(vol)) return PTR_ERR(vol); + vol->name[BTRFS_PATH_NAME_MAX] = '\0';
switch (cmd) { case BTRFS_IOC_SCAN_DEV:
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pan Bian bianpan2016@163.com
commit 42a657f57628402c73237547f0134e083e2f6764 upstream.
The function relocate_block_group calls btrfs_end_transaction to release trans when update_backref_cache returns 1, and then continues the loop body. If btrfs_block_rsv_refill fails this time, it will jump out the loop and the freed trans will be accessed. This may result in a use-after-free bug. The patch assigns NULL to trans after trans is released so that it will not be accessed.
Fixes: 0647bf564f1 ("Btrfs: improve forever loop when doing balance relocation") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Pan Bian bianpan2016@163.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/relocation.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4048,6 +4048,7 @@ static noinline_for_stack int relocate_b restart: if (update_backref_cache(trans, &rc->backref_cache)) { btrfs_end_transaction(trans); + trans = NULL; continue; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hou Zhiqiang Zhiqiang.Hou@nxp.com
commit c6fd6fe9dea44732cdcd970f1130b8cc50ad685a upstream.
The order of parameters is not correct when invoking the outbound window disable routine. Fix it.
Fixes: 4a2745d760fa ("PCI: layerscape: Disable outbound windows configured by bootloader") Signed-off-by: Hou Zhiqiang Zhiqiang.Hou@nxp.com [lorenzo.pieralisi@arm.com: commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/pci/dwc/pci-layerscape.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/pci/dwc/pci-layerscape.c +++ b/drivers/pci/dwc/pci-layerscape.c @@ -89,7 +89,7 @@ static void ls_pcie_disable_outbound_atu int i;
for (i = 0; i < PCIE_IATU_NUM; i++) - dw_pcie_disable_atu(pcie->pci, DW_PCIE_REGION_OUTBOUND, i); + dw_pcie_disable_atu(pcie->pci, i, DW_PCIE_REGION_OUTBOUND); }
static int ls1021_pcie_link_up(struct dw_pcie *pci)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christoph Muellner christoph.muellner@theobroma-systems.com
commit c1d91f86a1b4c9c05854d59c6a0abd5d0f75b849 upstream.
This patch fixes the wrong polarity setting for the PCIe host driver's pre-reset pin for rk3399-puma-haikou. Without this patch link training will most likely fail.
Fixes: 60fd9f72ce8a ("arm64: dts: rockchip: add Haikou baseboard with RK3399-Q7 SoM") Cc: stable@vger.kernel.org Signed-off-by: Christoph Muellner christoph.muellner@theobroma-systems.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/rockchip/rk3399-puma-haikou.dts | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/rockchip/rk3399-puma-haikou.dts +++ b/arch/arm64/boot/dts/rockchip/rk3399-puma-haikou.dts @@ -130,7 +130,7 @@ };
&pcie0 { - ep-gpios = <&gpio4 RK_PC6 GPIO_ACTIVE_LOW>; + ep-gpios = <&gpio4 RK_PC6 GPIO_ACTIVE_HIGH>; num-lanes = <4>; pinctrl-names = "default"; pinctrl-0 = <&pcie_clkreqn_cpm>;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov bp@suse.de
commit 60c8144afc287ef09ce8c1230c6aa972659ba1bb upstream.
Currently, the code sets up the thresholding interrupt vector and only then goes about initializing the thresholding banks. Which is wrong, because an early thresholding interrupt would cause a NULL pointer dereference when accessing those banks and prevent the machine from booting.
Therefore, set the thresholding interrupt vector only *after* having initialized the banks successfully.
Fixes: 18807ddb7f88 ("x86/mce/AMD: Reset Threshold Limit after logging error") Reported-by: Rafał Miłecki rafal@milecki.pl Reported-by: John Clemens clemej@gmail.com Signed-off-by: Borislav Petkov bp@suse.de Tested-by: Rafał Miłecki rafal@milecki.pl Tested-by: John Clemens john@deater.net Cc: Aravind Gopalakrishnan aravindksg.lkml@gmail.com Cc: linux-edac@vger.kernel.org Cc: stable@vger.kernel.org Cc: Tony Luck tony.luck@intel.com Cc: x86@kernel.org Cc: Yazen Ghannam Yazen.Ghannam@amd.com Link: https://lkml.kernel.org/r/20181127101700.2964-1-zajec5@gmail.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=201291 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/mcheck/mce_amd.c | 19 ++++++------------- 1 file changed, 6 insertions(+), 13 deletions(-)
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -56,7 +56,7 @@ /* Threshold LVT offset is at MSR0xC0000410[15:12] */ #define SMCA_THR_LVT_OFF 0xF000
-static bool thresholding_en; +static bool thresholding_irq_en;
static const char * const th_names[] = { "load_store", @@ -533,9 +533,8 @@ prepare_threshold_block(unsigned int ban
set_offset: offset = setup_APIC_mce_threshold(offset, new); - - if ((offset == new) && (mce_threshold_vector != amd_threshold_interrupt)) - mce_threshold_vector = amd_threshold_interrupt; + if (offset == new) + thresholding_irq_en = true;
done: mce_threshold_block_init(&b, offset); @@ -1356,9 +1355,6 @@ int mce_threshold_remove_device(unsigned { unsigned int bank;
- if (!thresholding_en) - return 0; - for (bank = 0; bank < mca_cfg.banks; ++bank) { if (!(per_cpu(bank_map, cpu) & (1 << bank))) continue; @@ -1376,9 +1372,6 @@ int mce_threshold_create_device(unsigned struct threshold_bank **bp; int err = 0;
- if (!thresholding_en) - return 0; - bp = per_cpu(threshold_banks, cpu); if (bp) return 0; @@ -1407,9 +1400,6 @@ static __init int threshold_init_device( { unsigned lcpu = 0;
- if (mce_threshold_vector == amd_threshold_interrupt) - thresholding_en = true; - /* to hit CPUs online before the notifier is up */ for_each_online_cpu(lcpu) { int err = mce_threshold_create_device(lcpu); @@ -1418,6 +1408,9 @@ static __init int threshold_init_device( return err; }
+ if (thresholding_irq_en) + mce_threshold_vector = amd_threshold_interrupt; + return 0; } /*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
commit 68239654acafe6aad5a3c1dc7237e60accfebc03 upstream.
The sequence
fpu->initialized = 1; /* step A */ preempt_disable(); /* step B */ fpu__restore(fpu); preempt_enable();
in __fpu__restore_sig() is racy in regard to a context switch.
For 32bit frames, __fpu__restore_sig() prepares the FPU state within fpu->state. To ensure that a context switch (switch_fpu_prepare() in particular) does not modify fpu->state it uses fpu__drop() which sets fpu->initialized to 0.
After fpu->initialized is cleared, the CPU's FPU state is not saved to fpu->state during a context switch. The new state is loaded via fpu__restore(). It gets loaded into fpu->state from userland and ensured it is sane. fpu->initialized is then set to 1 in order to avoid fpu__initialize() doing anything (overwrite the new state) which is part of fpu__restore().
A context switch between step A and B above would save CPU's current FPU registers to fpu->state and overwrite the newly prepared state. This looks like a tiny race window but the Kernel Test Robot reported this back in 2016 while we had lazy FPU support. Borislav Petkov made the link between that report and another patch that has been posted. Since the removal of the lazy FPU support, this race goes unnoticed because the warning has been removed.
Disable bottom halves around the restore sequence to avoid the race. BH need to be disabled because BH is allowed to run (even with preemption disabled) and might invoke kernel_fpu_begin() by doing IPsec.
[ bp: massage commit message a bit. ]
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Ingo Molnar mingo@kernel.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: kvm ML kvm@vger.kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Cc: x86-ml x86@kernel.org Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnic Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/fpu/signal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -344,10 +344,10 @@ static int __fpu__restore_sig(void __use sanitize_restored_xstate(tsk, &env, xfeatures, fx_only); }
+ local_bh_disable(); fpu->initialized = 1; - preempt_disable(); fpu__restore(fpu); - preempt_enable(); + local_bh_enable();
return err; } else {
Greg Kroah-Hartman wrote:
commit 68239654acafe6aad5a3c1dc7237e60accfebc03 upstream.
The sequence
fpu->initialized = 1; /* step A */ preempt_disable(); /* step B */ fpu__restore(fpu); preempt_enable();
in __fpu__restore_sig() is racy in regard to a context switch.
That same race appears to be present in older kernel branches also. The context is sligthly different, so the patch for 4.14 does not apply cleanly to older kernels. For 4.9 branch, this edit works:
s/fpu->initialized/fpu->fpstate_active/
--- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -342,10 +342,10 @@ static int __fpu__restore_sig(void __use sanitize_restored_xstate(tsk, &env, xfeatures, fx_only); }
+ local_bh_disable(); fpu->fpstate_active = 1; - preempt_disable(); fpu__restore(fpu); - preempt_enable(); + local_bh_enable();
return err; } else {
On Wed, Dec 05, 2018 at 06:26:24PM +0200, Jari Ruusu wrote:
That same race appears to be present in older kernel branches also. The context is sligthly different, so the patch for 4.14 does not apply cleanly to older kernels. For 4.9 branch, this edit works:
You could take the upstream one, amend it with your change, test it and send it to Greg - I believe he'll take the backport gladly.
:-)
On Wed, Dec 05, 2018 at 08:00:20PM +0100, Borislav Petkov wrote:
On Wed, Dec 05, 2018 at 06:26:24PM +0200, Jari Ruusu wrote:
That same race appears to be present in older kernel branches also. The context is sligthly different, so the patch for 4.14 does not apply cleanly to older kernels. For 4.9 branch, this edit works:
You could take the upstream one, amend it with your change, test it and send it to Greg - I believe he'll take the backport gladly.
:-)
Yes, that's the easiest way for me to accept such a patch, otherwise it gets put on the end of the very-long-queue...
thanks,
greg k-h
The sequence
fpu->initialized = 1; /* step A */ preempt_disable(); /* step B */ fpu__restore(fpu); preempt_enable();
in __fpu__restore_sig() is racy in regard to a context switch.
For 32bit frames, __fpu__restore_sig() prepares the FPU state within fpu->state. To ensure that a context switch (switch_fpu_prepare() in particular) does not modify fpu->state it uses fpu__drop() which sets fpu->initialized to 0.
After fpu->initialized is cleared, the CPU's FPU state is not saved to fpu->state during a context switch. The new state is loaded via fpu__restore(). It gets loaded into fpu->state from userland and ensured it is sane. fpu->initialized is then set to 1 in order to avoid fpu__initialize() doing anything (overwrite the new state) which is part of fpu__restore().
A context switch between step A and B above would save CPU's current FPU registers to fpu->state and overwrite the newly prepared state. This looks like a tiny race window but the Kernel Test Robot reported this back in 2016 while we had lazy FPU support. Borislav Petkov made the link between that report and another patch that has been posted. Since the removal of the lazy FPU support, this race goes unnoticed because the warning has been removed.
Disable bottom halves around the restore sequence to avoid the race. BH need to be disabled because BH is allowed to run (even with preemption disabled) and might invoke kernel_fpu_begin() by doing IPsec.
[ bp: massage commit message a bit. ]
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Ingo Molnar mingo@kernel.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: kvm ML kvm@vger.kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Cc: x86-ml x86@kernel.org Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnic Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de --- arch/x86/kernel/fpu/signal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c index ae52ef05d0981..769831d9fd114 100644 --- a/arch/x86/kernel/fpu/signal.c +++ b/arch/x86/kernel/fpu/signal.c @@ -342,10 +342,10 @@ static int __fpu__restore_sig(void __user *buf, void __user *buf_fx, int size) sanitize_restored_xstate(tsk, &env, xfeatures, fx_only); }
+ local_bh_disable(); fpu->fpstate_active = 1; - preempt_disable(); fpu__restore(fpu); - preempt_enable(); + local_bh_enable();
return err; } else {
On Fri, Dec 21, 2018 at 05:23:38PM +0100, Sebastian Andrzej Siewior wrote:
The sequence
fpu->initialized = 1; /* step A */ preempt_disable(); /* step B */ fpu__restore(fpu); preempt_enable();
in __fpu__restore_sig() is racy in regard to a context switch.
For 32bit frames, __fpu__restore_sig() prepares the FPU state within fpu->state. To ensure that a context switch (switch_fpu_prepare() in particular) does not modify fpu->state it uses fpu__drop() which sets fpu->initialized to 0.
After fpu->initialized is cleared, the CPU's FPU state is not saved to fpu->state during a context switch. The new state is loaded via fpu__restore(). It gets loaded into fpu->state from userland and ensured it is sane. fpu->initialized is then set to 1 in order to avoid fpu__initialize() doing anything (overwrite the new state) which is part of fpu__restore().
A context switch between step A and B above would save CPU's current FPU registers to fpu->state and overwrite the newly prepared state. This looks like a tiny race window but the Kernel Test Robot reported this back in 2016 while we had lazy FPU support. Borislav Petkov made the link between that report and another patch that has been posted. Since the removal of the lazy FPU support, this race goes unnoticed because the warning has been removed.
Disable bottom halves around the restore sequence to avoid the race. BH need to be disabled because BH is allowed to run (even with preemption disabled) and might invoke kernel_fpu_begin() by doing IPsec.
[ bp: massage commit message a bit. ]
Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Ingo Molnar mingo@kernel.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: Andy Lutomirski luto@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: "H. Peter Anvin" hpa@zytor.com Cc: "Jason A. Donenfeld" Jason@zx2c4.com Cc: kvm ML kvm@vger.kernel.org Cc: Paolo Bonzini pbonzini@redhat.com Cc: Radim Krčmář rkrcmar@redhat.com Cc: Rik van Riel riel@surriel.com Cc: stable@vger.kernel.org Cc: x86-ml x86@kernel.org Link: http://lkml.kernel.org/r/20181120102635.ddv3fvavxajjlfqk@linutronix.de Link: https://lkml.kernel.org/r/20160226074940.GA28911@pd.tnic Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de
arch/x86/kernel/fpu/signal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
What is the git commit id of this patch upstream?
thanks,
greg k-h
On 2018-12-21 17:29:05 [+0100], Greg Kroah-Hartman wrote:
What is the git commit id of this patch upstream?
commit 68239654acafe6aad5a3c1dc7237e60accfebc03 upstream.
I'm sorry. I cherry picked the original commit, resolved the conflict and forgot about the original commit id…
thanks,
greg k-h
Sebastian
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
commit ed6101bbf6266ee83e620b19faa7c6ad56bb41ab upstream.
Moving branch tracing setup to Intel core object into separate intel_pmu_bts_config function, because it's Intel specific.
Suggested-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Jiri Olsa jolsa@kernel.org Acked-by: Peter Zijlstra a.p.zijlstra@chello.nl Cc: stable@vger.kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: http://lkml.kernel.org/r/20181121101612.16272-1-jolsa@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/core.c | 20 -------------------- arch/x86/events/intel/core.c | 41 ++++++++++++++++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 21 deletions(-)
--- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -438,26 +438,6 @@ int x86_setup_perfctr(struct perf_event if (config == -1LL) return -EINVAL;
- /* - * Branch tracing: - */ - if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS && - !attr->freq && hwc->sample_period == 1) { - /* BTS is not supported by this architecture. */ - if (!x86_pmu.bts_active) - return -EOPNOTSUPP; - - /* BTS is currently only allowed for user-mode. */ - if (!attr->exclude_kernel) - return -EOPNOTSUPP; - - /* disallow bts if conflicting events are present */ - if (x86_add_exclusive(x86_lbr_exclusive_lbr)) - return -EBUSY; - - event->destroy = hw_perf_lbr_event_destroy; - } - hwc->config |= config;
return 0; --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2973,6 +2973,41 @@ static unsigned long intel_pmu_free_runn return flags; }
+static int intel_pmu_bts_config(struct perf_event *event) +{ + struct perf_event_attr *attr = &event->attr; + struct hw_perf_event *hwc = &event->hw; + + if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS && + !attr->freq && hwc->sample_period == 1) { + /* BTS is not supported by this architecture. */ + if (!x86_pmu.bts_active) + return -EOPNOTSUPP; + + /* BTS is currently only allowed for user-mode. */ + if (!attr->exclude_kernel) + return -EOPNOTSUPP; + + /* disallow bts if conflicting events are present */ + if (x86_add_exclusive(x86_lbr_exclusive_lbr)) + return -EBUSY; + + event->destroy = hw_perf_lbr_event_destroy; + } + + return 0; +} + +static int core_pmu_hw_config(struct perf_event *event) +{ + int ret = x86_pmu_hw_config(event); + + if (ret) + return ret; + + return intel_pmu_bts_config(event); +} + static int intel_pmu_hw_config(struct perf_event *event) { int ret = x86_pmu_hw_config(event); @@ -2980,6 +3015,10 @@ static int intel_pmu_hw_config(struct pe if (ret) return ret;
+ ret = intel_pmu_bts_config(event); + if (ret) + return ret; + if (event->attr.precise_ip) { if (!event->attr.freq) { event->hw.flags |= PERF_X86_EVENT_AUTO_RELOAD; @@ -3462,7 +3501,7 @@ static __initconst const struct x86_pmu .enable_all = core_pmu_enable_all, .enable = core_pmu_enable_event, .disable = x86_pmu_disable_event, - .hw_config = x86_pmu_hw_config, + .hw_config = core_pmu_hw_config, .schedule_events = x86_schedule_events, .eventsel = MSR_ARCH_PERFMON_EVENTSEL0, .perfctr = MSR_ARCH_PERFMON_PERFCTR0,
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
commit 67266c1080ad56c31af72b9c18355fde8ccc124a upstream.
Currently we check the branch tracing only by checking for the PERF_COUNT_HW_BRANCH_INSTRUCTIONS event of PERF_TYPE_HARDWARE type. But we can define the same event with the PERF_TYPE_RAW type.
Changing the intel_pmu_has_bts() code to check on event's final hw config value, so both HW types are covered.
Adding unlikely to intel_pmu_has_bts() condition calls, because it was used in the original code in intel_bts_constraints.
Signed-off-by: Jiri Olsa jolsa@kernel.org Acked-by: Peter Zijlstra a.p.zijlstra@chello.nl Cc: stable@vger.kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Arnaldo Carvalho de Melo acme@kernel.org Cc: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Vince Weaver vincent.weaver@maine.edu Link: http://lkml.kernel.org/r/20181121101612.16272-2-jolsa@kernel.org Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/intel/core.c | 17 +++-------------- arch/x86/events/perf_event.h | 13 +++++++++---- 2 files changed, 12 insertions(+), 18 deletions(-)
--- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2345,16 +2345,7 @@ done: static struct event_constraint * intel_bts_constraints(struct perf_event *event) { - struct hw_perf_event *hwc = &event->hw; - unsigned int hw_event, bts_event; - - if (event->attr.freq) - return NULL; - - hw_event = hwc->config & INTEL_ARCH_EVENT_MASK; - bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS); - - if (unlikely(hw_event == bts_event && hwc->sample_period == 1)) + if (unlikely(intel_pmu_has_bts(event))) return &bts_constraint;
return NULL; @@ -2976,10 +2967,8 @@ static unsigned long intel_pmu_free_runn static int intel_pmu_bts_config(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; - struct hw_perf_event *hwc = &event->hw;
- if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS && - !attr->freq && hwc->sample_period == 1) { + if (unlikely(intel_pmu_has_bts(event))) { /* BTS is not supported by this architecture. */ if (!x86_pmu.bts_active) return -EOPNOTSUPP; @@ -3038,7 +3027,7 @@ static int intel_pmu_hw_config(struct pe /* * BTS is set up earlier in this path, so don't account twice */ - if (!intel_pmu_has_bts(event)) { + if (!unlikely(intel_pmu_has_bts(event))) { /* disallow lbr if conflicting events are present */ if (x86_add_exclusive(x86_lbr_exclusive_lbr)) return -EBUSY; --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -850,11 +850,16 @@ static inline int amd_pmu_init(void)
static inline bool intel_pmu_has_bts(struct perf_event *event) { - if (event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS && - !event->attr.freq && event->hw.sample_period == 1) - return true; + struct hw_perf_event *hwc = &event->hw; + unsigned int hw_event, bts_event;
- return false; + if (event->attr.freq) + return false; + + hw_event = hwc->config & INTEL_ARCH_EVENT_MASK; + bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS); + + return hw_event == bts_event && hwc->sample_period == 1; }
int intel_pmu_save_and_restart(struct perf_event *event);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maximilian Heyne mheyne@amazon.de
commit 41e817bca3acd3980efe5dd7d28af0e6f4ab9247 upstream.
commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the generic_write_sync prototype") reworked callers of generic_write_sync(), and ended up dropping the error return for the directio path. Prior to that commit, in dio_complete(), an error would be bubbled up the stack, but after that commit, errors passed on to dio_complete were eaten up.
This was reported on the list earlier, and a fix was proposed in https://lore.kernel.org/lkml/20160921141539.GA17898@infradead.org/, but never followed up with. We recently hit this bug in our testing where fencing io errors, which were previously erroring out with EIO, were being returned as success operations after this commit.
The fix proposed on the list earlier was a little short -- it would have still called generic_write_sync() in case `ret` already contained an error. This fix ensures generic_write_sync() is only called when there's no pending error in the write. Additionally, transferred is replaced with ret to bring this code in line with other callers.
Fixes: e259221763a4 ("fs: simplify the generic_write_sync prototype") Reported-by: Ravi Nankani rnankani@amazon.com Signed-off-by: Maximilian Heyne mheyne@amazon.de Reviewed-by: Christoph Hellwig hch@lst.de CC: Torsten Mehlan tomeh@amazon.de CC: Uwe Dannowski uwed@amazon.de CC: Amit Shah aams@amazon.de CC: David Woodhouse dwmw@amazon.co.uk CC: stable@vger.kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/direct-io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -304,8 +304,8 @@ static ssize_t dio_complete(struct dio * */ dio->iocb->ki_pos += transferred;
- if (dio->op == REQ_OP_WRITE) - ret = generic_write_sync(dio->iocb, transferred); + if (ret > 0 && dio->op == REQ_OP_WRITE) + ret = generic_write_sync(dio->iocb, ret); dio->iocb->ki_complete(dio->iocb, ret, 0); }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 7b69154171b407844c273ab4c10b5f0ddcd6aa29 upstream.
Some spurious calls of snd_free_pages() have been overlooked and remain in the error paths of wss driver code. Since runtime->dma_area is managed by the PCM core helper, we shouldn't release manually.
Drop the superfluous calls.
Reviewed-by: Takashi Sakamoto o-takashi@sakamocchi.jp Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/isa/wss/wss_lib.c | 2 -- 1 file changed, 2 deletions(-)
--- a/sound/isa/wss/wss_lib.c +++ b/sound/isa/wss/wss_lib.c @@ -1531,7 +1531,6 @@ static int snd_wss_playback_open(struct if (err < 0) { if (chip->release_dma) chip->release_dma(chip, chip->dma_private_data, chip->dma1); - snd_free_pages(runtime->dma_area, runtime->dma_bytes); return err; } chip->playback_substream = substream; @@ -1572,7 +1571,6 @@ static int snd_wss_capture_open(struct s if (err < 0) { if (chip->release_dma) chip->release_dma(chip, chip->dma_private_data, chip->dma2); - snd_free_pages(runtime->dma_area, runtime->dma_bytes); return err; } chip->capture_substream = substream;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 7194eda1ba0872d917faf3b322540b4f57f11ba5 upstream.
The function snd_ac97_put_spsa() gets the bit shift value from the associated private_value, but it extracts too much; the current code extracts 8 bit values in bits 8-15, but this is a combination of two nibbles (bits 8-11 and bits 12-15) for left and right shifts. Due to the incorrect bits extraction, the actual shift may go beyond the 32bit value, as spotted recently by UBSAN check: UBSAN: Undefined behaviour in sound/pci/ac97/ac97_codec.c:836:7 shift exponent 68 is too large for 32-bit type 'int'
This patch fixes the shift value extraction by masking the properly with 0x0f instead of 0xff.
Reported-and-tested-by: Meelis Roos mroos@linux.ee Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/ac97/ac97_codec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/ac97/ac97_codec.c +++ b/sound/pci/ac97/ac97_codec.c @@ -824,7 +824,7 @@ static int snd_ac97_put_spsa(struct snd_ { struct snd_ac97 *ac97 = snd_kcontrol_chip(kcontrol); int reg = kcontrol->private_value & 0xff; - int shift = (kcontrol->private_value >> 8) & 0xff; + int shift = (kcontrol->private_value >> 8) & 0x0f; int mask = (kcontrol->private_value >> 16) & 0xff; // int invert = (kcontrol->private_value >> 24) & 0xff; unsigned short value, old, new;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit e1a7bfe3807974e66f971f2589d4e0197ec0fced upstream.
The procedure for adding a user control element has some window opened for race against the concurrent removal of a user element. This was caught by syzkaller, hitting a KASAN use-after-free error.
This patch addresses the bug by wrapping the whole procedure to add a user control element with the card->controls_rwsem, instead of only around the increment of card->user_ctl_count.
This required a slight code refactoring, too. The function snd_ctl_add() is split to two parts: a core function to add the control element and a part calling it. The former is called from the function for adding a user control element inside the controls_rwsem.
One change to be noted is that snd_ctl_notify() for adding a control element gets called inside the controls_rwsem as well while it was called outside the rwsem. But this should be OK, as snd_ctl_notify() takes another (finer) rwlock instead of rwsem, and the call of snd_ctl_notify() inside rwsem is already done in another code path.
Reported-by: syzbot+dc09047bce3820621ba2@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/control.c | 80 ++++++++++++++++++++++++++++----------------------- 1 file changed, 45 insertions(+), 35 deletions(-)
--- a/sound/core/control.c +++ b/sound/core/control.c @@ -347,6 +347,40 @@ static int snd_ctl_find_hole(struct snd_ return 0; }
+/* add a new kcontrol object; call with card->controls_rwsem locked */ +static int __snd_ctl_add(struct snd_card *card, struct snd_kcontrol *kcontrol) +{ + struct snd_ctl_elem_id id; + unsigned int idx; + unsigned int count; + + id = kcontrol->id; + if (id.index > UINT_MAX - kcontrol->count) + return -EINVAL; + + if (snd_ctl_find_id(card, &id)) { + dev_err(card->dev, + "control %i:%i:%i:%s:%i is already present\n", + id.iface, id.device, id.subdevice, id.name, id.index); + return -EBUSY; + } + + if (snd_ctl_find_hole(card, kcontrol->count) < 0) + return -ENOMEM; + + list_add_tail(&kcontrol->list, &card->controls); + card->controls_count += kcontrol->count; + kcontrol->id.numid = card->last_numid + 1; + card->last_numid += kcontrol->count; + + id = kcontrol->id; + count = kcontrol->count; + for (idx = 0; idx < count; idx++, id.index++, id.numid++) + snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_ADD, &id); + + return 0; +} + /** * snd_ctl_add - add the control instance to the card * @card: the card instance @@ -363,45 +397,18 @@ static int snd_ctl_find_hole(struct snd_ */ int snd_ctl_add(struct snd_card *card, struct snd_kcontrol *kcontrol) { - struct snd_ctl_elem_id id; - unsigned int idx; - unsigned int count; int err = -EINVAL;
if (! kcontrol) return err; if (snd_BUG_ON(!card || !kcontrol->info)) goto error; - id = kcontrol->id; - if (id.index > UINT_MAX - kcontrol->count) - goto error;
down_write(&card->controls_rwsem); - if (snd_ctl_find_id(card, &id)) { - up_write(&card->controls_rwsem); - dev_err(card->dev, "control %i:%i:%i:%s:%i is already present\n", - id.iface, - id.device, - id.subdevice, - id.name, - id.index); - err = -EBUSY; - goto error; - } - if (snd_ctl_find_hole(card, kcontrol->count) < 0) { - up_write(&card->controls_rwsem); - err = -ENOMEM; - goto error; - } - list_add_tail(&kcontrol->list, &card->controls); - card->controls_count += kcontrol->count; - kcontrol->id.numid = card->last_numid + 1; - card->last_numid += kcontrol->count; - id = kcontrol->id; - count = kcontrol->count; + err = __snd_ctl_add(card, kcontrol); up_write(&card->controls_rwsem); - for (idx = 0; idx < count; idx++, id.index++, id.numid++) - snd_ctl_notify(card, SNDRV_CTL_EVENT_MASK_ADD, &id); + if (err < 0) + goto error; return 0;
error: @@ -1360,9 +1367,12 @@ static int snd_ctl_elem_add(struct snd_c kctl->tlv.c = snd_ctl_elem_user_tlv;
/* This function manage to free the instance on failure. */ - err = snd_ctl_add(card, kctl); - if (err < 0) - return err; + down_write(&card->controls_rwsem); + err = __snd_ctl_add(card, kctl); + if (err < 0) { + snd_ctl_free_one(kctl); + goto unlock; + } offset = snd_ctl_get_ioff(kctl, &info->id); snd_ctl_build_ioff(&info->id, kctl, offset); /* @@ -1373,10 +1383,10 @@ static int snd_ctl_elem_add(struct snd_c * which locks the element. */
- down_write(&card->controls_rwsem); card->user_ctl_count++; - up_write(&card->controls_rwsem);
+ unlock: + up_write(&card->controls_rwsem); return 0; }
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit 9a20332ab373b1f8f947e0a9c923652b32dab031 upstream.
Some spurious calls of snd_free_pages() have been overlooked and remain in the error paths of sparc cs4231 driver code. Since runtime->dma_area is managed by the PCM core helper, we shouldn't release manually.
Drop the superfluous calls.
Reviewed-by: Takashi Sakamoto o-takashi@sakamocchi.jp Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/sparc/cs4231.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/sound/sparc/cs4231.c +++ b/sound/sparc/cs4231.c @@ -1146,10 +1146,8 @@ static int snd_cs4231_playback_open(stru runtime->hw = snd_cs4231_playback;
err = snd_cs4231_open(chip, CS4231_MODE_PLAY); - if (err < 0) { - snd_free_pages(runtime->dma_area, runtime->dma_bytes); + if (err < 0) return err; - } chip->playback_substream = substream; chip->p_periods_sent = 0; snd_pcm_set_sync(substream); @@ -1167,10 +1165,8 @@ static int snd_cs4231_capture_open(struc runtime->hw = snd_cs4231_capture;
err = snd_cs4231_open(chip, CS4231_MODE_RECORD); - if (err < 0) { - snd_free_pages(runtime->dma_area, runtime->dma_bytes); + if (err < 0) return err; - } chip->capture_substream = substream; chip->c_periods_sent = 0; snd_pcm_set_sync(substream);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kailang Yang kailang@realtek.com
commit 1078bef0cd9291355a20369b21cd823026ab8eaa upstream.
This patch will enable ALC300.
[ It's almost equivalent with other ALC269-compatible ones, and apparently has no loopback mixer -- tiwai ]
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -343,6 +343,7 @@ static void alc_fill_eapd_coef(struct hd case 0x10ec0285: case 0x10ec0298: case 0x10ec0289: + case 0x10ec0300: alc_update_coef_idx(codec, 0x10, 1<<9, 0); break; case 0x10ec0275: @@ -2758,6 +2759,7 @@ enum { ALC269_TYPE_ALC215, ALC269_TYPE_ALC225, ALC269_TYPE_ALC294, + ALC269_TYPE_ALC300, ALC269_TYPE_ALC700, };
@@ -2792,6 +2794,7 @@ static int alc269_parse_auto_config(stru case ALC269_TYPE_ALC215: case ALC269_TYPE_ALC225: case ALC269_TYPE_ALC294: + case ALC269_TYPE_ALC300: case ALC269_TYPE_ALC700: ssids = alc269_ssids; break; @@ -7089,6 +7092,10 @@ static int patch_alc269(struct hda_codec spec->gen.mixer_nid = 0; /* ALC2x4 does not have any loopback mixer path */ alc_update_coef_idx(codec, 0x6b, 0x0018, (1<<4) | (1<<3)); /* UAJ MIC Vref control by verb */ break; + case 0x10ec0300: + spec->codec_variant = ALC269_TYPE_ALC300; + spec->gen.mixer_nid = 0; /* no loopback on ALC300 */ + break; case 0x10ec0700: case 0x10ec0701: case 0x10ec0703: @@ -8160,6 +8167,7 @@ static const struct hda_device_id snd_hd HDA_CODEC_ENTRY(0x10ec0295, "ALC295", patch_alc269), HDA_CODEC_ENTRY(0x10ec0298, "ALC298", patch_alc269), HDA_CODEC_ENTRY(0x10ec0299, "ALC299", patch_alc269), + HDA_CODEC_ENTRY(0x10ec0300, "ALC300", patch_alc269), HDA_CODEC_REV_ENTRY(0x10ec0861, 0x100340, "ALC660", patch_alc861), HDA_CODEC_ENTRY(0x10ec0660, "ALC660-VD", patch_alc861vd), HDA_CODEC_ENTRY(0x10ec0861, "ALC861", patch_alc861),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anisse Astier anisse@astier.eu
commit 8cd65271f8e545ddeed10ecc2e417936bdff168e upstream.
MSI Cubi N 8GL (MS-B171) needs the same fixup as its older model, the MS-B120, in order for the headset mic to be properly detected.
They both use a single 3-way jack for both mic and headset with an ALC283 codec, with the same pins used.
Cc: stable@vger.kernel.org Signed-off-by: Anisse Astier anisse@astier.eu Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6411,6 +6411,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x144d, 0xc740, "Samsung Ativ book 8 (NP870Z5G)", ALC269_FIXUP_ATIV_BOOK_8), SND_PCI_QUIRK(0x1458, 0xfa53, "Gigabyte BXBT-2807", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1462, 0xb120, "MSI Cubi MS-B120", ALC283_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1462, 0xb171, "Cubi N 8GL (MS-B171)", ALC283_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x17aa, 0x1036, "Lenovo P520", ALC233_FIXUP_LENOVO_MULTI_CODECS), SND_PCI_QUIRK(0x17aa, 0x20f2, "Thinkpad SL410/510", ALC269_FIXUP_SKU_IGNORE), SND_PCI_QUIRK(0x17aa, 0x215e, "Thinkpad L512", ALC269_FIXUP_SKU_IGNORE),
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pan Bian bianpan2016@163.com
commit ecebf55d27a11538ea84aee0be643dd953f830d5 upstream.
The function ext2_xattr_set calls brelse(bh) to drop the reference count of bh. After that, bh may be freed. However, following brelse(bh), it reads bh->b_data via macro HDR(bh). This may result in a use-after-free bug. This patch moves brelse(bh) after reading field.
CC: stable@vger.kernel.org Signed-off-by: Pan Bian bianpan2016@163.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext2/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext2/xattr.c +++ b/fs/ext2/xattr.c @@ -612,9 +612,9 @@ skip_replace: }
cleanup: - brelse(bh); if (!(bh && header == HDR(bh))) kfree(header); + brelse(bh); up_write(&EXT2_I(inode)->xattr_sem);
return error;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heiko Stuebner heiko@sntech.de
commit 672e60b72bbe7aace88721db55b380b6a51fb8f9 upstream.
The Coreboot version on veyron ChromeOS devices seems to ignore memory@0 nodes when updating the available memory and instead inserts another memory node without the address.
This leads to 4GB systems only ever be using 2GB as the memory@0 node takes precedence. So remove the @0 for veyron devices.
Fixes: 0b639b815f15 ("ARM: dts: rockchip: Add missing unit name to memory nodes in rk3288 boards") Cc: stable@vger.kernel.org Reported-by: Heikki Lindholm holin@iki.fi Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/rk3288-veyron.dtsi | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/arch/arm/boot/dts/rk3288-veyron.dtsi +++ b/arch/arm/boot/dts/rk3288-veyron.dtsi @@ -47,7 +47,11 @@ #include "rk3288.dtsi"
/ { - memory@0 { + /* + * The default coreboot on veyron devices ignores memory@0 nodes + * and would instead create another memory node. + */ + memory { device_type = "memory"; reg = <0x0 0x0 0x0 0x80000000>; };
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Genoud richard.genoud@gmail.com
commit 98f5f932254b88ce828bc8e4d1642d14e5854caa upstream.
The leak was found when opening/closing a serial port a great number of time, increasing kmalloc-32 in slabinfo.
Each time the port was opened, dma_request_slave_channel() was called. Then, in at_dma_xlate(), atslave was allocated with devm_kzalloc() and never freed. (Well, it was free at module unload, but that's not what we want). So, here, kzalloc is more suited for the job since it has to be freed in atc_free_chan_resources().
Cc: stable@vger.kernel.org Fixes: bbe89c8e3d59 ("at_hdmac: move to generic DMA binding") Reported-by: Mario Forner m.forner@be4energy.com Suggested-by: Alexandre Belloni alexandre.belloni@bootlin.com Acked-by: Alexandre Belloni alexandre.belloni@bootlin.com Acked-by: Ludovic Desroches ludovic.desroches@microchip.com Signed-off-by: Richard Genoud richard.genoud@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dma/at_hdmac.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/dma/at_hdmac.c +++ b/drivers/dma/at_hdmac.c @@ -1641,6 +1641,12 @@ static void atc_free_chan_resources(stru atchan->descs_allocated = 0; atchan->status = 0;
+ /* + * Free atslave allocated in at_dma_xlate() + */ + kfree(chan->private); + chan->private = NULL; + dev_vdbg(chan2dev(chan), "free_chan_resources: done\n"); }
@@ -1675,7 +1681,7 @@ static struct dma_chan *at_dma_xlate(str dma_cap_zero(mask); dma_cap_set(DMA_SLAVE, mask);
- atslave = devm_kzalloc(&dmac_pdev->dev, sizeof(*atslave), GFP_KERNEL); + atslave = kzalloc(sizeof(*atslave), GFP_KERNEL); if (!atslave) return NULL;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Genoud richard.genoud@gmail.com
commit 77e75fda94d2ebb86aa9d35fb1860f6395bf95de upstream.
of_dma_controller_free() was not called on module onloading. This lead to a soft lockup: watchdog: BUG: soft lockup - CPU#0 stuck for 23s! Modules linked in: at_hdmac [last unloaded: at_hdmac] when of_dma_request_slave_channel() tried to call ofdma->of_dma_xlate().
Cc: stable@vger.kernel.org Fixes: bbe89c8e3d59 ("at_hdmac: move to generic DMA binding") Acked-by: Ludovic Desroches ludovic.desroches@microchip.com Signed-off-by: Richard Genoud richard.genoud@gmail.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dma/at_hdmac.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/dma/at_hdmac.c +++ b/drivers/dma/at_hdmac.c @@ -2006,6 +2006,8 @@ static int at_dma_remove(struct platform struct resource *io;
at_dma_off(atdma); + if (pdev->dev.of_node) + of_dma_controller_free(pdev->dev.of_node); dma_async_device_unregister(&atdma->dma_common);
dma_pool_destroy(atdma->memset_pool);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
We want to release the unused reservation we have since it refills the delayed refs reserve, which will make everything go smoother when running the delayed refs if we're short on our reservation.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Omar Sandoval osandov@fb.com Reviewed-by: Liu Bo bo.liu@linux.alibaba.com Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/transaction.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c index f74005ca8f08..73c1fbca0c35 100644 --- a/fs/btrfs/transaction.c +++ b/fs/btrfs/transaction.c @@ -1955,6 +1955,9 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) return ret; }
+ btrfs_trans_release_metadata(trans, fs_info); + trans->block_rsv = NULL; + /* make a pass through all the delayed refs we have so far * any runnings procs may add more while we are here */ @@ -1964,9 +1967,6 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans) return ret; }
- btrfs_trans_release_metadata(trans, fs_info); - trans->block_rsv = NULL; - cur_trans = trans->transaction;
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Wolsieffer benwolsieffer@gmail.com
commit 5a96b2d38dc054c0bbcbcd585b116566cbd877fe upstream.
The compatibility ioctl wrapper for VCHIQ_IOC_AWAIT_COMPLETION assumes that the native ioctl always uses a message buffer and decrements msgbufcount. Certain message types do not use a message buffer and in this case msgbufcount is not decremented, and completion->header for the message is NULL. Because the wrapper unconditionally decrements msgbufcount, the calling process may assume that a message buffer has been used even when it has not.
This results in a memory leak in the userspace code that interfaces with this driver. When msgbufcount is decremented, the userspace code assumes that the buffer can be freed though the reference in completion->header, which cannot happen when the reference is NULL.
This patch causes the wrapper to only decrement msgbufcount when the native ioctl decrements it. Note that we cannot simply copy the native ioctl's value of msgbufcount, because the wrapper only retrieves messages from the native ioctl one at a time, while userspace may request multiple messages.
See https://github.com/raspberrypi/linux/pull/2703 for more discussion of this patch.
Fixes: 5569a1260933 ("staging: vchiq_arm: Add compatibility wrappers for ioctls") Signed-off-by: Ben Wolsieffer benwolsieffer@gmail.com Acked-by: Stefan Wahren stefan.wahren@i2se.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -1461,6 +1461,7 @@ vchiq_compat_ioctl_await_completion(stru struct vchiq_await_completion32 args32; struct vchiq_completion_data32 completion32; unsigned int *msgbufcount32; + unsigned int msgbufcount_native; compat_uptr_t msgbuf32; void *msgbuf; void **msgbufptr; @@ -1572,7 +1573,11 @@ vchiq_compat_ioctl_await_completion(stru sizeof(completion32))) return -EFAULT;
- args32.msgbufcount--; + if (get_user(msgbufcount_native, &args->msgbufcount)) + return -EFAULT; + + if (!msgbufcount_native) + args32.msgbufcount--;
msgbufcount32 = &((struct vchiq_await_completion32 __user *)arg)->msgbufcount;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larry Finger Larry.Finger@lwfinger.net
commit 8561fb31a1f9594e2807681f5c0721894e367f19 upstream.
With Androidx86 8.1, wificond returns "failed to get nl80211_sta_info_tx_failed" and wificondControl returns "Invalid signal poll result from wificond". The fix is to OR sinfo->filled with BIT_ULL(NL80211_STA_INFO_TX_FAILED).
This missing bit is apparently not needed with NetworkManager, but it does no harm in that case.
Reported-and-Tested-by: youling257 youling257@gmail.com Cc: linux-wireless@vger.kernel.org Cc: youling257 youling257@gmail.com Signed-off-by: Larry Finger Larry.Finger@lwfinger.net Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c +++ b/drivers/staging/rtl8723bs/os_dep/ioctl_cfg80211.c @@ -1293,7 +1293,7 @@ static int cfg80211_rtw_get_station(stru
sinfo->filled |= BIT(NL80211_STA_INFO_TX_PACKETS); sinfo->tx_packets = psta->sta_stats.tx_pkts; - + sinfo->filled |= BIT_ULL(NL80211_STA_INFO_TX_FAILED); }
/* for Ad-Hoc/AP mode */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit a84a1bcc992f0545a51d2e120b8ca2ef20e2ea97 upstream.
There are two new Realtek card readers require ums-realtek to work correctly.
Add the new IDs to support them.
Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Acked-by: Alan Stern stern@rowland.harvard.edu Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/storage/unusual_realtek.h | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/usb/storage/unusual_realtek.h +++ b/drivers/usb/storage/unusual_realtek.h @@ -39,4 +39,14 @@ UNUSUAL_DEV(0x0bda, 0x0159, 0x0000, 0x99 "USB Card Reader", USB_SC_DEVICE, USB_PR_DEVICE, init_realtek_cr, 0),
+UNUSUAL_DEV(0x0bda, 0x0177, 0x0000, 0x9999, + "Realtek", + "USB Card Reader", + USB_SC_DEVICE, USB_PR_DEVICE, init_realtek_cr, 0), + +UNUSUAL_DEV(0x0bda, 0x0184, 0x0000, 0x9999, + "Realtek", + "USB Card Reader", + USB_SC_DEVICE, USB_PR_DEVICE, init_realtek_cr, 0), + #endif /* defined(CONFIG_USB_STORAGE_REALTEK) || ... */
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Niewöhner linux@mniewoehner.de
commit effd14f66cc1ef6701a19c5a56e39c35f4d395a5 upstream.
Cherry G230 Stream 2.0 (G85-231) and 3.0 (G85-232) need this quirk to function correctly. This fixes a but where double pressing numlock locks up the device completely with need to replug the keyboard.
Signed-off-by: Michael Niewöhner linux@mniewoehner.de Tested-by: Michael Niewöhner linux@mniewoehner.de Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/core/quirks.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -64,6 +64,9 @@ static const struct usb_device_id usb_qu /* Microsoft LifeCam-VX700 v2.0 */ { USB_DEVICE(0x045e, 0x0770), .driver_info = USB_QUIRK_RESET_RESUME },
+ /* Cherry Stream G230 2.0 (G85-231) and 3.0 (G85-232) */ + { USB_DEVICE(0x046a, 0x0023), .driver_info = USB_QUIRK_RESET_RESUME }, + /* Logitech HD Pro Webcams C920, C920-C, C925e and C930e */ { USB_DEVICE(0x046d, 0x082d), .driver_info = USB_QUIRK_DELAY_INIT }, { USB_DEVICE(0x046d, 0x0841), .driver_info = USB_QUIRK_DELAY_INIT },
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felipe Balbi felipe.balbi@linux.intel.com
commit 38317f5c0f2faae5110854f36edad810f841d62f upstream.
This reverts commit ffb80fc672c3a7b6afd0cefcb1524fb99917b2f3.
Turns out that commit is wrong. Host controllers are allowed to use Clear Feature HALT as means to sync data toggle between host and periperal.
Cc: stable@vger.kernel.org Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/usb/dwc3/gadget.c | 5 ----- 1 file changed, 5 deletions(-)
--- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -1511,9 +1511,6 @@ int __dwc3_gadget_ep_set_halt(struct dwc unsigned transfer_in_flight; unsigned started;
- if (dep->flags & DWC3_EP_STALL) - return 0; - if (dep->number > 1) trb = dwc3_ep_prev_trb(dep, dep->trb_enqueue); else @@ -1535,8 +1532,6 @@ int __dwc3_gadget_ep_set_halt(struct dwc else dep->flags |= DWC3_EP_STALL; } else { - if (!(dep->flags & DWC3_EP_STALL)) - return 0;
ret = dwc3_send_clear_stall_ep_cmd(dep); if (ret)
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Kelly martin@martingkelly.com
commit fe5192ac81ad0d4dfe1395d11f393f0513c15f7f upstream.
Currently, we enable the device before we enable the device trigger. At high frequencies, this can cause interrupts that don't yet have a poll function associated with them and are thus treated as spurious. At high frequencies with level interrupts, this can even cause an interrupt storm of repeated spurious interrupts (~100,000 on my Beagleboard with the LSM9DS1 magnetometer). If these repeat too much, the interrupt will get disabled and the device will stop functioning.
To prevent these problems, enable the device prior to enabling the device trigger, and disable the divec prior to disabling the trigger. This means there's no window of time during which the device creates interrupts but we have no trigger to answer them.
Fixes: 90efe055629 ("iio: st_sensors: harden interrupt handling") Signed-off-by: Martin Kelly martin@martingkelly.com Tested-by: Denis Ciocca denis.ciocca@st.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iio/magnetometer/st_magn_buffer.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
--- a/drivers/iio/magnetometer/st_magn_buffer.c +++ b/drivers/iio/magnetometer/st_magn_buffer.c @@ -30,11 +30,6 @@ int st_magn_trig_set_state(struct iio_tr return st_sensors_set_dataready_irq(indio_dev, state); }
-static int st_magn_buffer_preenable(struct iio_dev *indio_dev) -{ - return st_sensors_set_enable(indio_dev, true); -} - static int st_magn_buffer_postenable(struct iio_dev *indio_dev) { int err; @@ -50,7 +45,7 @@ static int st_magn_buffer_postenable(str if (err < 0) goto st_magn_buffer_postenable_error;
- return err; + return st_sensors_set_enable(indio_dev, true);
st_magn_buffer_postenable_error: kfree(mdata->buffer_data); @@ -63,11 +58,11 @@ static int st_magn_buffer_predisable(str int err; struct st_sensor_data *mdata = iio_priv(indio_dev);
- err = iio_triggered_buffer_predisable(indio_dev); + err = st_sensors_set_enable(indio_dev, false); if (err < 0) goto st_magn_buffer_predisable_error;
- err = st_sensors_set_enable(indio_dev, false); + err = iio_triggered_buffer_predisable(indio_dev);
st_magn_buffer_predisable_error: kfree(mdata->buffer_data); @@ -75,7 +70,6 @@ st_magn_buffer_predisable_error: }
static const struct iio_buffer_setup_ops st_magn_buffer_setup_ops = { - .preenable = &st_magn_buffer_preenable, .postenable = &st_magn_buffer_postenable, .predisable = &st_magn_buffer_predisable, };
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luis Chamberlain mcgrof@kernel.org
commit 5618cf031fecda63847cafd1091e7b8bd626cdb1 upstream.
We free the misc device string twice on rmmod; fix this. Without this we cannot remove the module without crashing.
Link: http://lkml.kernel.org/r/20181124050500.5257-1-mcgrof@kernel.org Signed-off-by: Luis Chamberlain mcgrof@kernel.org Reported-by: Randy Dunlap rdunlap@infradead.org Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: stable@vger.kernel.org [4.12+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/test_kmod.c | 1 - 1 file changed, 1 deletion(-)
--- a/lib/test_kmod.c +++ b/lib/test_kmod.c @@ -1221,7 +1221,6 @@ void unregister_test_dev_kmod(struct kmo
dev_info(test_dev->dev, "removing interface\n"); misc_deregister(&test_dev->misc_dev); - kfree(&test_dev->misc_dev.name);
mutex_unlock(&test_dev->config_mutex); mutex_unlock(&test_dev->trigger_mutex);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Zhao yuzhao@google.com
commit c1cb20d43728aa9b5393bd8d489bc85c142949b2 upstream.
We changed the key of swap cache tree from swp_entry_t.val to swp_offset. We need to do so in shmem_replace_page() as well.
Hugh said: "shmem_replace_page() has been wrong since the day I wrote it: good enough to work on swap "type" 0, which is all most people ever use (especially those few who need shmem_replace_page() at all), but broken once there are any non-0 swp_type bits set in the higher order bits"
Link: http://lkml.kernel.org/r/20181121215442.138545-1-yuzhao@google.com Fixes: f6ab1f7f6b2d ("mm, swap: use offset of swap entry as key of swap cache") Signed-off-by: Yu Zhao yuzhao@google.com Reviewed-by: Matthew Wilcox willy@infradead.org Acked-by: Hugh Dickins hughd@google.com Cc: stable@vger.kernel.org [4.9+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/shmem.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c +++ b/mm/shmem.c @@ -1532,11 +1532,13 @@ static int shmem_replace_page(struct pag { struct page *oldpage, *newpage; struct address_space *swap_mapping; + swp_entry_t entry; pgoff_t swap_index; int error;
oldpage = *pagep; - swap_index = page_private(oldpage); + entry.val = page_private(oldpage); + swap_index = swp_offset(entry); swap_mapping = page_mapping(oldpage);
/* @@ -1555,7 +1557,7 @@ static int shmem_replace_page(struct pag __SetPageLocked(newpage); __SetPageSwapBacked(newpage); SetPageUptodate(newpage); - set_page_private(newpage, swap_index); + set_page_private(newpage, entry.val); SetPageSwapCache(newpage);
/*
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dexuan Cui decui@microsoft.com
commit eceb05965489784f24bbf4d61ba60e475a983016 upstream.
This is a longstanding issue: if the vmbus upper-layer drivers try to consume too many GPADLs, the host may return with an error 0xC0000044 (STATUS_QUOTA_EXCEEDED), but currently we forget to check the creation_status, and hence we can pass an invalid GPADL handle into the OPEN_CHANNEL message, and get an error code 0xc0000225 in open_info->response.open_result.status, and finally we hang in vmbus_open() -> "goto error_free_info" -> vmbus_teardown_gpadl().
With this patch, we can exit gracefully on STATUS_QUOTA_EXCEEDED.
Cc: Stephen Hemminger sthemmin@microsoft.com Cc: K. Y. Srinivasan kys@microsoft.com Cc: Haiyang Zhang haiyangz@microsoft.com Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui decui@microsoft.com Signed-off-by: K. Y. Srinivasan kys@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hv/channel.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -454,6 +454,14 @@ int vmbus_establish_gpadl(struct vmbus_c } wait_for_completion(&msginfo->waitevent);
+ if (msginfo->response.gpadl_created.creation_status != 0) { + pr_err("Failed to establish GPADL: err = 0x%x\n", + msginfo->response.gpadl_created.creation_status); + + ret = -EDQUOT; + goto cleanup; + } + if (channel->rescind) { ret = -ENODEV; goto cleanup;
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: YueHaibing yuehaibing@huawei.com
commit 6484a677294aa5d08c0210f2f387ebb9be646115 upstream.
gcc '-Wunused-but-set-variable' warning:
drivers/misc/mic/scif/scif_rma.c: In function 'scif_create_remote_lookup': drivers/misc/mic/scif/scif_rma.c:373:25: warning: variable 'vmalloc_num_pages' set but not used [-Wunused-but-set-variable]
'vmalloc_num_pages' should be used to determine if the address is within the vmalloc range.
Fixes: ba612aa8b487 ("misc: mic: SCIF memory registration and unregistration") Signed-off-by: YueHaibing yuehaibing@huawei.com Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/misc/mic/scif/scif_rma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/misc/mic/scif/scif_rma.c +++ b/drivers/misc/mic/scif/scif_rma.c @@ -417,7 +417,7 @@ static int scif_create_remote_lookup(str if (err) goto error_window; err = scif_map_page(&window->num_pages_lookup.lookup[j], - vmalloc_dma_phys ? + vmalloc_num_pages ? vmalloc_to_page(&window->num_pages[i]) : virt_to_page(&window->num_pages[i]), remote_dev);
4.14-stable review patch. If anyone has any objections, please let me know.
------------------
From: Todd Kjos tkjos@android.com
commit 7bada55ab50697861eee6bb7d60b41e68a961a9c upstream.
Malicious code can attempt to free buffers using the BC_FREE_BUFFER ioctl to binder. There are protections against a user freeing a buffer while in use by the kernel, however there was a window where BC_FREE_BUFFER could be used to free a recently allocated buffer that was not completely initialized. This resulted in a use-after-free detected by KASAN with a malicious test program.
This window is closed by setting the buffer's allow_user_free attribute to 0 when the buffer is allocated or when the user has previously freed it instead of waiting for the caller to set it. The problem was that when the struct buffer was recycled, allow_user_free was stale and set to 1 allowing a free to go through.
Signed-off-by: Todd Kjos tkjos@google.com Acked-by: Arve Hjønnevåg arve@android.com Cc: stable stable@vger.kernel.org # 4.14 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/android/binder.c | 21 ++++++++++++--------- drivers/android/binder_alloc.c | 14 ++++++-------- drivers/android/binder_alloc.h | 3 +-- 3 files changed, 19 insertions(+), 19 deletions(-)
--- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -2918,7 +2918,6 @@ static void binder_transaction(struct bi t->buffer = NULL; goto err_binder_alloc_buf_failed; } - t->buffer->allow_user_free = 0; t->buffer->debug_id = t->debug_id; t->buffer->transaction = t; t->buffer->target_node = target_node; @@ -3407,14 +3406,18 @@ static int binder_thread_write(struct bi
buffer = binder_alloc_prepare_to_free(&proc->alloc, data_ptr); - if (buffer == NULL) { - binder_user_error("%d:%d BC_FREE_BUFFER u%016llx no match\n", - proc->pid, thread->pid, (u64)data_ptr); - break; - } - if (!buffer->allow_user_free) { - binder_user_error("%d:%d BC_FREE_BUFFER u%016llx matched unreturned buffer\n", - proc->pid, thread->pid, (u64)data_ptr); + if (IS_ERR_OR_NULL(buffer)) { + if (PTR_ERR(buffer) == -EPERM) { + binder_user_error( + "%d:%d BC_FREE_BUFFER u%016llx matched unreturned or currently freeing buffer\n", + proc->pid, thread->pid, + (u64)data_ptr); + } else { + binder_user_error( + "%d:%d BC_FREE_BUFFER u%016llx no match\n", + proc->pid, thread->pid, + (u64)data_ptr); + } break; } binder_debug(BINDER_DEBUG_FREE_BUFFER, --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -149,14 +149,12 @@ static struct binder_buffer *binder_allo else { /* * Guard against user threads attempting to - * free the buffer twice + * free the buffer when in use by kernel or + * after it's already been freed. */ - if (buffer->free_in_progress) { - pr_err("%d:%d FREE_BUFFER u%016llx user freed buffer twice\n", - alloc->pid, current->pid, (u64)user_ptr); - return NULL; - } - buffer->free_in_progress = 1; + if (!buffer->allow_user_free) + return ERR_PTR(-EPERM); + buffer->allow_user_free = 0; return buffer; } } @@ -486,7 +484,7 @@ struct binder_buffer *binder_alloc_new_b
rb_erase(best_fit, &alloc->free_buffers); buffer->free = 0; - buffer->free_in_progress = 0; + buffer->allow_user_free = 0; binder_insert_allocated_buffer_locked(alloc, buffer); binder_alloc_debug(BINDER_DEBUG_BUFFER_ALLOC, "%d: binder_alloc_buf size %zd got %pK\n", --- a/drivers/android/binder_alloc.h +++ b/drivers/android/binder_alloc.h @@ -50,8 +50,7 @@ struct binder_buffer { unsigned free:1; unsigned allow_user_free:1; unsigned async_transaction:1; - unsigned free_in_progress:1; - unsigned debug_id:28; + unsigned debug_id:29;
struct binder_transaction *transaction;
stable-rc/linux-4.14.y boot: 120 boots: 0 failed, 118 passed with 2 offline (v4.14.85-147-g2bfd08691add)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.85-147...
Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.85-147-g2bfd08691add Git Commit: 2bfd08691add5955f6d58006360c4a78edce64d0 Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 59 unique boards, 23 SoC families, 14 builds out of 197
Offline Platforms:
arm:
multi_v7_defconfig: stih410-b2120: 1 offline lab
arm64:
defconfig: meson-gxl-s905x-p212: 1 offline lab
--- For more info write to info@kernelci.org
On 12/4/18 2:48 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
Build results: total: 175 pass: 175 fail: 0 Qemu test results: total: 322 pass: 322 fail: 0
Details are available at https://kerneltests.org/builders/.
Guenter
On Tue, 4 Dec 2018 at 16:34, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.86-rc2 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: a5c883b384be3a583f2875dc44c9ec3941803121 git describe: v4.14.85-149-ga5c883b384be Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.85-14...
No regressions (compared to build v4.14.85)
No fixes (compared to build v4.14.85)
Ran 21468 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * boot * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-containers-tests * ltp-cve-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * ltp-fs-tests * ltp-open-posix-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Wed, Dec 05, 2018 at 10:48:59AM +0530, Naresh Kamboju wrote:
On Tue, 4 Dec 2018 at 16:34, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing two of these and letting me know.
greg k-h
On 04/12/2018 10:48, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 14 tests: 14 pass, 0 fail
Linux version: 4.14.86-rc2-ga5c883b Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Wed, Dec 05, 2018 at 09:31:45AM +0000, Jon Hunter wrote:
On 04/12/2018 10:48, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 14 tests: 14 pass, 0 fail
Linux version: 4.14.86-rc2-ga5c883b Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Great, thanks for testing these and letting me know.
greg k-h
On 12/4/18 3:48 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.86 release. There are 146 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu Dec 6 10:36:52 UTC 2018. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.86-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org