From: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Hi Ingo,
Please consider pulling, this is on top of
perf-core-for-mingo-4.19-20180809, that is not yet in tip.
Thanks,
- Arnaldo
Test results at the end of this message, as usual.
The following changes since commit 6a9405b56c274024564f9014bba97b92c91b34d6:
perf map: Optimize maps__fixup_overlappings() (2018-08-08 15:56:00 -0300)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-4.19-20180815
for you to fetch changes up to 6855dc41b24619c3d1de3dbd27dd0546b0e45272:
x86: Add entry trampolines to kcore (2018-08-14 19:13:26 -0300)
----------------------------------------------------------------
perf/core improvements ad fixes:
kernel:
. kallsyms, x86: Export addresses of PTI entry trampolines (Alexander Shishkin)
. kallsyms: Simplify update_iter_mod() (Adrian Hunter)
. x86: Add entry trampolines to kcore (Adrian Hunter)
Hardware tracing:
. Fix auxtrace queue resize (Adrian Hunter)
Arch specific:
. Fix uninitialized ARM SPE record error variable (Kim Phillips)
. Fix trace event post-processing in powerpc (Sandipan Das)
Build:
. Fix check-headers.sh AND list path of execution (Alexander Kapshuk)
. Remove -mcet and -fcf-protection when building the python binding
with older clang versions (Arnaldo Carvalho de Melo)
. Make check-headers.sh check based on kernel dir (Jiri Olsa)
. Move syscall_64.tbl check into check-headers.sh (Jiri Olsa)
Infrastructure:
. Check for null when copying nsinfo. (Benno Evers)
Libraries:
. Rename libtraceevent prefixes, prep work for making it a shared
library generaly available (Tzvetomir Stoyanov (VMware))
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
----------------------------------------------------------------
Adrian Hunter (3):
perf auxtrace: Fix queue resize
kallsyms: Simplify update_iter_mod()
x86: Add entry trampolines to kcore
Alexander Kapshuk (1):
perf tools: Fix check-headers.sh AND list path of execution
Alexander Shishkin (1):
kallsyms, x86: Export addresses of PTI entry trampolines
Arnaldo Carvalho de Melo (1):
perf python: Remove -mcet and -fcf-protection when building with clang
Benno Evers (1):
perf tools: Check for null when copying nsinfo.
Jiri Olsa (2):
perf tools: Make check-headers.sh check based on kernel dir
perf tools: Move syscall_64.tbl check into check-headers.sh
Kim Phillips (1):
perf arm spe: Fix uninitialized record error variable
Sandipan Das (1):
perf probe powerpc: Fix trace event post-processing
Tzvetomir Stoyanov (VMware) (24):
tools lib traceevent, perf tools: Rename struct pevent to struct tep_handle
tools lib traceevent, perf tools: Rename 'struct pevent_record' to 'struct tep_record'
tools lib traceevent, perf tools: Rename pevent plugin related APIs
tools lib traceevent, perf tools: Rename pevent alloc / free APIs
tools lib traceevent, perf tools: Rename pevent find APIs
tools lib traceevent, perf tools: Rename pevent parse APIs
tools lib traceevent, perf tools: Rename pevent print APIs
tools lib traceevent, perf tools: Rename pevent_read_number_* APIs
tools lib traceevent, perf tools: Rename pevent_register_* APIs
tools lib traceevent, perf tools: Rename pevent_set_* APIs
tools lib traceevent, perf tools: Rename traceevent_* APIs
tools lib traceevent, perf tools: Rename 'enum pevent_flag' to 'enum tep_flag'
tools lib traceevent, tools lib lockdep: Rename 'enum pevent_errno' to 'enum tep_errno'
tools lib traceevent: Rename pevent_function* APIs
tools lib traceevent, perf tools: Rename traceevent_plugin_* APIs
tools lib traceevent: Rename pevent_filter* APIs
tools lib traceevent: Rename pevent_register / unregister APIs
tools lib traceevent: Rename pevent_data_ APIs
tools lib traceevent: Rename pevent field APIs
tools lib traceevent: Rename pevent_find_* APIs
tools lib traceevent: Rename various pevent get/set/is APIs
tools lib traceevent: Rename internal parser related APIs
tools lib traceevent: Rename various pevent APIs
tools lib traceevent: Rename static variables and functions in event-parse.c
arch/x86/mm/cpu_entry_area.c | 33 +
fs/proc/kcore.c | 7 +-
include/linux/kcore.h | 13 +
kernel/kallsyms.c | 51 +-
tools/lib/lockdep/Makefile | 4 +-
tools/lib/traceevent/Makefile | 4 +-
tools/lib/traceevent/event-parse.c | 696 ++++++++++-----------
tools/lib/traceevent/event-parse.h | 458 +++++++-------
tools/lib/traceevent/event-plugin.c | 70 +--
tools/lib/traceevent/parse-filter.c | 288 ++++-----
tools/lib/traceevent/plugin_cfg80211.c | 20 +-
tools/lib/traceevent/plugin_function.c | 34 +-
tools/lib/traceevent/plugin_hrtimer.c | 56 +-
tools/lib/traceevent/plugin_jbd2.c | 36 +-
tools/lib/traceevent/plugin_kmem.c | 66 +-
tools/lib/traceevent/plugin_kvm.c | 154 ++---
tools/lib/traceevent/plugin_mac80211.c | 28 +-
tools/lib/traceevent/plugin_sched_switch.c | 60 +-
tools/lib/traceevent/plugin_scsi.c | 24 +-
tools/lib/traceevent/plugin_xen.c | 20 +-
tools/perf/arch/arm64/util/arm-spe.c | 1 +
tools/perf/arch/powerpc/util/sym-handling.c | 4 +-
tools/perf/arch/x86/Makefile | 3 -
tools/perf/builtin-kmem.c | 6 +-
tools/perf/builtin-report.c | 6 +-
tools/perf/builtin-script.c | 6 +-
tools/perf/check-headers.sh | 17 +-
tools/perf/util/auxtrace.c | 3 +
tools/perf/util/data-convert-bt.c | 6 +-
tools/perf/util/evsel.c | 2 +-
tools/perf/util/header.c | 6 +-
tools/perf/util/machine.h | 2 +-
tools/perf/util/namespaces.c | 3 +
tools/perf/util/python.c | 10 +-
.../perf/util/scripting-engines/trace-event-perl.c | 2 +-
.../util/scripting-engines/trace-event-python.c | 6 +-
tools/perf/util/setup.py | 10 +-
tools/perf/util/sort.c | 16 +-
tools/perf/util/sort.h | 2 +-
tools/perf/util/trace-event-parse.c | 34 +-
tools/perf/util/trace-event-read.c | 44 +-
tools/perf/util/trace-event-scripting.c | 4 +-
tools/perf/util/trace-event.c | 28 +-
tools/perf/util/trace-event.h | 20 +-
44 files changed, 1230 insertions(+), 1133 deletions(-)
Test results:
The first ones are container (docker) based builds of tools/perf with
and without libelf support. Where clang is available, it is also used
to build perf with/without libelf, and building with LIBCLANGLLVM=1
(built-in clang) with gcc and clang when clang and its devel libraries
are installed.
The objtool and samples/bpf/ builds are disabled now that I'm switching from
using the sources in a local volume to fetching them from a http server to
build it inside the container, to make it easier to build in a container cluster.
Those will come back later.
Several are cross builds, the ones with -x-ARCH and the android one, and those
may not have all the features built, due to lack of multi-arch devel packages,
available and being used so far on just a few, like
debian:experimental-x-{arm64,mipsel}.
The 'perf test' one will perform a variety of tests exercising
tools/perf/util/, tools/lib/{bpf,traceevent,etc}, as well as run perf commands
with a variety of command line event specifications to then intercept the
sys_perf_event syscall to check that the perf_event_attr fields are set up as
expected, among a variety of other unit tests.
Then there is the 'make -C tools/perf build-test' ones, that build tools/perf/
with a variety of feature sets, exercising the build with an incomplete set of
features as well as with a complete one. It is planned to have it run on each
of the containers mentioned above, using some container orchestration
infrastructure. Get in contact if interested in helping having this in place.
# dm
1 alpine:3.4 : Ok gcc (Alpine 5.3.0) 5.3.0
2 alpine:3.5 : Ok gcc (Alpine 6.2.1) 6.2.1 20160822
3 alpine:3.6 : Ok gcc (Alpine 6.3.0) 6.3.0
4 alpine:3.7 : Ok gcc (Alpine 6.4.0) 6.4.0
5 alpine:3.8 : Ok gcc (Alpine 6.4.0) 6.4.0
6 alpine:edge : Ok gcc (Alpine 6.4.0) 6.4.0
7 amazonlinux:1 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
8 amazonlinux:2 : Ok gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
9 android-ndk:r12b-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
10 android-ndk:r15c-arm : Ok arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
11 centos:5 : Ok gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-55)
12 centos:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23)
13 centos:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28)
14 debian:7 : Ok gcc (Debian 4.7.2-5) 4.7.2
15 debian:8 : Ok gcc (Debian 4.9.2-10+deb8u1) 4.9.2
16 debian:9 : Ok gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
17 debian:experimental : Ok gcc (Debian 8.2.0-1) 8.2.0
18 debian:experimental-x-arm64 : Ok aarch64-linux-gnu-gcc (Debian 8.1.0-12) 8.1.0
19 debian:experimental-x-mips : Ok mips-linux-gnu-gcc (Debian 8.1.0-12) 8.1.0
20 debian:experimental-x-mips64 : Ok mips64-linux-gnuabi64-gcc (Debian 8.1.0-12) 8.1.0
21 debian:experimental-x-mipsel : Ok mipsel-linux-gnu-gcc (Debian 8.1.0-12) 8.1.0
22 fedora:20 : Ok gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7)
23 fedora:21 : Ok gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6)
24 fedora:22 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
25 fedora:23 : Ok gcc (GCC) 5.3.1 20160406 (Red Hat 5.3.1-6)
26 fedora:24 : Ok gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1)
27 fedora:24-x-ARC-uClibc : Ok arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
28 fedora:25 : Ok gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)
29 fedora:26 : Ok gcc (GCC) 7.3.1 20180130 (Red Hat 7.3.1-2)
30 fedora:27 : Ok gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
31 fedora:28 : Ok gcc (GCC) 8.1.1 20180712 (Red Hat 8.1.1-5)
32 fedora:rawhide : Ok gcc (GCC) 8.0.1 20180324 (Red Hat 8.0.1-0.20)
33 gentoo-stage3-amd64:latest : Ok gcc (Gentoo 7.3.0-r3 p1.4) 7.3.0
34 mageia:5 : Ok gcc (GCC) 4.9.2
35 mageia:6 : Ok gcc (Mageia 5.5.0-1.mga6) 5.5.0
36 opensuse:13.2 : Ok gcc (SUSE Linux) 4.8.3 20140627 [gcc-4_8-branch revision 212064]
37 opensuse:42.1 : Ok gcc (SUSE Linux) 4.8.5
38 opensuse:42.2 : Ok gcc (SUSE Linux) 4.8.5
39 opensuse:42.3 : Ok gcc (SUSE Linux) 4.8.5
40 opensuse:tumbleweed : Ok gcc (SUSE Linux) 7.3.1 20180323 [gcc-7-branch revision 258812]
41 oraclelinux:6 : Ok gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23.0.1)
42 oraclelinux:7 : Ok gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28.0.1)
43 ubuntu:12.04.5 : Ok gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
44 ubuntu:14.04.4 : Ok gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
45 ubuntu:14.04.4-x-linaro-arm64 : Ok aarch64-linux-gnu-gcc (Linaro GCC 5.5-2017.10) 5.5.0
46 ubuntu:16.04 : Ok gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
47 ubuntu:16.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
48 ubuntu:16.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
49 ubuntu:16.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
50 ubuntu:16.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
51 ubuntu:16.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
52 ubuntu:16.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
53 ubuntu:16.10 : Ok gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
54 ubuntu:17.10 : Ok gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
55 ubuntu:18.04 : Ok gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
56 ubuntu:18.04-x-arm : Ok arm-linux-gnueabihf-gcc (Ubuntu/Linaro 7.3.0-16ubuntu3) 7.3.0
57 ubuntu:18.04-x-arm64 : Ok aarch64-linux-gnu-gcc (Ubuntu/Linaro 7.3.0-16ubuntu3) 7.3.0
58 ubuntu:18.04-x-m68k : Ok m68k-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
59 ubuntu:18.04-x-powerpc : Ok powerpc-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
60 ubuntu:18.04-x-powerpc64 : Ok powerpc64-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
61 ubuntu:18.04-x-powerpc64el : Ok powerpc64le-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
62 ubuntu:18.04-x-riscv64 : Ok riscv64-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
63 ubuntu:18.04-x-s390 : Ok s390x-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
64 ubuntu:18.04-x-sh4 : Ok sh4-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
65 ubuntu:18.04-x-sparc64 : Ok sparc64-linux-gnu-gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0
66 ubuntu:18.10 : Ok gcc (Ubuntu 8.2.0-1ubuntu2) 8.2.0
#
# uname -a
Linux seventh 4.18.0-rc8-00002-g1236568ee3cb #1 SMP Wed Aug 8 09:39:17 -03 2018 x86_64 x86_64 x86_64 GNU/Linux
# git log --oneline -1
6855dc41b246 (HEAD -> perf/core) x86: Add entry trampolines to kcore
# perf version --build-options
perf version 4.18.rc7.g6855dc
dwarf: [ on ] # HAVE_DWARF_SUPPORT
dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT
glibc: [ on ] # HAVE_GLIBC_SUPPORT
gtk2: [ on ] # HAVE_GTK2_SUPPORT
syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT
libbfd: [ on ] # HAVE_LIBBFD_SUPPORT
libelf: [ on ] # HAVE_LIBELF_SUPPORT
libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT
libperl: [ on ] # HAVE_LIBPERL_SUPPORT
libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT
libslang: [ on ] # HAVE_SLANG_SUPPORT
libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT
libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT
libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT
zlib: [ on ] # HAVE_ZLIB_SUPPORT
lzma: [ on ] # HAVE_LZMA_SUPPORT
get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT
bpf: [ on ] # HAVE_LIBBPF_SUPPORT
# perf test
1: vmlinux symtab matches kallsyms : Ok
2: Detect openat syscall event : Ok
3: Detect openat syscall event on all cpus : Ok
4: Read samples using the mmap interface : Ok
5: Test data source output : Ok
6: Parse event definition strings : Ok
7: Simple expression parser : Ok
8: PERF_RECORD_* events & perf_sample fields : Ok
9: Parse perf pmu format : Ok
10: DSO data read : Ok
11: DSO data cache : Ok
12: DSO data reopen : Ok
13: Roundtrip evsel->name : Ok
14: Parse sched tracepoints fields : Ok
15: syscalls:sys_enter_openat event fields : Ok
16: Setup struct perf_event_attr : Ok
17: Match and link multiple hists : Ok
18: 'import perf' in python : Ok
19: Breakpoint overflow signal handler : Ok
20: Breakpoint overflow sampling : Ok
21: Breakpoint accounting : Ok
22: Number of exit events of a simple workload : Ok
23: Software clock events period values : Ok
24: Object code reading : Ok
25: Sample parsing : Ok
26: Use a dummy software event to keep tracking : Ok
27: Parse with no sample_id_all bit set : Ok
28: Filter hist entries : Ok
29: Lookup mmap thread : Ok
30: Share thread mg : Ok
31: Sort output of hist entries : Ok
32: Cumulate child hist entries : Ok
33: Track with sched_switch : Ok
34: Filter fds with revents mask in a fdarray : Ok
35: Add fd to a fdarray, making it autogrow : Ok
36: kmod_path__parse : Ok
37: Thread map : Ok
38: LLVM search and compile :
38.1: Basic BPF llvm compile : Ok
38.2: kbuild searching : Ok
38.3: Compile source for BPF prologue generation : Ok
38.4: Compile source for BPF relocation : Ok
39: Session topology : Ok
40: BPF filter :
40.1: Basic BPF filtering : Ok
40.2: BPF pinning : Ok
40.3: BPF prologue generation : Ok
40.4: BPF relocation checker : Ok
41: Synthesize thread map : Ok
42: Remove thread map : Ok
43: Synthesize cpu map : Ok
44: Synthesize stat config : Ok
45: Synthesize stat : Ok
46: Synthesize stat round : Ok
47: Synthesize attr update : Ok
48: Event times : Ok
49: Read backward ring buffer : Ok
50: Print cpu map : Ok
51: Probe SDT events : Ok
52: is_printable_array : Ok
53: Print bitmap : Ok
54: perf hooks : Ok
55: builtin clang support : Skip (not compiled in)
56: unit_number__scnprintf : Ok
57: mem2node : Ok
58: x86 rdpmc : Ok
59: Convert perf time to TSC : Ok
60: DWARF unwind : Ok
61: x86 instruction decoder - new instructions : Ok
62: probe libc's inet_pton & backtrace it with ping : Ok
63: Check open filename arg using perf trace + vfs_getname: Ok
64: Use vfs_getname probe to get syscall args filenames : Ok
65: Add vfs_getname probe to get syscall args filenames : Ok
#
$ make -C tools/perf build-test
make: Entering directory '/home/acme/git/perf/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make_no_libaudit_O: make NO_LIBAUDIT=1
make_tags_O: make tags
make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1 NO_JVMTI=1
make_no_libelf_O: make NO_LIBELF=1
make_no_libbpf_O: make NO_LIBBPF=1
make_util_map_o_O: make util/map.o
make_doc_O: make doc
make_install_prefix_slash_O: make install prefix=/tmp/krava/
make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
make_no_demangle_O: make NO_DEMANGLE=1
make_static_O: make LDFLAGS=-static
make_install_O: make install
make_with_clangllvm_O: make LIBCLANGLLVM=1
make_install_bin_O: make install-bin
make_no_libunwind_O: make NO_LIBUNWIND=1
make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
make_no_slang_O: make NO_SLANG=1
make_no_libpython_O: make NO_LIBPYTHON=1
make_no_gtk2_O: make NO_GTK2=1
make_no_libbionic_O: make NO_LIBBIONIC=1
make_install_prefix_O: make install prefix=/tmp/krava
make_debug_O: make DEBUG=1
make_no_libperl_O: make NO_LIBPERL=1
make_pure_O: make
make_no_libnuma_O: make NO_LIBNUMA=1
make_util_pmu_bison_o_O: make util/pmu-bison.o
make_help_O: make help
make_no_newt_O: make NO_NEWT=1
make_no_backtrace_O: make NO_BACKTRACE=1
make_clean_all_O: make clean all
make_no_auxtrace_O: make NO_AUXTRACE=1
make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
make_perf_o_O: make perf.o
make_with_babeltrace_O: make LIBBABELTRACE=1
OK
make: Leaving directory '/home/acme/git/perf/tools/perf'
$
I'm announcing the release of the 4.4.150 kernel.
All x86 users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
Greg Kroah-Hartman (1):
Linux 4.4.150
Sean Christopherson (1):
x86/speculation/l1tf: Exempt zeroed PTEs from inversion
I'm announcing the release of the 4.9.122 kernel.
All x86 users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
Greg Kroah-Hartman (1):
Linux 4.9.122
Sean Christopherson (1):
x86/speculation/l1tf: Exempt zeroed PTEs from inversion
I'm announcing the release of the 4.14.65 kernel.
All x86 users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
Greg Kroah-Hartman (1):
Linux 4.14.65
Sean Christopherson (1):
x86/speculation/l1tf: Exempt zeroed PTEs from inversion
I'm announcing the release of the 4.17.17 kernel.
All x86 users of the 4.17 kernel series must upgrade.
The updated 4.17.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.17.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
Greg Kroah-Hartman (1):
Linux 4.17.17
Sean Christopherson (1):
x86/speculation/l1tf: Exempt zeroed PTEs from inversion
I'm announcing the release of the 4.18.3 kernel.
All x86 users of the 4.18 kernel series must upgrade.
The updated 4.18.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.18.y
and can be browsed at the normal kernel.org git web browser:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/x86/include/asm/pgtable-invert.h | 11 ++++++++++-
2 files changed, 11 insertions(+), 2 deletions(-)
Greg Kroah-Hartman (1):
Linux 4.18.3
Sean Christopherson (1):
x86/speculation/l1tf: Exempt zeroed PTEs from inversion
Commit 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack
space") messed up STACK_VAR() by 4 bytes presuming it was related
to skb scratch buffer space, but it clearly isn't as this refers
to the top word in stack, therefore restore it. This fixes a NULL
pointer dereference seen during bootup when JIT is enabled and BPF
program run in sk_filter_trim_cap() triggered by systemd-udevd.
JIT rework in 1c35ba122d4a ("ARM: net: bpf: use negative numbers
for stacked registers") and 96cced4e774a ("ARM: net: bpf: access
eBPF scratch space using ARM FP register") removed the affected
parts, so only needed in 4.18 stable.
Fixes: 38ca93060163 ("bpf, arm32: save 4 bytes of unneeded stack space")
Reported-by: Peter Robinson <pbrobinson(a)gmail.com>
Reported-by: Marc Haber <mh+netdev(a)zugschlus.de>
Tested-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Tested-by: Peter Robinson <pbrobinson(a)gmail.com>
Cc: Russell King <rmk+kernel(a)armlinux.org.uk>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
---
arch/arm/net/bpf_jit_32.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index f6a62ae..c864f6b 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -238,7 +238,7 @@ static void jit_fill_hole(void *area, unsigned int size)
#define STACK_SIZE ALIGN(_STACK_SIZE, STACK_ALIGNMENT)
/* Get the offset of eBPF REGISTERs stored on scratch space. */
-#define STACK_VAR(off) (STACK_SIZE - off)
+#define STACK_VAR(off) (STACK_SIZE - off - 4)
#if __LINUX_ARM_ARCH__ < 7
--
2.9.5
This is the start of the stable review cycle for the 3.18.119 release.
There are 15 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat Aug 18 17:16:20 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.119-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.119-rc1
Mark Salyzyn <salyzyn(a)android.com>
Bluetooth: hidp: buffer overflow in hidp_process_report
Eric Biggers <ebiggers(a)google.com>
crypto: ablkcipher - fix crash flushing dcache in error path
Eric Biggers <ebiggers(a)google.com>
crypto: blkcipher - fix crash flushing dcache in error path
Eric Biggers <ebiggers(a)google.com>
crypto: vmac - separate tfm and request context
Eric Biggers <ebiggers(a)google.com>
crypto: vmac - require a block cipher with 128-bit block size
Randy Dunlap <rdunlap(a)infradead.org>
kbuild: verify that $DEPMOD is installed
Liwei Song <liwei.song(a)windriver.com>
i2c: ismt: fix wrong device address when unmap the data buffer
Andrey Ryabinin <a.ryabinin(a)samsung.com>
mm: slub: fix format mismatches in slab_err() callers
Erick Reyes <erickreyes(a)google.com>
ALSA: info: Check for integer overflow in snd_info_entry_write()
Masami Hiramatsu <mhiramat(a)kernel.org>
kprobes/x86: Fix %p uses in error messages
Oleksij Rempel <o.rempel(a)pengutronix.de>
ARM: dts: imx6sx: fix irq for pcie bridge
Al Viro <viro(a)zeniv.linux.org.uk>
fix __legitimize_mnt()/mntput() race
Al Viro <viro(a)zeniv.linux.org.uk>
fix mntput/mntput race
Al Viro <viro(a)zeniv.linux.org.uk>
root dentries need RCU-delayed freeing
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't cache skb_shinfo()
-------------
Diffstat:
Documentation/Changes | 17 +-
Makefile | 4 +-
arch/arm/boot/dts/imx6sx.dtsi | 2 +-
arch/x86/kernel/kprobes/core.c | 4 +-
crypto/ablkcipher.c | 57 +++---
crypto/blkcipher.c | 54 +++---
crypto/vmac.c | 412 ++++++++++++++++++-----------------------
drivers/i2c/busses/i2c-ismt.c | 2 +-
drivers/net/xen-netfront.c | 8 +-
fs/dcache.c | 6 +-
fs/namespace.c | 28 ++-
include/crypto/vmac.h | 63 -------
mm/slub.c | 6 +-
net/bluetooth/hidp/core.c | 4 +-
scripts/depmod.sh | 8 +-
sound/core/info.c | 4 +-
16 files changed, 297 insertions(+), 382 deletions(-)
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which
leaves a window where cm_rep_handler can run, set the connection up,
dequeue pending packets and leave the subsequently queued packets by
start_xmit() sitting on neigh->queue until they're dropped when the
connection is torn down. This only applies to connected mode. These
dropped packets can really upset TCP, for example, and cause
multi-minute delays in transmission for open connections.
I've got a reproducer available if it's needed.
Here's the code in start_xmit where we check to see if the connection
is up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet
in start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch re-checks ipoib_cm_up with priv->lock held to avoid this
race condition. Since odds are the conn should be up most of the time
(and thus the connection *not* down most of the time) we don't hold the
lock for the first check attempt to avoid a slowdown from unecessary
locking for the majority of the packets transmitted during the
connection's life.
Tested-by: Ira Weiny <ira.weiny(a)intel.com>
Signed-off-by: Aaron Knister <aaron.s.knister(a)nasa.gov>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 53 +++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 26cde95b..529dbeab 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1093,6 +1093,34 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
spin_unlock_irqrestore(&priv->lock, flags);
}
+static void defer_neigh_skb(struct sk_buff *skb, struct net_device *dev,
+ struct ipoib_neigh *neigh,
+ struct ipoib_pseudo_header *phdr,
+ unsigned long *flags)
+{
+ struct ipoib_dev_priv *priv = ipoib_priv(dev);
+ unsigned long local_flags;
+ int acquire_priv_lock = 0;
+
+ /* Passing in pointer to spin_lock flags indicates spin lock
+ * already acquired so we don't need to acquire the priv lock */
+ if (flags == NULL) {
+ flags = &local_flags;
+ acquire_priv_lock = 1;
+ }
+
+ if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
+ push_pseudo_header(skb, phdr->hwaddr);
+ if (acquire_priv_lock)
+ spin_lock_irqsave(&priv->lock, *flags);
+ __skb_queue_tail(&neigh->queue, skb);
+ spin_unlock_irqrestore(&priv->lock, *flags);
+ } else {
+ ++dev->stats.tx_dropped;
+ dev_kfree_skb_any(skb);
+ }
+}
+
static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -1160,6 +1188,21 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
+ /*
+ * Re-check ipoib_cm_up with priv->lock held to avoid
+ * race condition between start_xmit and skb_dequeue in
+ * cm_rep_handler. Since odds are the conn should be up
+ * most of the time, we don't hold the lock for the
+ * first check above
+ */
+ spin_lock_irqsave(&priv->lock, flags);
+ if (ipoib_cm_up(neigh)) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+ } else {
+ defer_neigh_skb(skb, dev, neigh, phdr, &flags);
+ }
+ goto unref;
} else if (neigh->ah && neigh->ah->valid) {
neigh->ah->last_send = rn->send(dev, skb, neigh->ah->ah,
IPOIB_QPN(phdr->hwaddr));
@@ -1168,15 +1211,7 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
neigh_refresh_path(neigh, phdr->hwaddr, dev);
}
- if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
- push_pseudo_header(skb, phdr->hwaddr);
- spin_lock_irqsave(&priv->lock, flags);
- __skb_queue_tail(&neigh->queue, skb);
- spin_unlock_irqrestore(&priv->lock, flags);
- } else {
- ++dev->stats.tx_dropped;
- dev_kfree_skb_any(skb);
- }
+ defer_neigh_skb(skb, dev, neigh, phdr, NULL);
unref:
ipoib_neigh_put(neigh);
--
2.12.3
On most systems with ACPI hotplugging support, it seems that we always
receive a hotplug event once we re-enable EC interrupts even if the GPU
hasn't even been resumed yet.
This can cause problems since even though we schedule hpd_work to handle
connector reprobing for us, hpd_work synchronizes on
pm_runtime_get_sync() to wait until the device is ready to perform
reprobing. Since runtime suspend/resume callbacks are disabled before
the PM core calls ->suspend(), any calls to pm_runtime_get_sync() during
this period will grab a runtime PM ref and return immediately with
-EACCES. Because we schedule hpd_work from our ACPI HPD handler, and
hpd_work synchronizes on pm_runtime_get_sync(), this causes us to launch
a connector reprobe immediately even if the GPU isn't actually resumed
just yet. This causes various warnings in dmesg and occasionally, also
prevents some displays connected to the dedicated GPU from coming back
up after suspend. Example:
usb 1-4: USB disconnect, device number 14
usb 1-4.1: USB disconnect, device number 15
WARNING: CPU: 0 PID: 838 at drivers/gpu/drm/nouveau/include/nvkm/subdev/i2c.h:170 nouveau_dp_detect+0x17e/0x370 [nouveau]
CPU: 0 PID: 838 Comm: kworker/0:6 Not tainted 4.17.14-201.Lyude.bz1477182.V3.fc28.x86_64 #1
Hardware name: LENOVO 20EQS64N00/20EQS64N00, BIOS N1EET77W (1.50 ) 03/28/2018
Workqueue: events nouveau_display_hpd_work [nouveau]
RIP: 0010:nouveau_dp_detect+0x17e/0x370 [nouveau]
RSP: 0018:ffffa15143933cf0 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff8cb4f656c400 RCX: 0000000000000000
RDX: ffffa1514500e4e4 RSI: ffffa1514500e4e4 RDI: 0000000001009002
RBP: ffff8cb4f4a8a800 R08: ffffa15143933cfd R09: ffffa15143933cfc
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8cb4fb57a000
R13: ffff8cb4fb57a000 R14: ffff8cb4f4a8f800 R15: ffff8cb4f656c418
FS: 0000000000000000(0000) GS:ffff8cb51f400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f78ec938000 CR3: 000000073720a003 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
? _cond_resched+0x15/0x30
nouveau_connector_detect+0x2ce/0x520 [nouveau]
? _cond_resched+0x15/0x30
? ww_mutex_lock+0x12/0x40
drm_helper_probe_detect_ctx+0x8b/0xe0 [drm_kms_helper]
drm_helper_hpd_irq_event+0xa8/0x120 [drm_kms_helper]
nouveau_display_hpd_work+0x2a/0x60 [nouveau]
process_one_work+0x187/0x340
worker_thread+0x2e/0x380
? pwq_unbound_release_workfn+0xd0/0xd0
kthread+0x112/0x130
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x35/0x40
Code: 4c 8d 44 24 0d b9 00 05 00 00 48 89 ef ba 09 00 00 00 be 01 00 00 00 e8 e1 09 f8 ff 85 c0 0f 85 b2 01 00 00 80 7c 24 0c 03 74 02 <0f> 0b 48 89 ef e8 b8 07 f8 ff f6 05 51 1b c8 ff 02 0f 84 72 ff
---[ end trace 55d811b38fc8e71a ]---
So, to fix this we attempt to grab a runtime PM reference in the ACPI
handler itself asynchronously. If the GPU is already awake (it will have
normal hotplugging at this point) or runtime PM callbacks are currently
disabled on the device, we drop our reference without updating the
autosuspend delay. We only schedule connector reprobes when we
successfully managed to queue up a resume request with our asynchronous
PM ref.
This also has the added benefit of preventing redundant connector
reprobes from ACPI while the GPU is runtime resumed!
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
Cc: Karol Herbst <kherbst(a)redhat.com>
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1477182#c41
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_display.c | 26 +++++++++++++++++------
1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 4b873e668b26..2e3226ff9f95 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -377,15 +377,29 @@ nouveau_display_acpi_ntfy(struct notifier_block *nb, unsigned long val,
{
struct nouveau_drm *drm = container_of(nb, typeof(*drm), acpi_nb);
struct acpi_bus_event *info = data;
+ int ret;
if (!strcmp(info->device_class, ACPI_VIDEO_CLASS)) {
if (info->type == ACPI_VIDEO_NOTIFY_PROBE) {
- /*
- * This may be the only indication we receive of a
- * connector hotplug on a runtime suspended GPU,
- * schedule hpd_work to check.
- */
- schedule_work(&drm->hpd_work);
+ ret = pm_runtime_get(drm->dev->dev);
+ if (ret == 1 || ret == -EACCES) {
+ /* If the GPU is already awake, or in a state
+ * where we can't wake it up, it can handle
+ * it's own hotplug events.
+ */
+ pm_runtime_put_autosuspend(drm->dev->dev);
+ } else if (ret == 0) {
+ /* This may be the only indication we receive
+ * of a connector hotplug on a runtime
+ * suspended GPU, schedule hpd_work to check.
+ */
+ NV_DEBUG(drm, "ACPI requested connector reprobe\n");
+ schedule_work(&drm->hpd_work);
+ pm_runtime_put_noidle(drm->dev->dev);
+ } else {
+ NV_WARN(drm, "Dropped ACPI reprobe event due to RPM error: %d\n",
+ ret);
+ }
/* acpi-video should not generate keypresses for this */
return NOTIFY_BAD;
--
2.17.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8088d3dd4d7c6933a65aa169393b5d88d8065672 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 10:54:56 -0700
Subject: [PATCH] crypto: skcipher - fix crash flushing dcache in error path
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
skcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing skcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "cbc(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741(a)huawei.com>
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 835e5d36ad59..0bd8c6caa498 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
return max(start, end_page);
}
-static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
+static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
{
u8 *addr;
@@ -103,23 +103,24 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
addr = skcipher_get_spot(addr, bsize);
scatterwalk_copychunks(addr, &walk->out, bsize,
(walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1);
- return 0;
}
int skcipher_walk_done(struct skcipher_walk *walk, int err)
{
- unsigned int n = walk->nbytes - err;
- unsigned int nbytes;
-
- nbytes = walk->total - n;
-
- if (unlikely(err < 0)) {
- nbytes = 0;
- n = 0;
- } else if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
- SKCIPHER_WALK_SLOW |
- SKCIPHER_WALK_COPY |
- SKCIPHER_WALK_DIFF)))) {
+ unsigned int n; /* bytes processed */
+ bool more;
+
+ if (unlikely(err < 0))
+ goto finish;
+
+ n = walk->nbytes - err;
+ walk->total -= n;
+ more = (walk->total != 0);
+
+ if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
+ SKCIPHER_WALK_SLOW |
+ SKCIPHER_WALK_COPY |
+ SKCIPHER_WALK_DIFF)))) {
unmap_src:
skcipher_unmap_src(walk);
} else if (walk->flags & SKCIPHER_WALK_DIFF) {
@@ -131,28 +132,28 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
skcipher_unmap_dst(walk);
} else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) {
if (WARN_ON(err)) {
+ /* unexpected case; didn't process all bytes */
err = -EINVAL;
- nbytes = 0;
- } else
- n = skcipher_done_slow(walk, n);
+ goto finish;
+ }
+ skcipher_done_slow(walk, n);
+ goto already_advanced;
}
- if (err > 0)
- err = 0;
-
- walk->total = nbytes;
- walk->nbytes = nbytes;
-
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
- scatterwalk_done(&walk->in, 0, nbytes);
- scatterwalk_done(&walk->out, 1, nbytes);
+already_advanced:
+ scatterwalk_done(&walk->in, 0, more);
+ scatterwalk_done(&walk->out, 1, more);
- if (nbytes) {
+ if (more) {
crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ?
CRYPTO_TFM_REQ_MAY_SLEEP : 0);
return skcipher_walk_next(walk);
}
+ err = 0;
+finish:
+ walk->nbytes = 0;
/* Short-circuit for the common/fast path. */
if (!((unsigned long)walk->buffer | (unsigned long)walk->page))
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8088d3dd4d7c6933a65aa169393b5d88d8065672 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 10:54:56 -0700
Subject: [PATCH] crypto: skcipher - fix crash flushing dcache in error path
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
skcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing skcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "cbc(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741(a)huawei.com>
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 835e5d36ad59..0bd8c6caa498 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
return max(start, end_page);
}
-static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
+static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
{
u8 *addr;
@@ -103,23 +103,24 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
addr = skcipher_get_spot(addr, bsize);
scatterwalk_copychunks(addr, &walk->out, bsize,
(walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1);
- return 0;
}
int skcipher_walk_done(struct skcipher_walk *walk, int err)
{
- unsigned int n = walk->nbytes - err;
- unsigned int nbytes;
-
- nbytes = walk->total - n;
-
- if (unlikely(err < 0)) {
- nbytes = 0;
- n = 0;
- } else if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
- SKCIPHER_WALK_SLOW |
- SKCIPHER_WALK_COPY |
- SKCIPHER_WALK_DIFF)))) {
+ unsigned int n; /* bytes processed */
+ bool more;
+
+ if (unlikely(err < 0))
+ goto finish;
+
+ n = walk->nbytes - err;
+ walk->total -= n;
+ more = (walk->total != 0);
+
+ if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
+ SKCIPHER_WALK_SLOW |
+ SKCIPHER_WALK_COPY |
+ SKCIPHER_WALK_DIFF)))) {
unmap_src:
skcipher_unmap_src(walk);
} else if (walk->flags & SKCIPHER_WALK_DIFF) {
@@ -131,28 +132,28 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
skcipher_unmap_dst(walk);
} else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) {
if (WARN_ON(err)) {
+ /* unexpected case; didn't process all bytes */
err = -EINVAL;
- nbytes = 0;
- } else
- n = skcipher_done_slow(walk, n);
+ goto finish;
+ }
+ skcipher_done_slow(walk, n);
+ goto already_advanced;
}
- if (err > 0)
- err = 0;
-
- walk->total = nbytes;
- walk->nbytes = nbytes;
-
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
- scatterwalk_done(&walk->in, 0, nbytes);
- scatterwalk_done(&walk->out, 1, nbytes);
+already_advanced:
+ scatterwalk_done(&walk->in, 0, more);
+ scatterwalk_done(&walk->out, 1, more);
- if (nbytes) {
+ if (more) {
crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ?
CRYPTO_TFM_REQ_MAY_SLEEP : 0);
return skcipher_walk_next(walk);
}
+ err = 0;
+finish:
+ walk->nbytes = 0;
/* Short-circuit for the common/fast path. */
if (!((unsigned long)walk->buffer | (unsigned long)walk->page))
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8088d3dd4d7c6933a65aa169393b5d88d8065672 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 10:54:56 -0700
Subject: [PATCH] crypto: skcipher - fix crash flushing dcache in error path
scatterwalk_done() is only meant to be called after a nonzero number of
bytes have been processed, since scatterwalk_pagedone() will flush the
dcache of the *previous* page. But in the error case of
skcipher_walk_done(), e.g. if the input wasn't an integer number of
blocks, scatterwalk_done() was actually called after advancing 0 bytes.
This caused a crash ("BUG: unable to handle kernel paging request")
during '!PageSlab(page)' on architectures like arm and arm64 that define
ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE, provided that the input was
page-aligned as in that case walk->offset == 0.
Fix it by reorganizing skcipher_walk_done() to skip the
scatterwalk_advance() and scatterwalk_done() if an error has occurred.
This bug was found by syzkaller fuzzing.
Reproducer, assuming ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE:
#include <linux/if_alg.h>
#include <sys/socket.h>
#include <unistd.h>
int main()
{
struct sockaddr_alg addr = {
.salg_type = "skcipher",
.salg_name = "cbc(aes-generic)",
};
char buffer[4096] __attribute__((aligned(4096))) = { 0 };
int fd;
fd = socket(AF_ALG, SOCK_SEQPACKET, 0);
bind(fd, (void *)&addr, sizeof(addr));
setsockopt(fd, SOL_ALG, ALG_SET_KEY, buffer, 16);
fd = accept(fd, NULL, NULL);
write(fd, buffer, 15);
read(fd, buffer, 15);
}
Reported-by: Liu Chao <liuchao741(a)huawei.com>
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 835e5d36ad59..0bd8c6caa498 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -95,7 +95,7 @@ static inline u8 *skcipher_get_spot(u8 *start, unsigned int len)
return max(start, end_page);
}
-static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
+static void skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
{
u8 *addr;
@@ -103,23 +103,24 @@ static int skcipher_done_slow(struct skcipher_walk *walk, unsigned int bsize)
addr = skcipher_get_spot(addr, bsize);
scatterwalk_copychunks(addr, &walk->out, bsize,
(walk->flags & SKCIPHER_WALK_PHYS) ? 2 : 1);
- return 0;
}
int skcipher_walk_done(struct skcipher_walk *walk, int err)
{
- unsigned int n = walk->nbytes - err;
- unsigned int nbytes;
-
- nbytes = walk->total - n;
-
- if (unlikely(err < 0)) {
- nbytes = 0;
- n = 0;
- } else if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
- SKCIPHER_WALK_SLOW |
- SKCIPHER_WALK_COPY |
- SKCIPHER_WALK_DIFF)))) {
+ unsigned int n; /* bytes processed */
+ bool more;
+
+ if (unlikely(err < 0))
+ goto finish;
+
+ n = walk->nbytes - err;
+ walk->total -= n;
+ more = (walk->total != 0);
+
+ if (likely(!(walk->flags & (SKCIPHER_WALK_PHYS |
+ SKCIPHER_WALK_SLOW |
+ SKCIPHER_WALK_COPY |
+ SKCIPHER_WALK_DIFF)))) {
unmap_src:
skcipher_unmap_src(walk);
} else if (walk->flags & SKCIPHER_WALK_DIFF) {
@@ -131,28 +132,28 @@ int skcipher_walk_done(struct skcipher_walk *walk, int err)
skcipher_unmap_dst(walk);
} else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) {
if (WARN_ON(err)) {
+ /* unexpected case; didn't process all bytes */
err = -EINVAL;
- nbytes = 0;
- } else
- n = skcipher_done_slow(walk, n);
+ goto finish;
+ }
+ skcipher_done_slow(walk, n);
+ goto already_advanced;
}
- if (err > 0)
- err = 0;
-
- walk->total = nbytes;
- walk->nbytes = nbytes;
-
scatterwalk_advance(&walk->in, n);
scatterwalk_advance(&walk->out, n);
- scatterwalk_done(&walk->in, 0, nbytes);
- scatterwalk_done(&walk->out, 1, nbytes);
+already_advanced:
+ scatterwalk_done(&walk->in, 0, more);
+ scatterwalk_done(&walk->out, 1, more);
- if (nbytes) {
+ if (more) {
crypto_yield(walk->flags & SKCIPHER_WALK_SLEEP ?
CRYPTO_TFM_REQ_MAY_SLEEP : 0);
return skcipher_walk_next(walk);
}
+ err = 0;
+finish:
+ walk->nbytes = 0;
/* Short-circuit for the common/fast path. */
if (!((unsigned long)walk->buffer | (unsigned long)walk->page))
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0567fc9e90b9b1c8dbce8a5468758e6206744d4a Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 09:57:50 -0700
Subject: [PATCH] crypto: skcipher - fix aligning block size in
skcipher_copy_iv()
The ALIGN() macro needs to be passed the alignment, not the alignmask
(which is the alignment minus 1).
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 7d6a49fe3047..4f6b8dadaceb 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -398,7 +398,7 @@ static int skcipher_copy_iv(struct skcipher_walk *walk)
unsigned size;
u8 *iv;
- aligned_bs = ALIGN(bs, alignmask);
+ aligned_bs = ALIGN(bs, alignmask + 1);
/* Minimum size to align buffer by alignmask. */
size = alignmask & ~a;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0567fc9e90b9b1c8dbce8a5468758e6206744d4a Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 09:57:50 -0700
Subject: [PATCH] crypto: skcipher - fix aligning block size in
skcipher_copy_iv()
The ALIGN() macro needs to be passed the alignment, not the alignmask
(which is the alignment minus 1).
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 7d6a49fe3047..4f6b8dadaceb 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -398,7 +398,7 @@ static int skcipher_copy_iv(struct skcipher_walk *walk)
unsigned size;
u8 *iv;
- aligned_bs = ALIGN(bs, alignmask);
+ aligned_bs = ALIGN(bs, alignmask + 1);
/* Minimum size to align buffer by alignmask. */
size = alignmask & ~a;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0567fc9e90b9b1c8dbce8a5468758e6206744d4a Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Mon, 23 Jul 2018 09:57:50 -0700
Subject: [PATCH] crypto: skcipher - fix aligning block size in
skcipher_copy_iv()
The ALIGN() macro needs to be passed the alignment, not the alignmask
(which is the alignment minus 1).
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface")
Cc: <stable(a)vger.kernel.org> # v4.10+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/crypto/skcipher.c b/crypto/skcipher.c
index 7d6a49fe3047..4f6b8dadaceb 100644
--- a/crypto/skcipher.c
+++ b/crypto/skcipher.c
@@ -398,7 +398,7 @@ static int skcipher_copy_iv(struct skcipher_walk *walk)
unsigned size;
u8 *iv;
- aligned_bs = ALIGN(bs, alignmask);
+ aligned_bs = ALIGN(bs, alignmask + 1);
/* Minimum size to align buffer by alignmask. */
size = alignmask & ~a;
commit b3681dd548d06deb2e1573890829dff4b15abf46 upstream.
This version applies to v4.9.
>From Andy Lutomirski, original author:
error_entry and error_exit communicate the user vs kernel status of
the frame using %ebx. This is unnecessary -- the information is in
regs->cs. Just use regs->cs.
This makes error_entry simpler and makes error_exit more robust.
It also fixes a nasty bug. Before all the Spectre nonsense, The
xen_failsafe_callback entry point returned like this:
ALLOC_PT_GPREGS_ON_STACK
SAVE_C_REGS
SAVE_EXTRA_REGS
ENCODE_FRAME_POINTER
jmp error_exit
And it did not go through error_entry. This was bogus: RBX
contained garbage, and error_exit expected a flag in RBX.
Fortunately, it generally contained *nonzero* garbage, so the
correct code path was used. As part of the Spectre fixes, code was
added to clear RBX to mitigate certain speculation attacks. Now,
depending on kernel configuration, RBX got zeroed and, when running
some Wine workloads, the kernel crashes. This was introduced by:
commit 3ac6d8c787b8 ("x86/entry/64: Clear registers for
exceptions/interrupts, to reduce speculation attack surface")
With this patch applied, RBX is no longer needed as a flag, and the
problem goes away.
I suspect that malicious userspace could use this bug to crash the
kernel even without the offending patch applied, though.
[Historical note: I wrote this patch as a cleanup before I was aware
of the bug it fixed.]
[Note to stable maintainers: this should probably get applied to all
kernels.]
Cc: Brian Gerst <brgerst(a)gmail.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: xen-devel(a)lists.xenproject.org
Cc: x86(a)kernel.org
Cc: stable(a)vger.kernel.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Fixes: 3ac6d8c787b8 ("x86/entry/64: Clear registers for exceptions/interrupts, to reduce speculation attack surface")
Reported-and-tested-by: "M. Vefa Bicakci" <m.v.b(a)runbox.com>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Sarah Newman <srn(a)prgmr.com>
---
arch/x86/entry/entry_64.S | 19 ++++---------------
1 file changed, 4 insertions(+), 15 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index d58d8dc..0dab47a 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -774,7 +774,7 @@ ENTRY(\sym)
call \do_sym
- jmp error_exit /* %ebx: no swapgs flag */
+ jmp error_exit
.endif
END(\sym)
.endm
@@ -1043,7 +1043,6 @@ END(paranoid_exit)
/*
* Save all registers in pt_regs, and switch gs if needed.
- * Return: EBX=0: came from user mode; EBX=1: otherwise
*/
ENTRY(error_entry)
cld
@@ -1087,7 +1086,6 @@ ENTRY(error_entry)
* for these here too.
*/
.Lerror_kernelspace:
- incl %ebx
leaq native_irq_return_iret(%rip), %rcx
cmpq %rcx, RIP+8(%rsp)
je .Lerror_bad_iret
@@ -1119,28 +1117,19 @@ ENTRY(error_entry)
/*
* Pretend that the exception came from user mode: set up pt_regs
- * as if we faulted immediately after IRET and clear EBX so that
- * error_exit knows that we will be returning to user mode.
+ * as if we faulted immediately after IRET.
*/
mov %rsp, %rdi
call fixup_bad_iret
mov %rax, %rsp
- decl %ebx
jmp .Lerror_entry_from_usermode_after_swapgs
END(error_entry)
-
-/*
- * On entry, EBX is a "return to kernel mode" flag:
- * 1: already in kernel mode, don't need SWAPGS
- * 0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode
- */
ENTRY(error_exit)
- movl %ebx, %eax
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
- testl %eax, %eax
- jnz retint_kernel
+ testb $3, CS(%rsp)
+ jz retint_kernel
jmp retint_user
END(error_exit)
--
1.9.1
From: Andrey Konovalov <andreyknvl(a)google.com>
commit 0e410e158e5baa1300bdf678cea4f4e0cf9d8b94 upstream.
With KASAN enabled the kernel has two different memset() functions, one
with KASAN checks (memset) and one without (__memset). KASAN uses some
macro tricks to use the proper version where required. For example
memset() calls in mm/slub.c are without KASAN checks, since they operate
on poisoned slab object metadata.
The issue is that clang emits memset() calls even when there is no
memset() in the source code. They get linked with improper memset()
implementation and the kernel fails to boot due to a huge amount of KASAN
reports during early boot stages.
The solution is to add -fno-builtin flag for files with KASAN_SANITIZE :=
n marker.
Link: http://lkml.kernel.org/r/8ffecfffe04088c52c42b92739c2bd8a0bcb3f5e.151638459…
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Michal Marek <michal.lkml(a)markovi.net>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[ Sami: Backported to 4.9 avoiding c5caf21ab0cf8 and e7c52b84fb ]
Signed-off-by: Sami Tolvanen <samitolvanen(a)google.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
Makefile | 3 ++-
scripts/Makefile.kasan | 3 +++
scripts/Makefile.lib | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 0723bbe1d4a7..1fa448ba243f 100644
--- a/Makefile
+++ b/Makefile
@@ -417,7 +417,8 @@ export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE
export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS
-export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_KASAN CFLAGS_UBSAN
+export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE
+export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE CFLAGS_UBSAN
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index 37323b0df374..2624d4bf9a45 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -28,4 +28,7 @@ else
CFLAGS_KASAN := $(CFLAGS_KASAN_MINIMAL)
endif
endif
+
+CFLAGS_KASAN_NOSANITIZE := -fno-builtin
+
endif
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index ae0f9ab1a70d..c954040c3cf2 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -127,7 +127,7 @@ endif
ifeq ($(CONFIG_KASAN),y)
_c_flags += $(if $(patsubst n%,, \
$(KASAN_SANITIZE_$(basetarget).o)$(KASAN_SANITIZE)y), \
- $(CFLAGS_KASAN))
+ $(CFLAGS_KASAN), $(CFLAGS_KASAN_NOSANITIZE))
endif
ifeq ($(CONFIG_UBSAN),y)
--
2.18.0.865.gffc8e1a3cd6-goog
The 4.4.y stable backport dc6ae4dffd65 for the upstream commit
3d4bf93ac120 ("tcp: detect malicious patterns in
tcp_collapse_ofo_queue()") missed a line that enlarges the
range_truesize value, which broke the whole check.
Fixes: dc6ae4dffd65 ("tcp: detect malicious patterns in tcp_collapse_ofo_queue()")
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
---
Greg, this is a fix-up specific to 4.4.y stable backport that had a
slightly different form from upstream fix. I haven't looked at the
older trees, but 4.9.y and later took the upstream fix as is, so this
patch isn't needed for them.
The patch hasn't been tested with the real test case, though; let me
know if the current code is intended. Thanks!
net/ipv4/tcp_input.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4a261e078082..9c4c6cd0316e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4835,6 +4835,7 @@ static void tcp_collapse_ofo_queue(struct sock *sk)
end = TCP_SKB_CB(skb)->end_seq;
range_truesize = skb->truesize;
} else {
+ range_truesize += skb->truesize;
if (before(TCP_SKB_CB(skb)->seq, start))
start = TCP_SKB_CB(skb)->seq;
if (after(TCP_SKB_CB(skb)->end_seq, end))
--
2.18.0
From: Yannik Sembritzki <yannik(a)sembritzki.me>
The split of .system_keyring into .builtin_trusted_keys and
.secondary_trusted_keys broke kexec, thereby preventing kernels signed by
keys which are now in the secondary keyring from being kexec'd.
Fix this by passing VERIFY_USE_SECONDARY_KEYRING to
verify_pefile_signature().
Fixes: d3bfe84129f6 ("certs: Add a secondary system keyring that can be added to dynamically")
Signed-off-by: Yannik Sembritzki <yannik(a)sembritzki.me>
Signed-off-by: David Howells <dhowells(a)redhat.com>
cc: kexec(a)lists.infradead.org
cc: keyrings(a)vger.kernel.org
cc: linux-security-module(a)vger.kernel.org
cc: stable(a)vger.kernel.org
---
arch/x86/kernel/kexec-bzimage64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index 7326078eaa7a..278cd07228dd 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -532,7 +532,7 @@ static int bzImage64_cleanup(void *loader_data)
static int bzImage64_verify_sig(const char *kernel, unsigned long kernel_len)
{
return verify_pefile_signature(kernel, kernel_len,
- NULL,
+ VERIFY_USE_SECONDARY_KEYRING,
VERIFYING_KEXEC_PE_SIGNATURE);
}
#endif
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which
leaves a window where cm_rep_handler can run, set the connection up,
dequeue pending packets and leave the subsequently queued packets by
start_xmit() sitting on neigh->queue until they're dropped when the
connection is torn down. This only applies to connected mode. These
dropped packets can really upset TCP, for example, and cause
multi-minute delays in transmission for open connections.
I've got a reproducer available if it's needed.
Here's the code in start_xmit where we check to see if the connection
is up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet
in start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch re-checks ipoib_cm_up with priv->lock held to avoid this
race condition. Since odds are the conn should be up most of the time
(and thus the connection *not* down most of the time) we don't hold the
lock for the first check attempt to avoid a slowdown from unecessary
locking for the majority of the packets transmitted during the
connection's life.
Cc: stable(a)vger.kernel.org
Tested-by: Ira Weiny <ira.weiny(a)intel.com>
Signed-off-by: Aaron Knister <aaron.s.knister(a)nasa.gov>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 53 +++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 26cde95b..529dbeab 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1093,6 +1093,34 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
spin_unlock_irqrestore(&priv->lock, flags);
}
+static void defer_neigh_skb(struct sk_buff *skb, struct net_device *dev,
+ struct ipoib_neigh *neigh,
+ struct ipoib_pseudo_header *phdr,
+ unsigned long *flags)
+{
+ struct ipoib_dev_priv *priv = ipoib_priv(dev);
+ unsigned long local_flags;
+ int acquire_priv_lock = 0;
+
+ /* Passing in pointer to spin_lock flags indicates spin lock
+ * already acquired so we don't need to acquire the priv lock */
+ if (flags == NULL) {
+ flags = &local_flags;
+ acquire_priv_lock = 1;
+ }
+
+ if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
+ push_pseudo_header(skb, phdr->hwaddr);
+ if (acquire_priv_lock)
+ spin_lock_irqsave(&priv->lock, *flags);
+ __skb_queue_tail(&neigh->queue, skb);
+ spin_unlock_irqrestore(&priv->lock, *flags);
+ } else {
+ ++dev->stats.tx_dropped;
+ dev_kfree_skb_any(skb);
+ }
+}
+
static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -1160,6 +1188,21 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
+ /*
+ * Re-check ipoib_cm_up with priv->lock held to avoid
+ * race condition between start_xmit and skb_dequeue in
+ * cm_rep_handler. Since odds are the conn should be up
+ * most of the time, we don't hold the lock for the
+ * first check above
+ */
+ spin_lock_irqsave(&priv->lock, flags);
+ if (ipoib_cm_up(neigh)) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+ } else {
+ defer_neigh_skb(skb, dev, neigh, phdr, &flags);
+ }
+ goto unref;
} else if (neigh->ah && neigh->ah->valid) {
neigh->ah->last_send = rn->send(dev, skb, neigh->ah->ah,
IPOIB_QPN(phdr->hwaddr));
@@ -1168,15 +1211,7 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
neigh_refresh_path(neigh, phdr->hwaddr, dev);
}
- if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
- push_pseudo_header(skb, phdr->hwaddr);
- spin_lock_irqsave(&priv->lock, flags);
- __skb_queue_tail(&neigh->queue, skb);
- spin_unlock_irqrestore(&priv->lock, flags);
- } else {
- ++dev->stats.tx_dropped;
- dev_kfree_skb_any(skb);
- }
+ defer_neigh_skb(skb, dev, neigh, phdr, NULL);
unref:
ipoib_neigh_put(neigh);
--
2.12.3
Hi!
According to 1), disabling EPT offers the same maximum protection against L1TF as disabling SMT but
has a severe performance impact.
FWIW: With EPT disabled (2)), I can *not* confirm any performance-degradation for the VirtualBox
Windows- or Linux-VMs that I use. Those VMs are for desktop-use, though.
So to me it seems that the performance impact depends on the use case and in a desktop-setting
disabling EPT may offer a simple max-protection-option with the advantage of still enabled
hyperthreading.
I have tried this with 4.18.1 and 4.14.63.
Rainer Fiebig
***
1) https://www.kernel.org/doc/html/latest/admin-guide/l1tf.html#mitigation-sel…
2) kvm-intel.ept=0
> tail /sys/devices/system/cpu/vulnerabilities/*
==> /sys/devices/system/cpu/vulnerabilities/l1tf <==
Mitigation: PTE Inversion; VMX: EPT disabled
==> /sys/devices/system/cpu/vulnerabilities/meltdown <==
Mitigation: PTI
==> /sys/devices/system/cpu/vulnerabilities/spec_store_bypass <==
Mitigation: Speculative Store Bypass disabled via prctl and seccomp
==> /sys/devices/system/cpu/vulnerabilities/spectre_v1 <==
Mitigation: __user pointer sanitization
==> /sys/devices/system/cpu/vulnerabilities/spectre_v2 <==
Mitigation: Full generic retpoline, IBPB, IBRS_FW
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 4.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 4.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5e0fb5df2ee871b841f96f9cb6a7f2784e96aa4e Mon Sep 17 00:00:00 2001
From: Toshi Kani <toshi.kani(a)hpe.com>
Date: Wed, 27 Jun 2018 08:13:48 -0600
Subject: [PATCH] x86/mm: Add TLB purge to free pmd/pte page interfaces
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates
a pud / pmd map. The following preconditions are met at their entry.
- All pte entries for a target pud/pmd address range have been cleared.
- System-wide TLB purges have been peformed for a target pud/pmd address
range.
The preconditions assure that there is no stale TLB entry for the range.
Speculation may not cache TLB entries since it requires all levels of page
entries, including ptes, to have P & A-bits set for an associated address.
However, speculation may cache pud/pmd entries (paging-structure caches)
when they have P-bit set.
Add a system-wide TLB purge (INVLPG) to a single page after clearing
pud/pmd entry's P-bit.
SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches,
states that:
INVLPG invalidates all paging-structure caches associated with the
current PCID regardless of the liner addresses to which they correspond.
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: mhocko(a)suse.com
Cc: akpm(a)linux-foundation.org
Cc: hpa(a)zytor.com
Cc: cpandya(a)codeaurora.org
Cc: linux-mm(a)kvack.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: stable(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20180627141348.21777-4-toshi.kani@hpe.com
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index fbd14e506758..e3deefb891da 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -725,24 +725,44 @@ int pmd_clear_huge(pmd_t *pmd)
* @pud: Pointer to a PUD.
* @addr: Virtual address associated with pud.
*
- * Context: The pud range has been unmaped and TLB purged.
+ * Context: The pud range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
+ *
+ * NOTE: Callers must allow a single page allocation.
*/
int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
- pmd_t *pmd;
+ pmd_t *pmd, *pmd_sv;
+ pte_t *pte;
int i;
if (pud_none(*pud))
return 1;
pmd = (pmd_t *)pud_page_vaddr(*pud);
+ pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
+ if (!pmd_sv)
+ return 0;
- for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
- return 0;
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd_sv[i] = pmd[i];
+ if (!pmd_none(pmd[i]))
+ pmd_clear(&pmd[i]);
+ }
pud_clear(pud);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ if (!pmd_none(pmd_sv[i])) {
+ pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]);
+ free_page((unsigned long)pte);
+ }
+ }
+
+ free_page((unsigned long)pmd_sv);
free_page((unsigned long)pmd);
return 1;
@@ -753,7 +773,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
* @pmd: Pointer to a PMD.
* @addr: Virtual address associated with pmd.
*
- * Context: The pmd range has been unmaped and TLB purged.
+ * Context: The pmd range has been unmapped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
@@ -765,6 +785,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
pte = (pte_t *)pmd_page_vaddr(*pmd);
pmd_clear(pmd);
+
+ /* INVLPG to clear all paging-structure caches */
+ flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1);
+
free_page((unsigned long)pte);
return 1;
The patch below does not apply to the 4.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c40a56a7818cfe735fc93a69e1875f8bba834483 Mon Sep 17 00:00:00 2001
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Date: Thu, 2 Aug 2018 15:58:31 -0700
Subject: [PATCH] x86/mm/init: Remove freed kernel image areas from alias
mapping
The kernel image is mapped into two places in the virtual address space
(addresses without KASLR, of course):
1. The kernel direct map (0xffff880000000000)
2. The "high kernel map" (0xffffffff81000000)
We actually execute out of #2. If we get the address of a kernel symbol,
it points to #2, but almost all physical-to-virtual translations point to
Parts of the "high kernel map" alias are mapped in the userspace page
tables with the Global bit for performance reasons. The parts that we map
to userspace do not (er, should not) have secrets. When PTI is enabled then
the global bit is usually not set in the high mapping and just used to
compensate for poor performance on systems which lack PCID.
This is fine, except that some areas in the kernel image that are adjacent
to the non-secret-containing areas are unused holes. We free these holes
back into the normal page allocator and reuse them as normal kernel memory.
The memory will, of course, get *used* via the normal map, but the alias
mapping is kept.
This otherwise unused alias mapping of the holes will, by default keep the
Global bit, be mapped out to userspace, and be vulnerable to Meltdown.
Remove the alias mapping of these pages entirely. This is likely to
fracture the 2M page mapping the kernel image near these areas, but this
should affect a minority of the area.
The pageattr code changes *all* aliases mapping the physical pages that it
operates on (by default). We only want to modify a single alias, so we
need to tweak its behavior.
This unmapping behavior is currently dependent on PTI being in place.
Going forward, we should at least consider doing this for all
configurations. Having an extra read-write alias for memory is not exactly
ideal for debugging things like random memory corruption and this does
undercut features like DEBUG_PAGEALLOC or future work like eXclusive Page
Frame Ownership (XPFO).
Before this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte
current_kernel-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82e00000 2M RW NX pte
current_kernel-0xffffffff82e00000-0xffffffff83200000 4M RW PSE NX pmd
current_kernel-0xffffffff83200000-0xffffffffa0000000 462M pmd
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K RW NX pte
current_user-0xffffffff82000000-0xffffffff82600000 6M ro PSE GLB NX pmd
current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd
After this patch:
current_kernel:---[ High Kernel Mapping ]---
current_kernel-0xffffffff80000000-0xffffffff81000000 16M pmd
current_kernel-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_kernel-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_kernel-0xffffffff81e11000-0xffffffff82000000 1980K pte
current_kernel-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd
current_kernel-0xffffffff82400000-0xffffffff82488000 544K ro NX pte
current_kernel-0xffffffff82488000-0xffffffff82600000 1504K pte
current_kernel-0xffffffff82600000-0xffffffff82c00000 6M RW PSE NX pmd
current_kernel-0xffffffff82c00000-0xffffffff82c0d000 52K RW NX pte
current_kernel-0xffffffff82c0d000-0xffffffff82dc0000 1740K pte
current_user:---[ High Kernel Mapping ]---
current_user-0xffffffff80000000-0xffffffff81000000 16M pmd
current_user-0xffffffff81000000-0xffffffff81e00000 14M ro PSE GLB x pmd
current_user-0xffffffff81e00000-0xffffffff81e11000 68K ro GLB x pte
current_user-0xffffffff81e11000-0xffffffff82000000 1980K pte
current_user-0xffffffff82000000-0xffffffff82400000 4M ro PSE GLB NX pmd
current_user-0xffffffff82400000-0xffffffff82488000 544K ro NX pte
current_user-0xffffffff82488000-0xffffffff82600000 1504K pte
current_user-0xffffffff82600000-0xffffffffa0000000 474M pmd
[ tglx: Do not unmap on 32bit as there is only one mapping ]
Fixes: 0f561fce4d69 ("x86/pti: Enable global pages for shared areas")
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Kees Cook <keescook(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Link: https://lkml.kernel.org/r/20180802225831.5F6A2BFC@viggo.jf.intel.com
diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h
index bd090367236c..34cffcef7375 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -46,6 +46,7 @@ int set_memory_np(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
int set_memory_encrypted(unsigned long addr, int numpages);
int set_memory_decrypted(unsigned long addr, int numpages);
+int set_memory_np_noalias(unsigned long addr, int numpages);
int set_memory_array_uc(unsigned long *addr, int addrinarray);
int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index bc11dedffc45..74b157ac078d 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -780,8 +780,30 @@ void free_init_pages(char *what, unsigned long begin, unsigned long end)
*/
void free_kernel_image_pages(void *begin, void *end)
{
- free_init_pages("unused kernel image",
- (unsigned long)begin, (unsigned long)end);
+ unsigned long begin_ul = (unsigned long)begin;
+ unsigned long end_ul = (unsigned long)end;
+ unsigned long len_pages = (end_ul - begin_ul) >> PAGE_SHIFT;
+
+
+ free_init_pages("unused kernel image", begin_ul, end_ul);
+
+ /*
+ * PTI maps some of the kernel into userspace. For performance,
+ * this includes some kernel areas that do not contain secrets.
+ * Those areas might be adjacent to the parts of the kernel image
+ * being freed, which may contain secrets. Remove the "high kernel
+ * image mapping" for these freed areas, ensuring they are not even
+ * potentially vulnerable to Meltdown regardless of the specific
+ * optimizations PTI is currently using.
+ *
+ * The "noalias" prevents unmapping the direct map alias which is
+ * needed to access the freed pages.
+ *
+ * This is only valid for 64bit kernels. 32bit has only one mapping
+ * which can't be treated in this way for obvious reasons.
+ */
+ if (IS_ENABLED(CONFIG_X86_64) && cpu_feature_enabled(X86_FEATURE_PTI))
+ set_memory_np_noalias(begin_ul, len_pages);
}
void __ref free_initmem(void)
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index c04153796f61..0a74996a1149 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -53,6 +53,7 @@ static DEFINE_SPINLOCK(cpa_lock);
#define CPA_FLUSHTLB 1
#define CPA_ARRAY 2
#define CPA_PAGES_ARRAY 4
+#define CPA_NO_CHECK_ALIAS 8 /* Do not search for aliases */
#ifdef CONFIG_PROC_FS
static unsigned long direct_pages_count[PG_LEVEL_NUM];
@@ -1486,6 +1487,9 @@ static int change_page_attr_set_clr(unsigned long *addr, int numpages,
/* No alias checking for _NX bit modifications */
checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
+ /* Has caller explicitly disabled alias checking? */
+ if (in_flag & CPA_NO_CHECK_ALIAS)
+ checkalias = 0;
ret = __change_page_attr_set_clr(&cpa, checkalias);
@@ -1772,6 +1776,15 @@ int set_memory_np(unsigned long addr, int numpages)
return change_page_attr_clear(&addr, numpages, __pgprot(_PAGE_PRESENT), 0);
}
+int set_memory_np_noalias(unsigned long addr, int numpages)
+{
+ int cpa_flags = CPA_NO_CHECK_ALIAS;
+
+ return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
+ __pgprot(_PAGE_PRESENT), 0,
+ cpa_flags, NULL);
+}
+
int set_memory_4k(unsigned long addr, int numpages)
{
return change_page_attr_set_clr(&addr, numpages, __pgprot(0),
Hi!
With 4.18.1 and l1tf=full no problems so far and no noticeable performance-degradation in normal
desktop-usage including a VirtualBox-VM. Compiling a kernel seems to take a bit longer, though.
4.9.120 and 4.14.63 also seem OK in cursory testing.
CPU: Core i3.
Thanks and so long!
Rainer Fiebig
--
The truth always turns out to be simpler than you thought.
Richard Feynman
Dear Supplier,
Please view our company catalog in attach file below and selected
items then provide quotation based on our requirement.
For further clarification or any question feel free to contact
me.
Your further reply as above will be very appreciated and helpful
for our next conversation.
Hope we can establish long-term business relationship.
Best regards.
Walter Benson.
Marketing & Business Development
APTECH AFRICA LTD
JUBA SOUTH SUDAN
TEL:+211922414561
www.aptechafrica.com
This is the start of the stable review cycle for the 4.9.120 release.
There are 107 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Aug 16 17:14:53 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.120-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.120-rc1
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/microcode: Allow late microcode loading with SMT disabled
Ashok Raj <ashok.raj(a)intel.com>
x86/microcode: Do not upload microcode if CPUs are offline
David Woodhouse <dwmw(a)amazon.co.uk>
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
Andi Kleen <ak(a)linux.intel.com>
x86/mm/kmmio: Make the tracer robust against L1TF
Andi Kleen <ak(a)linux.intel.com>
x86/mm/pat: Make set_memory_np() L1TF safe
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Invert all not present mappings
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Fix SMT supported evaluation
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
Paolo Bonzini <pbonzini(a)redhat.com>
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
Paolo Bonzini <pbonzini(a)redhat.com>
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: VMX: support MSR_IA32_ARCH_CAPABILITIES as a feature MSR
Wanpeng Li <wanpengli(a)tencent.com>
KVM: X86: Allow userspace to define the microcode version
Wanpeng Li <wanpengli(a)tencent.com>
KVM: X86: Introduce kvm_get_msr_feature()
Tom Lendacky <thomas.lendacky(a)amd.com>
KVM: SVM: Add MSR-based feature support for serializing LFENCE
Tom Lendacky <thomas.lendacky(a)amd.com>
KVM: x86: Add a framework for supporting MSR-based features
Thomas Gleixner <tglx(a)linutronix.de>
Documentation/l1tf: Remove Yonah processors from not vulnerable list
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
Nicolai Stange <nstange(a)suse.de>
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
Nicolai Stange <nstange(a)suse.de>
x86: Don't include linux/irq.h from asm/hardirq.h
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
Nicolai Stange <nstange(a)suse.de>
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
Josh Poimboeuf <jpoimboe(a)redhat.com>
cpu/hotplug: detect SMT disabled by BIOS
Tony Luck <tony.luck(a)intel.com>
Documentation/l1tf: Fix typos
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content
Thomas Gleixner <tglx(a)linutronix.de>
Documentation: Add section about CPU vulnerabilities
Jiri Kosina <jkosina(a)suse.cz>
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
Jiri Kosina <jkosina(a)suse.cz>
cpu/hotplug: Expose SMT control init function
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Allow runtime control of L1D flush
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Serialize L1D flush parameter setter
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Add static key for flush always
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Move l1tf setup function
Thomas Gleixner <tglx(a)linutronix.de>
x86/l1tf: Handle EPT disabled state proper
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Drop L1TF MSR list approach
Thomas Gleixner <tglx(a)linutronix.de>
x86/litf: Introduce vmx status variable
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Online siblings when SMT control is turned on
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Add find_msr() helper function
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers
Jim Mattson <jmattson(a)google.com>
kvm: nVMX: Update MSR load counts on a VMCS switch
Paolo Bonzini <pbonzini(a)redhat.com>
x86/KVM/VMX: Add L1D flush logic
Paolo Bonzini <pbonzini(a)redhat.com>
x86/KVM/VMX: Add L1D MSR based flush
Paolo Bonzini <pbonzini(a)redhat.com>
x86/KVM/VMX: Add L1D flush algorithm
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Add module argument for L1TF mitigation
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Boot HT siblings at least once
Thomas Gleixner <tglx(a)linutronix.de>
Revert "x86/apic: Ignore secondary threads if nosmt=force"
Michal Hocko <mhocko(a)suse.cz>
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
Vlastimil Babka <vbabka(a)suse.cz>
x86/speculation/l1tf: Protect PAE swap entries against L1TF
Borislav Petkov <bp(a)suse.de>
x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/cpufeatures: Add detection of L1D cache flush support.
Vlastimil Babka <vbabka(a)suse.cz>
x86/speculation/l1tf: Extend 64bit swap file size limit
Thomas Gleixner <tglx(a)linutronix.de>
x86/apic: Ignore secondary threads if nosmt=force
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/AMD: Evaluate smp_num_siblings early
Borislav Petkov <bp(a)suse.de>
x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/intel: Evaluate smp_num_siblings early
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/topology: Provide detect_extended_topology_early()
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/common: Provide detect_ht_early()
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/AMD: Remove the pointless detect_ht() call
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu: Remove the pointless CPU printout
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Provide knobs to control SMT
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Split do_cpu_down()
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Make bringup/teardown of smp threads symmetric
Thomas Gleixner <tglx(a)linutronix.de>
x86/topology: Provide topology_smt_supported()
Thomas Gleixner <tglx(a)linutronix.de>
x86/smp: Provide topology_is_primary_thread()
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/bugs: Move the l1tf function and define pr_fmt properly
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Limit swap file size to MAX_PA/2
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Add sysfs reporting for l1tf
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Make sure the first page is always reserved
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
Linus Torvalds <torvalds(a)linux-foundation.org>
x86/speculation/l1tf: Protect swap entries against L1TF
Linus Torvalds <torvalds(a)linux-foundation.org>
x86/speculation/l1tf: Change order of offset/type in swap entry
Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
Nick Desaulniers <ndesaulniers(a)google.com>
x86/irqflags: Provide a declaration for native_save_fl
Masami Hiramatsu <mhiramat(a)kernel.org>
kprobes/x86: Fix %p uses in error messages
Jiri Kosina <jkosina(a)suse.cz>
x86/speculation: Protect against userspace-userspace spectreRSB
Peter Zijlstra <peterz(a)infradead.org>
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
Oleksij Rempel <o.rempel(a)pengutronix.de>
ARM: dts: imx6sx: fix irq for pcie bridge
Michael Mera <dev(a)michaelmera.com>
IB/ocrdma: fix out of bounds access to local buffer
Fabio Estevam <fabio.estevam(a)nxp.com>
mtd: nand: qcom: Add a NULL check for devm_kasprintf()
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
IB/mlx4: Mark user MR as writable if actual virtual memory is writable
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
IB/core: Make testing MR flags for writability a static inline function
Eric W. Biederman <ebiederm(a)xmission.com>
proc: Fix proc_sys_prune_dcache to hold a sb reference
Eric W. Biederman <ebiederm(a)xmission.com>
proc/sysctl: Don't grab i_lock under sysctl_lock.
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
proc/sysctl: prune stale dentries during unregistering
Al Viro <viro(a)zeniv.linux.org.uk>
fix __legitimize_mnt()/mntput() race
Al Viro <viro(a)zeniv.linux.org.uk>
fix mntput/mntput race
Al Viro <viro(a)zeniv.linux.org.uk>
make sure that __dentry_kill() always invalidates d_seq, unhashed or not
Al Viro <viro(a)zeniv.linux.org.uk>
root dentries need RCU-delayed freeing
Linus Torvalds <torvalds(a)linux-foundation.org>
init: rename and re-order boot_cpu_state_init()
Bart Van Assche <bart.vanassche(a)wdc.com>
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
Hans de Goede <hdegoede(a)redhat.com>
ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't cache skb_shinfo()
Linus Torvalds <torvalds(a)linux-foundation.org>
Mark HI and TASKLET softirq synchronous
Andrey Konovalov <andreyknvl(a)google.com>
kasan: add no_sanitize attribute for clang builds
John David Anglin <dave.anglin(a)bell.net>
parisc: Define mb() and add memory barriers to assembler unlock sequences
Helge Deller <deller(a)gmx.de>
parisc: Enable CONFIG_MLONGCALLS by default
Tadeusz Struk <tadeusz.struk(a)intel.com>
tpm: fix race condition in tpm_common_write()
Theodore Ts'o <tytso(a)mit.edu>
ext4: fix check to prevent initializing reserved inodes
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 24 +
Documentation/index.rst | 1 +
Documentation/kernel-parameters.txt | 78 +++
Documentation/l1tf.rst | 610 +++++++++++++++++++++
Documentation/virtual/kvm/api.txt | 40 +-
Makefile | 4 +-
arch/Kconfig | 3 +
arch/arm/boot/dts/imx6sx.dtsi | 2 +-
arch/parisc/Kconfig | 2 +-
arch/parisc/include/asm/barrier.h | 32 ++
arch/parisc/kernel/entry.S | 2 +
arch/parisc/kernel/pacache.S | 1 +
arch/parisc/kernel/syscall.S | 4 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/apic.h | 10 +
arch/x86/include/asm/cpufeatures.h | 4 +-
arch/x86/include/asm/dmi.h | 2 +-
arch/x86/include/asm/hardirq.h | 26 +-
arch/x86/include/asm/irqflags.h | 2 +
arch/x86/include/asm/kvm_host.h | 9 +
arch/x86/include/asm/msr-index.h | 7 +
arch/x86/include/asm/page_32_types.h | 9 +-
arch/x86/include/asm/pgtable-2level.h | 17 +
arch/x86/include/asm/pgtable-3level.h | 37 +-
arch/x86/include/asm/pgtable-invert.h | 32 ++
arch/x86/include/asm/pgtable.h | 85 ++-
arch/x86/include/asm/pgtable_64.h | 48 +-
arch/x86/include/asm/pgtable_types.h | 10 +-
arch/x86/include/asm/processor.h | 17 +
arch/x86/include/asm/topology.h | 6 +-
arch/x86/include/asm/vmx.h | 11 +
arch/x86/kernel/apic/apic.c | 17 +
arch/x86/kernel/apic/htirq.c | 2 +
arch/x86/kernel/apic/io_apic.c | 1 +
arch/x86/kernel/apic/msi.c | 1 +
arch/x86/kernel/apic/vector.c | 1 +
arch/x86/kernel/cpu/amd.c | 53 +-
arch/x86/kernel/cpu/bugs.c | 171 ++++--
arch/x86/kernel/cpu/common.c | 56 +-
arch/x86/kernel/cpu/cpu.h | 2 +
arch/x86/kernel/cpu/intel.c | 7 +
arch/x86/kernel/cpu/microcode/core.c | 26 +
arch/x86/kernel/cpu/topology.c | 41 +-
arch/x86/kernel/fpu/core.c | 1 +
arch/x86/kernel/ftrace.c | 1 +
arch/x86/kernel/hpet.c | 1 +
arch/x86/kernel/i8259.c | 1 +
arch/x86/kernel/irq.c | 1 +
arch/x86/kernel/irq_32.c | 1 +
arch/x86/kernel/irq_64.c | 1 +
arch/x86/kernel/irqinit.c | 1 +
arch/x86/kernel/kprobes/core.c | 5 +-
arch/x86/kernel/kprobes/opt.c | 1 +
arch/x86/kernel/paravirt.c | 14 +-
arch/x86/kernel/setup.c | 6 +
arch/x86/kernel/smp.c | 1 +
arch/x86/kernel/smpboot.c | 18 +
arch/x86/kernel/time.c | 1 +
arch/x86/kvm/svm.c | 46 +-
arch/x86/kvm/vmx.c | 426 ++++++++++++--
arch/x86/kvm/x86.c | 133 ++++-
arch/x86/mm/fault.c | 1 +
arch/x86/mm/init.c | 23 +
arch/x86/mm/kaiser.c | 1 +
arch/x86/mm/kmmio.c | 25 +-
arch/x86/mm/mmap.c | 21 +
arch/x86/mm/pageattr.c | 8 +-
arch/x86/platform/efi/efi_64.c | 1 +
arch/x86/platform/efi/quirks.c | 1 +
.../intel-mid/device_libs/platform_mrfld_wdt.c | 1 +
arch/x86/platform/uv/tlb_uv.c | 1 +
arch/x86/xen/enlighten.c | 1 +
arch/x86/xen/setup.c | 1 +
drivers/acpi/acpi_lpss.c | 2 +
drivers/base/cpu.c | 8 +
drivers/char/tpm/tpm-dev.c | 43 +-
drivers/infiniband/core/umem.c | 11 +-
drivers/infiniband/hw/mlx4/mr.c | 50 +-
drivers/infiniband/hw/ocrdma/ocrdma_stats.c | 2 +-
drivers/mtd/nand/qcom_nandc.c | 3 +
drivers/net/xen-netfront.c | 8 +-
drivers/pci/host/pci-hyperv.c | 2 +
drivers/scsi/sr.c | 29 +-
fs/dcache.c | 13 +-
fs/ext4/ialloc.c | 5 +-
fs/ext4/super.c | 8 +-
fs/namespace.c | 28 +-
fs/proc/inode.c | 3 +-
fs/proc/internal.h | 7 +-
fs/proc/proc_sysctl.c | 83 ++-
include/asm-generic/pgtable.h | 12 +
include/linux/compiler-clang.h | 3 +
include/linux/cpu.h | 23 +-
include/linux/swapfile.h | 2 +
include/linux/sysctl.h | 1 +
include/rdma/ib_verbs.h | 14 +
include/uapi/linux/kvm.h | 2 +
init/main.c | 2 +-
kernel/cpu.c | 282 +++++++++-
kernel/smp.c | 2 +
kernel/softirq.c | 12 +-
mm/memory.c | 29 +-
mm/mprotect.c | 49 ++
mm/swapfile.c | 46 +-
tools/arch/x86/include/asm/cpufeatures.h | 4 +-
105 files changed, 2680 insertions(+), 366 deletions(-)
From: Aaron Knister <aaron.s.knister(a)nasa.gov>
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which
leaves a window where cm_rep_handler can run, set the connection up,
dequeue pending packets and leave the subsequently queued packets by
start_xmit() sitting on neigh->queue until they're dropped when the
connection is torn down. This only applies to connected mode. These
dropped packets can really upset TCP, for example, and cause
multi-minute delays in transmission for open connections.
I've got a reproducer available if it's needed.
Here's the code in start_xmit where we check to see if the connection
is up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet
in start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch re-checks ipoib_cm_up with priv->lock held to avoid this
race condition. Since odds are the conn should be up most of the time
(and thus the connection *not* down most of the time) we don't hold the
lock for the first check attempt to avoid a slowdown from unecessary
locking for the majority of the packets transmitted during the
connection's life.
Cc: stable(a)vger.kernel.org
Tested-by: Ira Weiny <ira.weiny(a)intel.com>
Signed-off-by: Aaron Knister <aaron.s.knister(a)nasa.gov>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 53 +++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 26cde95b..529dbeab 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1093,6 +1093,34 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
spin_unlock_irqrestore(&priv->lock, flags);
}
+static void defer_neigh_skb(struct sk_buff *skb, struct net_device *dev,
+ struct ipoib_neigh *neigh,
+ struct ipoib_pseudo_header *phdr,
+ unsigned long *flags)
+{
+ struct ipoib_dev_priv *priv = ipoib_priv(dev);
+ unsigned long local_flags;
+ int acquire_priv_lock = 0;
+
+ /* Passing in pointer to spin_lock flags indicates spin lock
+ * already acquired so we don't need to acquire the priv lock */
+ if (flags == NULL) {
+ flags = &local_flags;
+ acquire_priv_lock = 1;
+ }
+
+ if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
+ push_pseudo_header(skb, phdr->hwaddr);
+ if (acquire_priv_lock)
+ spin_lock_irqsave(&priv->lock, *flags);
+ __skb_queue_tail(&neigh->queue, skb);
+ spin_unlock_irqrestore(&priv->lock, *flags);
+ } else {
+ ++dev->stats.tx_dropped;
+ dev_kfree_skb_any(skb);
+ }
+}
+
static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -1160,6 +1188,21 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
+ /*
+ * Re-check ipoib_cm_up with priv->lock held to avoid
+ * race condition between start_xmit and skb_dequeue in
+ * cm_rep_handler. Since odds are the conn should be up
+ * most of the time, we don't hold the lock for the
+ * first check above
+ */
+ spin_lock_irqsave(&priv->lock, flags);
+ if (ipoib_cm_up(neigh)) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+ } else {
+ defer_neigh_skb(skb, dev, neigh, phdr, &flags);
+ }
+ goto unref;
} else if (neigh->ah && neigh->ah->valid) {
neigh->ah->last_send = rn->send(dev, skb, neigh->ah->ah,
IPOIB_QPN(phdr->hwaddr));
@@ -1168,15 +1211,7 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
neigh_refresh_path(neigh, phdr->hwaddr, dev);
}
- if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
- push_pseudo_header(skb, phdr->hwaddr);
- spin_lock_irqsave(&priv->lock, flags);
- __skb_queue_tail(&neigh->queue, skb);
- spin_unlock_irqrestore(&priv->lock, flags);
- } else {
- ++dev->stats.tx_dropped;
- dev_kfree_skb_any(skb);
- }
+ defer_neigh_skb(skb, dev, neigh, phdr, NULL);
unref:
ipoib_neigh_put(neigh);
--
2.12.3
From: Andi Kleen <ak(a)linux.intel.com>
The stable backport of the
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
patch for 4.4 and 4.9 put new C code for !__HAVE_ARCH_PFN_MODIFY_ALLOWED
code outside the assembler ifdef. This breaks the xtensa and ia64
build as reported by 0day which somehow include this file
into assembler.
Just add an #ifdef __ASSEMBLY__ around the new code to fix this.
This patch is only needed for 4.9 and 4.4 stable, the newer stables
don't have this problem.
Fixes: 7c5b42f82c13 ("x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings")
Signed-off-by: Andi Kleen <ak(a)linux.intel.com>
---
include/asm-generic/pgtable.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index a88ea9e37a25..abc2a1b15dd8 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -825,6 +825,7 @@ static inline int pmd_free_pte_page(pmd_t *pmd)
#endif
#endif
+#ifndef __ASSEMBLY__
struct file;
int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t *vma_prot);
@@ -839,6 +840,9 @@ static inline bool arch_has_pfn_modify_check(void)
{
return false;
}
+
+#endif
+
#endif /* !_HAVE_ARCH_PFN_MODIFY_ALLOWED */
#endif /* !__ASSEMBLY__ */
--
2.17.1
This is the start of the stable review cycle for the 4.18.1 release.
There are 79 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Aug 16 17:13:16 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.18.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.18.1-rc1
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/microcode: Allow late microcode loading with SMT disabled
David Woodhouse <dwmw(a)amazon.co.uk>
tools headers: Synchronise x86 cpufeatures.h for L1TF additions
Andi Kleen <ak(a)linux.intel.com>
x86/mm/kmmio: Make the tracer robust against L1TF
Andi Kleen <ak(a)linux.intel.com>
x86/mm/pat: Make set_memory_np() L1TF safe
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Invert all not present mappings
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Fix SMT supported evaluation
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry
Paolo Bonzini <pbonzini(a)redhat.com>
x86/speculation: Use ARCH_CAPABILITIES to skip L1D flush on vmentry
Paolo Bonzini <pbonzini(a)redhat.com>
x86/speculation: Simplify sysfs report of VMX L1TF vulnerability
Thomas Gleixner <tglx(a)linutronix.de>
Documentation/l1tf: Remove Yonah processors from not vulnerable list
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Don't set l1tf_flush_l1d from vmx_handle_external_intr()
Nicolai Stange <nstange(a)suse.de>
x86/irq: Let interrupt handlers set kvm_cpu_l1tf_flush_l1d
Nicolai Stange <nstange(a)suse.de>
x86: Don't include linux/irq.h from asm/hardirq.h
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Introduce per-host-cpu analogue of l1tf_flush_l1d
Nicolai Stange <nstange(a)suse.de>
x86/irq: Demote irq_cpustat_t::__softirq_pending to u16
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Move the l1tf_flush_l1d test to vmx_l1d_flush()
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Replace 'vmx_l1d_flush_always' with 'vmx_l1d_flush_cond'
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Don't set l1tf_flush_l1d to true from vmx_l1d_flush()
Josh Poimboeuf <jpoimboe(a)redhat.com>
cpu/hotplug: detect SMT disabled by BIOS
Tony Luck <tony.luck(a)intel.com>
Documentation/l1tf: Fix typos
Nicolai Stange <nstange(a)suse.de>
x86/KVM/VMX: Initialize the vmx_l1d_flush_pages' content
Jiri Kosina <jkosina(a)suse.cz>
x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
Thomas Gleixner <tglx(a)linutronix.de>
Documentation: Add section about CPU vulnerabilities
Jiri Kosina <jkosina(a)suse.cz>
x86/bugs, kvm: Introduce boot-time control of L1TF mitigations
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Set CPU_SMT_NOT_SUPPORTED early
Jiri Kosina <jkosina(a)suse.cz>
cpu/hotplug: Expose SMT control init function
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Allow runtime control of L1D flush
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Serialize L1D flush parameter setter
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Add static key for flush always
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Move l1tf setup function
Thomas Gleixner <tglx(a)linutronix.de>
x86/l1tf: Handle EPT disabled state proper
Thomas Gleixner <tglx(a)linutronix.de>
x86/kvm: Drop L1TF MSR list approach
Thomas Gleixner <tglx(a)linutronix.de>
x86/litf: Introduce vmx status variable
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Online siblings when SMT control is turned on
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Use MSR save list for IA32_FLUSH_CMD if required
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Extend add_atomic_switch_msr() to allow VMENTER only MSRs
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Separate the VMX AUTOLOAD guest/host number accounting
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Add find_msr() helper function
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Split the VMX MSR LOAD structures to have an host/guest numbers
Paolo Bonzini <pbonzini(a)redhat.com>
x86/KVM/VMX: Add L1D flush logic
Paolo Bonzini <pbonzini(a)redhat.com>
x86/KVM/VMX: Add L1D MSR based flush
Paolo Bonzini <pbonzini(a)redhat.com>
x86/KVM/VMX: Add L1D flush algorithm
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM/VMX: Add module argument for L1TF mitigation
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/KVM: Warn user if KVM is loaded SMT and L1TF CPU bug being present
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Boot HT siblings at least once
Thomas Gleixner <tglx(a)linutronix.de>
Revert "x86/apic: Ignore secondary threads if nosmt=force"
Michal Hocko <mhocko(a)suse.cz>
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
Vlastimil Babka <vbabka(a)suse.cz>
x86/speculation/l1tf: Protect PAE swap entries against L1TF
Borislav Petkov <bp(a)suse.de>
x86/CPU/AMD: Move TOPOEXT reenablement before reading smp_num_siblings
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/cpufeatures: Add detection of L1D cache flush support.
Vlastimil Babka <vbabka(a)suse.cz>
x86/speculation/l1tf: Extend 64bit swap file size limit
Thomas Gleixner <tglx(a)linutronix.de>
x86/apic: Ignore secondary threads if nosmt=force
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/AMD: Evaluate smp_num_siblings early
Borislav Petkov <bp(a)suse.de>
x86/CPU/AMD: Do not check CPUID max ext level before parsing SMP info
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/intel: Evaluate smp_num_siblings early
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/topology: Provide detect_extended_topology_early()
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/common: Provide detect_ht_early()
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu/AMD: Remove the pointless detect_ht() call
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu: Remove the pointless CPU printout
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Provide knobs to control SMT
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Split do_cpu_down()
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Make bringup/teardown of smp threads symmetric
Thomas Gleixner <tglx(a)linutronix.de>
x86/topology: Provide topology_smt_supported()
Thomas Gleixner <tglx(a)linutronix.de>
x86/smp: Provide topology_is_primary_thread()
Peter Zijlstra <peterz(a)infradead.org>
sched/smt: Update sched_smt_present at runtime
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
x86/bugs: Move the l1tf function and define pr_fmt properly
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Limit swap file size to MAX_PA/2
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Add sysfs reporting for l1tf
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Make sure the first page is always reserved
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
Linus Torvalds <torvalds(a)linux-foundation.org>
x86/speculation/l1tf: Protect swap entries against L1TF
Linus Torvalds <torvalds(a)linux-foundation.org>
x86/speculation/l1tf: Change order of offset/type in swap entry
Andi Kleen <ak(a)linux.intel.com>
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
Nick Desaulniers <ndesaulniers(a)google.com>
x86/irqflags: Provide a declaration for native_save_fl
Masami Hiramatsu <mhiramat(a)kernel.org>
kprobes/x86: Fix %p uses in error messages
Jiri Kosina <jkosina(a)suse.cz>
x86/speculation: Protect against userspace-userspace spectreRSB
Peter Zijlstra <peterz(a)infradead.org>
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 24 +
Documentation/admin-guide/index.rst | 9 +
Documentation/admin-guide/kernel-parameters.txt | 78 +++
Documentation/admin-guide/l1tf.rst | 610 +++++++++++++++++++++
Makefile | 4 +-
arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/apic.h | 9 +
arch/x86/include/asm/cpufeatures.h | 3 +
arch/x86/include/asm/dmi.h | 2 +-
arch/x86/include/asm/hardirq.h | 26 +-
arch/x86/include/asm/irqflags.h | 2 +
arch/x86/include/asm/kvm_host.h | 6 +
arch/x86/include/asm/msr-index.h | 7 +
arch/x86/include/asm/page_32_types.h | 9 +-
arch/x86/include/asm/pgtable-2level.h | 17 +
arch/x86/include/asm/pgtable-3level.h | 37 +-
arch/x86/include/asm/pgtable-invert.h | 32 ++
arch/x86/include/asm/pgtable.h | 74 ++-
arch/x86/include/asm/pgtable_64.h | 38 +-
arch/x86/include/asm/processor.h | 17 +
arch/x86/include/asm/topology.h | 6 +-
arch/x86/include/asm/vmx.h | 11 +
arch/x86/kernel/apic/apic.c | 16 +
arch/x86/kernel/apic/io_apic.c | 1 +
arch/x86/kernel/apic/msi.c | 1 +
arch/x86/kernel/apic/vector.c | 1 +
arch/x86/kernel/cpu/amd.c | 51 +-
arch/x86/kernel/cpu/bugs.c | 171 ++++--
arch/x86/kernel/cpu/common.c | 56 +-
arch/x86/kernel/cpu/cpu.h | 2 +
arch/x86/kernel/cpu/intel.c | 7 +
arch/x86/kernel/cpu/microcode/core.c | 16 +-
arch/x86/kernel/cpu/topology.c | 41 +-
arch/x86/kernel/fpu/core.c | 1 +
arch/x86/kernel/hpet.c | 1 +
arch/x86/kernel/i8259.c | 1 +
arch/x86/kernel/idt.c | 1 +
arch/x86/kernel/irq.c | 1 +
arch/x86/kernel/irq_32.c | 1 +
arch/x86/kernel/irq_64.c | 1 +
arch/x86/kernel/irqinit.c | 1 +
arch/x86/kernel/kprobes/core.c | 5 +-
arch/x86/kernel/paravirt.c | 14 +-
arch/x86/kernel/setup.c | 6 +
arch/x86/kernel/smp.c | 1 +
arch/x86/kernel/smpboot.c | 18 +
arch/x86/kernel/time.c | 1 +
arch/x86/kvm/mmu.c | 1 +
arch/x86/kvm/vmx.c | 455 ++++++++++++---
arch/x86/kvm/x86.c | 34 +-
arch/x86/mm/init.c | 23 +
arch/x86/mm/kmmio.c | 25 +-
arch/x86/mm/mmap.c | 21 +
arch/x86/mm/pageattr.c | 8 +-
arch/x86/mm/pti.c | 1 +
.../intel-mid/device_libs/platform_mrfld_wdt.c | 1 +
arch/x86/platform/uv/tlb_uv.c | 1 +
arch/x86/xen/enlighten.c | 1 +
drivers/base/cpu.c | 8 +
drivers/gpu/drm/i915/i915_pmu.c | 1 +
drivers/gpu/drm/i915/intel_lpe_audio.c | 1 +
drivers/pci/controller/pci-hyperv.c | 1 +
include/asm-generic/pgtable.h | 12 +
include/linux/cpu.h | 21 +
include/linux/swapfile.h | 2 +
kernel/cpu.c | 280 +++++++++-
kernel/sched/core.c | 30 +-
kernel/sched/fair.c | 1 +
kernel/smp.c | 2 +
mm/memory.c | 37 +-
mm/mprotect.c | 49 ++
mm/swapfile.c | 46 +-
tools/arch/x86/include/asm/cpufeatures.h | 3 +
74 files changed, 2206 insertions(+), 300 deletions(-)
ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input
before seeing if the PFN is valid. This leads to false positives when
some of the upper bits are set, but the lower bits match a valid PFN.
For example, the following userspace code looks up a bogus entry in
/proc/kpageflags:
int pagemap = open("/proc/self/pagemap", O_RDONLY);
int pageflags = open("/proc/kpageflags", O_RDONLY);
uint64_t pfn, val;
lseek64(pagemap, [...], SEEK_SET);
read(pagemap, &pfn, sizeof(pfn));
if (pfn & (1UL << 63)) { /* valid PFN */
pfn &= ((1UL << 55) - 1); /* clear flag bits */
pfn |= (1UL << 55);
lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET);
read(pageflags, &val, sizeof(val));
}
On ARM64 this causes the userspace process to crash with SIGSEGV rather
than reading (1 << KPF_NOPAGE). kpageflags_read() treats the offset as
valid, and stable_page_flags() will try to access an address between the
user and kernel address ranges.
Fixes: c1cc1552616d ("arm64: MMU initialisation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Greg Hackmann <ghackmann(a)google.com>
---
arch/arm64/mm/init.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 9abf8a1e7b25..787e27964ab9 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -287,7 +287,11 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
#ifdef CONFIG_HAVE_ARCH_PFN_VALID
int pfn_valid(unsigned long pfn)
{
- return memblock_is_map_memory(pfn << PAGE_SHIFT);
+ phys_addr_t addr = pfn << PAGE_SHIFT;
+
+ if ((addr >> PAGE_SHIFT) != pfn)
+ return 0;
+ return memblock_is_map_memory(addr);
}
EXPORT_SYMBOL(pfn_valid);
#endif
--
2.18.0.865.gffc8e1a3cd6-goog
It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Karol Herbst <kherbst(a)redhat.com>
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Cc: stable(a)vger.kernel.org
Cc: Lukas Wunner <lukas(a)wunner.de>
---
drivers/gpu/drm/nouveau/nouveau_connector.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 2a45b4c2ceb0..010d6db14cba 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -572,12 +572,16 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
nv_connector->edid = NULL;
}
- /* Outputs are only polled while runtime active, so acquiring a
- * runtime PM ref here is unnecessary (and would deadlock upon
- * runtime suspend because it waits for polling to finish).
+ /* Outputs are only polled while runtime active, so resuming the
+ * device here is unnecessary (and would deadlock upon runtime suspend
+ * because it waits for polling to finish). We do however, want to
+ * prevent the autosuspend timer from elapsing during this operation
+ * if possible.
*/
- if (!drm_kms_helper_is_poll_worker()) {
- ret = pm_runtime_get_sync(connector->dev->dev);
+ if (drm_kms_helper_is_poll_worker()) {
+ pm_runtime_get_noresume(dev->dev);
+ } else {
+ ret = pm_runtime_get_sync(dev->dev);
if (ret < 0 && ret != -EACCES)
return conn_status;
}
@@ -655,10 +659,8 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
out:
- if (!drm_kms_helper_is_poll_worker()) {
- pm_runtime_mark_last_busy(connector->dev->dev);
- pm_runtime_put_autosuspend(connector->dev->dev);
- }
+ pm_runtime_mark_last_busy(dev->dev);
+ pm_runtime_put_autosuspend(dev->dev);
return conn_status;
}
--
2.17.1
Currently, nouveau uses the generic drm_fb_helper_output_poll_changed()
function provided by DRM as it's output_poll_changed callback.
Unfortunately however, this function doesn't grab runtime PM references
early enough and even if it did-we can't block waiting for the device to
resume in output_poll_changed() since it's very likely that we'll need
to grab the fb_helper lock at some point during the runtime resume
process. This currently results in deadlocking like so:
[ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.676527] kworker/4:0 D 0 37 2 0x80000000
[ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[ 246.678704] Call Trace:
[ 246.679753] __schedule+0x322/0xaf0
[ 246.680916] schedule+0x33/0x90
[ 246.681924] schedule_preempt_disabled+0x15/0x20
[ 246.683023] __mutex_lock+0x569/0x9a0
[ 246.684035] ? kobject_uevent_env+0x117/0x7b0
[ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.686179] mutex_lock_nested+0x1b/0x20
[ 246.687278] ? mutex_lock_nested+0x1b/0x20
[ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[ 246.692611] process_one_work+0x231/0x620
[ 246.693725] worker_thread+0x214/0x3a0
[ 246.694756] kthread+0x12b/0x150
[ 246.695856] ? wq_pool_ids_show+0x140/0x140
[ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.697998] ret_from_fork+0x3a/0x50
[ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.702278] kworker/0:1 D 0 60 2 0x80000000
[ 246.703293] Workqueue: pm pm_runtime_work
[ 246.704393] Call Trace:
[ 246.705403] __schedule+0x322/0xaf0
[ 246.706439] ? wait_for_completion+0x104/0x190
[ 246.707393] schedule+0x33/0x90
[ 246.708375] schedule_timeout+0x3a5/0x590
[ 246.709289] ? mark_held_locks+0x58/0x80
[ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40
[ 246.711222] ? wait_for_completion+0x104/0x190
[ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190
[ 246.713094] ? wait_for_completion+0x104/0x190
[ 246.713964] wait_for_completion+0x12c/0x190
[ 246.714895] ? wake_up_q+0x80/0x80
[ 246.715727] ? get_work_pool+0x90/0x90
[ 246.716649] flush_work+0x1c9/0x280
[ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[ 246.718442] __cancel_work_timer+0x146/0x1d0
[ 246.719247] cancel_delayed_work_sync+0x13/0x20
[ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[ 246.721897] pci_pm_runtime_suspend+0x6b/0x190
[ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.723737] __rpm_callback+0x7a/0x1d0
[ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.725607] rpm_callback+0x24/0x80
[ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.727376] rpm_suspend+0x142/0x6b0
[ 246.728185] pm_runtime_work+0x97/0xc0
[ 246.728938] process_one_work+0x231/0x620
[ 246.729796] worker_thread+0x44/0x3a0
[ 246.730614] kthread+0x12b/0x150
[ 246.731395] ? wq_pool_ids_show+0x140/0x140
[ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.732878] ret_from_fork+0x3a/0x50
[ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.736113] kworker/4:2 D 0 422 2 0x80000080
[ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[ 246.737665] Call Trace:
[ 246.738490] __schedule+0x322/0xaf0
[ 246.739250] schedule+0x33/0x90
[ 246.739908] rpm_resume+0x19c/0x850
[ 246.740750] ? finish_wait+0x90/0x90
[ 246.741541] __pm_runtime_resume+0x4e/0x90
[ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[ 246.743124] drm_atomic_commit+0x4a/0x50 [drm]
[ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau]
[ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[ 246.752314] process_one_work+0x231/0x620
[ 246.752979] worker_thread+0x44/0x3a0
[ 246.753838] kthread+0x12b/0x150
[ 246.754619] ? wq_pool_ids_show+0x140/0x140
[ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.756162] ret_from_fork+0x3a/0x50
[ 246.756847]
Showing all locks held in the system:
[ 246.758261] 3 locks held by kworker/4:0/37:
[ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.761516] 2 locks held by kworker/0:1/60:
[ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.763890] 1 lock held by khungtaskd/64:
[ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[ 246.765588] 5 locks held by kworker/4:2/422:
[ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[ 246.770839] 1 lock held by dmesg/1038:
[ 246.771739] 2 locks held by zsh/1172:
[ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[ 246.775522] =============================================
After trying dozens of different solutions, I found one very simple one
that should also have the benefit of preventing us from having to fight
locking for the rest of our lives. So, we work around these deadlocks by
deferring all fbcon hotplug events that happen after the runtime suspend
process starts until after the device is resumed again.
Changes since v7:
- Fixup commit message - Daniel Vetter
Changes since v6:
- Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia
Changes since v5:
- Come up with the (hopefully final) solution for solving this dumb
problem, one that is a lot less likely to cause issues with locking in
the future. This should work around all deadlock conditions with fbcon
brought up thus far.
Changes since v4:
- Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
condition that Lukas described
- Just move all of this out of drm_fb_helper. It seems that other DRM
drivers have already figured out other workarounds for this. If other
drivers do end up needing this in the future, we can just move this
back into drm_fb_helper again.
Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
the following conditions are true (as otherwise, calling this in the
wrong spot will cause Bad Things to happen):
- fb_helper hotplug handling was actually inhibited previously
- fb_helper actually has a delayed hotplug pending
- fb_helper is actually bound
- fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
situation where a driver would actually want to use this without
checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
isn't enabled
- Actually grab the toplevel fb_helper lock in
drm_fb_helper_resume_hotplug(), since it's possible other activity
(such as a hotplug) could be going on at the same time the driver
calls drm_fb_helper_resume_hotplug(). We need this to check whether or
not drm_fb_helper_hotplug_event() needs to be called anyway
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Reviewed-by: Karol Herbst <kherbst(a)redhat.com>
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Cc: stable(a)vger.kernel.org
Cc: Lukas Wunner <lukas(a)wunner.de>
---
drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_display.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_fbcon.c | 57 +++++++++++++++++++++++
drivers/gpu/drm/nouveau/nouveau_fbcon.h | 5 ++
4 files changed, 64 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 8b522a9b12f6..a0772389ed90 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2049,7 +2049,7 @@ nv50_disp_atomic_state_alloc(struct drm_device *dev)
static const struct drm_mode_config_funcs
nv50_disp_func = {
.fb_create = nouveau_user_framebuffer_create,
- .output_poll_changed = drm_fb_helper_output_poll_changed,
+ .output_poll_changed = nouveau_fbcon_output_poll_changed,
.atomic_check = nv50_disp_atomic_check,
.atomic_commit = nv50_disp_atomic_commit,
.atomic_state_alloc = nv50_disp_atomic_state_alloc,
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 1d36ab5d4796..4b873e668b26 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -293,7 +293,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev,
static const struct drm_mode_config_funcs nouveau_mode_config_funcs = {
.fb_create = nouveau_user_framebuffer_create,
- .output_poll_changed = drm_fb_helper_output_poll_changed,
+ .output_poll_changed = nouveau_fbcon_output_poll_changed,
};
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 85c1f10bc2b6..8cf966690963 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -466,6 +466,7 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work)
console_unlock();
if (state == FBINFO_STATE_RUNNING) {
+ nouveau_fbcon_hotplug_resume(drm->fbcon);
pm_runtime_mark_last_busy(drm->dev->dev);
pm_runtime_put_sync(drm->dev->dev);
}
@@ -487,6 +488,61 @@ nouveau_fbcon_set_suspend(struct drm_device *dev, int state)
schedule_work(&drm->fbcon_work);
}
+void
+nouveau_fbcon_output_poll_changed(struct drm_device *dev)
+{
+ struct nouveau_drm *drm = nouveau_drm(dev);
+ struct nouveau_fbdev *fbcon = drm->fbcon;
+ int ret;
+
+ if (!fbcon)
+ return;
+
+ mutex_lock(&fbcon->hotplug_lock);
+
+ ret = pm_runtime_get(dev->dev);
+ if (ret == 1 || ret == -EACCES) {
+ drm_fb_helper_hotplug_event(&fbcon->helper);
+
+ pm_runtime_mark_last_busy(dev->dev);
+ pm_runtime_put_autosuspend(dev->dev);
+ } else if (ret == 0) {
+ /* If the GPU was already in the process of suspending before
+ * this event happened, then we can't block here as we'll
+ * deadlock the runtime pmops since they wait for us to
+ * finish. So, just defer this event for when we runtime
+ * resume again. It will be handled by fbcon_work.
+ */
+ NV_DEBUG(drm, "fbcon HPD event deferred until runtime resume\n");
+ fbcon->hotplug_waiting = true;
+ pm_runtime_put_noidle(drm->dev->dev);
+ } else {
+ DRM_WARN("fbcon HPD event lost due to RPM failure: %d\n",
+ ret);
+ }
+
+ mutex_unlock(&fbcon->hotplug_lock);
+}
+
+void
+nouveau_fbcon_hotplug_resume(struct nouveau_fbdev *fbcon)
+{
+ struct nouveau_drm *drm;
+
+ if (!fbcon)
+ return;
+ drm = nouveau_drm(fbcon->helper.dev);
+
+ mutex_lock(&fbcon->hotplug_lock);
+ if (fbcon->hotplug_waiting) {
+ fbcon->hotplug_waiting = false;
+
+ NV_DEBUG(drm, "Handling deferred fbcon HPD events\n");
+ drm_fb_helper_hotplug_event(&fbcon->helper);
+ }
+ mutex_unlock(&fbcon->hotplug_lock);
+}
+
int
nouveau_fbcon_init(struct drm_device *dev)
{
@@ -505,6 +561,7 @@ nouveau_fbcon_init(struct drm_device *dev)
drm->fbcon = fbcon;
INIT_WORK(&drm->fbcon_work, nouveau_fbcon_set_suspend_work);
+ mutex_init(&fbcon->hotplug_lock);
drm_fb_helper_prepare(dev, &fbcon->helper, &nouveau_fbcon_helper_funcs);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.h b/drivers/gpu/drm/nouveau/nouveau_fbcon.h
index a6f192ea3fa6..db9d52047ef8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.h
@@ -41,6 +41,9 @@ struct nouveau_fbdev {
struct nvif_object gdi;
struct nvif_object blit;
struct nvif_object twod;
+
+ struct mutex hotplug_lock;
+ bool hotplug_waiting;
};
void nouveau_fbcon_restore(void);
@@ -68,6 +71,8 @@ void nouveau_fbcon_set_suspend(struct drm_device *dev, int state);
void nouveau_fbcon_accel_save_disable(struct drm_device *dev);
void nouveau_fbcon_accel_restore(struct drm_device *dev);
+void nouveau_fbcon_output_poll_changed(struct drm_device *dev);
+void nouveau_fbcon_hotplug_resume(struct nouveau_fbdev *fbcon);
extern int nouveau_nofbaccel;
#endif /* __NV50_FBCON_H__ */
--
2.17.1
Since actual hotplug notifications don't get disabled until
nouveau_display_fini() is called, all this will do is cause any hotplugs
that happen between this drm_kms_helper_poll_disable() call and the
actual hotplug disablement to potentially be dropped if ACPI isn't
around to help us.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Acked-by: Karol Herbst <kherbst(a)redhat.com>
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Cc: stable(a)vger.kernel.org
Cc: Lukas Wunner <lukas(a)wunner.de>
---
drivers/gpu/drm/nouveau/nouveau_drm.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c
index c7ec86d6c3c9..5fdc1fbe2ee5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -835,7 +835,6 @@ nouveau_pmops_runtime_suspend(struct device *dev)
return -EBUSY;
}
- drm_kms_helper_poll_disable(drm_dev);
nouveau_switcheroo_optimus_dsm();
ret = nouveau_do_suspend(drm_dev, true);
pci_save_state(pdev);
--
2.17.1
Turns out this part is my fault for not noticing when reviewing
9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling"). Currently
we call drm_kms_helper_poll_enable() from nouveau_display_hpd_work().
This makes basically no sense however, because that means we're calling
drm_kms_helper_poll_enable() every time we schedule the hotplug
detection work. This is also against the advice mentioned in
drm_kms_helper_poll_enable()'s documentation:
Note that calls to enable and disable polling must be strictly ordered,
which is automatically the case when they're only call from
suspend/resume callbacks.
Of course, hotplugs can't really be ordered. They could even happen
immediately after we called drm_kms_helper_poll_disable() in
nouveau_display_fini(), which can lead to all sorts of issues.
Additionally; enabling polling /after/ we call
drm_helper_hpd_irq_event() could also mean that we'd miss a hotplug
event anyway, since drm_helper_hpd_irq_event() wouldn't bother trying to
probe connectors so long as polling is disabled.
So; simply move this back into nouveau_display_init() again. The race
condition that both of these patches attempted to work around has
already been fixed properly in
d61a5c106351 ("drm/nouveau: Fix deadlock on runtime suspend")
Fixes: 9a2eba337cace ("drm/nouveau: Fix drm poll_helper handling")
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Acked-by: Karol Herbst <kherbst(a)redhat.com>
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: Peter Ujfalusi <peter.ujfalusi(a)ti.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/nouveau/nouveau_display.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index ec7861457b84..1d36ab5d4796 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -355,8 +355,6 @@ nouveau_display_hpd_work(struct work_struct *work)
pm_runtime_get_sync(drm->dev->dev);
drm_helper_hpd_irq_event(drm->dev);
- /* enable polling for external displays */
- drm_kms_helper_poll_enable(drm->dev);
pm_runtime_mark_last_busy(drm->dev->dev);
pm_runtime_put_sync(drm->dev->dev);
@@ -411,6 +409,11 @@ nouveau_display_init(struct drm_device *dev)
if (ret)
return ret;
+ /* enable connector detection and polling for connectors without HPD
+ * support
+ */
+ drm_kms_helper_poll_enable(dev);
+
/* enable hotplug interrupts */
drm_connector_list_iter_begin(dev, &conn_iter);
nouveau_for_each_non_mst_connector_iter(connector, &conn_iter) {
--
2.17.1
On Fri, Aug 10, 2018 at 02:09:01AM +0000, Felipe Franciosi wrote:
> Hi Ming (and all),
>
> Your series "scsi: virtio_scsi: fix IO hang caused by irq vector automatic affinity" which forces virtio-scsi to use blk-mq fixes an issue introduced by 84676c1f. We noticed that this bug also exists in 4.14.y (as ef86f3a72adb), but your series was not backported to that stable branch.
>
> Are there any plans to do that? At least CoreOS is using 4.14 and showing issues on AHV (which provides an mq virtio-scsi controller).
>
Hi Felipe,
Looks the following 4 patches should have been marked as stable, sorry
for missing that.
b5b6e8c8d3b4 scsi: virtio_scsi: fix IO hang caused by automatic irq vector affinity
2f31115e940c scsi: core: introduce force_blk_mq
adbe552349f2 scsi: megaraid_sas: fix selection of reply queue
8b834bff1b73 scsi: hpsa: fix selection of reply queue
Usually this backporting is done by our stable guys, so I will CC stable
and leave them handle it, but I am happy to provide any help for
addressing conflicts or sort of thing.
Thanks,
Ming
> From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Date: 2018年8月15日周三 上午8:15
> Subject: Re: [PATCH 4.14 000/104] 4.14.63-stable review
> To: <linux-kernel(a)vger.kernel.org>
> Cc: <torvalds(a)linux-foundation.org>, <akpm(a)linux-foundation.org>,
> <linux(a)roeck-us.net>, <shuah(a)kernel.org>, <patches(a)kernelci.org>,
> <ben.hutchings(a)codethink.co.uk>, <lkft-triage(a)lists.linaro.org>,
> <stable(a)vger.kernel.org>
>
>
> On Tue, Aug 14, 2018 at 07:16:14PM +0200, Greg Kroah-Hartman wrote:
> > This is the start of the stable review cycle for the 4.14.63 release.
> > There are 104 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Thu Aug 16 17:14:49 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc…
>
> -rc2 is now out:
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.63-rc…
Merged to our tree, tested on my test machines with basic functional
tests, all looks fine.
Thanks,
--
Jack Wang
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin
Tel: +49 30 577 008 042
Fax: +49 30 577 008 299
Email: jinpu.wang(a)profitbricks.com
URL: https://www.profitbricks.de
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss, Matthias Steinberg, Christoph Steffens
The patch
spi: fix IDR collision on systems with both fixed and dynamic SPI bus numbers
has been applied to the spi tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From 1a4327fbf4554d5b78d75b19a13d40d6de220159 Mon Sep 17 00:00:00 2001
From: Kirill Kapranov <kirill.kapranov(a)compulab.co.il>
Date: Mon, 13 Aug 2018 19:48:10 +0300
Subject: [PATCH] spi: fix IDR collision on systems with both fixed and dynamic
SPI bus numbers
On systems where some controllers get a dynamic ID assigned and some have
a fixed number (e.g. from ACPI tables), the current implementation might
run into an IDR collision: in case of a fixed bus number is gotten by a
driver (but not marked busy in IDR tree) and a driver with dynamic bus
number gets the same ID and predictably fails.
Fix this by means of checking-in fixed IDsin IDR as far as dynamic ones
at the moment of the controller registration.
Fixes: 9b61e302210e (spi: Pick spi bus number from Linux idr or spi alias)
Signed-off-by: Kirill Kapranov <kirill.kapranov(a)compulab.co.il>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/spi/spi.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index ec395a6baf9c..a00d006d4c3a 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2170,6 +2170,15 @@ int spi_register_controller(struct spi_controller *ctlr)
if (WARN(id < 0, "couldn't get idr"))
return id;
ctlr->bus_num = id;
+ } else {
+ /* devices with a fixed bus num must check-in with the num */
+ mutex_lock(&board_lock);
+ id = idr_alloc(&spi_master_idr, ctlr, ctlr->bus_num,
+ ctlr->bus_num + 1, GFP_KERNEL);
+ mutex_unlock(&board_lock);
+ if (WARN(id < 0, "couldn't get idr"))
+ return id == -ENOSPC ? -EBUSY : id;
+ ctlr->bus_num = id;
}
INIT_LIST_HEAD(&ctlr->queue);
spin_lock_init(&ctlr->queue_lock);
--
2.18.0
From: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
commit b89b41d0b8414690ec0030c134b8bde209e6d06c upstream
Current cpu_core_id fixup causes downcored F17h configurations to be
incorrect:
NODE: 0
processor 0 core id : 0
processor 1 core id : 1
processor 2 core id : 2
processor 3 core id : 4
processor 4 core id : 5
processor 5 core id : 0
NODE: 1
processor 6 core id : 2
processor 7 core id : 3
processor 8 core id : 4
processor 9 core id : 0
processor 10 core id : 1
processor 11 core id : 2
Code that relies on the cpu_core_id, like match_smt(), for example,
which builds the thread siblings masks used by the scheduler, is
mislead.
So, limit the fixup to pre-F17h machines. The new value for cpu_core_id
for F17h and later will represent the CPUID_Fn8000001E_EBX[CoreId],
which is guaranteed to be unique for each core within a socket.
This way we have:
NODE: 0
processor 0 core id : 0
processor 1 core id : 1
processor 2 core id : 2
processor 3 core id : 4
processor 4 core id : 5
processor 5 core id : 6
NODE: 1
processor 6 core id : 8
processor 7 core id : 9
processor 8 core id : 10
processor 9 core id : 12
processor 10 core id : 13
processor 11 core id : 14
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
[ Heavily massaged. ]
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Yazen Ghannam <Yazen.Ghannam(a)amd.com>
Link: http://lkml.kernel.org/r/20170731085159.9455-2-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
---
arch/x86/kernel/cpu/amd.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 6de596449488..e864ff6cd8bd 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -305,6 +305,22 @@ static void amd_get_topology_early(struct cpuinfo_x86 *c)
}
/*
+ * Fix up cpu_core_id for pre-F17h systems to be in the
+ * [0 .. cores_per_node - 1] range. Not really needed but
+ * kept so as not to break existing setups.
+ */
+static void legacy_fixup_core_id(struct cpuinfo_x86 *c)
+{
+ u32 cus_per_node;
+
+ if (c->x86 >= 0x17)
+ return;
+
+ cus_per_node = c->x86_max_cores / nodes_per_socket;
+ c->cpu_core_id %= cus_per_node;
+}
+
+/*
* Fixup core topology information for
* (1) AMD multi-node processors
* Assumption: Number of cores in each internal node is the same.
@@ -359,15 +375,9 @@ static void amd_get_topology(struct cpuinfo_x86 *c)
} else
return;
- /* fixup multi-node processor information */
if (nodes_per_socket > 1) {
- u32 cus_per_node;
-
set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cus_per_node = c->x86_max_cores / nodes_per_socket;
-
- /* core id has to be in the [0 .. cores_per_node - 1] range */
- c->cpu_core_id %= cus_per_node;
+ legacy_fixup_core_id(c);
}
}
#endif
--
2.7.4
The commit 8844618d8aa7 ("ext4: only look at the bg_flags field if it
is valid") introduced an issue which are seeing when running "adb
remount" on Android devices with the affected kernels. This change
appears on 4.4.y and later.
> EXT4-fs error (device vdd): ext4_has_uninit_itable:2882: comm remount svc 50: Inode table for bg 0 marked as needing zeroing
> Kernel panic - not syncing: EXT4-fs (device vdd): panic forced after error
Looks like this fix was already picked up for 4.14.y, 4.17.y but
(AFAIK) it isn't on anybody's radar for 4.4 and 4.9. Thanks!
The patch titled
Subject: drivers/block/zram/zram_drv.c: fix bug storing backing_dev
has been added to the -mm tree. Its filename is
zram-fix-bug-storing-backing_dev.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/zram-fix-bug-storing-backing_dev.p…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/zram-fix-bug-storing-backing_dev.p…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Kalauskas <peskal(a)google.com>
Subject: drivers/block/zram/zram_drv.c: fix bug storing backing_dev
The call to strlcpy in backing_dev_store is incorrect. It should take
the size of the destination buffer instead of the size of the source
buffer. Additionally, ignore the newline character (\n) when reading
the new file_name buffer. This makes it possible to set the backing_dev
as follows:
echo /dev/sdX > /sys/block/zram0/backing_dev
The reason it worked before was the fact that strlcpy() copies 'len - 1'
bytes, which is strlen(buf) - 1 in our case, so it accidentally didn't
copy the trailing new line symbol. Which also means that "echo -n
/dev/sdX" most likely was broken.
Signed-off-by: Peter Kalauskas <peskal(a)google.com>
Link: http://lkml.kernel.org/r/20180813061623.GC64836@rodete-desktop-imager.corp.…
Acked-by: Minchan Kim <minchan(a)kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/block/zram/zram_drv.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/block/zram/zram_drv.c~zram-fix-bug-storing-backing_dev
+++ a/drivers/block/zram/zram_drv.c
@@ -337,6 +337,7 @@ static ssize_t backing_dev_store(struct
struct device_attribute *attr, const char *buf, size_t len)
{
char *file_name;
+ size_t sz;
struct file *backing_dev = NULL;
struct inode *inode;
struct address_space *mapping;
@@ -357,7 +358,11 @@ static ssize_t backing_dev_store(struct
goto out;
}
- strlcpy(file_name, buf, len);
+ strlcpy(file_name, buf, PATH_MAX);
+ /* ignore trailing newline */
+ sz = strlen(file_name);
+ if (sz > 0 && file_name[sz - 1] == '\n')
+ file_name[sz - 1] = 0x00;
backing_dev = filp_open(file_name, O_RDWR|O_LARGEFILE, 0);
if (IS_ERR(backing_dev)) {
_
Patches currently in -mm which might be from peskal(a)google.com are
zram-fix-bug-storing-backing_dev.patch
These patches are needed for kasan+clang support. I confirmed they
apply cleanly in order (top to bottom):
4.9:
commit c5caf21ab0cf8 ("kasan: turn on -fsanitize-address-use-after-scope")
commit 0e410e158e5b ("kasan: don't emit builtin calls when sanitization is off")
4.4:
commit c5caf21ab0cf8 ("kasan: turn on -fsanitize-address-use-after-scope")
===
0e410e158e5b is the one I'm interested in. Looks like it landed in
4.16, and got backported to 4.14-stable.
===
c5caf21ab0cf8 depends on c6d308534aef6 ("UBSAN: run-time undefined
behavior sanity checker"), and I don't want to bring in all of UBSAN
to 4.4. I'll send a patch for 0e410e158e5b.
--
Thanks,
~Nick Desaulniers
The function has an inline "return false;" definition with CONFIG_SMP=n but the
"real" definition is also visible leading to "redefinition of
‘apic_id_is_primary_thread’" compiler error. Guard it with #ifdef CONFIG_SMP
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: 6a4d2657e048 ("x86/smp: Provide topology_is_primary_thread()")
Cc: stable(a)vger.kernel.org
---
arch/x86/kernel/apic/apic.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 87ff6235bbfe..84132eddb5a8 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2193,6 +2193,7 @@ static int cpuid_to_apicid[] = {
[0 ... NR_CPUS - 1] = -1,
};
+#ifdef CONFIG_SMP
/**
* apic_id_is_primary_thread - Check whether APIC ID belongs to a primary thread
* @id: APIC ID to check
@@ -2207,6 +2208,7 @@ bool apic_id_is_primary_thread(unsigned int apicid)
mask = (1U << (fls(smp_num_siblings) - 1)) - 1;
return !(apicid & mask);
}
+#endif
/*
* Should use this API to allocate logical CPU IDs to keep nr_logical_cpuids
--
2.18.0
From: Andrey Konovalov <andreyknvl(a)google.com>
commit 0e410e158e5baa1300bdf678cea4f4e0cf9d8b94 upstream.
With KASAN enabled the kernel has two different memset() functions, one
with KASAN checks (memset) and one without (__memset). KASAN uses some
macro tricks to use the proper version where required. For example
memset() calls in mm/slub.c are without KASAN checks, since they operate
on poisoned slab object metadata.
The issue is that clang emits memset() calls even when there is no
memset() in the source code. They get linked with improper memset()
implementation and the kernel fails to boot due to a huge amount of KASAN
reports during early boot stages.
The solution is to add -fno-builtin flag for files with KASAN_SANITIZE :=
n marker.
Link: http://lkml.kernel.org/r/8ffecfffe04088c52c42b92739c2bd8a0bcb3f5e.151638459…
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Acked-by: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Michal Marek <michal.lkml(a)markovi.net>
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[ Nick : Backported to 4.4 avoiding KUBSAN ]
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
Makefile | 3 ++-
scripts/Makefile.kasan | 3 +++
scripts/Makefile.lib | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index ee92a12e3a4b..4fdd43dd14aa 100644
--- a/Makefile
+++ b/Makefile
@@ -418,7 +418,8 @@ export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE
export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS
export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS
-export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_GCOV CFLAGS_KASAN
+export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_GCOV
+export CFLAGS_KASAN CFLAGS_KASAN_NOSANITIZE
export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE
export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE
export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL
diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index 37323b0df374..2624d4bf9a45 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -28,4 +28,7 @@ else
CFLAGS_KASAN := $(CFLAGS_KASAN_MINIMAL)
endif
endif
+
+CFLAGS_KASAN_NOSANITIZE := -fno-builtin
+
endif
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 24914e7de944..a2d0e6d32659 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -126,7 +126,7 @@ endif
ifeq ($(CONFIG_KASAN),y)
_c_flags += $(if $(patsubst n%,, \
$(KASAN_SANITIZE_$(basetarget).o)$(KASAN_SANITIZE)y), \
- $(CFLAGS_KASAN))
+ $(CFLAGS_KASAN), $(CFLAGS_KASAN_NOSANITIZE))
endif
# If building the kernel in a separate objtree expand all occurrences
--
2.18.0.865.gffc8e1a3cd6-goog
These are already defined higher up in the file.
Cc: stable(a)vger.kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe(a)redhat.com>
---
arch/x86/kvm/vmx.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 58bba7a7572a..e7691e666479 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -9725,9 +9725,6 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
* information but as all relevant affected CPUs have 32KiB L1D cache size
* there is no point in doing so.
*/
-#define L1D_CACHE_ORDER 4
-static void *vmx_l1d_flush_pages;
-
static void vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
int size = PAGE_SIZE << L1D_CACHE_ORDER;
--
2.17.1
Dear Friend, My sincere apologies for this unsolicited mail to you, my name is Barrister Luis Carlos Delgado, theCEO/founder of (LCD ABOGADOS) with offices in Madrid and Portugal. We consult for NGOs, Companiesand individuals on Family Law, Intellectual Property, Real Estate, and Wills. I am consulting you on business grounds: I know that it may surprise you receiving this mail from me since therewas no previous correspondence between us. I have an urgent and very confidential business proposal of(USD$9,500,000.00) (Nine Million Five Hundred Thousand US Dollars only) which I believe that will be a verygood opportunity for both of us, so I decided to contact you on the business. Further details; get back me on my direct contact number below: Tel: +351920414587Fax: +34917692994E-mail: Luis_carlos.delgado(a)consultant.comYour swift response will be highly recommended and appreciated, and I shall provide you with more detailsabout my urgent business proposal.Note: If my approach offends your moral ethics do accept my sincere apology, if on the contrary you wish towork with me on this, kindly get back to me with your interest for more details on my direct private contact.Best Regards, Luis Carlos Delgado Esq.(LCD ABOGADOS) Av. de Burgos 20, 28036 Madrid EspanaRua Sarmento de Beires 30 1�E, 1900-221 Lisboa PortugalTel: +351920414587Fax: +34917692994E-mail: Luis_carlos.delgado(a)consultant.co
The patch
ASoC: wm9712: fix replace codec to component
has been applied to the asoc tree at
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git
All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.
You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.
If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.
Please add any relevant lists and maintainers to the CCs when replying
to this mail.
Thanks,
Mark
>From 5e4cfadaf5b73a0801b2fa7fb007f98400ebfe6e Mon Sep 17 00:00:00 2001
From: Marcel Ziswiler <marcel.ziswiler(a)toradex.com>
Date: Tue, 14 Aug 2018 00:35:56 +0200
Subject: [PATCH] ASoC: wm9712: fix replace codec to component
Since commit 143b44845d87 ("ASoC: wm9712: replace codec to component")
"wm9712-codec" got renamed to "wm9712-component", however, this change
never got propagated down to the actual board/platform drivers. E.g. on
Colibri T20 this lead to the following spew upon boot with sound/touch
being broken:
[ 2.214121] tegra-snd-wm9712 sound: ASoC: CODEC DAI wm9712-hifi not registered
[ 2.222137] tegra-snd-wm9712 sound: snd_soc_register_card failed (-517)
...
[ 2.344384] tegra-snd-wm9712 sound: ASoC: CODEC DAI wm9712-hifi not registered
[ 2.351885] tegra-snd-wm9712 sound: snd_soc_register_card failed (-517)
...
[ 2.668339] tegra-snd-wm9712 sound: ASoC: CODEC DAI wm9712-hifi not registered
[ 2.675811] tegra-snd-wm9712 sound: snd_soc_register_card failed (-517)
...
[ 3.208408] tegra-snd-wm9712 sound: ASoC: CODEC DAI wm9712-hifi not registered
[ 3.216312] tegra-snd-wm9712 sound: snd_soc_register_card failed (-517)
...
[ 3.235397] tegra-snd-wm9712 sound: ASoC: CODEC DAI wm9712-hifi not registered
[ 3.248938] tegra-snd-wm9712 sound: snd_soc_register_card failed (-517)
...
[ 14.970443] ALSA device list:
[ 14.996628] No soundcards found.
This commit finally fixes this again.
Signed-off-by: Marcel Ziswiler <marcel.ziswiler(a)toradex.com>
Acked-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/codecs/wm9712.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/codecs/wm9712.c b/sound/soc/codecs/wm9712.c
index 953d94d50586..ade34c26ad2f 100644
--- a/sound/soc/codecs/wm9712.c
+++ b/sound/soc/codecs/wm9712.c
@@ -719,7 +719,7 @@ static int wm9712_probe(struct platform_device *pdev)
static struct platform_driver wm9712_component_driver = {
.driver = {
- .name = "wm9712-component",
+ .name = "wm9712-codec",
},
.probe = wm9712_probe,
--
2.18.0
Since Haswell we have no color range indication either in the pipe or
port registers for DP. Instead, there's a separate register for setting
the DP Main Stream Attributes (MSA) directly. The MSA register
definition makes no references to colorimetry, just a vague reference to
the DP spec. The connection to the color range was lost.
Apparently we've failed to set the proper MSA bit for limited, or CEA,
range ever since the first DDI platforms. We've started setting other
MSA parameters since commit dae847991a43 ("drm/i915: add
intel_ddi_set_pipe_settings").
Without the crucial bit of information, the DP sink has no way of
knowing the source is actually transmitting limited range RGB, leading
to "washed out" colors. With the colorimetry information, compliant
sinks should be able to handle the limited range properly. Native
(i.e. non-LSPCON) HDMI was not affected because we do pass the color
range via AVI infoframes.
Though not the root cause, the problem was made worse for DDI platforms
with commit 55bc60db5988 ("drm/i915: Add "Automatic" mode for the
"Broadcast RGB" property"), which selects limited range RGB
automatically based on the mode, as per the DP, HDMI and CEA specs.
After all these years, the fix boils down to flipping one bit.
[Per testing reports, this fixes DP sinks, but not the LSPCON. My
educated guess is that the LSPCON fails to turn the CEA range MSA into
AVI infoframes for HDMI.]
Reported-by: Michał Kopeć <mkopec12(a)gmail.com>
Reported-by: N. W. <nw9165-3201(a)yahoo.com>
Reported-by: Nicholas Stommel <nicholas.stommel(a)gmail.com>
Reported-by: Tom Yan <tom.ty89(a)gmail.com>
Tested-by: Nicholas Stommel <nicholas.stommel(a)gmail.com>
References: https://bugs.freedesktop.org/show_bug.cgi?id=100023
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107476
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94921
Cc: Paulo Zanoni <paulo.r.zanoni(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v3.9+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_ddi.c | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 17575cfc22b5..0c9f03dda569 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -9246,6 +9246,7 @@ enum skl_power_gate {
#define TRANS_MSA_10_BPC (2 << 5)
#define TRANS_MSA_12_BPC (3 << 5)
#define TRANS_MSA_16_BPC (4 << 5)
+#define TRANS_MSA_CEA_RANGE (1 << 3)
/* LCPLL Control */
#define LCPLL_CTL _MMIO(0x130040)
diff --git a/drivers/gpu/drm/i915/intel_ddi.c b/drivers/gpu/drm/i915/intel_ddi.c
index 0adc043529f2..6f7be066c8f2 100644
--- a/drivers/gpu/drm/i915/intel_ddi.c
+++ b/drivers/gpu/drm/i915/intel_ddi.c
@@ -1685,6 +1685,10 @@ void intel_ddi_set_pipe_settings(const struct intel_crtc_state *crtc_state)
WARN_ON(transcoder_is_dsi(cpu_transcoder));
temp = TRANS_MSA_SYNC_CLK;
+
+ if (crtc_state->limited_color_range)
+ temp |= TRANS_MSA_CEA_RANGE;
+
switch (crtc_state->pipe_bpp) {
case 18:
temp |= TRANS_MSA_6_BPC;
--
2.11.0
Hi Daniel,
A kernel bug report was opened against Ubuntu [0]. It was found the
following patch introduced the regression:
da9970668948 ("usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201")
The bug reporter claims there is a typo in the patch that caused the
regression. I built a test kernel with a change to the suspected typo
and the bug reporter claims it resolved the regression. My test kernel
had the following change:
- pdev->device == 0x0014)
+ pdev->device == 0x0015)
I was hoping to get your feedback, since you are the patch author. Do
you think this is an actual typo, or maybe there really needs to be two
quirks?
Thanks,
Joe
[0] http://pad.lv/1773704
Guten Tag,
nach unserem Besuch Ihrer Homepage möchten wir Ihnen ein Angebot von Produkten vorstellen, das Ihnen ermöglichen wird, den Verkauf Ihrer Produkte sowie Dienstleistungen deutlich zu erhöhen.
Ich biete Ihnen den ganz neuen Adressenkatalog der Österreicher Unternehmen an, in dem sich direkte Kontaktdaten der Firmeninhaber und Manager befinden.
Die Firmenangaben beinhalten:
Firmennamen, Adresse des Hauptsitzes (Straße und Hausnummer, Postleitzahl, Ort, Region), E-Mail-Adresse, Telefonnummer, Faxnummer.
http://www.dbc-at.net/?page=catalog
***
1. AT 2018 ( 104 000 ) - 149 EUR ( bis zum 14.08.2018 )
***
Die Verwendungsmöglichkeiten der Datenbanken sind praktisch unbegrenzt und Sie können durch Verwendung der von uns entwickelten Programme des personalisierten Versendens von Angeboten u.ä. mittels E-mailing bzw.
Fax effektive und sichere Werbekampagnen damit durchführen.
Bitte informieren Sie sich über die weiteren Details einmal unverbindlich auf unseren Webseite:
http://www.dbc-at.net/?page=catalog
MfG
Martin Weber
GC-Team.
I'm sure I don't need to tell you that fb_helper's locking is a mess.
That being said; fb_helper's locking mess can seriously complicate the
runtime suspend/resume operations of drivers because it can invoke
atomic commits and connector probing from anywhere that calls
drm_fb_helper_hotplug_event(). Since most drivers use
drm_fb_helper_output_poll_changed() as their output_poll_changed
handler, this can happen in every single context that can fire off a
hotplug event. An example:
[ 246.669625] INFO: task kworker/4:0:37 blocked for more than 120 seconds.
[ 246.673398] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.675271] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.676527] kworker/4:0 D 0 37 2 0x80000000
[ 246.677580] Workqueue: events output_poll_execute [drm_kms_helper]
[ 246.678704] Call Trace:
[ 246.679753] __schedule+0x322/0xaf0
[ 246.680916] schedule+0x33/0x90
[ 246.681924] schedule_preempt_disabled+0x15/0x20
[ 246.683023] __mutex_lock+0x569/0x9a0
[ 246.684035] ? kobject_uevent_env+0x117/0x7b0
[ 246.685132] ? drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.686179] mutex_lock_nested+0x1b/0x20
[ 246.687278] ? mutex_lock_nested+0x1b/0x20
[ 246.688307] drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.689420] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[ 246.690462] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[ 246.691570] output_poll_execute+0x198/0x1c0 [drm_kms_helper]
[ 246.692611] process_one_work+0x231/0x620
[ 246.693725] worker_thread+0x214/0x3a0
[ 246.694756] kthread+0x12b/0x150
[ 246.695856] ? wq_pool_ids_show+0x140/0x140
[ 246.696888] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.697998] ret_from_fork+0x3a/0x50
[ 246.699034] INFO: task kworker/0:1:60 blocked for more than 120 seconds.
[ 246.700153] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.701182] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.702278] kworker/0:1 D 0 60 2 0x80000000
[ 246.703293] Workqueue: pm pm_runtime_work
[ 246.704393] Call Trace:
[ 246.705403] __schedule+0x322/0xaf0
[ 246.706439] ? wait_for_completion+0x104/0x190
[ 246.707393] schedule+0x33/0x90
[ 246.708375] schedule_timeout+0x3a5/0x590
[ 246.709289] ? mark_held_locks+0x58/0x80
[ 246.710208] ? _raw_spin_unlock_irq+0x2c/0x40
[ 246.711222] ? wait_for_completion+0x104/0x190
[ 246.712134] ? trace_hardirqs_on_caller+0xf4/0x190
[ 246.713094] ? wait_for_completion+0x104/0x190
[ 246.713964] wait_for_completion+0x12c/0x190
[ 246.714895] ? wake_up_q+0x80/0x80
[ 246.715727] ? get_work_pool+0x90/0x90
[ 246.716649] flush_work+0x1c9/0x280
[ 246.717483] ? flush_workqueue_prep_pwqs+0x1b0/0x1b0
[ 246.718442] __cancel_work_timer+0x146/0x1d0
[ 246.719247] cancel_delayed_work_sync+0x13/0x20
[ 246.720043] drm_kms_helper_poll_disable+0x1f/0x30 [drm_kms_helper]
[ 246.721123] nouveau_pmops_runtime_suspend+0x3d/0xb0 [nouveau]
[ 246.721897] pci_pm_runtime_suspend+0x6b/0x190
[ 246.722825] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.723737] __rpm_callback+0x7a/0x1d0
[ 246.724721] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.725607] rpm_callback+0x24/0x80
[ 246.726553] ? pci_has_legacy_pm_support+0x70/0x70
[ 246.727376] rpm_suspend+0x142/0x6b0
[ 246.728185] pm_runtime_work+0x97/0xc0
[ 246.728938] process_one_work+0x231/0x620
[ 246.729796] worker_thread+0x44/0x3a0
[ 246.730614] kthread+0x12b/0x150
[ 246.731395] ? wq_pool_ids_show+0x140/0x140
[ 246.732202] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.732878] ret_from_fork+0x3a/0x50
[ 246.733768] INFO: task kworker/4:2:422 blocked for more than 120 seconds.
[ 246.734587] Not tainted 4.18.0-rc5Lyude-Test+ #2
[ 246.735393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 246.736113] kworker/4:2 D 0 422 2 0x80000080
[ 246.736789] Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
[ 246.737665] Call Trace:
[ 246.738490] __schedule+0x322/0xaf0
[ 246.739250] schedule+0x33/0x90
[ 246.739908] rpm_resume+0x19c/0x850
[ 246.740750] ? finish_wait+0x90/0x90
[ 246.741541] __pm_runtime_resume+0x4e/0x90
[ 246.742370] nv50_disp_atomic_commit+0x31/0x210 [nouveau]
[ 246.743124] drm_atomic_commit+0x4a/0x50 [drm]
[ 246.743775] restore_fbdev_mode_atomic+0x1c8/0x240 [drm_kms_helper]
[ 246.744603] restore_fbdev_mode+0x31/0x140 [drm_kms_helper]
[ 246.745373] drm_fb_helper_restore_fbdev_mode_unlocked+0x54/0xb0 [drm_kms_helper]
[ 246.746220] drm_fb_helper_set_par+0x2d/0x50 [drm_kms_helper]
[ 246.746884] drm_fb_helper_hotplug_event.part.28+0x96/0xb0 [drm_kms_helper]
[ 246.747675] drm_fb_helper_output_poll_changed+0x23/0x30 [drm_kms_helper]
[ 246.748544] drm_kms_helper_hotplug_event+0x2a/0x30 [drm_kms_helper]
[ 246.749439] nv50_mstm_hotplug+0x15/0x20 [nouveau]
[ 246.750111] drm_dp_send_link_address+0x177/0x1c0 [drm_kms_helper]
[ 246.750764] drm_dp_check_and_send_link_address+0xa8/0xd0 [drm_kms_helper]
[ 246.751602] drm_dp_mst_link_probe_work+0x51/0x90 [drm_kms_helper]
[ 246.752314] process_one_work+0x231/0x620
[ 246.752979] worker_thread+0x44/0x3a0
[ 246.753838] kthread+0x12b/0x150
[ 246.754619] ? wq_pool_ids_show+0x140/0x140
[ 246.755386] ? kthread_create_worker_on_cpu+0x70/0x70
[ 246.756162] ret_from_fork+0x3a/0x50
[ 246.756847]
Showing all locks held in the system:
[ 246.758261] 3 locks held by kworker/4:0/37:
[ 246.759016] #0: 00000000f8df4d2d ((wq_completion)"events"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.759856] #1: 00000000e6065461 ((work_completion)(&(&dev->mode_config.output_poll_work)->work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.760670] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_hotplug_event.part.28+0x20/0xb0 [drm_kms_helper]
[ 246.761516] 2 locks held by kworker/0:1/60:
[ 246.762274] #0: 00000000fff6be0f ((wq_completion)"pm"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.762982] #1: 000000005ab44fb4 ((work_completion)(&dev->power.work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.763890] 1 lock held by khungtaskd/64:
[ 246.764664] #0: 000000008cb8b5c3 (rcu_read_lock){....}, at: debug_show_all_locks+0x23/0x185
[ 246.765588] 5 locks held by kworker/4:2/422:
[ 246.766440] #0: 00000000232f0959 ((wq_completion)"events_long"){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.767390] #1: 00000000bb59b134 ((work_completion)(&mgr->work)){+.+.}, at: process_one_work+0x1b3/0x620
[ 246.768154] #2: 00000000cb66735f (&helper->lock){+.+.}, at: drm_fb_helper_restore_fbdev_mode_unlocked+0x4c/0xb0 [drm_kms_helper]
[ 246.768966] #3: 000000004c8f0b6b (crtc_ww_class_acquire){+.+.}, at: restore_fbdev_mode_atomic+0x4b/0x240 [drm_kms_helper]
[ 246.769921] #4: 000000004c34a296 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_backoff+0x8a/0x1b0 [drm]
[ 246.770839] 1 lock held by dmesg/1038:
[ 246.771739] 2 locks held by zsh/1172:
[ 246.772650] #0: 00000000836d0438 (&tty->ldisc_sem){++++}, at: ldsem_down_read+0x37/0x40
[ 246.773680] #1: 000000001f4f4d48 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xc1/0x870
[ 246.775522] =============================================
After trying dozens of different solutions, I found one very simple one
that should also have the benefit of preventing us from having to fight
locking for the rest of our lives. So, we work around these deadlocks by
deferring all fbcon hotplug events that happen after the runtime suspend
process starts until after the device is resumed again.
Changes since v6:
- Remove unused nouveau_fbcon_hotplugged_in_suspend() - Ilia
Changes since v5:
- Come up with the (hopefully final) solution for solving this dumb
problem, one that is a lot less likely to cause issues with locking in
the future. This should work around all deadlock conditions with fbcon
brought up thus far.
Changes since v4:
- Add nouveau_fbcon_hotplugged_in_suspend() to workaround deadlock
condition that Lukas described
- Just move all of this out of drm_fb_helper. It seems that other DRM
drivers have already figured out other workarounds for this. If other
drivers do end up needing this in the future, we can just move this
back into drm_fb_helper again.
Changes since v3:
- Actually check if fb_helper is NULL in both new helpers
- Actually check drm_fbdev_emulation in both new helpers
- Don't fire off a fb_helper hotplug unconditionally; only do it if
the following conditions are true (as otherwise, calling this in the
wrong spot will cause Bad Things to happen):
- fb_helper hotplug handling was actually inhibited previously
- fb_helper actually has a delayed hotplug pending
- fb_helper is actually bound
- fb_helper is actually initialized
- Add __must_check to drm_fb_helper_suspend_hotplug(). There's no
situation where a driver would actually want to use this without
checking the return value, so enforce that
- Rewrite and clarify the documentation for both helpers.
- Make sure to return true in the drm_fb_helper_suspend_hotplug() stub
that's provided in drm_fb_helper.h when CONFIG_DRM_FBDEV_EMULATION
isn't enabled
- Actually grab the toplevel fb_helper lock in
drm_fb_helper_resume_hotplug(), since it's possible other activity
(such as a hotplug) could be going on at the same time the driver
calls drm_fb_helper_resume_hotplug(). We need this to check whether or
not drm_fb_helper_hotplug_event() needs to be called anyway
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: Karol Herbst <karolherbst(a)gmail.com>
---
drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_display.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_fbcon.c | 57 +++++++++++++++++++++++
drivers/gpu/drm/nouveau/nouveau_fbcon.h | 5 ++
4 files changed, 64 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c
index 8b522a9b12f6..a0772389ed90 100644
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c
+++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c
@@ -2049,7 +2049,7 @@ nv50_disp_atomic_state_alloc(struct drm_device *dev)
static const struct drm_mode_config_funcs
nv50_disp_func = {
.fb_create = nouveau_user_framebuffer_create,
- .output_poll_changed = drm_fb_helper_output_poll_changed,
+ .output_poll_changed = nouveau_fbcon_output_poll_changed,
.atomic_check = nv50_disp_atomic_check,
.atomic_commit = nv50_disp_atomic_commit,
.atomic_state_alloc = nv50_disp_atomic_state_alloc,
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c
index 1d36ab5d4796..4b873e668b26 100644
--- a/drivers/gpu/drm/nouveau/nouveau_display.c
+++ b/drivers/gpu/drm/nouveau/nouveau_display.c
@@ -293,7 +293,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev,
static const struct drm_mode_config_funcs nouveau_mode_config_funcs = {
.fb_create = nouveau_user_framebuffer_create,
- .output_poll_changed = drm_fb_helper_output_poll_changed,
+ .output_poll_changed = nouveau_fbcon_output_poll_changed,
};
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
index 85c1f10bc2b6..8cf966690963 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
@@ -466,6 +466,7 @@ nouveau_fbcon_set_suspend_work(struct work_struct *work)
console_unlock();
if (state == FBINFO_STATE_RUNNING) {
+ nouveau_fbcon_hotplug_resume(drm->fbcon);
pm_runtime_mark_last_busy(drm->dev->dev);
pm_runtime_put_sync(drm->dev->dev);
}
@@ -487,6 +488,61 @@ nouveau_fbcon_set_suspend(struct drm_device *dev, int state)
schedule_work(&drm->fbcon_work);
}
+void
+nouveau_fbcon_output_poll_changed(struct drm_device *dev)
+{
+ struct nouveau_drm *drm = nouveau_drm(dev);
+ struct nouveau_fbdev *fbcon = drm->fbcon;
+ int ret;
+
+ if (!fbcon)
+ return;
+
+ mutex_lock(&fbcon->hotplug_lock);
+
+ ret = pm_runtime_get(dev->dev);
+ if (ret == 1 || ret == -EACCES) {
+ drm_fb_helper_hotplug_event(&fbcon->helper);
+
+ pm_runtime_mark_last_busy(dev->dev);
+ pm_runtime_put_autosuspend(dev->dev);
+ } else if (ret == 0) {
+ /* If the GPU was already in the process of suspending before
+ * this event happened, then we can't block here as we'll
+ * deadlock the runtime pmops since they wait for us to
+ * finish. So, just defer this event for when we runtime
+ * resume again. It will be handled by fbcon_work.
+ */
+ NV_DEBUG(drm, "fbcon HPD event deferred until runtime resume\n");
+ fbcon->hotplug_waiting = true;
+ pm_runtime_put_noidle(drm->dev->dev);
+ } else {
+ DRM_WARN("fbcon HPD event lost due to RPM failure: %d\n",
+ ret);
+ }
+
+ mutex_unlock(&fbcon->hotplug_lock);
+}
+
+void
+nouveau_fbcon_hotplug_resume(struct nouveau_fbdev *fbcon)
+{
+ struct nouveau_drm *drm;
+
+ if (!fbcon)
+ return;
+ drm = nouveau_drm(fbcon->helper.dev);
+
+ mutex_lock(&fbcon->hotplug_lock);
+ if (fbcon->hotplug_waiting) {
+ fbcon->hotplug_waiting = false;
+
+ NV_DEBUG(drm, "Handling deferred fbcon HPD events\n");
+ drm_fb_helper_hotplug_event(&fbcon->helper);
+ }
+ mutex_unlock(&fbcon->hotplug_lock);
+}
+
int
nouveau_fbcon_init(struct drm_device *dev)
{
@@ -505,6 +561,7 @@ nouveau_fbcon_init(struct drm_device *dev)
drm->fbcon = fbcon;
INIT_WORK(&drm->fbcon_work, nouveau_fbcon_set_suspend_work);
+ mutex_init(&fbcon->hotplug_lock);
drm_fb_helper_prepare(dev, &fbcon->helper, &nouveau_fbcon_helper_funcs);
diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.h b/drivers/gpu/drm/nouveau/nouveau_fbcon.h
index a6f192ea3fa6..db9d52047ef8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_fbcon.h
+++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.h
@@ -41,6 +41,9 @@ struct nouveau_fbdev {
struct nvif_object gdi;
struct nvif_object blit;
struct nvif_object twod;
+
+ struct mutex hotplug_lock;
+ bool hotplug_waiting;
};
void nouveau_fbcon_restore(void);
@@ -68,6 +71,8 @@ void nouveau_fbcon_set_suspend(struct drm_device *dev, int state);
void nouveau_fbcon_accel_save_disable(struct drm_device *dev);
void nouveau_fbcon_accel_restore(struct drm_device *dev);
+void nouveau_fbcon_output_poll_changed(struct drm_device *dev);
+void nouveau_fbcon_hotplug_resume(struct nouveau_fbdev *fbcon);
extern int nouveau_nofbaccel;
#endif /* __NV50_FBCON_H__ */
--
2.17.1
The page migration code employs try_to_unmap() to try and unmap the
source page. This is accomplished by using rmap_walk to find all
vmas where the page is mapped. This search stops when page mapcount
is zero. For shared PMD huge pages, the page map count is always 1
no matter the number of mappings. Shared mappings are tracked via
the reference count of the PMD page. Therefore, try_to_unmap stops
prematurely and does not completely unmap all mappings of the source
page.
This problem can result is data corruption as writes to the original
source page can happen after contents of the page are copied to the
target page. Hence, data is lost.
This problem was originally seen as DB corruption of shared global
areas after a huge page was soft offlined due to ECC memory errors.
DB developers noticed they could reproduce the issue by (hotplug)
offlining memory used to back huge pages. A simple testcase can
reproduce the problem by creating a shared PMD mapping (note that
this must be at least PUD_SIZE in size and PUD_SIZE aligned (1GB on
x86)), and using migrate_pages() to migrate process pages between
nodes while continually writing to the huge pages being migrated.
To fix, have the try_to_unmap_one routine check for huge PMD sharing
by calling huge_pmd_unshare for hugetlbfs huge pages. If it is a
shared mapping it will be 'unshared' which removes the page table
entry and drops the reference on the PMD page. After this, flush
caches and TLB.
Fixes: 39dde65c9940 ("shared page table for hugetlb page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
v2: Fixed build issue for !CONFIG_HUGETLB_PAGE and typos in comment
include/linux/hugetlb.h | 6 ++++++
mm/rmap.c | 21 +++++++++++++++++++++
2 files changed, 27 insertions(+)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 36fa6a2a82e3..7524663028ec 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -170,6 +170,12 @@ static inline unsigned long hugetlb_total_pages(void)
return 0;
}
+static inline int huge_pmd_unshare(struct mm_struct *mm, unsigned long *addr,
+ pte_t *ptep)
+{
+ return 0;
+}
+
#define follow_hugetlb_page(m,v,p,vs,a,b,i,w,n) ({ BUG(); 0; })
#define follow_huge_addr(mm, addr, write) ERR_PTR(-EINVAL)
#define copy_hugetlb_page_range(src, dst, vma) ({ BUG(); 0; })
diff --git a/mm/rmap.c b/mm/rmap.c
index 09a799c9aebd..cf2340adad10 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1409,6 +1409,27 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
address = pvmw.address;
+ /*
+ * PMDs for hugetlbfs pages could be shared. In this case,
+ * pages with shared PMDs will have a mapcount of 1 no matter
+ * how many times they are actually mapped. Map counting for
+ * PMD sharing is mostly done via the reference count on the
+ * PMD page itself. If the page we are trying to unmap is a
+ * hugetlbfs page, attempt to 'unshare' at the PMD level.
+ * huge_pmd_unshare clears the PUD and adjusts reference
+ * counting on the PMD page which effectively unmaps the page.
+ * Take care of flushing cache and TLB for page in this
+ * specific mapping here.
+ */
+ if (PageHuge(page) &&
+ huge_pmd_unshare(mm, &address, pvmw.pte)) {
+ unsigned long end_add = address + vma_mmu_pagesize(vma);
+
+ flush_cache_range(vma, address, end_add);
+ flush_tlb_range(vma, address, end_add);
+ mmu_notifier_invalidate_range(mm, address, end_add);
+ continue;
+ }
if (IS_ENABLED(CONFIG_MIGRATION) &&
(flags & TTU_MIGRATION) &&
--
2.17.1
> -----Original Message-----
> From: Rafael J. Wysocki [mailto:rafael@kernel.org]
> Sent: Sunday, August 12, 2018 2:47 AM
> To: Schmauss, Erik <erik.schmauss(a)intel.com>
> Cc: ACPI Devel Maling List <linux-acpi(a)vger.kernel.org>; Rafael J. Wysocki
> <rjw(a)rjwysocki.net>
> Subject: Re: [PATCH 05/11] ACPICA: AML Parser: skip opcodes that open a scope
> upon parse failure
>
> On Fri, Aug 10, 2018 at 11:45 PM Erik Schmauss <erik.schmauss(a)intel.com>
> wrote:
> >
> > This change skips the entire length of opcodes that open a scope
> > (Device, Scope, Processor, etc) if the creation of the op fails. The
> > failure could be caused by various errors including AE_ALREADY_EXISTS
> > and AE_NOT_FOUND.
> >
> > Reported-by: Jeremy Linton <jeremy.linton(a)arm.com>
> > Tested-by: Jeremy Linton <jeremy.linton(a)arm.com>
> > Signed-off-by: Erik Schmauss <erik.schmauss(a)intel.com>
>
> I think that we should propagate this fix to the "stable" kernel series, at least
> 4.17.y and newer. Do you agree?
Yes, I agree.
Hi Greg, please add this to the stable kernel
>
> > ---
> > drivers/acpi/acpica/psloop.c | 17 +++++++++++------
> > 1 file changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/acpi/acpica/psloop.c
> > b/drivers/acpi/acpica/psloop.c index 20b6142da183..358fcdd1f8a5 100644
> > --- a/drivers/acpi/acpica/psloop.c
> > +++ b/drivers/acpi/acpica/psloop.c
> > @@ -22,6 +22,7 @@
> > #include "acdispat.h"
> > #include "amlcode.h"
> > #include "acconvert.h"
> > +#include "acnamesp.h"
> >
> > #define _COMPONENT ACPI_PARSER
> > ACPI_MODULE_NAME("psloop")
> > @@ -527,12 +528,18 @@ acpi_status acpi_ps_parse_loop(struct
> acpi_walk_state *walk_state)
> > if (ACPI_FAILURE(status)) {
> > return_ACPI_STATUS(status);
> > }
> > - if (walk_state->opcode == AML_SCOPE_OP) {
> > + if (acpi_ns_opens_scope
> > + (acpi_ps_get_opcode_info
> > +
> > + (walk_state->opcode)->object_type)) {
> > /*
> > - * If the scope op fails to parse, skip the body of the
> > - * scope op because the parse failure indicates that the
> > - * device may not exist.
> > + * If the scope/device op fails to parse, skip the body of
> > + * the scope op because the parse failure indicates that
> > + * the device may not exist.
> > */
> > + ACPI_ERROR((AE_INFO,
> > + "Skip parsing opcode %s",
> > + acpi_ps_get_opcode_name
> > +
> > + (walk_state->opcode)));
> > walk_state->parser_state.aml =
> > walk_state->aml + 1;
> > walk_state->parser_state.aml =
> > @@ -540,8 +547,6 @@ acpi_status acpi_ps_parse_loop(struct
> acpi_walk_state *walk_state)
> > (&walk_state->parser_state);
> > walk_state->aml =
> > walk_state->parser_state.aml;
> > - ACPI_ERROR((AE_INFO,
> > - "Skipping Scope block"));
> > }
> >
> > continue;
> > --
> > 2.17.1
> >
It's true we can't resume the device from poll workers in
nouveau_connector_detect(). We can however, prevent the autosuspend
timer from elapsing immediately if it hasn't already without risking any
sort of deadlock with the runtime suspend/resume operations. So do that
instead of entirely avoiding grabbing a power reference.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
Cc: Lukas Wunner <lukas(a)wunner.de>
Cc: Karol Herbst <karolherbst(a)gmail.com>
---
drivers/gpu/drm/nouveau/nouveau_connector.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index 2a45b4c2ceb0..010d6db14cba 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -572,12 +572,16 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
nv_connector->edid = NULL;
}
- /* Outputs are only polled while runtime active, so acquiring a
- * runtime PM ref here is unnecessary (and would deadlock upon
- * runtime suspend because it waits for polling to finish).
+ /* Outputs are only polled while runtime active, so resuming the
+ * device here is unnecessary (and would deadlock upon runtime suspend
+ * because it waits for polling to finish). We do however, want to
+ * prevent the autosuspend timer from elapsing during this operation
+ * if possible.
*/
- if (!drm_kms_helper_is_poll_worker()) {
- ret = pm_runtime_get_sync(connector->dev->dev);
+ if (drm_kms_helper_is_poll_worker()) {
+ pm_runtime_get_noresume(dev->dev);
+ } else {
+ ret = pm_runtime_get_sync(dev->dev);
if (ret < 0 && ret != -EACCES)
return conn_status;
}
@@ -655,10 +659,8 @@ nouveau_connector_detect(struct drm_connector *connector, bool force)
out:
- if (!drm_kms_helper_is_poll_worker()) {
- pm_runtime_mark_last_busy(connector->dev->dev);
- pm_runtime_put_autosuspend(connector->dev->dev);
- }
+ pm_runtime_mark_last_busy(dev->dev);
+ pm_runtime_put_autosuspend(dev->dev);
return conn_status;
}
--
2.17.1