Hi Greg, Sasha,
While backporting 37640adbefd6 ("MIPS: PCI: remember nasid changed by
set interrupt affinity") something went wrong and an extra 'n' was added.
So 'data->nasid' became 'data->nnasid' and the MIPS builds started failing.
Since v5.4.78 is already released I assumed you will need a patch to
fix it. Please consider applying the attached patch, this is only needed
for 5.4-stable tree.
--
Regards
Sudip
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This series flushes the cache on kernel entry and
after kernel user accesses.
Thanks to Nick Piggin, Russell Currey, Christopher M. Riedl, Michael
Ellerman and Spoorthy S for their work in developing, optimising,
testing and backporting these fixes, and to the many others who helped
behind the scenes.
Daniel Axtens (1):
selftests/powerpc: entry flush test
Michael Ellerman (1):
powerpc: Only include kup-radix.h for 64-bit Book3S
Nicholas Piggin (2):
powerpc/64s: flush L1D on kernel entry
powerpc/64s: flush L1D after user accesses
Russell Currey (1):
selftests/powerpc: rfi_flush: disable entry flush if present
.../admin-guide/kernel-parameters.txt | 7 +
.../powerpc/include/asm/book3s/64/kup-radix.h | 66 +++---
arch/powerpc/include/asm/exception-64s.h | 12 +-
arch/powerpc/include/asm/feature-fixups.h | 19 ++
arch/powerpc/include/asm/kup.h | 26 ++-
arch/powerpc/include/asm/security_features.h | 7 +
arch/powerpc/include/asm/setup.h | 4 +
arch/powerpc/kernel/exceptions-64s.S | 80 +++----
arch/powerpc/kernel/setup_64.c | 122 ++++++++++-
arch/powerpc/kernel/syscall_64.c | 2 +-
arch/powerpc/kernel/vmlinux.lds.S | 14 ++
arch/powerpc/lib/feature-fixups.c | 104 +++++++++
arch/powerpc/platforms/powernv/setup.c | 17 ++
arch/powerpc/platforms/pseries/setup.c | 8 +
.../selftests/powerpc/security/.gitignore | 1 +
.../selftests/powerpc/security/Makefile | 2 +-
.../selftests/powerpc/security/entry_flush.c | 198 ++++++++++++++++++
.../selftests/powerpc/security/rfi_flush.c | 35 +++-
18 files changed, 646 insertions(+), 78 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/security/entry_flush.c
--
2.25.1
This adds crashkernel=auto feature to configure reserved memory for
vmcore creation to both x86 and ARM platforms based on the total memory
size.
Cc: stable(a)vger.kernel.org
Signed-off-by: John Donnelly <john.p.donnelly(a)oracle.com>
Signed-off-by: Saeed Mirzamohammadi <saeed.mirzamohammadi(a)oracle.com>
---
Documentation/admin-guide/kdump/kdump.rst | 5 +++++
arch/arm64/Kconfig | 26 ++++++++++++++++++++++-
arch/arm64/configs/defconfig | 1 +
arch/x86/Kconfig | 26 ++++++++++++++++++++++-
arch/x86/configs/x86_64_defconfig | 1 +
kernel/crash_core.c | 20 +++++++++++++++--
6 files changed, 75 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index 75a9dd98e76e..f95a2af64f59 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -285,7 +285,12 @@ This would mean:
2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
3) if the RAM size is larger than 2G, then reserve 128M
+Or you can use crashkernel=auto if you have enough memory. The threshold
+is 1G on x86_64 and arm64. If your system memory is less than the threshold,
+crashkernel=auto will not reserve memory. The size changes according to
+the system memory size like below:
+ x86_64/arm64: 1G-64G:128M,64G-1T:256M,1T-:512M
Boot into System Kernel
=======================
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1515f6f153a0..d359dcffa80e 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1124,7 +1124,7 @@ comment "Support for PE file signature verification disabled"
depends on KEXEC_SIG
depends on !EFI || !SIGNED_PE_FILE_VERIFICATION
-config CRASH_DUMP
+menuconfig CRASH_DUMP
bool "Build kdump crash kernel"
help
Generate crash dump after being started by kexec. This should
@@ -1135,6 +1135,30 @@ config CRASH_DUMP
For more details see Documentation/admin-guide/kdump/kdump.rst
+if CRASH_DUMP
+
+config CRASH_AUTO_STR
+ string "Memory reserved for crash kernel"
+ depends on CRASH_DUMP
+ default "1G-64G:128M,64G-1T:256M,1T-:512M"
+ help
+ This configures the reserved memory dependent
+ on the value of System RAM. The syntax is:
+ crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
+ range=start-[end]
+
+ For example:
+ crashkernel=512M-2G:64M,2G-:128M
+
+ This would mean:
+
+ 1) if the RAM is smaller than 512M, then don't reserve anything
+ (this is the "rescue" case)
+ 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
+ 3) if the RAM size is larger than 2G, then reserve 128M
+
+endif # CRASH_DUMP
+
config XEN_DOM0
def_bool y
depends on XEN
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 5cfe3cf6f2ac..899ef3b6a78f 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -69,6 +69,7 @@ CONFIG_SECCOMP=y
CONFIG_KEXEC=y
CONFIG_KEXEC_FILE=y
CONFIG_CRASH_DUMP=y
+# CONFIG_CRASH_AUTO_STR is not set
CONFIG_XEN=y
CONFIG_COMPAT=y
CONFIG_RANDOMIZE_BASE=y
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f6946b81f74a..bacd17312bb1 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2035,7 +2035,7 @@ config KEXEC_BZIMAGE_VERIFY_SIG
help
Enable bzImage signature verification support.
-config CRASH_DUMP
+menuconfig CRASH_DUMP
bool "kernel crash dumps"
depends on X86_64 || (X86_32 && HIGHMEM)
help
@@ -2049,6 +2049,30 @@ config CRASH_DUMP
(CONFIG_RELOCATABLE=y).
For more details see Documentation/admin-guide/kdump/kdump.rst
+if CRASH_DUMP
+
+config CRASH_AUTO_STR
+ string "Memory reserved for crash kernel" if X86_64
+ depends on CRASH_DUMP
+ default "1G-64G:128M,64G-1T:256M,1T-:512M"
+ help
+ This configures the reserved memory dependent
+ on the value of System RAM. The syntax is:
+ crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
+ range=start-[end]
+
+ For example:
+ crashkernel=512M-2G:64M,2G-:128M
+
+ This would mean:
+
+ 1) if the RAM is smaller than 512M, then don't reserve anything
+ (this is the "rescue" case)
+ 2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
+ 3) if the RAM size is larger than 2G, then reserve 128M
+
+endif # CRASH_DUMP
+
config KEXEC_JUMP
bool "kexec jump"
depends on KEXEC && HIBERNATION
diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig
index 9936528e1939..7a87fbecf40b 100644
--- a/arch/x86/configs/x86_64_defconfig
+++ b/arch/x86/configs/x86_64_defconfig
@@ -33,6 +33,7 @@ CONFIG_EFI_MIXED=y
CONFIG_HZ_1000=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
+# CONFIG_CRASH_AUTO_STR is not set
CONFIG_HIBERNATION=y
CONFIG_PM_DEBUG=y
CONFIG_PM_TRACE_RTC=y
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index 106e4500fd53..a44cd9cc12c4 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -7,6 +7,7 @@
#include <linux/crash_core.h>
#include <linux/utsname.h>
#include <linux/vmalloc.h>
+#include <linux/kexec.h>
#include <asm/page.h>
#include <asm/sections.h>
@@ -41,6 +42,15 @@ static int __init parse_crashkernel_mem(char *cmdline,
unsigned long long *crash_base)
{
char *cur = cmdline, *tmp;
+ unsigned long long total_mem = system_ram;
+
+ /*
+ * Firmware sometimes reserves some memory regions for it's own use.
+ * so we get less than actual system memory size.
+ * Workaround this by round up the total size to 128M which is
+ * enough for most test cases.
+ */
+ total_mem = roundup(total_mem, SZ_128M);
/* for each entry of the comma-separated list */
do {
@@ -85,13 +95,13 @@ static int __init parse_crashkernel_mem(char *cmdline,
return -EINVAL;
}
cur = tmp;
- if (size >= system_ram) {
+ if (size >= total_mem) {
pr_warn("crashkernel: invalid size\n");
return -EINVAL;
}
/* match ? */
- if (system_ram >= start && system_ram < end) {
+ if (total_mem >= start && total_mem < end) {
*crash_size = size;
break;
}
@@ -250,6 +260,12 @@ static int __init __parse_crashkernel(char *cmdline,
if (suffix)
return parse_crashkernel_suffix(ck_cmdline, crash_size,
suffix);
+#ifdef CONFIG_CRASH_AUTO_STR
+ if (strncmp(ck_cmdline, "auto", 4) == 0) {
+ ck_cmdline = CONFIG_CRASH_AUTO_STR;
+ pr_info("Using crashkernel=auto, the size chosen is a best effort estimation.\n");
+ }
+#endif
/*
* if the commandline contains a ':', then that's the extended
* syntax -- if not, it must be the classic syntax
--
2.18.4
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This series flushes the cache on kernel entry and
after kernel user accesses.
Thanks to Nick Piggin, Russell Currey, Christopher M. Riedl, Michael
Ellerman and Spoorthy S for their work in developing, optimising,
testing and backporting these fixes, and to the many others who helped
behind the scenes.
Andrew Donnellan (1):
powerpc: Fix __clear_user() with KUAP enabled
Christophe Leroy (2):
powerpc: Add a framework for user access tracking
powerpc: Implement user_access_begin and friends
Daniel Axtens (2):
powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
powerpc/64s: move some exception handlers out of line
Nicholas Piggin (3):
powerpc/64s: flush L1D on kernel entry
powerpc/uaccess: Evaluate macro arguments once, before user access is
allowed
powerpc/64s: flush L1D after user accesses
Documentation/kernel-parameters.txt | 7 +
.../powerpc/include/asm/book3s/64/kup-radix.h | 23 ++
arch/powerpc/include/asm/exception-64s.h | 15 +-
arch/powerpc/include/asm/feature-fixups.h | 19 ++
arch/powerpc/include/asm/futex.h | 4 +
arch/powerpc/include/asm/kup.h | 40 ++++
arch/powerpc/include/asm/security_features.h | 7 +
arch/powerpc/include/asm/setup.h | 4 +
arch/powerpc/include/asm/uaccess.h | 142 +++++++++---
arch/powerpc/kernel/exceptions-64s.S | 210 +++++++++++-------
arch/powerpc/kernel/ppc_ksyms.c | 10 +
arch/powerpc/kernel/setup_64.c | 138 ++++++++++++
arch/powerpc/kernel/vmlinux.lds.S | 14 ++
arch/powerpc/lib/checksum_wrappers_64.c | 4 +
arch/powerpc/lib/feature-fixups.c | 104 +++++++++
arch/powerpc/lib/string.S | 2 +-
arch/powerpc/lib/string_64.S | 4 +-
arch/powerpc/platforms/powernv/setup.c | 15 ++
arch/powerpc/platforms/pseries/setup.c | 8 +
19 files changed, 653 insertions(+), 117 deletions(-)
create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h
create mode 100644 arch/powerpc/include/asm/kup.h
--
2.25.1
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This series flushes the cache on kernel entry and
after kernel user accesses.
Thanks to Nick Piggin, Russell Currey, Christopher M. Riedl, Michael
Ellerman and Spoorthy S for their work in developing, optimising,
testing and backporting these fixes, and to the many others who helped
behind the scenes.
Andrew Donnellan (1):
powerpc: Fix __clear_user() with KUAP enabled
Christophe Leroy (2):
powerpc: Add a framework for user access tracking
powerpc: Implement user_access_begin and friends
Daniel Axtens (2):
powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
powerpc/64s: move some exception handlers out of line
Nicholas Piggin (3):
powerpc/64s: flush L1D on kernel entry
powerpc/uaccess: Evaluate macro arguments once, before user access is
allowed
powerpc/64s: flush L1D after user accesses
Documentation/kernel-parameters.txt | 7 +
.../powerpc/include/asm/book3s/64/kup-radix.h | 22 +++
arch/powerpc/include/asm/exception-64s.h | 13 +-
arch/powerpc/include/asm/feature-fixups.h | 19 +++
arch/powerpc/include/asm/futex.h | 4 +
arch/powerpc/include/asm/kup.h | 40 +++++
arch/powerpc/include/asm/security_features.h | 7 +
arch/powerpc/include/asm/setup.h | 4 +
arch/powerpc/include/asm/uaccess.h | 143 ++++++++++++++----
arch/powerpc/kernel/exceptions-64s.S | 130 ++++++++--------
arch/powerpc/kernel/setup_64.c | 120 +++++++++++++++
arch/powerpc/kernel/vmlinux.lds.S | 14 ++
arch/powerpc/lib/checksum_wrappers.c | 4 +
arch/powerpc/lib/feature-fixups.c | 104 +++++++++++++
arch/powerpc/lib/string.S | 4 +-
arch/powerpc/lib/string_64.S | 6 +-
arch/powerpc/platforms/powernv/setup.c | 15 ++
arch/powerpc/platforms/pseries/setup.c | 8 +
18 files changed, 567 insertions(+), 97 deletions(-)
create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h
create mode 100644 arch/powerpc/include/asm/kup.h
--
2.25.1
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This series flushes the cache on kernel entry and
after kernel user accesses.
Thanks to Nick Piggin, Russell Currey, Christopher M. Riedl, Michael
Ellerman and Spoorthy S for their work in developing, optimising,
testing and backporting these fixes, and to the many others who helped
behind the scenes.
Andrew Donnellan (1):
powerpc: Fix __clear_user() with KUAP enabled
Christophe Leroy (2):
powerpc: Add a framework for user access tracking
powerpc: Implement user_access_begin and friends
Daniel Axtens (2):
powerpc/64s: Define MASKABLE_RELON_EXCEPTION_PSERIES_OOL
powerpc/64s: move some exception handlers out of line
Nicholas Piggin (3):
powerpc/64s: flush L1D on kernel entry
powerpc/uaccess: Evaluate macro arguments once, before user access is
allowed
powerpc/64s: flush L1D after user accesses
.../admin-guide/kernel-parameters.txt | 7 +
.../powerpc/include/asm/book3s/64/kup-radix.h | 22 +++
arch/powerpc/include/asm/exception-64s.h | 13 +-
arch/powerpc/include/asm/feature-fixups.h | 19 +++
arch/powerpc/include/asm/futex.h | 4 +
arch/powerpc/include/asm/kup.h | 40 +++++
arch/powerpc/include/asm/security_features.h | 7 +
arch/powerpc/include/asm/setup.h | 4 +
arch/powerpc/include/asm/uaccess.h | 148 ++++++++++++++----
arch/powerpc/kernel/exceptions-64s.S | 96 +++++++-----
arch/powerpc/kernel/setup_64.c | 122 ++++++++++++++-
arch/powerpc/kernel/vmlinux.lds.S | 14 ++
arch/powerpc/lib/checksum_wrappers.c | 4 +
arch/powerpc/lib/feature-fixups.c | 104 ++++++++++++
arch/powerpc/lib/string.S | 4 +-
arch/powerpc/lib/string_64.S | 6 +-
arch/powerpc/platforms/powernv/setup.c | 17 ++
arch/powerpc/platforms/pseries/setup.c | 8 +
18 files changed, 558 insertions(+), 81 deletions(-)
create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h
create mode 100644 arch/powerpc/include/asm/kup.h
--
2.25.1
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This series flushes the cache on kernel entry and
after kernel user accesses.
Thanks to Nick Piggin, Russell Currey, Christopher M. Riedl, Michael
Ellerman and Spoorthy S for their work in developing, optimising,
testing and backporting these fixes, and to the many others who helped
behind the scenes.
Andrew Donnellan (1):
powerpc: Fix __clear_user() with KUAP enabled
Christophe Leroy (2):
powerpc: Add a framework for user access tracking
powerpc: Implement user_access_begin and friends
Daniel Axtens (1):
powerpc/64s: move some exception handlers out of line
Nicholas Piggin (3):
powerpc/64s: flush L1D on kernel entry
powerpc/uaccess: Evaluate macro arguments once, before user access is
allowed
powerpc/64s: flush L1D after user accesses
.../admin-guide/kernel-parameters.txt | 7 +
.../powerpc/include/asm/book3s/64/kup-radix.h | 22 +++
arch/powerpc/include/asm/exception-64s.h | 9 +-
arch/powerpc/include/asm/feature-fixups.h | 19 +++
arch/powerpc/include/asm/futex.h | 4 +
arch/powerpc/include/asm/kup.h | 40 +++++
arch/powerpc/include/asm/security_features.h | 7 +
arch/powerpc/include/asm/setup.h | 4 +
arch/powerpc/include/asm/uaccess.h | 147 ++++++++++++++----
arch/powerpc/kernel/exceptions-64s.S | 96 +++++++-----
arch/powerpc/kernel/setup_64.c | 122 ++++++++++++++-
arch/powerpc/kernel/vmlinux.lds.S | 14 ++
arch/powerpc/lib/checksum_wrappers.c | 4 +
arch/powerpc/lib/feature-fixups.c | 104 +++++++++++++
arch/powerpc/lib/string_32.S | 4 +-
arch/powerpc/lib/string_64.S | 6 +-
arch/powerpc/platforms/powernv/setup.c | 17 ++
arch/powerpc/platforms/pseries/setup.c | 8 +
18 files changed, 553 insertions(+), 81 deletions(-)
create mode 100644 arch/powerpc/include/asm/book3s/64/kup-radix.h
create mode 100644 arch/powerpc/include/asm/kup.h
--
2.25.1
IBM Power9 processors can speculatively operate on data in the L1
cache before it has been completely validated, via a way-prediction
mechanism. It is not possible for an attacker to determine the
contents of impermissible memory using this method, since these
systems implement a combination of hardware and software security
measures to prevent scenarios where protected data could be leaked.
However these measures don't address the scenario where an attacker
induces the operating system to speculatively execute instructions
using data that the attacker controls. This can be used for example to
speculatively bypass "kernel user access prevention" techniques, as
discovered by Anthony Steinhauser of Google's Safeside Project. This
is not an attack by itself, but there is a possibility it could be
used in conjunction with side-channels or other weaknesses in the
privileged code to construct an attack.
This issue can be mitigated by flushing the L1 cache between privilege
boundaries of concern. This series flushes the cache on kernel entry and
after kernel user accesses.
Thanks to Nick Piggin, Russell Currey, Christopher M. Riedl, Michael
Ellerman and Spoorthy S for their work in developing, optimising,
testing and backporting these fixes, and to the many others who helped
behind the scenes.
Daniel Axtens (1):
selftests/powerpc: entry flush test
Michael Ellerman (1):
powerpc: Only include kup-radix.h for 64-bit Book3S
Nicholas Piggin (2):
powerpc/64s: flush L1D on kernel entry
powerpc/64s: flush L1D after user accesses
Russell Currey (1):
selftests/powerpc: rfi_flush: disable entry flush if present
.../admin-guide/kernel-parameters.txt | 7 +
.../powerpc/include/asm/book3s/64/kup-radix.h | 29 ++--
arch/powerpc/include/asm/exception-64s.h | 12 +-
arch/powerpc/include/asm/feature-fixups.h | 19 ++
arch/powerpc/include/asm/kup.h | 27 ++-
arch/powerpc/include/asm/security_features.h | 7 +
arch/powerpc/include/asm/setup.h | 4 +
arch/powerpc/kernel/exceptions-64s.S | 88 +++++-----
arch/powerpc/kernel/setup_64.c | 122 ++++++++++++-
arch/powerpc/kernel/vmlinux.lds.S | 14 ++
arch/powerpc/lib/feature-fixups.c | 104 +++++++++++
arch/powerpc/platforms/powernv/setup.c | 17 ++
arch/powerpc/platforms/pseries/setup.c | 8 +
.../selftests/powerpc/security/.gitignore | 1 +
.../selftests/powerpc/security/Makefile | 2 +-
.../selftests/powerpc/security/entry_flush.c | 163 ++++++++++++++++++
.../selftests/powerpc/security/rfi_flush.c | 35 +++-
17 files changed, 592 insertions(+), 67 deletions(-)
create mode 100644 tools/testing/selftests/powerpc/security/entry_flush.c
--
2.25.1
Hi,
Please backport commit f9317ae5523f99999fb54c513ebabbb2bc887ddf ("net:
lantiq: Add locking for TX DMA channel") to kernel 5.4.
https://git.kernel.org/linus/f9317ae5523f99999fb54c513ebabbb2bc887ddf
The fix commit was added upstream with kernel 5.9 and fixes a problem
introduced in commit fe1a56420cf2 ("net: lantiq: Add Lantiq / Intel
VRX200 Ethernet driver") with kernel 4.20.
Multiple users reported in the ticket to integrate this into OpenWrt
that this fixes TX hangs for them.
https://github.com/openwrt/openwrt/pull/3085
Hauke
Hi,
Please backport "i2c: mux: pca954x: Add missing pca9546 definition to
chip_desc" to kernel 4.9.
This is upstream commit id dbe4d69d252e9e65c6c46826980b77b11a142065
https://git.kernel.org/linus/dbe4d69d252e9e65c6c46826980b77b11a142065
commit dbe4d69d252e9e65c6c46826980b77b11a142065
Author: Mike Looijmans <mike.looijmans(a)topic.nl>
Date: Thu Mar 23 10:00:36 2017 +0100
i2c: mux: pca954x: Add missing pca9546 definition to chip_desc
The pca954x_of_match table references the chips array at position
pca_9546, but this entry is not filled before.
When a device tree contains a compatible string with "nxp,pca9546", it
will not load successfully without this patch.
This problem was introduced in commit 8a191a7ad4ca ("i2c: pca954x: add
device tree binding") in v4.9 and is fixed upstream with kernel version
4.11.
The commit f8251f1dfda9 ("i2c: mux: pca954x: Add missing pca9542
definition to chip_desc") fixes a similar problem with the pca9542.
https://git.kernel.org/linus/f8251f1dfda9e1200545bf19270d9df2273bdfa1
The changes in the pca954x_acpi_ids should not be backported as it does
not exist in 4.9.
Hauke
On Thu, Nov 19, 2020 at 1:44 PM Tao Zhou <ouwen210(a)hotmail.com> wrote:
> [...]
> That time I realized something, but..
> I try to remember something and get some impression.
>
> We need to update the below when do not need to enqueue entity because
> this is added for runnable_avg updating,
>
> update_load_avg(cfs_rq, se, UPDATE_TG);
> se_update_runnable(se);
>
> Earlier version do not introduce the above to only update runnable_avg.
> Use one *for loop* is enough though. Please correct me if I am wrong.
>
Thanks a lot Tao! I'm not sure, I'm definitely not an expert in the
scheduler. Will defer this one to Vincent / Peter / Phil / Ben.
Cheers!
I'm announcing the release of the 4.9.244 kernel.
All users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/kernel-parameters.txt | 8
Makefile | 2
arch/x86/events/intel/pt.c | 4
arch/x86/kernel/cpu/bugs.c | 52 +-
drivers/block/xen-blkback/blkback.c | 22 -
drivers/block/xen-blkback/xenbus.c | 5
drivers/char/random.c | 1
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 -
drivers/gpu/drm/gma500/psb_irq.c | 34 -
drivers/iommu/amd_iommu_types.h | 6
drivers/misc/mei/client.h | 4
drivers/net/can/dev.c | 14
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 32 +
drivers/net/geneve.c | 36 +
drivers/net/wan/cosa.c | 1
drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2
drivers/net/xen-netback/common.h | 15
drivers/net/xen-netback/interface.c | 61 ++
drivers/net/xen-netback/netback.c | 11
drivers/net/xen-netback/rx.c | 13
drivers/of/address.c | 4
drivers/pinctrl/aspeed/pinctrl-aspeed.c | 7
drivers/pinctrl/devicetree.c | 26 -
drivers/pinctrl/pinctrl-amd.c | 6
drivers/regulator/core.c | 2
drivers/scsi/device_handler/scsi_dh_alua.c | 9
drivers/scsi/hpsa.c | 4
drivers/usb/class/cdc-acm.c | 9
drivers/usb/gadget/udc/goku_udc.c | 2
drivers/xen/events/events_2l.c | 9
drivers/xen/events/events_base.c | 422 +++++++++++++++++--
drivers/xen/events/events_fifo.c | 82 +--
drivers/xen/events/events_internal.h | 20
drivers/xen/evtchn.c | 7
drivers/xen/xen-pciback/pci_stub.c | 14
drivers/xen/xen-pciback/pciback.h | 12
drivers/xen/xen-pciback/pciback_ops.c | 48 +-
drivers/xen/xen-pciback/xenbus.c | 2
drivers/xen/xen-scsiback.c | 23 -
fs/btrfs/extent_io.c | 4
fs/btrfs/ioctl.c | 2
fs/cifs/cifs_unicode.c | 8
fs/ext4/inline.c | 1
fs/ext4/super.c | 5
fs/gfs2/glock.c | 3
fs/gfs2/rgrp.c | 5
fs/ocfs2/super.c | 1
fs/xfs/libxfs/xfs_rmap.c | 2
fs/xfs/libxfs/xfs_rmap_btree.c | 16
fs/xfs/xfs_iops.c | 10
fs/xfs/xfs_pnfs.c | 2
include/linux/can/skb.h | 20
include/linux/perf_event.h | 2
include/linux/prandom.h | 36 +
include/linux/time64.h | 4
include/xen/events.h | 29 +
kernel/events/core.c | 42 -
kernel/events/internal.h | 2
kernel/exit.c | 5
kernel/irq/Kconfig | 1
kernel/reboot.c | 28 -
kernel/time/timer.c | 7
kernel/trace/ring_buffer.c | 54 ++
lib/random32.c | 462 ++++++++++++---------
lib/swiotlb.c | 6
mm/mempolicy.c | 6
net/ipv4/syncookies.c | 9
net/ipv6/sit.c | 2
net/ipv6/syncookies.c | 10
net/iucv/af_iucv.c | 3
net/mac80211/tx.c | 35 +
net/wireless/reg.c | 2
net/x25/af_x25.c | 2
net/xfrm/xfrm_state.c | 8
sound/hda/ext/hdac_ext_controller.c | 2
tools/perf/util/session.c | 1
78 files changed, 1446 insertions(+), 548 deletions(-)
Al Viro (1):
don't dump the threads that had been already exiting when zapped.
Alexander Aring (1):
gfs2: Wake up when sd_glock_disposal becomes zero
Alexander Usyskin (1):
mei: protect mei_cl_mtu from null dereference
Anand K Mistry (1):
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
Billy Tsai (1):
pinctrl: aspeed: Fix GPI only function problem.
Bob Peterson (2):
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
gfs2: check for live vs. read-only file system in gfs2_fitrim
Boris Protopopov (1):
Convert trailing spaces and periods in path components
Brian Foster (1):
xfs: flush new eof page on truncate to avoid post-eof corruption
Chris Brandt (1):
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
Christoph Hellwig (1):
xfs: fix a missing unlock on error in xfs_fs_map_blocks
Christophe JAILLET (1):
i40e: Fix a potential NULL pointer dereference
Coiby Xu (2):
pinctrl: amd: use higher precision for 512 RtcClk
pinctrl: amd: fix incorrect way to disable debounce filter
Dan Carpenter (2):
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
can: peak_usb: add range checking in decode operations
Darrick J. Wong (2):
xfs: fix flags argument to rmap lookup when converting shared file rmaps
xfs: fix rmap key and record comparison functions
Eric Biggers (1):
ext4: fix leaking sysfs kobject after failed mount
Evan Nimmo (1):
of/address: Fix of_node memory leak in of_dma_is_coherent
Evan Quan (1):
drm/amdgpu: perform srbm soft reset always on SDMA resume
Evgeny Novikov (1):
usb: gadget: goku_udc: fix potential crashes in probe
Filipe Manana (1):
Btrfs: fix missing error return if writeback for extent buffer never started
George Spelvin (1):
random32: make prandom_u32() output unpredictable
Greg Kroah-Hartman (1):
Linux 4.9.244
Grzegorz Siwik (1):
i40e: Wrong truncation from u16 to u8
Hannes Reinecke (1):
scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
Jiri Olsa (2):
perf tools: Add missing swap for ino_generation
perf/core: Fix race in the perf_mmap_close() function
Johannes Berg (1):
mac80211: fix use of skb payload instead of header
Johannes Thumshirn (1):
btrfs: reschedule when cloning lots of extents
Joseph Qi (1):
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
Juergen Gross (12):
xen/events: avoid removing an event channel while handling it
xen/events: add a proper barrier to 2-level uevent unmasking
xen/events: fix race in evtchn_fifo_unmask()
xen/events: add a new "late EOI" evtchn framework
xen/blkback: use lateeoi irq binding
xen/netback: use lateeoi irq binding
xen/scsiback: use lateeoi irq binding
xen/pciback: use lateeoi irq binding
xen/events: switch user event channels to lateeoi model
xen/events: use a common cpu hotplug hook for event channels
xen/events: defer eoi in case of excessive number of events
xen/events: block rogue events for some time
Kaixu Xia (1):
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
Keita Suzuki (1):
scsi: hpsa: Fix memory leak in hpsa_init_one()
Mao Wenan (1):
net: Update window_clamp if SOCK_RCVBUF is set
Marc Zyngier (1):
genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
Mark Gray (1):
geneve: add transport ports in route lookup for geneve
Martin Schiller (1):
net/x25: Fix null-ptr-deref in x25_connect
Martyna Szapar (2):
i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
i40e: Memory leak in i40e_config_iwarp_qvlist
Masashi Honma (1):
ath9k_htc: Use appropriate rs_datalen type
Mathieu Poirier (1):
perf/core: Fix crash when using HW tracing kernel filters
Matteo Croce (2):
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
reboot: fix overflow parsing reboot cpu number
Michał Mirosław (1):
regulator: defer probe when trying to get voltage from unresolved supply
Oleksij Rempel (1):
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
Oliver Hartkopp (1):
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
Oliver Herms (1):
IPv6: Set SIT tunnel hard_header_len to zero
Peter Zijlstra (1):
perf: Fix get_recursion_context()
Sergey Nemov (1):
i40e: add num_vectors checker in iwarp handler
Shijie Luo (1):
mm: mempolicy: fix potential pte_unmap_unlock pte error
Song Liu (1):
perf/core: Fix bad use of igrab()
Stefano Stabellini (1):
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Stephane Grosjean (1):
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
Steven Rostedt (VMware) (1):
ring-buffer: Fix recursion protection transitions between interrupt context
Suravee Suthikulpanit (1):
iommu/amd: Increase interrupt remapping table limit to 512 entries
Thomas Zimmermann (1):
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
Ursula Braun (1):
net/af_iucv: fix null pointer dereference on shutdown
Vincent Mailhol (1):
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
Wang Hai (1):
cosa: Add missing kfree in error path of cosa_write
Wengang Wang (1):
ocfs2: initialize ip_next_orphan
Will Deacon (1):
pinctrl: devicetree: Avoid taking direct reference to device name string
Ye Bin (1):
cfg80211: regulatory: Fix inconsistent format argument
Zeng Tao (1):
time: Prevent undefined behaviour in timespec64_to_ns()
kiyin(尹亮) (1):
perf/core: Fix a memory leak in perf_event_parse_addr_filter()
zhuoliang zhang (1):
net: xfrm: fix a race condition during allocing spi
I'm announcing the release of the 4.4.244 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/kernel-parameters.txt | 8
Makefile | 2
arch/x86/kernel/cpu/bugs.c | 52 +-
drivers/block/xen-blkback/blkback.c | 22
drivers/block/xen-blkback/xenbus.c | 5
drivers/char/random.c | 2
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 -
drivers/gpu/drm/gma500/psb_irq.c | 34 -
drivers/iommu/amd_iommu_types.h | 6
drivers/misc/mei/client.h | 4
drivers/net/can/dev.c | 14
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4
drivers/net/geneve.c | 36 +
drivers/net/wan/cosa.c | 1
drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2
drivers/net/xen-netback/common.h | 39 +
drivers/net/xen-netback/interface.c | 59 ++
drivers/net/xen-netback/netback.c | 17
drivers/of/address.c | 4
drivers/pinctrl/devicetree.c | 26 -
drivers/pinctrl/pinctrl-amd.c | 6
drivers/usb/class/cdc-acm.c | 9
drivers/usb/gadget/udc/goku_udc.c | 2
drivers/xen/events/events_2l.c | 9
drivers/xen/events/events_base.c | 444 ++++++++++++++++++--
drivers/xen/events/events_fifo.c | 102 +---
drivers/xen/events/events_internal.h | 20
drivers/xen/evtchn.c | 7
drivers/xen/xen-pciback/pci_stub.c | 14
drivers/xen/xen-pciback/pciback.h | 12
drivers/xen/xen-pciback/pciback_ops.c | 48 +-
drivers/xen/xen-pciback/xenbus.c | 2
drivers/xen/xen-scsiback.c | 23 -
fs/btrfs/extent_io.c | 4
fs/btrfs/ioctl.c | 2
fs/cifs/cifs_unicode.c | 8
fs/ext4/inline.c | 1
fs/ext4/super.c | 5
fs/gfs2/glock.c | 3
fs/gfs2/rgrp.c | 5
fs/ocfs2/super.c | 1
fs/xfs/xfs_pnfs.c | 2
include/linux/can/skb.h | 20
include/linux/prandom.h | 36 +
include/linux/time64.h | 4
include/xen/events.h | 29 +
kernel/events/core.c | 7
kernel/events/internal.h | 2
kernel/exit.c | 5
kernel/reboot.c | 28 -
kernel/time/timer.c | 7
kernel/trace/ring_buffer.c | 54 +-
lib/random32.c | 463 ++++++++++++---------
lib/swiotlb.c | 6
mm/mempolicy.c | 6
net/ipv4/syncookies.c | 9
net/ipv6/sit.c | 2
net/ipv6/syncookies.c | 10
net/iucv/af_iucv.c | 3
net/mac80211/tx.c | 35 +
net/wireless/reg.c | 2
net/x25/af_x25.c | 2
net/xfrm/xfrm_state.c | 8
sound/hda/ext/hdac_ext_controller.c | 2
tools/perf/util/session.c | 1
67 files changed, 1412 insertions(+), 521 deletions(-)
Al Viro (1):
don't dump the threads that had been already exiting when zapped.
Alexander Aring (1):
gfs2: Wake up when sd_glock_disposal becomes zero
Alexander Usyskin (1):
mei: protect mei_cl_mtu from null dereference
Anand K Mistry (1):
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
Bob Peterson (2):
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
gfs2: check for live vs. read-only file system in gfs2_fitrim
Boris Protopopov (1):
Convert trailing spaces and periods in path components
Chris Brandt (1):
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
Christoph Hellwig (1):
xfs: fix a missing unlock on error in xfs_fs_map_blocks
Coiby Xu (2):
pinctrl: amd: use higher precision for 512 RtcClk
pinctrl: amd: fix incorrect way to disable debounce filter
Dan Carpenter (2):
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
can: peak_usb: add range checking in decode operations
Eric Biggers (1):
ext4: fix leaking sysfs kobject after failed mount
Evan Nimmo (1):
of/address: Fix of_node memory leak in of_dma_is_coherent
Evan Quan (1):
drm/amdgpu: perform srbm soft reset always on SDMA resume
Evgeny Novikov (1):
usb: gadget: goku_udc: fix potential crashes in probe
Filipe Manana (1):
Btrfs: fix missing error return if writeback for extent buffer never started
George Spelvin (1):
random32: make prandom_u32() output unpredictable
Greg Kroah-Hartman (1):
Linux 4.4.244
Grzegorz Siwik (1):
i40e: Wrong truncation from u16 to u8
Jiri Olsa (2):
perf tools: Add missing swap for ino_generation
perf/core: Fix race in the perf_mmap_close() function
Johannes Berg (1):
mac80211: fix use of skb payload instead of header
Johannes Thumshirn (1):
btrfs: reschedule when cloning lots of extents
Joseph Qi (1):
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
Juergen Gross (12):
xen/events: avoid removing an event channel while handling it
xen/events: add a proper barrier to 2-level uevent unmasking
xen/events: fix race in evtchn_fifo_unmask()
xen/events: add a new "late EOI" evtchn framework
xen/blkback: use lateeoi irq binding
xen/netback: use lateeoi irq binding
xen/scsiback: use lateeoi irq binding
xen/pciback: use lateeoi irq binding
xen/events: switch user event channels to lateeoi model
xen/events: use a common cpu hotplug hook for event channels
xen/events: defer eoi in case of excessive number of events
xen/events: block rogue events for some time
Kaixu Xia (1):
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
Mao Wenan (1):
net: Update window_clamp if SOCK_RCVBUF is set
Mark Gray (1):
geneve: add transport ports in route lookup for geneve
Martin Schiller (1):
net/x25: Fix null-ptr-deref in x25_connect
Martyna Szapar (1):
i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
Masashi Honma (1):
ath9k_htc: Use appropriate rs_datalen type
Matteo Croce (2):
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
reboot: fix overflow parsing reboot cpu number
Oleksij Rempel (1):
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
Oliver Hartkopp (1):
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
Oliver Herms (1):
IPv6: Set SIT tunnel hard_header_len to zero
Peter Zijlstra (1):
perf: Fix get_recursion_context()
Shijie Luo (1):
mm: mempolicy: fix potential pte_unmap_unlock pte error
Stefano Stabellini (1):
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Stephane Grosjean (1):
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
Steven Rostedt (VMware) (1):
ring-buffer: Fix recursion protection transitions between interrupt context
Suravee Suthikulpanit (1):
iommu/amd: Increase interrupt remapping table limit to 512 entries
Thomas Zimmermann (1):
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
Ursula Braun (1):
net/af_iucv: fix null pointer dereference on shutdown
Vincent Mailhol (1):
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
Wang Hai (1):
cosa: Add missing kfree in error path of cosa_write
Wengang Wang (1):
ocfs2: initialize ip_next_orphan
Will Deacon (1):
pinctrl: devicetree: Avoid taking direct reference to device name string
Ye Bin (1):
cfg80211: regulatory: Fix inconsistent format argument
Zeng Tao (1):
time: Prevent undefined behaviour in timespec64_to_ns()
zhuoliang zhang (1):
net: xfrm: fix a race condition during allocing spi
Reshape request should be blocked with ongoing resync job. In cluster
env, a node can start resync job even if the resync cmd isn't executed
on it, e.g., user executes "mdadm --grow" on node A, sometimes node B
will start resync job. However, current update_raid_disks() only check
local recovery status, which is incomplete. As a result, we see user will
execute "mdadm --grow" successfully on local, while the remote node deny
to do reshape job when it doing resync job. The inconsistent handling
cause array enter unexpected status. If user doesn't observe this issue
and continue executing mdadm cmd, the array doesn't work at last.
Fix this issue by blocking reshape request. When node executes "--grow"
and detects ongoing resync, it should stop and report error to user.
The following script reproduces the issue with ~100% probability.
(two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB)
```
# on node1, node2 is the remote node.
ssh root@node2 "mdadm -S --scan"
mdadm -S --scan
for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
count=20; done
mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh
ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"
sleep 5
mdadm --manage --add /dev/md0 /dev/sdi
mdadm --wait /dev/md0
mdadm --grow --raid-devices=3 /dev/md0
mdadm /dev/md0 --fail /dev/sdg
mdadm /dev/md0 --remove /dev/sdg
mdadm --grow --raid-devices=2 /dev/md0
```
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhao Heming <heming.zhao(a)suse.com>
---
drivers/md/md.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 98bac4f304ae..74280e353b8f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7278,6 +7278,7 @@ static int update_raid_disks(struct mddev *mddev, int raid_disks)
return -EINVAL;
if (mddev->sync_thread ||
test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+ test_bit(MD_RESYNCING_REMOTE, &mddev->recovery) ||
mddev->reshape_position != MaxSector)
return -EBUSY;
@@ -9662,8 +9663,11 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
}
}
- if (mddev->raid_disks != le32_to_cpu(sb->raid_disks))
- update_raid_disks(mddev, le32_to_cpu(sb->raid_disks));
+ if (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) {
+ ret = update_raid_disks(mddev, le32_to_cpu(sb->raid_disks));
+ if (ret)
+ pr_warn("md: updating array disks failed. %d\n", ret);
+ }
/*
* Since mddev->delta_disks has already updated in update_raid_disks,
--
2.27.0
[This is backport for 4.9 of 29daf869cbab69088fe1755d9dd224e99ba78b56]
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.
This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.
Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.
Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.16024928…
---
arch/powerpc/kernel/head_8xx.S | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 2274be535dda..3801b32b1642 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -359,11 +359,9 @@ InstructionTLBMiss:
/* Load the MI_TWC with the attributes for this "segment." */
MTSPR_CPU6(SPRN_MI_TWC, r11, r3) /* Set segment attributes */
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-11, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
li r11, RPN_PATTERN
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 20-23 and 28 must be clear.
@@ -443,11 +441,9 @@ _ENTRY(DTLBMiss_jmp)
* r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
* r10 = (r10 & ~PRESENT) | r11;
*/
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-11, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 22 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
--
2.25.0
[This is backport for 4.4 of 29daf869cbab69088fe1755d9dd224e99ba78b56]
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.
This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.
Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.
Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.16024928…
---
arch/powerpc/kernel/head_8xx.S | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 01e274e6907b..3d7512e72900 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -361,11 +361,9 @@ InstructionTLBMiss:
/* Load the MI_TWC with the attributes for this "segment." */
MTSPR_CPU6(SPRN_MI_TWC, r11, r3) /* Set segment attributes */
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-11, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
li r11, RPN_PATTERN
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 20-23 and 28 must be clear.
@@ -436,11 +434,9 @@ DataStoreTLBMiss:
* r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
* r10 = (r10 & ~PRESENT) | r11;
*/
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-11, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 22 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
--
2.25.0
[This is backport for 4.19 of 29daf869cbab69088fe1755d9dd224e99ba78b56]
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.
This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.
Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.
Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.16024928…
---
arch/powerpc/kernel/head_8xx.S | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 9fd2ff28b8ff..dc99258f2e8c 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -356,11 +356,9 @@ _ENTRY(ITLBMiss_cmp)
/* Load the MI_TWC with the attributes for this "segment." */
mtspr SPRN_MI_TWC, r11 /* Set segment attributes */
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-7, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
li r11, RPN_PATTERN | 0x200
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 20 and 23 must be clear.
@@ -482,11 +480,9 @@ _ENTRY(DTLBMiss_jmp)
* r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
* r10 = (r10 & ~PRESENT) | r11;
*/
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-7, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 24, 25, 26, and 27 must be
* set. All other Linux PTE bits control the behavior
--
2.25.0
[This is backport for 4.14 of 29daf869cbab69088fe1755d9dd224e99ba78b56]
The kernel expects pte_young() to work regardless of CONFIG_SWAP.
Make sure a minor fault is taken to set _PAGE_ACCESSED when it
is not already set, regardless of the selection of CONFIG_SWAP.
This adds at least 3 instructions to the TLB miss exception
handlers fast path. Following patch will reduce this overhead.
Also update the rotation instruction to the correct number of bits
to reflect all changes done to _PAGE_ACCESSED over time.
Fixes: d069cb4373fe ("powerpc/8xx: Don't touch ACCESSED when no SWAP.")
Fixes: 5f356497c384 ("powerpc/8xx: remove unused _PAGE_WRITETHRU")
Fixes: e0a8e0d90a9f ("powerpc/8xx: Handle PAGE_USER via APG bits")
Fixes: 5b2753fc3e8a ("powerpc/8xx: Implementation of PAGE_EXEC")
Fixes: a891c43b97d3 ("powerpc/8xx: Prepare handlers for _PAGE_HUGE for 512k pages.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/af834e8a0f1fa97bfae65664950f0984a70c4750.16024928…
---
arch/powerpc/kernel/head_8xx.S | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/kernel/head_8xx.S b/arch/powerpc/kernel/head_8xx.S
index 2d0d89e2cb9a..43884af0e35c 100644
--- a/arch/powerpc/kernel/head_8xx.S
+++ b/arch/powerpc/kernel/head_8xx.S
@@ -398,11 +398,9 @@ _ENTRY(ITLBMiss_cmp)
#if defined (CONFIG_HUGETLB_PAGE) && defined (CONFIG_PPC_4K_PAGES)
rlwimi r10, r11, 1, MI_SPS16K
#endif
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-11, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
li r11, RPN_PATTERN
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 20-23 and 28 must be clear.
@@ -528,11 +526,9 @@ _ENTRY(DTLBMiss_jmp)
* r11 = ((r10 & PRESENT) & ((r10 & ACCESSED) >> 5));
* r10 = (r10 & ~PRESENT) | r11;
*/
-#ifdef CONFIG_SWAP
- rlwinm r11, r10, 32-5, _PAGE_PRESENT
+ rlwinm r11, r10, 32-11, _PAGE_PRESENT
and r11, r11, r10
rlwimi r10, r11, 0, _PAGE_PRESENT
-#endif
/* The Linux PTE won't go exactly into the MMU TLB.
* Software indicator bits 22 and 28 must be clear.
* Software indicator bits 24, 25, 26, and 27 must be
--
2.25.0
The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb().
This ends up to page_frag_alloc() to allocate skb->data from
page_frag_cache->va.
During the memory pressure, page_frag_cache->va may be allocated as
pfmemalloc page. As a result, the skb->pfmemalloc is always true as
skb->data is from page_frag_cache->va. The skb will be dropped if the
sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour
under memory pressure.
However, once kernel is not under memory pressure any longer (suppose large
amount of memory pages are just reclaimed), the page_frag_alloc() may still
re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a
result, the skb->pfmemalloc is always true unless page_frag_cache->va is
re-allocated, even if the kernel is not under memory pressure any longer.
Here is how kernel runs into issue.
1. The kernel is under memory pressure and allocation of
PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead,
the pfmemalloc page is allocated for page_frag_cache->va.
2: All skb->data from page_frag_cache->va (pfmemalloc) will have
skb->pfmemalloc=true. The skb will always be dropped by sock without
SOCK_MEMALLOC. This is an expected behaviour.
3. Suppose a large amount of pages are reclaimed and kernel is not under
memory pressure any longer. We expect skb->pfmemalloc drop will not happen.
4. Unfortunately, page_frag_alloc() does not proactively re-allocate
page_frag_alloc->va and will always re-use the prior pfmemalloc page. The
skb->pfmemalloc is always true even kernel is not under memory pressure any
longer.
Fix this by freeing and re-allocating the page instead of recycling it.
References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/
References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/
Suggested-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Bert Barbe <bert.barbe(a)oracle.com>
Cc: Rama Nichanamatlu <rama.nichanamatlu(a)oracle.com>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra(a)oracle.com>
Cc: Manjunath Patil <manjunath.b.patil(a)oracle.com>
Cc: Joe Jin <joe.jin(a)oracle.com>
Cc: SRINIVAS <srinivas.eeda(a)oracle.com>
Cc: stable(a)vger.kernel.org
Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve")
Signed-off-by: Dongli Zhang <dongli.zhang(a)oracle.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
---
Changed since v1:
- change author from Matthew to Dongli
- Add references to all prior discussions
- Add more details to commit message
Changed since v2:
- add unlikely (suggested by Eric Dumazet)
mm/page_alloc.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 23f5066bd4a5..91129ce75ed4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5103,6 +5103,11 @@ void *page_frag_alloc(struct page_frag_cache *nc,
if (!page_ref_sub_and_test(page, nc->pagecnt_bias))
goto refill;
+ if (unlikely(nc->pfmemalloc)) {
+ free_the_page(page, compound_order(page));
+ goto refill;
+ }
+
#if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE)
/* if size can vary use size else just use PAGE_SIZE */
size = nc->size;
--
2.17.1
Hi Greg, Sasha,
This was missing in 4.9-stable. First patch is only needed so that
applying the second patch becomes easy. If its not accepted I can manually
backport it. Please add it to your queue.
--
Regards
Sudip
Hi Greg, Sasha,
This was missing in 4.14-stable. First patch is only needed so that
applying the second patch becomes easy. If its not accepted I can manually
backport it. Please add it to your queue.
--
Regards
Sudip
Please CC me in any replies as I am not subscribed to the list.
This is a legitimate request as I often need more than two days
especially on busy work days or weekends.
On Tue, 2020-11-17 at 09:01 +0100, Pavel Machek wrote:
> On Sat 2020-11-14 17:40:36, Hussam Al-Tayeb wrote:
> > Hello. I would like to suggest lengthening the review period for
> > stable
> > releases from 48 hours to 7 days.
> > The rationale is that 48 hours is not enough for people to test
> > those
> > stable releases and make sure there are no regressions for
> > particular
> > workflows.
>
> You should probably cc stable list and Greg with this.
>
> And yes, I believe that would be good idea.
>
> Plus the period is very often shorter than advertised, which might be
> also good to fix.
>
> Best regards,
> pavel
>
This is the start of the stable review cycle for the 4.9.244 release.
There are 78 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 19 Nov 2020 12:20:51 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.244-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.244-rc1
Boris Protopopov <pboris(a)amazon.com>
Convert trailing spaces and periods in path components
Eric Biggers <ebiggers(a)google.com>
ext4: fix leaking sysfs kobject after failed mount
Matteo Croce <mcroce(a)microsoft.com>
reboot: fix overflow parsing reboot cpu number
Matteo Croce <mcroce(a)microsoft.com>
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
Jiri Olsa <jolsa(a)redhat.com>
perf/core: Fix race in the perf_mmap_close() function
Juergen Gross <jgross(a)suse.com>
xen/events: block rogue events for some time
Juergen Gross <jgross(a)suse.com>
xen/events: defer eoi in case of excessive number of events
Juergen Gross <jgross(a)suse.com>
xen/events: use a common cpu hotplug hook for event channels
Juergen Gross <jgross(a)suse.com>
xen/events: switch user event channels to lateeoi model
Juergen Gross <jgross(a)suse.com>
xen/pciback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/scsiback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/netback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/blkback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/events: add a new "late EOI" evtchn framework
Juergen Gross <jgross(a)suse.com>
xen/events: fix race in evtchn_fifo_unmask()
Juergen Gross <jgross(a)suse.com>
xen/events: add a proper barrier to 2-level uevent unmasking
Juergen Gross <jgross(a)suse.com>
xen/events: avoid removing an event channel while handling it
kiyin(尹亮) <kiyin(a)tencent.com>
perf/core: Fix a memory leak in perf_event_parse_addr_filter()
Mathieu Poirier <mathieu.poirier(a)linaro.org>
perf/core: Fix crash when using HW tracing kernel filters
Song Liu <songliubraving(a)fb.com>
perf/core: Fix bad use of igrab()
Anand K Mistry <amistry(a)google.com>
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
George Spelvin <lkml(a)sdf.org>
random32: make prandom_u32() output unpredictable
Mao Wenan <wenan.mao(a)linux.alibaba.com>
net: Update window_clamp if SOCK_RCVBUF is set
Martin Schiller <ms(a)dev.tdt.de>
net/x25: Fix null-ptr-deref in x25_connect
Ursula Braun <ubraun(a)linux.ibm.com>
net/af_iucv: fix null pointer dereference on shutdown
Oliver Herms <oliver.peter.herms(a)gmail.com>
IPv6: Set SIT tunnel hard_header_len to zero
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Coiby Xu <coiby.xu(a)gmail.com>
pinctrl: amd: fix incorrect way to disable debounce filter
Coiby Xu <coiby.xu(a)gmail.com>
pinctrl: amd: use higher precision for 512 RtcClk
Thomas Zimmermann <tzimmermann(a)suse.de>
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
Al Viro <viro(a)zeniv.linux.org.uk>
don't dump the threads that had been already exiting when zapped.
Wengang Wang <wen.gang.wang(a)oracle.com>
ocfs2: initialize ip_next_orphan
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: protect mei_cl_mtu from null dereference
Chris Brandt <chris.brandt(a)renesas.com>
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
Joseph Qi <joseph.qi(a)linux.alibaba.com>
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
Kaixu Xia <kaixuxia(a)tencent.com>
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
Peter Zijlstra <peterz(a)infradead.org>
perf: Fix get_recursion_context()
Wang Hai <wanghai38(a)huawei.com>
cosa: Add missing kfree in error path of cosa_write
Evan Nimmo <evan.nimmo(a)alliedtelesis.co.nz>
of/address: Fix of_node memory leak in of_dma_is_coherent
Christoph Hellwig <hch(a)lst.de>
xfs: fix a missing unlock on error in xfs_fs_map_blocks
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix rmap key and record comparison functions
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix flags argument to rmap lookup when converting shared file rmaps
Billy Tsai <billy_tsai(a)aspeedtech.com>
pinctrl: aspeed: Fix GPI only function problem.
Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
iommu/amd: Increase interrupt remapping table limit to 512 entries
Hannes Reinecke <hare(a)suse.de>
scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
Ye Bin <yebin10(a)huawei.com>
cfg80211: regulatory: Fix inconsistent format argument
Johannes Berg <johannes.berg(a)intel.com>
mac80211: always wind down STA state
Johannes Berg <johannes.berg(a)intel.com>
mac80211: fix use of skb payload instead of header
Evan Quan <evan.quan(a)amd.com>
drm/amdgpu: perform srbm soft reset always on SDMA resume
Keita Suzuki <keitasuzuki.park(a)sslab.ics.keio.ac.jp>
scsi: hpsa: Fix memory leak in hpsa_init_one()
Bob Peterson <rpeterso(a)redhat.com>
gfs2: check for live vs. read-only file system in gfs2_fitrim
Bob Peterson <rpeterso(a)redhat.com>
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
Evgeny Novikov <novikov(a)ispras.ru>
usb: gadget: goku_udc: fix potential crashes in probe
Masashi Honma <masashi.honma(a)gmail.com>
ath9k_htc: Use appropriate rs_datalen type
Mark Gray <mark.d.gray(a)redhat.com>
geneve: add transport ports in route lookup for geneve
Martyna Szapar <martyna.szapar(a)intel.com>
i40e: Memory leak in i40e_config_iwarp_qvlist
Martyna Szapar <martyna.szapar(a)intel.com>
i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
Grzegorz Siwik <grzegorz.siwik(a)intel.com>
i40e: Wrong truncation from u16 to u8
Sergey Nemov <sergey.nemov(a)intel.com>
i40e: add num_vectors checker in iwarp handler
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
i40e: Fix a potential NULL pointer dereference
Will Deacon <will(a)kernel.org>
pinctrl: devicetree: Avoid taking direct reference to device name string
Filipe Manana <fdmanana(a)suse.com>
Btrfs: fix missing error return if writeback for extent buffer never started
Brian Foster <bfoster(a)redhat.com>
xfs: flush new eof page on truncate to avoid post-eof corruption
Stephane Grosjean <s.grosjean(a)peak-system.com>
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
Dan Carpenter <dan.carpenter(a)oracle.com>
can: peak_usb: add range checking in decode operations
Oleksij Rempel <o.rempel(a)pengutronix.de>
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
Oliver Hartkopp <socketcan(a)hartkopp.net>
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
Dan Carpenter <dan.carpenter(a)oracle.com>
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
Jiri Olsa <jolsa(a)kernel.org>
perf tools: Add missing swap for ino_generation
zhuoliang zhang <zhuoliang.zhang(a)mediatek.com>
net: xfrm: fix a race condition during allocing spi
Marc Zyngier <maz(a)kernel.org>
genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
btrfs: reschedule when cloning lots of extents
Zeng Tao <prime.zeng(a)hisilicon.com>
time: Prevent undefined behaviour in timespec64_to_ns()
Shijie Luo <luoshijie1(a)huawei.com>
mm: mempolicy: fix potential pte_unmap_unlock pte error
Alexander Aring <aahringo(a)redhat.com>
gfs2: Wake up when sd_glock_disposal becomes zero
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ring-buffer: Fix recursion protection transitions between interrupt context
Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
regulator: defer probe when trying to get voltage from unresolved supply
-------------
Diffstat:
Documentation/kernel-parameters.txt | 8 +
Makefile | 4 +-
arch/x86/events/intel/pt.c | 4 +-
arch/x86/kernel/cpu/bugs.c | 52 ++-
drivers/block/xen-blkback/blkback.c | 22 +-
drivers/block/xen-blkback/xenbus.c | 5 +-
drivers/char/random.c | 1 -
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 +-
drivers/gpu/drm/gma500/psb_irq.c | 34 +-
drivers/iommu/amd_iommu_types.h | 6 +-
drivers/misc/mei/client.h | 4 +-
drivers/net/can/dev.c | 14 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++-
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 ++-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 32 +-
drivers/net/geneve.c | 36 +-
drivers/net/wan/cosa.c | 1 +
drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2 +-
drivers/net/xen-netback/common.h | 15 +
drivers/net/xen-netback/interface.c | 61 ++-
drivers/net/xen-netback/netback.c | 11 +-
drivers/net/xen-netback/rx.c | 13 +-
drivers/of/address.c | 4 +-
drivers/pinctrl/aspeed/pinctrl-aspeed.c | 7 +-
drivers/pinctrl/devicetree.c | 26 +-
drivers/pinctrl/pinctrl-amd.c | 6 +-
drivers/regulator/core.c | 2 +
drivers/scsi/device_handler/scsi_dh_alua.c | 9 +-
drivers/scsi/hpsa.c | 4 +-
drivers/usb/class/cdc-acm.c | 9 +
drivers/usb/gadget/udc/goku_udc.c | 2 +-
drivers/xen/events/events_2l.c | 9 +-
drivers/xen/events/events_base.c | 422 +++++++++++++++++--
drivers/xen/events/events_fifo.c | 82 ++--
drivers/xen/events/events_internal.h | 20 +-
drivers/xen/evtchn.c | 7 +-
drivers/xen/xen-pciback/pci_stub.c | 14 +-
drivers/xen/xen-pciback/pciback.h | 12 +-
drivers/xen/xen-pciback/pciback_ops.c | 48 ++-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 23 +-
fs/btrfs/extent_io.c | 4 +
fs/btrfs/ioctl.c | 2 +
fs/cifs/cifs_unicode.c | 8 +-
fs/ext4/inline.c | 1 +
fs/ext4/super.c | 5 +-
fs/gfs2/glock.c | 3 +-
fs/gfs2/rgrp.c | 5 +-
fs/ocfs2/super.c | 1 +
fs/xfs/libxfs/xfs_rmap.c | 2 +-
fs/xfs/libxfs/xfs_rmap_btree.c | 16 +-
fs/xfs/xfs_iops.c | 10 +
fs/xfs/xfs_pnfs.c | 2 +-
include/linux/can/skb.h | 20 +-
include/linux/perf_event.h | 2 +-
include/linux/prandom.h | 36 +-
include/linux/time64.h | 4 +
include/xen/events.h | 29 +-
kernel/events/core.c | 42 +-
kernel/events/internal.h | 2 +-
kernel/exit.c | 5 +-
kernel/irq/Kconfig | 1 +
kernel/reboot.c | 28 +-
kernel/time/timer.c | 7 -
kernel/trace/ring_buffer.c | 54 ++-
lib/random32.c | 462 +++++++++++++--------
lib/swiotlb.c | 6 +-
mm/mempolicy.c | 6 +-
net/ipv4/syncookies.c | 9 +-
net/ipv6/sit.c | 2 -
net/ipv6/syncookies.c | 10 +-
net/iucv/af_iucv.c | 3 +-
net/mac80211/sta_info.c | 18 +
net/mac80211/tx.c | 35 +-
net/wireless/reg.c | 2 +-
net/x25/af_x25.c | 2 +-
net/xfrm/xfrm_state.c | 8 +-
sound/hda/ext/hdac_ext_controller.c | 2 +
tools/perf/util/session.c | 1 +
79 files changed, 1465 insertions(+), 549 deletions(-)
This is the start of the stable review cycle for the 4.4.244 release.
There are 64 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 19 Nov 2020 12:20:51 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.244-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.244-rc1
Boris Protopopov <pboris(a)amazon.com>
Convert trailing spaces and periods in path components
Eric Biggers <ebiggers(a)google.com>
ext4: fix leaking sysfs kobject after failed mount
Matteo Croce <mcroce(a)microsoft.com>
reboot: fix overflow parsing reboot cpu number
Matteo Croce <mcroce(a)microsoft.com>
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
Jiri Olsa <jolsa(a)redhat.com>
perf/core: Fix race in the perf_mmap_close() function
Juergen Gross <jgross(a)suse.com>
xen/events: block rogue events for some time
Juergen Gross <jgross(a)suse.com>
xen/events: defer eoi in case of excessive number of events
Juergen Gross <jgross(a)suse.com>
xen/events: use a common cpu hotplug hook for event channels
Juergen Gross <jgross(a)suse.com>
xen/events: switch user event channels to lateeoi model
Juergen Gross <jgross(a)suse.com>
xen/pciback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/scsiback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/netback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/blkback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/events: add a new "late EOI" evtchn framework
Juergen Gross <jgross(a)suse.com>
xen/events: fix race in evtchn_fifo_unmask()
Juergen Gross <jgross(a)suse.com>
xen/events: add a proper barrier to 2-level uevent unmasking
Juergen Gross <jgross(a)suse.com>
xen/events: avoid removing an event channel while handling it
Anand K Mistry <amistry(a)google.com>
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
George Spelvin <lkml(a)sdf.org>
random32: make prandom_u32() output unpredictable
Mao Wenan <wenan.mao(a)linux.alibaba.com>
net: Update window_clamp if SOCK_RCVBUF is set
Martin Schiller <ms(a)dev.tdt.de>
net/x25: Fix null-ptr-deref in x25_connect
Ursula Braun <ubraun(a)linux.ibm.com>
net/af_iucv: fix null pointer dereference on shutdown
Oliver Herms <oliver.peter.herms(a)gmail.com>
IPv6: Set SIT tunnel hard_header_len to zero
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Coiby Xu <coiby.xu(a)gmail.com>
pinctrl: amd: fix incorrect way to disable debounce filter
Coiby Xu <coiby.xu(a)gmail.com>
pinctrl: amd: use higher precision for 512 RtcClk
Thomas Zimmermann <tzimmermann(a)suse.de>
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
Al Viro <viro(a)zeniv.linux.org.uk>
don't dump the threads that had been already exiting when zapped.
Wengang Wang <wen.gang.wang(a)oracle.com>
ocfs2: initialize ip_next_orphan
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: protect mei_cl_mtu from null dereference
Chris Brandt <chris.brandt(a)renesas.com>
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
Joseph Qi <joseph.qi(a)linux.alibaba.com>
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
Kaixu Xia <kaixuxia(a)tencent.com>
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
Peter Zijlstra <peterz(a)infradead.org>
perf: Fix get_recursion_context()
Wang Hai <wanghai38(a)huawei.com>
cosa: Add missing kfree in error path of cosa_write
Evan Nimmo <evan.nimmo(a)alliedtelesis.co.nz>
of/address: Fix of_node memory leak in of_dma_is_coherent
Christoph Hellwig <hch(a)lst.de>
xfs: fix a missing unlock on error in xfs_fs_map_blocks
Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
iommu/amd: Increase interrupt remapping table limit to 512 entries
Ye Bin <yebin10(a)huawei.com>
cfg80211: regulatory: Fix inconsistent format argument
Johannes Berg <johannes.berg(a)intel.com>
mac80211: always wind down STA state
Johannes Berg <johannes.berg(a)intel.com>
mac80211: fix use of skb payload instead of header
Evan Quan <evan.quan(a)amd.com>
drm/amdgpu: perform srbm soft reset always on SDMA resume
Bob Peterson <rpeterso(a)redhat.com>
gfs2: check for live vs. read-only file system in gfs2_fitrim
Bob Peterson <rpeterso(a)redhat.com>
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
Evgeny Novikov <novikov(a)ispras.ru>
usb: gadget: goku_udc: fix potential crashes in probe
Masashi Honma <masashi.honma(a)gmail.com>
ath9k_htc: Use appropriate rs_datalen type
Mark Gray <mark.d.gray(a)redhat.com>
geneve: add transport ports in route lookup for geneve
Martyna Szapar <martyna.szapar(a)intel.com>
i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
Grzegorz Siwik <grzegorz.siwik(a)intel.com>
i40e: Wrong truncation from u16 to u8
Will Deacon <will(a)kernel.org>
pinctrl: devicetree: Avoid taking direct reference to device name string
Filipe Manana <fdmanana(a)suse.com>
Btrfs: fix missing error return if writeback for extent buffer never started
Stephane Grosjean <s.grosjean(a)peak-system.com>
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
Dan Carpenter <dan.carpenter(a)oracle.com>
can: peak_usb: add range checking in decode operations
Oleksij Rempel <o.rempel(a)pengutronix.de>
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
Oliver Hartkopp <socketcan(a)hartkopp.net>
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
Dan Carpenter <dan.carpenter(a)oracle.com>
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
Jiri Olsa <jolsa(a)kernel.org>
perf tools: Add missing swap for ino_generation
zhuoliang zhang <zhuoliang.zhang(a)mediatek.com>
net: xfrm: fix a race condition during allocing spi
Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
btrfs: reschedule when cloning lots of extents
Zeng Tao <prime.zeng(a)hisilicon.com>
time: Prevent undefined behaviour in timespec64_to_ns()
Shijie Luo <luoshijie1(a)huawei.com>
mm: mempolicy: fix potential pte_unmap_unlock pte error
Alexander Aring <aahringo(a)redhat.com>
gfs2: Wake up when sd_glock_disposal becomes zero
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ring-buffer: Fix recursion protection transitions between interrupt context
-------------
Diffstat:
Documentation/kernel-parameters.txt | 8 +
Makefile | 4 +-
arch/x86/kernel/cpu/bugs.c | 52 ++-
drivers/block/xen-blkback/blkback.c | 22 +-
drivers/block/xen-blkback/xenbus.c | 5 +-
drivers/char/random.c | 2 -
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 +-
drivers/gpu/drm/gma500/psb_irq.c | 34 +-
drivers/iommu/amd_iommu_types.h | 6 +-
drivers/misc/mei/client.h | 4 +-
drivers/net/can/dev.c | 14 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++-
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 ++-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 +-
drivers/net/geneve.c | 36 +-
drivers/net/wan/cosa.c | 1 +
drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2 +-
drivers/net/xen-netback/common.h | 39 ++
drivers/net/xen-netback/interface.c | 59 ++-
drivers/net/xen-netback/netback.c | 17 +-
drivers/of/address.c | 4 +-
drivers/pinctrl/devicetree.c | 26 +-
drivers/pinctrl/pinctrl-amd.c | 6 +-
drivers/usb/class/cdc-acm.c | 9 +
drivers/usb/gadget/udc/goku_udc.c | 2 +-
drivers/xen/events/events_2l.c | 9 +-
drivers/xen/events/events_base.c | 444 ++++++++++++++++++--
drivers/xen/events/events_fifo.c | 102 ++---
drivers/xen/events/events_internal.h | 20 +-
drivers/xen/evtchn.c | 7 +-
drivers/xen/xen-pciback/pci_stub.c | 14 +-
drivers/xen/xen-pciback/pciback.h | 12 +-
drivers/xen/xen-pciback/pciback_ops.c | 48 ++-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 23 +-
fs/btrfs/extent_io.c | 4 +
fs/btrfs/ioctl.c | 2 +
fs/cifs/cifs_unicode.c | 8 +-
fs/ext4/inline.c | 1 +
fs/ext4/super.c | 5 +-
fs/gfs2/glock.c | 3 +-
fs/gfs2/rgrp.c | 5 +-
fs/ocfs2/super.c | 1 +
fs/xfs/xfs_pnfs.c | 2 +-
include/linux/can/skb.h | 20 +-
include/linux/prandom.h | 36 +-
include/linux/time64.h | 4 +
include/xen/events.h | 29 +-
kernel/events/core.c | 7 +-
kernel/events/internal.h | 2 +-
kernel/exit.c | 5 +-
kernel/reboot.c | 28 +-
kernel/time/timer.c | 7 -
kernel/trace/ring_buffer.c | 54 ++-
lib/random32.c | 463 +++++++++++++--------
lib/swiotlb.c | 6 +-
mm/mempolicy.c | 6 +-
net/ipv4/syncookies.c | 9 +-
net/ipv6/sit.c | 2 -
net/ipv6/syncookies.c | 10 +-
net/iucv/af_iucv.c | 3 +-
net/mac80211/sta_info.c | 18 +
net/mac80211/tx.c | 35 +-
net/wireless/reg.c | 2 +-
net/x25/af_x25.c | 2 +-
net/xfrm/xfrm_state.c | 8 +-
sound/hda/ext/hdac_ext_controller.c | 2 +
tools/perf/util/session.c | 1 +
68 files changed, 1431 insertions(+), 522 deletions(-)
DIR_INDEX has been introduced as a compat ext4 feature. That means that
even kernels / tools that don't understand the feature may modify the
filesystem. This works because for kernels not understanding indexed dir
format, internal htree nodes appear just as empty directory entries.
Index dir aware kernels then check the htree structure is still
consistent before using the data. This all worked reasonably well until
metadata checksums were introduced. The problem is that these
effectively made DIR_INDEX only ro-compatible because internal htree
nodes store checksums in a different place than normal directory blocks.
Thus any modification ignorant to DIR_INDEX (or just clearing
EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
and trigger kernel errors. So we have to be more careful when dealing
with indexed directories on filesystems with checksumming enabled.
1) We just disallow loading and directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it
means someone disabled DIR_INDEX on existing filesystem and didn't run
e2fsck), e2fsck can fix the problem, and we don't want to answer the
difficult question: "Should we rather corrupt the directory more or
should we ignore that DIR_INDEX feature is not set?"
2) When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree
information for reading but we refuse to add new entries to the
directory to avoid corrupting it more.
CC: stable(a)vger.kernel.org
Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/dir.c | 14 ++++++++------
fs/ext4/ext4.h | 5 ++++-
fs/ext4/inode.c | 13 +++++++++++++
fs/ext4/namei.c | 7 +++++++
4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 9f00fc0bf21d..cb9ea593b544 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -129,12 +129,14 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
if (err != ERR_BAD_DX_DIR) {
return err;
}
- /*
- * We don't set the inode dirty flag since it's not
- * critical that it get flushed back to the disk.
- */
- ext4_clear_inode_flag(file_inode(file),
- EXT4_INODE_INDEX);
+ /* Can we just clear INDEX flag to ignore htree information? */
+ if (!ext4_has_metadata_csum(sb)) {
+ /*
+ * We don't set the inode dirty flag since it's not
+ * critical that it gets flushed back to the disk.
+ */
+ ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
+ }
}
if (ext4_has_inline_data(inode)) {
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f8578caba40d..1fd6c1e2ce2a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2482,8 +2482,11 @@ void ext4_insert_dentry(struct inode *inode,
struct ext4_filename *fname);
static inline void ext4_update_dx_flag(struct inode *inode)
{
- if (!ext4_has_feature_dir_index(inode->i_sb))
+ if (!ext4_has_feature_dir_index(inode->i_sb)) {
+ /* ext4_iget() should have caught this... */
+ WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb));
ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
+ }
}
static const unsigned char ext4_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 629a25d999f0..d33135308c1b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4615,6 +4615,19 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
ret = -EFSCORRUPTED;
goto bad_inode;
}
+ /*
+ * If dir_index is not enabled but there's dir with INDEX flag set,
+ * we'd normally treat htree data as empty space. But with metadata
+ * checksumming that corrupts checksums so forbid that.
+ */
+ if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) &&
+ ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) {
+ ext4_error_inode(inode, function, line, 0,
+ "iget: Dir with htree data on filesystem "
+ "without dir_index feature.");
+ ret = -EFSCORRUPTED;
+ goto bad_inode;
+ }
ei->i_disksize = inode->i_size;
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1cb42d940784..deb9f7a02976 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2207,6 +2207,13 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
retval = ext4_dx_add_entry(handle, &fname, dir, inode);
if (!retval || (retval != ERR_BAD_DX_DIR))
goto out;
+ /* Can we just ignore htree data? */
+ if (ext4_has_metadata_csum(sb)) {
+ EXT4_ERROR_INODE(dir,
+ "Directory has corrupted htree index.");
+ retval = -EFSCORRUPTED;
+ goto out;
+ }
ext4_clear_inode_flag(dir, EXT4_INODE_INDEX);
dx_fallback++;
ext4_mark_inode_dirty(handle, dir);
--
2.16.4
An active ref_node always can be found in ctx->files_data, it's much
safer to get it this way instead of poking into files_data->ref_list.
Cc: stable(a)vger.kernel.org # v5.7+
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
fs/io_uring.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index b205c1df3f74..5cb194ca4fce 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -6974,9 +6974,7 @@ static int io_sqe_files_unregister(struct io_ring_ctx *ctx)
return -ENXIO;
spin_lock(&data->lock);
- if (!list_empty(&data->ref_list))
- ref_node = list_first_entry(&data->ref_list,
- struct fixed_file_ref_node, node);
+ ref_node = data->node;
spin_unlock(&data->lock);
if (ref_node)
percpu_ref_kill(&ref_node->refs);
--
2.24.0
Since commit 086d08725d34 ("remoteproc: create vdev subdevice with
specific dma memory pool"), every remoteproc has a DMA subdevice
("remoteprocX#vdevYbuffer") for each virtio device, which inherits
DMA capabilities from the corresponding platform device. This allowed
to associate different DMA pools with each vdev, and required from
virtio drivers to perform DMA operations with the parent device
(vdev->dev.parent) instead of grandparent (vdev->dev.parent->parent).
virtio_rpmsg_bus was already changed in the same merge cycle with
commit d999b622fcfb ("rpmsg: virtio: allocate buffer from parent"),
but virtio_console did not. In fact, operations using the grandparent
worked fine while the grandparent was the platform device, but since
commit c774ad010873 ("remoteproc: Fix and restore the parenting
hierarchy for vdev") this was changed, and now the grandparent device
is the remoteproc device without any DMA capabilities.
So, starting v5.8-rc1 the following warning is observed:
[ 2.483925] ------------[ cut here ]------------
[ 2.489148] WARNING: CPU: 3 PID: 101 at kernel/dma/mapping.c:427 0x80e7eee8
[ 2.489152] Modules linked in: virtio_console(+)
[ 2.503737] virtio_rpmsg_bus rpmsg_core
[ 2.508903]
[ 2.528898] <Other modules, stack and call trace here>
[ 2.913043]
[ 2.914907] ---[ end trace 93ac8746beab612c ]---
[ 2.920102] virtio-ports vport1p0: Error allocating inbufs
kernel/dma/mapping.c:427 is:
WARN_ON_ONCE(!dev->coherent_dma_mask);
obviously because the grandparent now is remoteproc dev without any
DMA caps:
[ 3.104943] Parent: remoteproc0#vdev1buffer, grandparent: remoteproc0
Fix this the same way as it was for virtio_rpmsg_bus, using just the
parent device (vdev->dev.parent, "remoteprocX#vdevYbuffer") for DMA
operations.
This also allows now to reserve DMA pools/buffers for rproc serial
via Device Tree.
Fixes: c774ad010873 ("remoteproc: Fix and restore the parenting hierarchy for vdev")
Cc: stable(a)vger.kernel.org # 5.1+
Signed-off-by: Alexander Lobakin <alobakin(a)pm.me>
---
drivers/char/virtio_console.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index a2da8f768b94..1836cc56e357 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -435,12 +435,12 @@ static struct port_buffer *alloc_buf(struct virtio_device *vdev, size_t buf_size
/*
* Allocate DMA memory from ancestor. When a virtio
* device is created by remoteproc, the DMA memory is
- * associated with the grandparent device:
- * vdev => rproc => platform-dev.
+ * associated with the parent device:
+ * virtioY => remoteprocX#vdevYbuffer.
*/
- if (!vdev->dev.parent || !vdev->dev.parent->parent)
+ buf->dev = vdev->dev.parent;
+ if (!buf->dev)
goto free_buf;
- buf->dev = vdev->dev.parent->parent;
/* Increase device refcnt to avoid freeing it */
get_device(buf->dev);
--
2.29.2
This is the start of the stable review cycle for the 4.14.207 release.
There are 85 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 19 Nov 2020 12:20:51 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.207-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.207-rc1
Boris Protopopov <pboris(a)amazon.com>
Convert trailing spaces and periods in path components
Matteo Croce <mcroce(a)microsoft.com>
reboot: fix overflow parsing reboot cpu number
Matteo Croce <mcroce(a)microsoft.com>
Revert "kernel/reboot.c: convert simple_strtoul to kstrtoint"
Jiri Olsa <jolsa(a)redhat.com>
perf/core: Fix race in the perf_mmap_close() function
Juergen Gross <jgross(a)suse.com>
xen/events: block rogue events for some time
Juergen Gross <jgross(a)suse.com>
xen/events: defer eoi in case of excessive number of events
Juergen Gross <jgross(a)suse.com>
xen/events: use a common cpu hotplug hook for event channels
Juergen Gross <jgross(a)suse.com>
xen/events: switch user event channels to lateeoi model
Juergen Gross <jgross(a)suse.com>
xen/pciback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/pvcallsback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/scsiback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/netback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/blkback: use lateeoi irq binding
Juergen Gross <jgross(a)suse.com>
xen/events: add a new "late EOI" evtchn framework
Juergen Gross <jgross(a)suse.com>
xen/events: fix race in evtchn_fifo_unmask()
Juergen Gross <jgross(a)suse.com>
xen/events: add a proper barrier to 2-level uevent unmasking
Juergen Gross <jgross(a)suse.com>
xen/events: avoid removing an event channel while handling it
kiyin(尹亮) <kiyin(a)tencent.com>
perf/core: Fix a memory leak in perf_event_parse_addr_filter()
Mathieu Poirier <mathieu.poirier(a)linaro.org>
perf/core: Fix crash when using HW tracing kernel filters
Song Liu <songliubraving(a)fb.com>
perf/core: Fix bad use of igrab()
Anand K Mistry <amistry(a)google.com>
x86/speculation: Allow IBPB to be conditionally enabled on CPUs with always-on STIBP
George Spelvin <lkml(a)sdf.org>
random32: make prandom_u32() output unpredictable
Mao Wenan <wenan.mao(a)linux.alibaba.com>
net: Update window_clamp if SOCK_RCVBUF is set
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix potential skb double free in an error path
Martin Willi <martin(a)strongswan.org>
vrf: Fix fast path output packet handling with async Netfilter rules
Martin Schiller <ms(a)dev.tdt.de>
net/x25: Fix null-ptr-deref in x25_connect
Ursula Braun <ubraun(a)linux.ibm.com>
net/af_iucv: fix null pointer dereference on shutdown
Oliver Herms <oliver.peter.herms(a)gmail.com>
IPv6: Set SIT tunnel hard_header_len to zero
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
swiotlb: fix "x86: Don't panic if can not alloc buffer for swiotlb"
Coiby Xu <coiby.xu(a)gmail.com>
pinctrl: amd: fix incorrect way to disable debounce filter
Coiby Xu <coiby.xu(a)gmail.com>
pinctrl: amd: use higher precision for 512 RtcClk
Thomas Zimmermann <tzimmermann(a)suse.de>
drm/gma500: Fix out-of-bounds access to struct drm_device.vblank[]
Al Viro <viro(a)zeniv.linux.org.uk>
don't dump the threads that had been already exiting when zapped.
Chen Zhou <chenzhou10(a)huawei.com>
selinux: Fix error return code in sel_ib_pkey_sid_slow()
Wengang Wang <wen.gang.wang(a)oracle.com>
ocfs2: initialize ip_next_orphan
Dan Carpenter <dan.carpenter(a)oracle.com>
futex: Don't enable IRQs unconditionally in put_pi_state()
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: protect mei_cl_mtu from null dereference
Chris Brandt <chris.brandt(a)renesas.com>
usb: cdc-acm: Add DISABLE_ECHO for Renesas USB Download mode
Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
uio: Fix use-after-free in uio_unregister_device()
Jing Xiangfeng <jingxiangfeng(a)huawei.com>
thunderbolt: Add the missed ida_simple_remove() in ring_request_msix()
Joseph Qi <joseph.qi(a)linux.alibaba.com>
ext4: unlock xattr_sem properly in ext4_inline_data_truncate()
Kaixu Xia <kaixuxia(a)tencent.com>
ext4: correctly report "not supported" for {usr,grp}jquota when !CONFIG_QUOTA
Peter Zijlstra <peterz(a)infradead.org>
perf: Fix get_recursion_context()
Wang Hai <wanghai38(a)huawei.com>
cosa: Add missing kfree in error path of cosa_write
Evan Nimmo <evan.nimmo(a)alliedtelesis.co.nz>
of/address: Fix of_node memory leak in of_dma_is_coherent
Christoph Hellwig <hch(a)lst.de>
xfs: fix a missing unlock on error in xfs_fs_map_blocks
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix rmap key and record comparison functions
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix flags argument to rmap lookup when converting shared file rmaps
Christoph Hellwig <hch(a)lst.de>
nbd: fix a block_device refcount leak in nbd_release
Billy Tsai <billy_tsai(a)aspeedtech.com>
pinctrl: aspeed: Fix GPI only function problem.
Andrew Jeffery <andrew(a)aj.id.au>
ARM: 9019/1: kprobes: Avoid fortify_panic() when copying optprobe template
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
pinctrl: intel: Set default bias in case no particular value given
Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
iommu/amd: Increase interrupt remapping table limit to 512 entries
Hannes Reinecke <hare(a)suse.de>
scsi: scsi_dh_alua: Avoid crash during alua_bus_detach()
Ye Bin <yebin10(a)huawei.com>
cfg80211: regulatory: Fix inconsistent format argument
Johannes Berg <johannes.berg(a)intel.com>
mac80211: always wind down STA state
Johannes Berg <johannes.berg(a)intel.com>
mac80211: fix use of skb payload instead of header
Evan Quan <evan.quan(a)amd.com>
drm/amdgpu: perform srbm soft reset always on SDMA resume
Keita Suzuki <keitasuzuki.park(a)sslab.ics.keio.ac.jp>
scsi: hpsa: Fix memory leak in hpsa_init_one()
Bob Peterson <rpeterso(a)redhat.com>
gfs2: check for live vs. read-only file system in gfs2_fitrim
Bob Peterson <rpeterso(a)redhat.com>
gfs2: Add missing truncate_inode_pages_final for sd_aspace
Bob Peterson <rpeterso(a)redhat.com>
gfs2: Free rd_bits later in gfs2_clear_rgrpd to fix use-after-free
Evgeny Novikov <novikov(a)ispras.ru>
usb: gadget: goku_udc: fix potential crashes in probe
Masashi Honma <masashi.honma(a)gmail.com>
ath9k_htc: Use appropriate rs_datalen type
Filipe Manana <fdmanana(a)suse.com>
Btrfs: fix missing error return if writeback for extent buffer never started
Brian Foster <bfoster(a)redhat.com>
xfs: flush new eof page on truncate to avoid post-eof corruption
Stephane Grosjean <s.grosjean(a)peak-system.com>
can: peak_canfd: pucan_handle_can_rx(): fix echo management when loopback is on
Stephane Grosjean <s.grosjean(a)peak-system.com>
can: peak_usb: peak_usb_get_ts_time(): fix timestamp wrapping
Dan Carpenter <dan.carpenter(a)oracle.com>
can: peak_usb: add range checking in decode operations
Oleksij Rempel <o.rempel(a)pengutronix.de>
can: can_create_echo_skb(): fix echo skb generation: always use skb_clone()
Oliver Hartkopp <socketcan(a)hartkopp.net>
can: dev: __can_get_echo_skb(): fix real payload length return value for RTR frames
Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
can: dev: can_get_echo_skb(): prevent call to kfree_skb() in hard IRQ context
Marc Kleine-Budde <mkl(a)pengutronix.de>
can: rx-offload: don't call kfree_skb() from IRQ context
Dan Carpenter <dan.carpenter(a)oracle.com>
ALSA: hda: prevent undefined shift in snd_hdac_ext_bus_get_link()
Jiri Olsa <jolsa(a)kernel.org>
perf tools: Add missing swap for ino_generation
zhuoliang zhang <zhuoliang.zhang(a)mediatek.com>
net: xfrm: fix a race condition during allocing spi
Olaf Hering <olaf(a)aepfle.de>
hv_balloon: disable warning when floor reached
Marc Zyngier <maz(a)kernel.org>
genirq: Let GENERIC_IRQ_IPI select IRQ_DOMAIN_HIERARCHY
Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
btrfs: reschedule when cloning lots of extents
Josef Bacik <josef(a)toxicpanda.com>
btrfs: sysfs: init devices outside of the chunk_mutex
Ming Lei <ming.lei(a)redhat.com>
nbd: don't update block size after device is started
Zeng Tao <prime.zeng(a)hisilicon.com>
time: Prevent undefined behaviour in timespec64_to_ns()
Shijie Luo <luoshijie1(a)huawei.com>
mm: mempolicy: fix potential pte_unmap_unlock pte error
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ring-buffer: Fix recursion protection transitions between interrupt context
Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
regulator: defer probe when trying to get voltage from unresolved supply
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 8 +
Makefile | 4 +-
arch/arm/include/asm/kprobes.h | 22 +-
arch/arm/probes/kprobes/opt-arm.c | 18 +-
arch/x86/events/intel/pt.c | 4 +-
arch/x86/kernel/cpu/bugs.c | 52 ++-
drivers/block/nbd.c | 10 +-
drivers/block/xen-blkback/blkback.c | 22 +-
drivers/block/xen-blkback/xenbus.c | 5 +-
drivers/char/random.c | 1 -
drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 27 +-
drivers/gpu/drm/gma500/psb_irq.c | 34 +-
drivers/hv/hv_balloon.c | 2 +-
drivers/iommu/amd_iommu_types.h | 6 +-
drivers/misc/mei/client.h | 4 +-
drivers/net/can/dev.c | 14 +-
drivers/net/can/peak_canfd/peak_canfd.c | 11 +-
drivers/net/can/rx-offload.c | 4 +-
drivers/net/can/usb/peak_usb/pcan_usb_core.c | 51 ++-
drivers/net/can/usb/peak_usb/pcan_usb_fd.c | 48 ++-
drivers/net/ethernet/realtek/r8169.c | 3 +-
drivers/net/vrf.c | 92 +++--
drivers/net/wan/cosa.c | 1 +
drivers/net/wireless/ath/ath9k/htc_drv_txrx.c | 2 +-
drivers/net/xen-netback/common.h | 15 +
drivers/net/xen-netback/interface.c | 61 +++-
drivers/net/xen-netback/netback.c | 11 +-
drivers/net/xen-netback/rx.c | 13 +-
drivers/of/address.c | 4 +-
drivers/pinctrl/aspeed/pinctrl-aspeed.c | 7 +-
drivers/pinctrl/intel/pinctrl-intel.c | 8 +
drivers/pinctrl/pinctrl-amd.c | 6 +-
drivers/regulator/core.c | 2 +
drivers/scsi/device_handler/scsi_dh_alua.c | 9 +-
drivers/scsi/hpsa.c | 4 +-
drivers/thunderbolt/nhi.c | 19 +-
drivers/uio/uio.c | 10 +-
drivers/usb/class/cdc-acm.c | 9 +
drivers/usb/gadget/udc/goku_udc.c | 2 +-
drivers/xen/events/events_2l.c | 9 +-
drivers/xen/events/events_base.c | 422 ++++++++++++++++++++--
drivers/xen/events/events_fifo.c | 83 ++---
drivers/xen/events/events_internal.h | 20 +-
drivers/xen/evtchn.c | 7 +-
drivers/xen/pvcalls-back.c | 76 ++--
drivers/xen/xen-pciback/pci_stub.c | 14 +-
drivers/xen/xen-pciback/pciback.h | 12 +-
drivers/xen/xen-pciback/pciback_ops.c | 48 ++-
drivers/xen/xen-pciback/xenbus.c | 2 +-
drivers/xen/xen-scsiback.c | 23 +-
fs/btrfs/extent_io.c | 4 +
fs/btrfs/ioctl.c | 2 +
fs/btrfs/volumes.c | 7 +-
fs/cifs/cifs_unicode.c | 8 +-
fs/ext4/inline.c | 1 +
fs/ext4/super.c | 4 +-
fs/gfs2/rgrp.c | 5 +-
fs/gfs2/super.c | 1 +
fs/ocfs2/super.c | 1 +
fs/xfs/libxfs/xfs_rmap.c | 2 +-
fs/xfs/libxfs/xfs_rmap_btree.c | 16 +-
fs/xfs/xfs_iops.c | 10 +
fs/xfs/xfs_pnfs.c | 2 +-
include/linux/can/skb.h | 20 +-
include/linux/perf_event.h | 2 +-
include/linux/prandom.h | 36 +-
include/linux/time64.h | 4 +
include/xen/events.h | 29 +-
kernel/events/core.c | 44 +--
kernel/events/internal.h | 2 +-
kernel/exit.c | 5 +-
kernel/futex.c | 5 +-
kernel/irq/Kconfig | 1 +
kernel/reboot.c | 28 +-
kernel/time/itimer.c | 4 -
kernel/time/timer.c | 7 -
kernel/trace/ring_buffer.c | 54 ++-
lib/random32.c | 462 +++++++++++++++---------
lib/swiotlb.c | 6 +-
mm/mempolicy.c | 6 +-
net/ipv4/syncookies.c | 9 +-
net/ipv6/sit.c | 2 -
net/ipv6/syncookies.c | 10 +-
net/iucv/af_iucv.c | 3 +-
net/mac80211/sta_info.c | 18 +
net/mac80211/tx.c | 35 +-
net/wireless/reg.c | 2 +-
net/x25/af_x25.c | 2 +-
net/xfrm/xfrm_state.c | 8 +-
security/selinux/ibpkey.c | 4 +-
sound/hda/ext/hdac_ext_controller.c | 2 +
tools/perf/util/session.c | 1 +
92 files changed, 1585 insertions(+), 630 deletions(-)
From: Eric Biggers <ebiggers(a)google.com>
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on f2fs by rejecting no-key dentries in f2fs_add_link().
Note that the weird check for the current task in f2fs_do_add_link()
seems to make this bug difficult to reproduce on f2fs.
Fixes: 9ea97163c6da ("f2fs crypto: add filename encryption for f2fs_add_link")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/f2fs/f2fs.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index cb700d797296..9a321c52face 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3251,6 +3251,8 @@ bool f2fs_empty_dir(struct inode *dir);
static inline int f2fs_add_link(struct dentry *dentry, struct inode *inode)
{
+ if (fscrypt_is_nokey_name(dentry))
+ return -ENOKEY;
return f2fs_do_add_link(d_inode(dentry->d_parent), &dentry->d_name,
inode, inode->i_ino, inode->i_mode);
}
--
2.29.2
From: Eric Biggers <ebiggers(a)google.com>
As described in "fscrypt: add fscrypt_is_nokey_name()", it's possible to
create a duplicate filename in an encrypted directory by creating a file
concurrently with adding the directory's encryption key.
Fix this bug on ext4 by rejecting no-key dentries in ext4_add_entry().
Note that the duplicate check in ext4_find_dest_de() sometimes prevented
this bug. However in many cases it didn't, since ext4_find_dest_de()
doesn't examine every dentry.
Fixes: 4461471107b7 ("ext4 crypto: enable filename encryption")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/ext4/namei.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 33509266f5a0..793fc7db9d28 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2195,6 +2195,9 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
if (!dentry->d_name.len)
return -EINVAL;
+ if (fscrypt_is_nokey_name(dentry))
+ return -ENOKEY;
+
#ifdef CONFIG_UNICODE
if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) &&
sb->s_encoding && utf8_validate(sb->s_encoding, &dentry->d_name))
--
2.29.2
From: Eric Biggers <ebiggers(a)google.com>
It's possible to create a duplicate filename in an encrypted directory
by creating a file concurrently with adding the encryption key.
Specifically, sys_open(O_CREAT) (or sys_mkdir(), sys_mknod(), or
sys_symlink()) can lookup the target filename while the directory's
encryption key hasn't been added yet, resulting in a negative no-key
dentry. The VFS then calls ->create() (or ->mkdir(), ->mknod(), or
->symlink()) because the dentry is negative. Normally, ->create() would
return -ENOKEY due to the directory's key being unavailable. However,
if the key was added between the dentry lookup and ->create(), then the
filesystem will go ahead and try to create the file.
If the target filename happens to already exist as a normal name (not a
no-key name), a duplicate filename may be added to the directory.
In order to fix this, we need to fix the filesystems to prevent
->create(), ->mkdir(), ->mknod(), and ->symlink() on no-key names.
(->rename() and ->link() need it too, but those are already handled
correctly by fscrypt_prepare_rename() and fscrypt_prepare_link().)
In preparation for this, add a helper function fscrypt_is_nokey_name()
that filesystems can use to do this check. Use this helper function for
the existing checks that fs/crypto/ does for rename and link.
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/crypto/hooks.c | 5 +++--
include/linux/fscrypt.h | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 37 insertions(+), 2 deletions(-)
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
index 20b0df47fe6a..061418be4b08 100644
--- a/fs/crypto/hooks.c
+++ b/fs/crypto/hooks.c
@@ -61,7 +61,7 @@ int __fscrypt_prepare_link(struct inode *inode, struct inode *dir,
return err;
/* ... in case we looked up no-key name before key was added */
- if (dentry->d_flags & DCACHE_NOKEY_NAME)
+ if (fscrypt_is_nokey_name(dentry))
return -ENOKEY;
if (!fscrypt_has_permitted_context(dir, inode))
@@ -86,7 +86,8 @@ int __fscrypt_prepare_rename(struct inode *old_dir, struct dentry *old_dentry,
return err;
/* ... in case we looked up no-key name(s) before key was added */
- if ((old_dentry->d_flags | new_dentry->d_flags) & DCACHE_NOKEY_NAME)
+ if (fscrypt_is_nokey_name(old_dentry) ||
+ fscrypt_is_nokey_name(new_dentry))
return -ENOKEY;
if (old_dir != new_dir) {
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
index a8f7a43f031b..8e1d31c959bf 100644
--- a/include/linux/fscrypt.h
+++ b/include/linux/fscrypt.h
@@ -111,6 +111,35 @@ static inline void fscrypt_handle_d_move(struct dentry *dentry)
dentry->d_flags &= ~DCACHE_NOKEY_NAME;
}
+/**
+ * fscrypt_is_nokey_name() - test whether a dentry is a no-key name
+ * @dentry: the dentry to check
+ *
+ * This returns true if the dentry is a no-key dentry. A no-key dentry is a
+ * dentry that was created in an encrypted directory that hasn't had its
+ * encryption key added yet. Such dentries may be either positive or negative.
+ *
+ * When a filesystem is asked to create a new filename in an encrypted directory
+ * and the new filename's dentry is a no-key dentry, it must fail the operation
+ * with ENOKEY. This includes ->create(), ->mkdir(), ->mknod(), ->symlink(),
+ * ->rename(), and ->link(). (However, ->rename() and ->link() are already
+ * handled by fscrypt_prepare_rename() and fscrypt_prepare_link().)
+ *
+ * This is necessary because creating a filename requires the directory's
+ * encryption key, but just checking for the key on the directory inode during
+ * the final filesystem operation doesn't guarantee that the key was available
+ * during the preceding dentry lookup. And the key must have already been
+ * available during the dentry lookup in order for it to have been checked
+ * whether the filename already exists in the directory and for the new file's
+ * dentry not to be invalidated due to it incorrectly having the no-key flag.
+ *
+ * Return: %true if the dentry is a no-key name
+ */
+static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
+{
+ return dentry->d_flags & DCACHE_NOKEY_NAME;
+}
+
/* crypto.c */
void fscrypt_enqueue_decrypt_work(struct work_struct *);
@@ -244,6 +273,11 @@ static inline void fscrypt_handle_d_move(struct dentry *dentry)
{
}
+static inline bool fscrypt_is_nokey_name(const struct dentry *dentry)
+{
+ return false;
+}
+
/* crypto.c */
static inline void fscrypt_enqueue_decrypt_work(struct work_struct *work)
{
--
2.29.2