A 5-level paging capable machine can have memory above 46-bit in the
physical address space. This memory is only addressable in the 5-level
paging mode: we don't have enough virtual address space to create direct
mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in
subsection_map_init().
Skip creating a memblock for such memory instead and notify user that
some memory is not addressable.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # v4.14
---
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
---
arch/x86/kernel/e820.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index c5399e80c59c..022fe1de8940 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -1307,7 +1307,14 @@ void __init e820__memblock_setup(void)
if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN)
continue;
- memblock_add(entry->addr, entry->size);
+ if (entry->addr >= MAXMEM || end >= MAXMEM)
+ pr_err_once("Some physical memory is not addressable in the paging mode.\n");
+
+ if (entry->addr >= MAXMEM)
+ continue;
+
+ end = min_t(u64, end, MAXMEM - 1);
+ memblock_add(entry->addr, end - entry->addr);
}
/* Throw away partial pages: */
--
2.26.2
GCC 10 appears to have changed -O2 in order to make compilation time
faster when using -flto, seemingly at the expense of performance, in
particular with regards to how the inliner works. Since -O3 these days
shouldn't have the same set of bugs as 10 years ago, this commit
defaults new kernel compiles to -O3 when using gcc >= 10.
Cc: linux-kbuild(a)vger.kernel.org
Cc: x86(a)kernel.org
Cc: stable(a)vger.kernel.org
Cc: hjl.tools(a)gmail.com
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Jakub Jelinek <jakub(a)redhat.com>
Cc: Oleksandr Natalenko <oleksandr(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: David Laight <David.Laight(a)aculab.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
Changes v1->v2:
- [Oleksandr] Remove O3 dependency on ARC.
init/Kconfig | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig
index 9e22ee8fbd75..f76ec3ccc883 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1245,7 +1245,8 @@ config BOOT_CONFIG
choice
prompt "Compiler optimization level"
- default CC_OPTIMIZE_FOR_PERFORMANCE
+ default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 100000
+ default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 100000 || CC_IS_CLANG)
config CC_OPTIMIZE_FOR_PERFORMANCE
bool "Optimize for performance (-O2)"
@@ -1256,7 +1257,6 @@ config CC_OPTIMIZE_FOR_PERFORMANCE
config CC_OPTIMIZE_FOR_PERFORMANCE_O3
bool "Optimize more for performance (-O3)"
- depends on ARC
imply CC_DISABLE_WARN_MAYBE_UNINITIALIZED # avoid false positives
help
Choosing this option will pass "-O3" to your compiler to optimize
--
2.26.2
This is the start of the stable review cycle for the 4.9.223 release.
There are 18 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 10 May 2020 12:29:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.223-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.223-rc1
Thomas Pedersen <thomas(a)adapt-ip.com>
mac80211: add ieee80211_is_any_nullfunc()
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Match both PCI ID and SSID for driver blacklist
Jere Leppänen <jere.leppanen(a)nokia.com>
sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
Marcin Nowakowski <marcin.nowakowski(a)imgtec.com>
MIPS: perf: Remove incorrect odd/even counter handling for I6400
Chuck Lever <chuck.lever(a)oracle.com>
xprtrdma: Fix backchannel allocation of extra rpcrdma_reps
Doug Berger <opendmb(a)gmail.com>
net: systemport: suppress warnings on failed Rx SKB allocations
Doug Berger <opendmb(a)gmail.com>
net: bcmgenet: suppress warnings on failed Rx SKB allocations
Nathan Chancellor <natechancellor(a)gmail.com>
lib/mpi: Fix building for powerpc with clang
Florian Fainelli <f.fainelli(a)gmail.com>
net: dsa: b53: Rework ARL bin logic
Jeremie Francois (on alpha) <jeremie.francois(a)gmail.com>
scripts/config: allow colons in option strings for sed
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: protect updating server->dstaddr with a spinlock
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: Fix sub-second increment
Xiyu Yang <xiyuyang19(a)fudan.edu.cn>
wimax/i2400m: Fix potential urb refcnt leak
Sebastian Reichel <sebastian.reichel(a)collabora.com>
ASoC: sgtl5000: Fix VAG power-on handling
Tyler Hicks <tyhicks(a)linux.microsoft.com>
selftests/ipc: Fix test failure seen after initial test run
YueHaibing <yuehaibing(a)huawei.com>
iio:ad7797: Use correct attribute_group
Alexey Kardashevskiy <aik(a)ozlabs.ru>
powerpc/pci/of: Parse unassigned resources
Jia He <justin.he(a)arm.com>
vhost: vsock: kick send_pkt worker once device is started
-------------
Diffstat:
Makefile | 4 +--
arch/mips/kernel/perf_event_mipsxx.c | 6 +++-
arch/powerpc/kernel/pci_of_scan.c | 12 ++++++--
drivers/iio/adc/ad7793.c | 2 +-
drivers/net/dsa/b53/b53_common.c | 30 ++++++++++++++++---
drivers/net/dsa/b53/b53_regs.h | 3 ++
drivers/net/ethernet/broadcom/bcmsysport.c | 3 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 3 +-
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 12 +++++---
drivers/net/wimax/i2400m/usb-fw.c | 1 +
drivers/vhost/vsock.c | 5 ++++
fs/cifs/connect.c | 2 ++
include/linux/ieee80211.h | 9 ++++++
lib/mpi/longlong.h | 34 +++++++++++-----------
net/mac80211/mlme.c | 2 +-
net/mac80211/rx.c | 8 ++---
net/mac80211/status.c | 5 ++--
net/mac80211/tx.c | 2 +-
net/sctp/sm_make_chunk.c | 6 +++-
net/sunrpc/xprtrdma/backchannel.c | 12 ++------
net/sunrpc/xprtrdma/verbs.c | 34 +++++++++++++---------
net/sunrpc/xprtrdma/xprt_rdma.h | 2 +-
scripts/config | 5 +++-
sound/pci/hda/hda_intel.c | 9 +++---
sound/soc/codecs/sgtl5000.c | 34 ++++++++++++++++++++++
sound/soc/codecs/sgtl5000.h | 1 +
tools/testing/selftests/ipc/msgque.c | 2 +-
27 files changed, 173 insertions(+), 75 deletions(-)
This is the start of the stable review cycle for the 4.14.180 release.
There are 22 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 10 May 2020 12:29:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.180-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.180-rc1
Jiri Slaby <jslaby(a)suse.cz>
cgroup, netclassid: remove double cond_resched
Thomas Pedersen <thomas(a)adapt-ip.com>
mac80211: add ieee80211_is_any_nullfunc()
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Match both PCI ID and SSID for driver blacklist
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Reverse the order of trace_types_lock and event_mutex
Jere Leppänen <jere.leppanen(a)nokia.com>
sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
Doug Berger <opendmb(a)gmail.com>
net: systemport: suppress warnings on failed Rx SKB allocations
Doug Berger <opendmb(a)gmail.com>
net: bcmgenet: suppress warnings on failed Rx SKB allocations
Nathan Chancellor <natechancellor(a)gmail.com>
lib/mpi: Fix building for powerpc with clang
Florian Fainelli <f.fainelli(a)gmail.com>
net: dsa: b53: Rework ARL bin logic
Jeremie Francois (on alpha) <jeremie.francois(a)gmail.com>
scripts/config: allow colons in option strings for sed
Philipp Rudo <prudo(a)linux.ibm.com>
s390/ftrace: fix potential crashes when switching tracers
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: protect updating server->dstaddr with a spinlock
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: Fix sub-second increment
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: fix enabling socfpga's ptp_ref_clock
Xiyu Yang <xiyuyang19(a)fudan.edu.cn>
wimax/i2400m: Fix potential urb refcnt leak
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: codecs: hdac_hdmi: Fix incorrect use of list_for_each_entry
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix HDMI channel mapping for multi-SSI mode
Sebastian Reichel <sebastian.reichel(a)collabora.com>
ASoC: sgtl5000: Fix VAG power-on handling
Tyler Hicks <tyhicks(a)linux.microsoft.com>
selftests/ipc: Fix test failure seen after initial test run
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check return value of pcm_new_ver
Alexey Kardashevskiy <aik(a)ozlabs.ru>
powerpc/pci/of: Parse unassigned resources
Jia He <justin.he(a)arm.com>
vhost: vsock: kick send_pkt worker once device is started
-------------
Diffstat:
Makefile | 4 +--
arch/powerpc/kernel/pci_of_scan.c | 12 ++++++--
arch/s390/kernel/diag.c | 2 +-
arch/s390/kernel/smp.c | 4 +--
arch/s390/kernel/trace.c | 2 +-
drivers/net/dsa/b53/b53_common.c | 30 ++++++++++++++++---
drivers/net/dsa/b53/b53_regs.h | 3 ++
drivers/net/ethernet/broadcom/bcmsysport.c | 3 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 3 +-
.../net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 9 ++++--
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 12 +++++---
drivers/net/wimax/i2400m/usb-fw.c | 1 +
drivers/vhost/vsock.c | 5 ++++
fs/cifs/connect.c | 2 ++
include/linux/ieee80211.h | 9 ++++++
kernel/trace/trace.c | 5 ++++
kernel/trace/trace_events.c | 31 ++++++++++----------
lib/mpi/longlong.h | 34 +++++++++++-----------
net/core/netclassid_cgroup.c | 4 +--
net/mac80211/mlme.c | 2 +-
net/mac80211/rx.c | 8 ++---
net/mac80211/status.c | 5 ++--
net/mac80211/tx.c | 2 +-
net/sctp/sm_make_chunk.c | 6 +++-
scripts/config | 5 +++-
sound/pci/hda/hda_intel.c | 9 +++---
sound/soc/codecs/hdac_hdmi.c | 6 ++--
sound/soc/codecs/sgtl5000.c | 34 ++++++++++++++++++++++
sound/soc/codecs/sgtl5000.h | 1 +
sound/soc/sh/rcar/ssiu.c | 2 +-
sound/soc/soc-topology.c | 4 ++-
tools/testing/selftests/ipc/msgque.c | 2 +-
32 files changed, 182 insertions(+), 79 deletions(-)
This is the start of the stable review cycle for the 4.19.122 release.
There are 32 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 10 May 2020 12:29:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.122-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.122-rc1
Daniel Vetter <daniel.vetter(a)ffwll.ch>
drm/atomic: Take the atomic toys away from X
Jiri Slaby <jslaby(a)suse.cz>
cgroup, netclassid: remove double cond_resched
Thomas Pedersen <thomas(a)adapt-ip.com>
mac80211: add ieee80211_is_any_nullfunc()
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: GPD pocket fan: Fix error message when temp-limits are out of range
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Match both PCI ID and SSID for driver blacklist
Nick Desaulniers <ndesaulniers(a)google.com>
hexagon: define ioremap_uc
Christoph Hellwig <hch(a)lst.de>
hexagon: clean up ioremap
Tuowen Zhao <ztuowen(a)gmail.com>
mfd: intel-lpss: Use devm_ioremap_uc for MMIO
Tuowen Zhao <ztuowen(a)gmail.com>
lib: devres: add a helper function for ioremap_uc
Aaron Ma <aaron.ma(a)canonical.com>
drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event
Jere Leppänen <jere.leppanen(a)nokia.com>
sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
Doug Berger <opendmb(a)gmail.com>
net: systemport: suppress warnings on failed Rx SKB allocations
Doug Berger <opendmb(a)gmail.com>
net: bcmgenet: suppress warnings on failed Rx SKB allocations
Nathan Chancellor <natechancellor(a)gmail.com>
lib/mpi: Fix building for powerpc with clang
Jeremie Francois (on alpha) <jeremie.francois(a)gmail.com>
scripts/config: allow colons in option strings for sed
Philipp Rudo <prudo(a)linux.ibm.com>
s390/ftrace: fix potential crashes when switching tracers
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: protect updating server->dstaddr with a spinlock
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix "status check failed" spam for multi-SSI
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Don't treat master SSI in multi SSI setup as parent
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: Fix sub-second increment
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: fix enabling socfpga's ptp_ref_clock
Xiyu Yang <xiyuyang19(a)fudan.edu.cn>
wimax/i2400m: Fix potential urb refcnt leak
Sandeep Raghuraman <sandy.8925(a)gmail.com>
drm/amdgpu: Correctly initialize thermal controller for GPUs with Powerplay table v0 (e.g Hawaii)
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: codecs: hdac_hdmi: Fix incorrect use of list_for_each_entry
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix HDMI channel mapping for multi-SSI mode
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix parent SSI start/stop in multi-SSI mode
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Properly set maxpacket limit
Sebastian Reichel <sebastian.reichel(a)collabora.com>
ASoC: sgtl5000: Fix VAG power-on handling
Tyler Hicks <tyhicks(a)linux.microsoft.com>
selftests/ipc: Fix test failure seen after initial test run
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check return value of pcm_new_ver
Alexey Kardashevskiy <aik(a)ozlabs.ru>
powerpc/pci/of: Parse unassigned resources
Jia He <justin.he(a)arm.com>
vhost: vsock: kick send_pkt worker once device is started
-------------
Diffstat:
Makefile | 4 +-
arch/hexagon/include/asm/io.h | 12 ++---
arch/hexagon/kernel/hexagon_ksyms.c | 2 +-
arch/hexagon/mm/ioremap.c | 2 +-
arch/powerpc/kernel/pci_of_scan.c | 12 ++++-
arch/s390/kernel/diag.c | 2 +-
arch/s390/kernel/smp.c | 4 +-
arch/s390/kernel/trace.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 3 +-
.../gpu/drm/amd/powerplay/hwmgr/processpptables.c | 26 +++++++++++
drivers/gpu/drm/drm_ioctl.c | 7 ++-
drivers/mfd/intel-lpss.c | 2 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 3 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 3 +-
.../net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 9 ++--
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 12 +++--
drivers/net/wimax/i2400m/usb-fw.c | 1 +
drivers/platform/x86/gpd-pocket-fan.c | 2 +-
drivers/usb/dwc3/core.h | 4 ++
drivers/usb/dwc3/gadget.c | 52 +++++++++++++++++-----
drivers/vhost/vsock.c | 5 +++
fs/cifs/connect.c | 2 +
include/linux/ieee80211.h | 9 ++++
include/linux/io.h | 2 +
lib/devres.c | 19 ++++++++
lib/mpi/longlong.h | 34 +++++++-------
net/core/netclassid_cgroup.c | 4 +-
net/mac80211/mlme.c | 2 +-
net/mac80211/rx.c | 8 ++--
net/mac80211/status.c | 5 +--
net/mac80211/tx.c | 2 +-
net/sctp/sm_make_chunk.c | 6 ++-
scripts/config | 5 ++-
sound/pci/hda/hda_intel.c | 9 ++--
sound/soc/codecs/hdac_hdmi.c | 6 +--
sound/soc/codecs/sgtl5000.c | 34 ++++++++++++++
sound/soc/codecs/sgtl5000.h | 1 +
sound/soc/sh/rcar/ssi.c | 11 ++++-
sound/soc/sh/rcar/ssiu.c | 2 +-
sound/soc/soc-topology.c | 4 +-
tools/testing/selftests/ipc/msgque.c | 2 +-
41 files changed, 250 insertions(+), 86 deletions(-)
Changes since v1:
- Rename memcpy_mcsafe() to copy_safe() since the x86-machine-check
specifics have already been de-emphasized in a previous commit and are
further de-emphasized by these changes. (Linus)
- Move copy_safe() out-of-line since it no longer reverts to plain
memcpy (Linus)
- Move copy_safe() to its own stand-alone compilation unit where it no
longer entangles with arch/x86/lib/memcpy_64.S. This also allows perf
to stop tracking ongoing updates to that file due to copy_safe()
updates. (Linus)
- Move the PowerPC implementation over to the new name.
[1]: http://lore.kernel.org/r/158654083112.1572482.8944305411228188871.stgit@dwi…
---
The primary motivation to go touch memcpy_mcsafe() is that the existing
benefit of doing slow and careful copies is obviated on newer CPUs. That
fact solves the problem of needing to detect machine-check recovery
capability. Now the old "mcsafe_key" opt-in to careful copying can be made
an opt-out from the default fast copy implementation.
The discussion with Linus further made clear that this facility had
already lost its x86-machine-check specificity starting with commit
2c89130a56a ("x86/asm/memcpy_mcsafe: Add write-protection-fault
handling"). The new changes to not require a "careful copy" further
de-emphasizes the role that x86-MCA plays in the implementation to just
one more source of recoverable trap during the operation.
With the above realizations the name "mcsafe" is no longer accurate and
copy_safe() is proposed as its replacement. x86 grows a copy_safe_fast()
implementation as a default implementation that is independent of
detecting the presence of x86-MCA.
---
Dan Williams (2):
copy_safe: Rename memcpy_mcsafe() to copy_safe()
x86/copy_safe: Introduce copy_safe_fast()
arch/powerpc/Kconfig | 2
arch/powerpc/include/asm/string.h | 2
arch/powerpc/include/asm/uaccess.h | 4
arch/powerpc/lib/Makefile | 2
arch/powerpc/lib/copy_safe.S | 4
arch/x86/Kconfig | 2
arch/x86/Kconfig.debug | 2
arch/x86/include/asm/copy_safe.h | 18 ++
arch/x86/include/asm/copy_safe_test.h | 75 +++++++++
arch/x86/include/asm/mcsafe_test.h | 75 ---------
arch/x86/include/asm/string_64.h | 32 ----
arch/x86/include/asm/uaccess_64.h | 21 ---
arch/x86/kernel/cpu/mce/core.c | 9 -
arch/x86/kernel/quirks.c | 10 -
arch/x86/lib/Makefile | 1
arch/x86/lib/copy_safe.c | 66 ++++++++
arch/x86/lib/copy_safe_64.S | 163 ++++++++++++++++++++
arch/x86/lib/memcpy_64.S | 115 --------------
arch/x86/lib/usercopy_64.c | 21 ---
drivers/md/dm-writecache.c | 12 +
drivers/nvdimm/claim.c | 2
drivers/nvdimm/pmem.c | 6 -
include/linux/string.h | 17 +-
include/linux/uio.h | 10 +
lib/Kconfig | 2
lib/iov_iter.c | 36 ++--
tools/arch/x86/include/asm/copy_safe_test.h | 13 ++
tools/arch/x86/include/asm/mcsafe_test.h | 13 --
tools/arch/x86/lib/memcpy_64.S | 115 --------------
tools/objtool/check.c | 5 -
tools/perf/bench/Build | 1
tools/perf/bench/mem-memcpy-x86-64-lib.c | 24 ---
tools/testing/nvdimm/test/nfit.c | 49 +++---
.../testing/selftests/powerpc/copyloops/.gitignore | 2
tools/testing/selftests/powerpc/copyloops/Makefile | 6 -
.../selftests/powerpc/copyloops/copy_safe.S | 0
36 files changed, 429 insertions(+), 508 deletions(-)
rename arch/powerpc/lib/{memcpy_mcsafe_64.S => copy_safe.S} (98%)
create mode 100644 arch/x86/include/asm/copy_safe.h
create mode 100644 arch/x86/include/asm/copy_safe_test.h
delete mode 100644 arch/x86/include/asm/mcsafe_test.h
create mode 100644 arch/x86/lib/copy_safe.c
create mode 100644 arch/x86/lib/copy_safe_64.S
create mode 100644 tools/arch/x86/include/asm/copy_safe_test.h
delete mode 100644 tools/arch/x86/include/asm/mcsafe_test.h
delete mode 100644 tools/perf/bench/mem-memcpy-x86-64-lib.c
rename tools/testing/selftests/powerpc/copyloops/{memcpy_mcsafe_64.S => copy_safe.S} (100%)
base-commit: b8dcd632c06b8706d22934f9bf9bf16a42b1ecc7
From: Yasunori Goto <y-goto(a)jp.fujitsu.com>
The root cause of panic is the num_pm of nfit_test1 is wrong.
Though 1 is specified for num_pm at nfit_test_init(), it must be 2,
because nfit_test1->spa_set[] array has 2 elements.
Since the array is smaller than expected, the driver breaks other area.
(it is often the link list of devres).
As a result, panic occurs like the following example.
CPU: 4 PID: 2233 Comm: lt-libndctl Tainted: G O 4.12.0-rc1+ #12
RIP: 0010:__list_del_entry_valid+0x6c/0xa0
Call Trace:
release_nodes+0x76/0x260
devres_release_all+0x3c/0x50
device_release_driver_internal+0x159/0x200
device_release_driver+0x12/0x20
bus_remove_device+0xfd/0x170
device_del+0x1e8/0x330
platform_device_del+0x28/0x90
platform_device_unregister+0x12/0x30
nfit_test_exit+0x2a/0x93b [nfit_test]
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Yasunori Goto <y-goto(a)jp.fujitsu.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
tools/testing/nvdimm/test/nfit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/nvdimm/test/nfit.c b/tools/testing/nvdimm/test/nfit.c
index 47ab1ab..ffdb7d0 100644
--- a/tools/testing/nvdimm/test/nfit.c
+++ b/tools/testing/nvdimm/test/nfit.c
@@ -2388,7 +2388,7 @@ static __init int nfit_test_init(void)
nfit_test->setup = nfit_test0_setup;
break;
case 1:
- nfit_test->num_pm = 1;
+ nfit_test->num_pm = 2;
nfit_test->dcr_idx = NUM_DCR;
nfit_test->num_dcr = 2;
nfit_test->alloc = nfit_test1_alloc;
--
1.8.3.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Revert back to comparing fb->format->format instead fb->format for the
page flip ioctl. This check was originally only here to disallow pixel
format changes, but when we changed it to do the pointer comparison
we potentially started to reject some (but definitely not all) modifier
changes as well. In fact the current behaviour depends on whether the
driver overrides the format info for a specific format+modifier combo.
Eg. on i915 this now rejects compression vs. no compression changes but
does not reject any other tiling changes. That's just inconsistent
nonsense.
The main reason we have to go back to the old behaviour is to fix page
flipping with Xorg. At some point Xorg got its atomic rights taken away
and since then we can't page flip between compressed and non-compressed
fbs on i915. Currently we get no page flipping for any games pretty much
since Mesa likes to use compressed buffers. Not sure how compositors are
working around this (don't use one myself). I guess they must be doing
something to get non-compressed buffers instead. Either that or
somehow no one noticed the tearing from the blit fallback.
Looking back at the original discussion on this change we pretty much
just did it in the name of skipping a few extra pointer dereferences.
However, I've decided not to revert the whole thing in case someone
has since started to depend on these changes. None of the other checks
are relevant for i915 anyways.
Cc: stable(a)vger.kernel.org
Cc: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Fixes: dbd4d5761e1f ("drm: Replace 'format->format' comparisons to just 'format' comparisons")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/drm_plane.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
index d6ad60ab0d38..f2ca5315f23b 100644
--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -1153,7 +1153,7 @@ int drm_mode_page_flip_ioctl(struct drm_device *dev,
if (ret)
goto out;
- if (old_fb->format != fb->format) {
+ if (old_fb->format->format != fb->format->format) {
DRM_DEBUG_KMS("Page flip is not allowed to change frame buffer format.\n");
ret = -EINVAL;
goto out;
--
2.24.1
Dear Masami,
Commit de462e5f10 (bootconfig: Fix to remove bootconfig data from initrd
while boot) causes a cosmetic regression on my x86 system with Debian
Sid/unstable.
Despite having no `bootconfig` parameter on the Linux CLI, the warning
below is shown.
'bootconfig' found on command line, but no bootconfig found
Reverting the commit fixes it.
Kind regards,
Paul
During ONFI detection, the CRC derived from the parameter page and the
CRC supposed to be at the end of the parameter page are compared. If
they do not match, the second then the third copies of the page are
tried.
The current implementation compares the newly derived CRC with the CRC
contained in the first page only. So if this particular CRC area has
been corrupted, then the detection will fail for a wrong reason.
Fix this issue by checking the derived CRC against the right one.
Fixes: 39138c1f4a31 ("mtd: rawnand: use bit-wise majority to recover the ONFI param page")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
drivers/mtd/nand/raw/nand_onfi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/nand_onfi.c b/drivers/mtd/nand/raw/nand_onfi.c
index 0b879bd0a68c..8fe8d7bdd203 100644
--- a/drivers/mtd/nand/raw/nand_onfi.c
+++ b/drivers/mtd/nand/raw/nand_onfi.c
@@ -173,7 +173,7 @@ int nand_onfi_detect(struct nand_chip *chip)
}
if (onfi_crc16(ONFI_CRC_BASE, (u8 *)&p[i], 254) ==
- le16_to_cpu(p->crc)) {
+ le16_to_cpu(p[i].crc)) {
if (i)
memcpy(p, &p[i], sizeof(*p));
break;
--
2.20.1
19.03.2020 00:31, Mikhail Novosyolov пишет:
> Current pre-release version of LibreSSL has enabled CMS support,
> and now sign-file is fully functional with it.
>
> See https://github.com/libressl-portable/openbsd/commits/master
>
> To test buildability with current LibreSSL:
> ~$ git clone https://github.com/libressl-portable/portable.git
> ~$ cd portable && ./autogen.sh
> ~$ ./configure --prefix=/opt/libressl
> ~$ make
> ~# make install
> Go to the kernel source tree and:
> ~$ gcc -I/opt/libressl/include -L /opt/libressl/lib -lcrypto -Wl,-rpath,/opt/libressl/lib scripts/sign-file.c -o scripts/sign-file
>
> Fixes: f8688017 ("sign-file: fix build error in sign-file.c with libressl")
>
> Signed-off-by: Mikhail Novosyolov <m.novosyolov(a)rosalinux.ru>
I would like to remember about this.
LibreSSL 3.1.1 has been released, and this patch (https://patchwork.kernel.org/patch/11446123/) is required to sign kernel modules using libressl.
Libressl 3.1.1 can sign them with functional parity with OpenSSL.
> ---
> scripts/sign-file.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/scripts/sign-file.c b/scripts/sign-file.c
> index fbd34b8e8f57..fd4d7c31d1bf 100644
> --- a/scripts/sign-file.c
> +++ b/scripts/sign-file.c
> @@ -41,9 +41,10 @@
> * signing with anything other than SHA1 - so we're stuck with that if such is
> * the case.
> */
> -#if defined(LIBRESSL_VERSION_NUMBER) || \
> - OPENSSL_VERSION_NUMBER < 0x10000000L || \
> - defined(OPENSSL_NO_CMS)
> +#if defined(OPENSSL_NO_CMS) || \
> + ( defined(LIBRESSL_VERSION_NUMBER) \
> + && (LIBRESSL_VERSION_NUMBER < 0x3010000fL) ) || \
> + OPENSSL_VERSION_NUMBER < 0x10000000L
> #define USE_PKCS7
> #endif
> #ifndef USE_PKCS7
In the latter models of RME Fireface series, device start to transfer
packets several dozens of milliseconds. On the other hand, ALSA fireface
driver starts IR context 2 milliseconds after the start. This results
in loss to handle incoming packets on the context.
This commit changes to start IR context immediately instead of
postponement. For Fireface 800, this affects nothing because the device
transfer packets 100 milliseconds or so after the start and this is
within wait timeout.
Cc: <stable(a)vger.kernel.org>
Fixes: acfedcbe1ce4 ("ALSA: firewire-lib: postpone to start IR context")
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
sound/firewire/fireface/ff-stream.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/sound/firewire/fireface/ff-stream.c b/sound/firewire/fireface/ff-stream.c
index 63b79c4a5405..5452115c0ef9 100644
--- a/sound/firewire/fireface/ff-stream.c
+++ b/sound/firewire/fireface/ff-stream.c
@@ -184,7 +184,6 @@ int snd_ff_stream_start_duplex(struct snd_ff *ff, unsigned int rate)
*/
if (!amdtp_stream_running(&ff->rx_stream)) {
int spd = fw_parent_device(ff->unit)->max_speed;
- unsigned int ir_delay_cycle;
err = ff->spec->protocol->begin_session(ff, rate);
if (err < 0)
@@ -200,14 +199,7 @@ int snd_ff_stream_start_duplex(struct snd_ff *ff, unsigned int rate)
if (err < 0)
goto error;
- // The device postpones start of transmission mostly for several
- // cycles after receiving packets firstly.
- if (ff->spec->protocol == &snd_ff_protocol_ff800)
- ir_delay_cycle = 800; // = 100 msec
- else
- ir_delay_cycle = 16; // = 2 msec
-
- err = amdtp_domain_start(&ff->domain, ir_delay_cycle);
+ err = amdtp_domain_start(&ff->domain, 0);
if (err < 0)
goto error;
--
2.25.1
128000 and 192000 are congruence modulo 32000, thus it's wrong to
distinguish them as multiple of 32000 and 48000 by modulo 32000 at
first.
Additionally, used condition statement to detect quadruple speed can
cause missing bit flag.
Furthermore, counter to ensure the configuration is wrong and it
causes false positive.
This commit fixes the above three bugs.
Cc: <stable(a)vger.kernel.org>
Fixes: 60aec494b389 ("ALSA: fireface: support allocate_resources operation in latter protocol")
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
sound/firewire/fireface/ff-protocol-latter.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/sound/firewire/fireface/ff-protocol-latter.c b/sound/firewire/fireface/ff-protocol-latter.c
index 0e4c3a9ed5e4..76ae568489ef 100644
--- a/sound/firewire/fireface/ff-protocol-latter.c
+++ b/sound/firewire/fireface/ff-protocol-latter.c
@@ -107,18 +107,18 @@ static int latter_allocate_resources(struct snd_ff *ff, unsigned int rate)
int err;
// Set the number of data blocks transferred in a second.
- if (rate % 32000 == 0)
- code = 0x00;
+ if (rate % 48000 == 0)
+ code = 0x04;
else if (rate % 44100 == 0)
code = 0x02;
- else if (rate % 48000 == 0)
- code = 0x04;
+ else if (rate % 32000 == 0)
+ code = 0x00;
else
return -EINVAL;
if (rate >= 64000 && rate < 128000)
code |= 0x08;
- else if (rate >= 128000 && rate < 192000)
+ else if (rate >= 128000)
code |= 0x10;
reg = cpu_to_le32(code);
@@ -140,7 +140,7 @@ static int latter_allocate_resources(struct snd_ff *ff, unsigned int rate)
if (curr_rate == rate)
break;
}
- if (count == 10)
+ if (count > 10)
return -ETIMEDOUT;
for (i = 0; i < ARRAY_SIZE(amdtp_rate_table); ++i) {
--
2.25.1
This is the start of the stable review cycle for the 5.6.12 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 10 May 2020 12:29:44 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.6.12-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.6.12-rc1
Will Deacon <will(a)kernel.org>
mm/mremap: Add comment explaining the untagging behaviour of mremap()
Jiri Slaby <jslaby(a)suse.cz>
cgroup, netclassid: remove double cond_resched
Thomas Pedersen <thomas(a)adapt-ip.com>
mac80211: add ieee80211_is_any_nullfunc()
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
ACPI: PM: s2idle: Fix comment in acpi_s2idle_prepare_late()
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: GPD pocket fan: Fix error message when temp-limits are out of range
Qian Cai <cai(a)lca.pw>
x86/kvm: fix a missing-prototypes "vmread_error"
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda: Match both PCI ID and SSID for driver blacklist
Aaron Ma <aaron.ma(a)canonical.com>
drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event
Jere Leppänen <jere.leppanen(a)nokia.com>
sctp: Fix SHUTDOWN CTSN Ack in the peer restart case
Andrii Nakryiko <andriin(a)fb.com>
tools/runqslower: Ensure own vmlinux.h is picked up first
Doug Berger <opendmb(a)gmail.com>
net: systemport: suppress warnings on failed Rx SKB allocations
Doug Berger <opendmb(a)gmail.com>
net: bcmgenet: suppress warnings on failed Rx SKB allocations
Madhuparna Bhowmik <madhuparnabhowmik10(a)gmail.com>
mac80211: sta_info: Add lockdep condition for RCU list usage
Nathan Chancellor <natechancellor(a)gmail.com>
lib/mpi: Fix building for powerpc with clang
Russell King <rmk+kernel(a)armlinux.org.uk>
net: phy: bcm84881: clear settings on link down
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct()
Vamshi K Sthambamkadi <vamshi.k.sthambamkadi(a)gmail.com>
tracing: Fix memory leaks in trace_events_hist.c
Paulo Alcantara <pc(a)cjr.nz>
cifs: do not share tcons with DFS
Jeremie Francois (on alpha) <jeremie.francois(a)gmail.com>
scripts/config: allow colons in option strings for sed
Ronnie Sahlberg <lsahlber(a)redhat.com>
cifs: protect updating server->dstaddr with a spinlock
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix "status check failed" spam for multi-SSI
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Don't treat master SSI in multi SSI setup as parent
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: Fix sub-second increment
Julien Beraud <julien.beraud(a)orolia.com>
net: stmmac: fix enabling socfpga's ptp_ref_clock
Xiyu Yang <xiyuyang19(a)fudan.edu.cn>
wimax/i2400m: Fix potential urb refcnt leak
Sandeep Raghuraman <sandy.8925(a)gmail.com>
drm/amdgpu: Correctly initialize thermal controller for GPUs with Powerplay table v0 (e.g Hawaii)
Prike Liang <Prike.Liang(a)amd.com>
drm/amd/powerplay: fix resume failed as smu table initialize early exit
Alex Elder <elder(a)linaro.org>
remoteproc: qcom_q6v5_mss: fix a bug in q6v5_probe()
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: codecs: hdac_hdmi: Fix incorrect use of list_for_each_entry
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix HDMI channel mapping for multi-SSI mode
Matthias Blankertz <matthias.blankertz(a)cetitec.com>
ASoC: rsnd: Fix parent SSI start/stop in multi-SSI mode
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Properly set maxpacket limit
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Fix endianness issue
Sebastian Reichel <sebastian.reichel(a)collabora.com>
ASoC: sgtl5000: Fix VAG power-on handling
Wu Bo <wubo40(a)huawei.com>
scsi: sg: add sg_remove_request in sg_write
Vasily Khoruzhick <anarsoul(a)gmail.com>
drm/bridge: anx6345: set correct BPC for display_info of connector
Tyler Hicks <tyhicks(a)linux.microsoft.com>
selftests/ipc: Fix test failure seen after initial test run
Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Revert "Kernel selftests: tpm2: check for tpm support"
Sandipan Das <sandipan(a)linux.ibm.com>
selftests: vm: Fix 64-bit test builds for powerpc64le
Sandipan Das <sandipan(a)linux.ibm.com>
selftests: vm: Do not override definition of ARCH
Yihao Wu <wuyihao(a)linux.alibaba.com>
SUNRPC/cache: Fix unsafe traverse caused double-free in cache_purge
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check return value of soc_tplg_dai_config
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check return value of pcm_new_ver
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check soc_tplg_add_route return value
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check return value of soc_tplg_*_create
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Check return value of soc_tplg_create_tlv
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ASoC: topology: Add missing memory checks
Marek Szyprowski <m.szyprowski(a)samsung.com>
drm/bridge: analogix_dp: Split bind() into probe() and real bind()
Jia He <justin.he(a)arm.com>
vhost: vsock: kick send_pkt worker once device is started
-------------
Diffstat:
Makefile | 4 +-
arch/x86/kvm/vmx/ops.h | 1 +
drivers/acpi/sleep.c | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 3 +-
.../gpu/drm/amd/powerplay/hwmgr/processpptables.c | 26 +++++
drivers/gpu/drm/amd/powerplay/renoir_ppt.c | 7 +-
drivers/gpu/drm/bridge/analogix/analogix-anx6345.c | 3 +
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 33 ++++--
drivers/gpu/drm/exynos/exynos_dp.c | 29 +++---
drivers/gpu/drm/rockchip/analogix_dp-rockchip.c | 36 ++++---
drivers/net/ethernet/broadcom/bcmsysport.c | 3 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 3 +-
.../net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 9 +-
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 12 ++-
drivers/net/phy/bcm84881.c | 6 +-
drivers/net/wimax/i2400m/usb-fw.c | 1 +
drivers/platform/x86/gpd-pocket-fan.c | 2 +-
drivers/remoteproc/qcom_q6v5_mss.c | 2 +-
drivers/scsi/sg.c | 4 +-
drivers/usb/dwc3/core.h | 4 +
drivers/usb/dwc3/gadget.c | 52 ++++++++--
drivers/vhost/vsock.c | 5 +
fs/cifs/connect.c | 6 ++
include/drm/bridge/analogix_dp.h | 5 +-
include/linux/ieee80211.h | 9 ++
kernel/trace/ftrace.c | 1 +
kernel/trace/trace_events_hist.c | 7 ++
lib/mpi/longlong.h | 34 +++---
mm/mremap.c | 10 ++
net/core/netclassid_cgroup.c | 4 +-
net/mac80211/mlme.c | 2 +-
net/mac80211/rx.c | 8 +-
net/mac80211/sta_info.c | 3 +-
net/mac80211/status.c | 5 +-
net/mac80211/tx.c | 2 +-
net/sctp/sm_make_chunk.c | 6 +-
net/sunrpc/cache.c | 5 +-
scripts/config | 5 +-
sound/pci/hda/hda_intel.c | 9 +-
sound/soc/codecs/hdac_hdmi.c | 6 +-
sound/soc/codecs/sgtl5000.c | 34 ++++++
sound/soc/codecs/sgtl5000.h | 1 +
sound/soc/sh/rcar/ssi.c | 11 +-
sound/soc/sh/rcar/ssiu.c | 2 +-
sound/soc/soc-topology.c | 115 ++++++++++++++++-----
tools/bpf/runqslower/Makefile | 2 +-
tools/testing/selftests/ipc/msgque.c | 2 +-
tools/testing/selftests/tpm2/test_smoke.sh | 13 +--
tools/testing/selftests/tpm2/test_space.sh | 9 +-
tools/testing/selftests/vm/Makefile | 4 +-
tools/testing/selftests/vm/run_vmtests | 2 +-
51 files changed, 402 insertions(+), 170 deletions(-)
Previously, the output format was programmed as part of the ioctl()
handler. However, this has two problems:
1) If there are multiple active streams with different output
formats, the hardware will use whichever format was set last
for both streams. Similarly, an ioctl() done in an inactive
context will wrongly affect other active contexts.
2) The registers are written while the device is not actively
streaming. To enable runtime PM tied to the streaming state,
all hardware access needs to be moved inside cedrus_device_run().
The call to cedrus_dst_format_set() is now placed just before the
codec-specific callback that programs the hardware.
Cc: <stable(a)vger.kernel.org>
Fixes: 50e761516f2b ("media: platform: Add Cedrus VPU decoder driver")
Suggested-by: Jernej Skrabec <jernej.skrabec(a)siol.net>
Suggested-by: Paul Kocialkowski <paul.kocialkowski(a)bootlin.com>
Signed-off-by: Samuel Holland <samuel(a)sholland.org>
Tested-by: Jernej Skrabec <jernej.skrabec(a)siol.net>
Reviewed-by: Jernej Skrabec <jernej.skrabec(a)siol.net>
Reviewed-by: Ezequiel Garcia <ezequiel(a)collabora.com>
---
v2: added patch
v3: collected tags
---
drivers/staging/media/sunxi/cedrus/cedrus_dec.c | 2 ++
drivers/staging/media/sunxi/cedrus/cedrus_video.c | 3 ---
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
index 4a2fc33a1d79..58c48e4fdfe9 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_dec.c
@@ -74,6 +74,8 @@ void cedrus_device_run(void *priv)
v4l2_m2m_buf_copy_metadata(run.src, run.dst, true);
+ cedrus_dst_format_set(dev, &ctx->dst_fmt);
+
dev->dec_ops[ctx->current_codec]->setup(ctx, &run);
/* Complete request(s) controls if needed. */
diff --git a/drivers/staging/media/sunxi/cedrus/cedrus_video.c b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
index 15cf1f10221b..ed3f511f066f 100644
--- a/drivers/staging/media/sunxi/cedrus/cedrus_video.c
+++ b/drivers/staging/media/sunxi/cedrus/cedrus_video.c
@@ -273,7 +273,6 @@ static int cedrus_s_fmt_vid_cap(struct file *file, void *priv,
struct v4l2_format *f)
{
struct cedrus_ctx *ctx = cedrus_file2ctx(file);
- struct cedrus_dev *dev = ctx->dev;
struct vb2_queue *vq;
int ret;
@@ -287,8 +286,6 @@ static int cedrus_s_fmt_vid_cap(struct file *file, void *priv,
ctx->dst_fmt = f->fmt.pix;
- cedrus_dst_format_set(dev, &ctx->dst_fmt);
-
return 0;
}
--
2.24.1
From: Sarthak Garg <sartgarg(a)codeaurora.org>
Consider the following stack trace
-001|raw_spin_lock_irqsave
-002|mmc_blk_cqe_complete_rq
-003|__blk_mq_complete_request(inline)
-003|blk_mq_complete_request(rq)
-004|mmc_cqe_timed_out(inline)
-004|mmc_mq_timed_out
mmc_mq_timed_out acquires the queue_lock for the first
time. The mmc_blk_cqe_complete_rq function also tries to acquire
the same queue lock resulting in recursive locking where the task
is spinning for the same lock which it has already acquired leading
to watchdog bark.
Fix this issue with the lock only for the required critical section.
Cc: <stable(a)vger.kernel.org> # v4.19+
Suggested-by: Sahitya Tummala <stummala(a)codeaurora.org>
Signed-off-by: Sarthak Garg <sartgarg(a)codeaurora.org>
---
drivers/mmc/core/queue.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 25bee3d..72bef39 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -107,7 +107,7 @@ static enum blk_eh_timer_return mmc_cqe_timed_out(struct request *req)
case MMC_ISSUE_DCMD:
if (host->cqe_ops->cqe_timeout(host, mrq, &recovery_needed)) {
if (recovery_needed)
- __mmc_cqe_recovery_notifier(mq);
+ mmc_cqe_recovery_notifier(mrq);
return BLK_EH_RESET_TIMER;
}
/* No timeout (XXX: huh? comment doesn't make much sense) */
@@ -131,12 +131,13 @@ static enum blk_eh_timer_return mmc_mq_timed_out(struct request *req,
spin_lock_irqsave(&mq->lock, flags);
- if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled)
+ if (mq->recovery_needed || !mq->use_cqe || host->hsq_enabled) {
ret = BLK_EH_RESET_TIMER;
- else
+ spin_unlock_irqrestore(&mq->lock, flags);
+ } else {
+ spin_unlock_irqrestore(&mq->lock, flags);
ret = mmc_cqe_timed_out(req);
-
- spin_unlock_irqrestore(&mq->lock, flags);
+ }
return ret;
}
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.6.11, v5.4.39, v4.19.121, v4.14.179, v4.9.222, v4.4.222.
v5.6.11: Build OK!
v5.4.39: Build failed! Errors:
fs/btrfs/scrub.c:3291:20: error: dereferencing pointer to incomplete type ‘struct btrfs_block_group’
fs/btrfs/scrub.c:3472:31: error: passing argument 7 of ‘scrub_stripe’ from incompatible pointer type [-Werror=incompatible-pointer-types]
v4.19.121: Build failed! Errors:
fs/btrfs/scrub.c:3289:20: error: dereferencing pointer to incomplete type ‘struct btrfs_block_group’
fs/btrfs/scrub.c:3470:31: error: passing argument 7 of ‘scrub_stripe’ from incompatible pointer type [-Werror=incompatible-pointer-types]
v4.14.179: Failed to apply! Possible dependencies:
32934280967d ("Btrfs: clean up scrub is_dev_replace parameter")
c83488afc5a7 ("btrfs: Remove fs_info from btrfs_inc_block_group_ro")
v4.9.222: Failed to apply! Possible dependencies:
0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
32934280967d ("Btrfs: clean up scrub is_dev_replace parameter")
5e00f1939f6e ("btrfs: convert btrfs_inc_block_group_ro to accept fs_info")
62d1f9fe97dd ("btrfs: remove trivial helper btrfs_find_tree_block")
c83488afc5a7 ("btrfs: Remove fs_info from btrfs_inc_block_group_ro")
cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
da17066c4047 ("btrfs: pull node/sector/stripe sizes out of root and into fs_info")
de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
fb456252d3d9 ("btrfs: root->fs_info cleanup, use fs_info->dev_root everywhere")
v4.4.222: Failed to apply! Possible dependencies:
0132761017e0 ("btrfs: fix string and comment grammatical issues and typos")
09cbfeaf1a5a ("mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros")
0b246afa62b0 ("btrfs: root->fs_info cleanup, add fs_info convenience variables")
0e749e54244e ("dax: increase granularity of dax_clear_blocks() operations")
32934280967d ("Btrfs: clean up scrub is_dev_replace parameter")
4420cfd3f51c ("staging: lustre: format properly all comment blocks for LNet core")
52db400fcd50 ("pmem, dax: clean up clear_pmem()")
5e00f1939f6e ("btrfs: convert btrfs_inc_block_group_ro to accept fs_info")
5fd88337d209 ("staging: lustre: fix all conditional comparison to zero in LNet layer")
b2e0d1625e19 ("dax: fix lifetime of in-kernel dax mappings with dax_map_atomic()")
bb7ab3b92e46 ("btrfs: Fix misspellings in comments.")
c83488afc5a7 ("btrfs: Remove fs_info from btrfs_inc_block_group_ro")
cf8cddd38bab ("btrfs: don't abuse REQ_OP_* flags for btrfs_map_block")
d1a5f2b4d8a1 ("block: use DAX for partition table reads")
de143792253e ("btrfs: struct btrfsic_state->root should be an fs_info")
e10624f8c097 ("pmem: fail io-requests to known bad blocks")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
If an operation's flag `needs_file` is set, the function
io_req_set_file() calls io_file_get() to obtain a `struct file*`.
This fails for `O_PATH` file descriptors, because io_file_get() calls
fget(), which rejects `O_PATH` file descriptors. To support `O_PATH`,
fdget_raw() must be used (like path_init() in `fs/namei.c` does).
This rejection causes io_req_set_file() to throw `-EBADF`. This
breaks the operations `openat`, `openat2` and `statx`, where `O_PATH`
file descriptors are commonly used.
This could be solved by adding support for `O_PATH` file descriptors
with another `io_op_def` flag, but since those three operations don't
need the `struct file*` but operate directly on the numeric file
descriptors, the best solution here is to simply remove `needs_file`
(and the accompanying flag `fd_non_reg`).
Signed-off-by: Max Kellermann <mk(a)cm4all.com>
Cc: stable(a)vger.kernel.org
---
fs/io_uring.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index a46de2cfc28e..d24f8e33323c 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -693,8 +693,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_file = 1,
},
[IORING_OP_OPENAT] = {
- .needs_file = 1,
- .fd_non_neg = 1,
.file_table = 1,
.needs_fs = 1,
},
@@ -708,8 +706,6 @@ static const struct io_op_def io_op_defs[] = {
},
[IORING_OP_STATX] = {
.needs_mm = 1,
- .needs_file = 1,
- .fd_non_neg = 1,
.needs_fs = 1,
},
[IORING_OP_READ] = {
@@ -739,8 +735,6 @@ static const struct io_op_def io_op_defs[] = {
.unbound_nonreg_file = 1,
},
[IORING_OP_OPENAT2] = {
- .needs_file = 1,
- .fd_non_neg = 1,
.file_table = 1,
.needs_fs = 1,
},
--
2.20.1
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 967dffffcc16 - mm/mremap: Add comment explaining the untagging behaviour of mremap()
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
🚧 ✅ trace: ftrace/tracer
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
ppc64le:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 2:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ❌ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ❌ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
🚧 ✅ trace: ftrace/tracer
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
Probable cause: Problem connecting to Beaker
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ Storage blktests
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
Probable cause: Problem connecting to Beaker
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ Usex - version 1.9-29
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
🚧 ⚡⚡⚡ trace: ftrace/tracer
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
Probable cause: Problem connecting to Beaker
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ Usex - version 1.9-29
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
🚧 ⚡⚡⚡ trace: ftrace/tracer
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
Probable cause: Problem connecting to Beaker
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ Storage blktests
x86_64:
Host 1:
✅ Boot test
✅ Storage SAN device stress - qedf driver
Host 2:
✅ Boot test
✅ Storage SAN device stress - mpt3sas_gen1
Host 3:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ IOMMU boot test
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 4:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ❌ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
🚧 ✅ trace: ftrace/tracer
Host 5:
⏱ Boot test
⏱ Storage SAN device stress - megaraid_sas
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
From: Marcos Paulo de Souza <mpdesouza(a)suse.com>
[PROBLEM]
Whenever a chown is executed, all capabilities of the file being touched are
lost. When doing incremental send with a file with capabilities, there is a
situation where the capability can be lost in the receiving side. The
sequence of actions bellow shows the problem:
$ mount /dev/sda fs1
$ mount /dev/sdb fs2
$ touch fs1/foo.bar
$ setcap cap_sys_nice+ep fs1/foo.bar
$ btrfs subvol snap -r fs1 fs1/snap_init
$ btrfs send fs1/snap_init | btrfs receive fs2
$ chgrp adm fs1/foo.bar
$ setcap cap_sys_nice+ep fs1/foo.bar
$ btrfs subvol snap -r fs1 fs1/snap_complete
$ btrfs subvol snap -r fs1 fs1/snap_incremental
$ btrfs send fs1/snap_complete | btrfs receive fs2
$ btrfs send -p fs1/snap_init fs1/snap_incremental | btrfs receive fs2
At this point, only a chown was emitted by "btrfs send" since only the group
was changed. This makes the cap_sys_nice capability to be dropped from
fs2/snap_incremental/foo.bar
[FIX]
Only emit capabilities after chown is emitted. The current code
first checks for xattrs that are new/changed, emits them, and later emit
the chown. Now, __process_new_xattr skips capabilities, letting only
finish_inode_if_needed to emit them, if they exist, for the inode being
processed.
This behavior was being worked around in "btrfs receive"
side by caching the capability and only applying it after chown. Now,
xattrs are only emmited _after_ chown, making that hack not needed
anymore.
Link: https://github.com/kdave/btrfs-progs/issues/202
Suggested-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com>
---
Changes from v1:
* Constify name var in send_capabilities function (suggested by Filipe)
* Changed btrfs_alloc_path -> alloc_path_for_send() in send_capabilities
(suggested by Filipe)
* Removed a redundant sentence in the commit message (suggested by Filipe)
* Added the Reviewed-By tag from Filipe
Changes from RFC:
* Explained about chown + drop capabilities problem in the commit message (suggested
by Filipe and David)
* Changed the commit message to show describe the fix (suggested by Filipe)
* Skip the xattr in __process_new_xattr if it's a capability, since it'll be
handled in finish_inode_if_needed now (suggested by Filipe).
* Created function send_capabilities to query if the inode has caps, and if
yes, emit them.
* Call send_capabilities in finish_inode_if_needed _after_ the needs_chown
check. (suggested by Filipe)
fs/btrfs/send.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 6b86841315be..2b378e32e7dc 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -23,6 +23,7 @@
#include "btrfs_inode.h"
#include "transaction.h"
#include "compression.h"
+#include "xattr.h"
/*
* Maximum number of references an extent can have in order for us to attempt to
@@ -4545,6 +4546,10 @@ static int __process_new_xattr(int num, struct btrfs_key *di_key,
struct fs_path *p;
struct posix_acl_xattr_header dummy_acl;
+ /* capabilities are emitted by finish_inode_if_needed */
+ if (!strncmp(name, XATTR_NAME_CAPS, name_len))
+ return 0;
+
p = fs_path_alloc();
if (!p)
return -ENOMEM;
@@ -5107,6 +5112,66 @@ static int send_extent_data(struct send_ctx *sctx,
return 0;
}
+/*
+ * Search for a capability xattr related to sctx->cur_ino. If the capability if
+ * found, call send_set_xattr function to emit it.
+ *
+ * Return %0 if there isn't a capability, or when the capability was emitted
+ * successfully, or < %0 if an error occurred.
+ */
+static int send_capabilities(struct send_ctx *sctx)
+{
+ struct fs_path *fspath = NULL;
+ struct btrfs_path *path;
+ struct btrfs_dir_item *di;
+ struct extent_buffer *leaf;
+ unsigned long data_ptr;
+ const char *name = XATTR_NAME_CAPS;
+ char *buf = NULL;
+ int buf_len;
+ int ret = 0;
+
+ path = alloc_path_for_send();
+ if (!path)
+ return -ENOMEM;
+
+ di = btrfs_lookup_xattr(NULL, sctx->send_root, path, sctx->cur_ino,
+ name, strlen(name), 0);
+ if (!di) {
+ /* there is no xattr for this inode */
+ goto out;
+ } else if (IS_ERR(di)) {
+ ret = PTR_ERR(di);
+ goto out;
+ }
+
+ leaf = path->nodes[0];
+ buf_len = btrfs_dir_data_len(leaf, di);
+
+ fspath = fs_path_alloc();
+ buf = kmalloc(buf_len, GFP_KERNEL);
+ if (!fspath || !buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ret = get_cur_path(sctx, sctx->cur_ino, sctx->cur_inode_gen, fspath);
+ if (ret < 0)
+ goto out;
+
+ data_ptr = (unsigned long)((char *)(di + 1) +
+ btrfs_dir_name_len(leaf, di));
+ read_extent_buffer(leaf, buf, data_ptr,
+ btrfs_dir_data_len(leaf, di));
+
+ ret = send_set_xattr(sctx, fspath, name, strlen(name), buf, buf_len);
+out:
+ kfree(buf);
+ fs_path_free(fspath);
+ btrfs_free_path(path);
+ return ret;
+}
+
static int clone_range(struct send_ctx *sctx,
struct clone_root *clone_root,
const u64 disk_byte,
@@ -6010,6 +6075,10 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
goto out;
}
+ ret = send_capabilities(sctx);
+ if (ret < 0)
+ goto out;
+
/*
* If other directory inodes depended on our current directory
* inode's move/rename, now do their move/rename operations.
--
2.25.1
Assume we have kmem configured and loaded:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory$
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
Assume we try to unload kmem. This force-unloading will work, even if
memory cannot get removed from the system.
[root@localhost ~]# rmmod kmem
[ 86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
...
[ 86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot
Now, we can reconfigure the namespace:
[root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
[ 131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
[ 131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
...
This fails as expected due to the busy memory resource, and the memory
cannot be used. However, the dax0.0 device is removed, and along its name.
The name of the memory resource now points at freed memory (name of the
device).
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
150000000-33fffffff : System RAM
We have to make sure to duplicate the string. While at it, remove the
superfluous setting of the name and fixup a stale comment.
Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
Cc: stable(a)vger.kernel.org # v5.3
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
drivers/dax/kmem.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 3d0a7e702c94..1e678bdf5aed 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -22,6 +22,7 @@ int dev_dax_kmem_probe(struct device *dev)
resource_size_t kmem_size;
resource_size_t kmem_end;
struct resource *new_res;
+ const char *new_res_name;
int numa_node;
int rc;
@@ -48,11 +49,16 @@ int dev_dax_kmem_probe(struct device *dev)
kmem_size &= ~(memory_block_size_bytes() - 1);
kmem_end = kmem_start + kmem_size;
- /* Region is permanently reserved. Hot-remove not yet implemented. */
- new_res = request_mem_region(kmem_start, kmem_size, dev_name(dev));
+ new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
+ if (!new_res_name)
+ return -ENOMEM;
+
+ /* Region is permanently reserved if hotremove fails. */
+ new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
if (!new_res) {
dev_warn(dev, "could not reserve region [%pa-%pa]\n",
&kmem_start, &kmem_end);
+ kfree(new_res_name);
return -EBUSY;
}
@@ -63,12 +69,12 @@ int dev_dax_kmem_probe(struct device *dev)
* unknown to us that will break add_memory() below.
*/
new_res->flags = IORESOURCE_SYSTEM_RAM;
- new_res->name = dev_name(dev);
rc = add_memory(numa_node, new_res->start, resource_size(new_res));
if (rc) {
release_resource(new_res);
kfree(new_res);
+ kfree(new_res_name);
return rc;
}
dev_dax->dax_kmem_res = new_res;
@@ -83,6 +89,7 @@ static int dev_dax_kmem_remove(struct device *dev)
struct resource *res = dev_dax->dax_kmem_res;
resource_size_t kmem_start = res->start;
resource_size_t kmem_size = resource_size(res);
+ const char *res_name = res->name;
int rc;
/*
@@ -102,6 +109,7 @@ static int dev_dax_kmem_remove(struct device *dev)
/* Release and free dax resources */
release_resource(res);
kfree(res);
+ kfree(res_name);
dev_dax->dax_kmem_res = NULL;
return 0;
--
2.25.4
The patch titled
Subject: device-dax: don't leak kernel memory to user space after unloading kmem
has been added to the -mm tree. Its filename is
device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/device-dax-dont-leak-kernel-memory…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/device-dax-dont-leak-kernel-memory…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: device-dax: don't leak kernel memory to user space after unloading kmem
Assume we have kmem configured and loaded:
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory$
140000000-1481fffff : namespace0.0
150000000-33fffffff : dax0.0
150000000-33fffffff : System RAM
Assume we try to unload kmem. This force-unloading will work, even if
memory cannot get removed from the system.
[root@localhost ~]# rmmod kmem
[ 86.380228] removing memory fails, because memory [0x0000000150000000-0x0000000157ffffff] is onlined
...
[ 86.431225] kmem dax0.0: DAX region [mem 0x150000000-0x33fffffff] cannot be hotremoved until the next reboot
Now, we can reconfigure the namespace:
[root@localhost ~]# ndctl create-namespace --force --reconfig=namespace0.0 --mode=devdax
[ 131.409351] nd_pmem namespace0.0: could not reserve region [mem 0x140000000-0x33fffffff]dax
[ 131.410147] nd_pmem: probe of namespace0.0 failed with error -16namespace0.0 --mode=devdax
...
This fails as expected due to the busy memory resource, and the memory
cannot be used. However, the dax0.0 device is removed, and along its name.
The name of the memory resource now points at freed memory (name of the
device).
[root@localhost ~]# cat /proc/iomem
...
140000000-33fffffff : Persistent Memory
140000000-1481fffff : namespace0.0
150000000-33fffffff : �_�^7_��/_��wR��WQ���^��� ...
150000000-33fffffff : System RAM
We have to make sure to duplicate the string. While at it, remove the
superfluous setting of the name and fixup a stale comment.
Link: http://lkml.kernel.org/r/20200508084217.9160-2-david@redhat.com
Fixes: 9f960da72b25 ("device-dax: "Hotremove" persistent memory that is used like normal RAM")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: <stable(a)vger.kernel.org> [5.3]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dax/kmem.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/dax/kmem.c~device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem
+++ a/drivers/dax/kmem.c
@@ -22,6 +22,7 @@ int dev_dax_kmem_probe(struct device *de
resource_size_t kmem_size;
resource_size_t kmem_end;
struct resource *new_res;
+ const char *new_res_name;
int numa_node;
int rc;
@@ -48,11 +49,16 @@ int dev_dax_kmem_probe(struct device *de
kmem_size &= ~(memory_block_size_bytes() - 1);
kmem_end = kmem_start + kmem_size;
- /* Region is permanently reserved. Hot-remove not yet implemented. */
- new_res = request_mem_region(kmem_start, kmem_size, dev_name(dev));
+ new_res_name = kstrdup(dev_name(dev), GFP_KERNEL);
+ if (!new_res_name)
+ return -ENOMEM;
+
+ /* Region is permanently reserved if hotremove fails. */
+ new_res = request_mem_region(kmem_start, kmem_size, new_res_name);
if (!new_res) {
dev_warn(dev, "could not reserve region [%pa-%pa]\n",
&kmem_start, &kmem_end);
+ kfree(new_res_name);
return -EBUSY;
}
@@ -63,12 +69,12 @@ int dev_dax_kmem_probe(struct device *de
* unknown to us that will break add_memory() below.
*/
new_res->flags = IORESOURCE_SYSTEM_RAM;
- new_res->name = dev_name(dev);
rc = add_memory(numa_node, new_res->start, resource_size(new_res));
if (rc) {
release_resource(new_res);
kfree(new_res);
+ kfree(new_res_name);
return rc;
}
dev_dax->dax_kmem_res = new_res;
@@ -83,6 +89,7 @@ static int dev_dax_kmem_remove(struct de
struct resource *res = dev_dax->dax_kmem_res;
resource_size_t kmem_start = res->start;
resource_size_t kmem_size = resource_size(res);
+ const char *res_name = res->name;
int rc;
/*
@@ -102,6 +109,7 @@ static int dev_dax_kmem_remove(struct de
/* Release and free dax resources */
release_resource(res);
kfree(res);
+ kfree(res_name);
dev_dax->dax_kmem_res = NULL;
return 0;
_
Patches currently in -mm which might be from david(a)redhat.com are
device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch
Ich bin Jeff Lindsay, ein älterer Bürger aus Kalifornien, USA. Ich habe einen Jackpot von 447,8 Millionen Dollar gewonnen, der größte Lotterie-Jackpot. Im Namen meiner Familie und aus gutem Willen spenden wir Ihnen und Ihrer Familie einen Betrag von (€ 2.000.000,00 EUR). Ich versuche, die öffentlichen Waisenhäuser zu erreichen. Tragen Sie zur Armutsbekämpfung bei und sorgen Sie für eine angemessene Gesundheitsversorgung für Einzelpersonen, insbesondere während dieser Welt. Pandemic Covid 19. Ich möchte auch, dass Sie einen Teil dieser Spende in die öffentliche Infrastruktur investieren, um Arbeitslosen in Ihrem Land Arbeitsplätze zu bieten. Ich habe dich gewählt, weil ich an dich glaube. Ich brauche Ihre uneingeschränkte Mitarbeit in Bezug auf diese Spende. Bitte kontaktieren Sie mich hier zurück unter meiner privaten E-Mail: povertysolutionsorg(a)gmail.com
The patch titled
Subject: mm: limit boost_watermark on small zones
has been removed from the -mm tree. Its filename was
mm-limit-boost_watermark-on-small-zones.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Henry Willard <henry.willard(a)oracle.com>
Subject: mm: limit boost_watermark on small zones
Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external
fragmentation event occurs") adds a boost_watermark() function which
increases the min watermark in a zone by at least pageblock_nr_pages or
the number of pages in a page block. On Arm64, with 64K pages and 512M
huge pages, this is 8192 pages or 512M. It does this regardless of the
number of managed pages managed in the zone or the likelihood of success.
This can put the zone immediately under water in terms of allocating pages
from the zone, and can cause a small machine to fail immediately due to
OoM. Unlike set_recommended_min_free_kbytes(), which substantially
increases min_free_kbytes and is tied to THP, boost_watermark() can be
called even if THP is not active. The problem is most likely to appear
on architectures such as Arm64 where pageblock_nr_pages is very large.
It is desirable to run the kdump capture kernel in as small a space as
possible to avoid wasting memory. In some architectures, such as Arm64,
there are restrictions on where the capture kernel can run, and therefore,
the space available. A capture kernel running in 768M can fail due to OoM
immediately after boost_watermark() sets the min in zone DMA32, where
most of the memory is, to 512M. It fails even though there is over 500M of
free memory. With boost_watermark() suppressed, the capture kernel can run
successfully in 448M.
This patch limits boost_watermark() to boosting a zone's min watermark only
when there are enough pages that the boost will produce positive results.
In this case that is estimated to be four times as many pages as
pageblock_nr_pages.
Mel said:
: There is no harm in marking it stable. Clearly it does not happen very
: often but it's not impossible. 32-bit x86 is a lot less common now
: which would previously have been vulnerable to triggering this easily.
: ppc64 has a larger base page size but typically only has one zone.
: arm64 is likely the most vulnerable, particularly when CMA is
: configured with a small movable zone.
Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@ora…
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Henry Willard <henry.willard(a)oracle.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/page_alloc.c~mm-limit-boost_watermark-on-small-zones
+++ a/mm/page_alloc.c
@@ -2401,6 +2401,14 @@ static inline void boost_watermark(struc
if (!watermark_boost_factor)
return;
+ /*
+ * Don't bother in zones that are unlikely to produce results.
+ * On small machines, including kdump capture kernels running
+ * in a small area, boosting the watermark can cause an out of
+ * memory situation immediately.
+ */
+ if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
+ return;
max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
watermark_boost_factor, 10000);
_
Patches currently in -mm which might be from henry.willard(a)oracle.com are
The patch titled
Subject: epoll: atomically remove wait entry on wake up
has been removed from the -mm tree. Its filename was
epoll-atomically-remove-wait-entry-on-wake-up.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Penyaev <rpenyaev(a)suse.de>
Subject: epoll: atomically remove wait entry on wake up
This patch does two things:
1. fixes lost wakeup introduced by:
339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
2. improves performance for events delivery.
The description of the problem is the following: if N (>1) threads are
waiting on ep->wq for new events and M (>1) events come, it is quite
likely that >1 wakeups hit the same wait queue entry, because there is
quite a big window between __add_wait_queue_exclusive() and the following
__remove_wait_queue() calls in ep_poll() function. This can lead to lost
wakeups, because thread, which was woken up, can handle not all the events
in ->rdllist. (in better words the problem is described here:
https://lkml.org/lkml/2019/10/7/905)
The idea of the current patch is to use init_wait() instead of
init_waitqueue_entry(). Internally init_wait() sets
autoremove_wake_function as a callback, which removes the wait entry
atomically (under the wq locks) from the list, thus the next coming wakeup
hits the next wait entry in the wait queue, thus preventing lost wakeups.
Problem is very well reproduced by the epoll60 test case [1].
Wait entry removal on wakeup has also performance benefits, because there
is no need to take a ep->lock and remove wait entry from the queue after
the successful wakeup. Here is the timing output of the epoll60 test
case:
With explicit wakeup from ep_scan_ready_list() (the state of the
code prior 339ddb53d373):
real 0m6.970s
user 0m49.786s
sys 0m0.113s
After this patch:
real 0m5.220s
user 0m36.879s
sys 0m0.019s
The other testcase is the stress-epoll [2], where one thread consumes
all the events and other threads produce many events:
With explicit wakeup from ep_scan_ready_list() (the state of the
code prior 339ddb53d373):
threads events/ms run-time ms
8 5427 1474
16 6163 2596
32 6824 4689
64 7060 9064
128 6991 18309
After this patch:
threads events/ms run-time ms
8 5598 1429
16 7073 2262
32 7502 4265
64 7640 8376
128 7634 16767
(number of "events/ms" represents event bandwidth, thus higher is
better; number of "run-time ms" represents overall time spent
doing the benchmark, thus lower is better)
[1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
[2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev(a)suse.de>
Reviewed-by: Jason Baron <jbaron(a)akamai.com>
Cc: Khazhismel Kumykov <khazhy(a)google.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Heiher <r(a)hev.cc>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/eventpoll.c | 43 ++++++++++++++++++++++++-------------------
1 file changed, 24 insertions(+), 19 deletions(-)
--- a/fs/eventpoll.c~epoll-atomically-remove-wait-entry-on-wake-up
+++ a/fs/eventpoll.c
@@ -1822,7 +1822,6 @@ static int ep_poll(struct eventpoll *ep,
{
int res = 0, eavail, timed_out = 0;
u64 slack = 0;
- bool waiter = false;
wait_queue_entry_t wait;
ktime_t expires, *to = NULL;
@@ -1867,21 +1866,23 @@ fetch_events:
*/
ep_reset_busy_poll_napi_id(ep);
- /*
- * We don't have any available event to return to the caller. We need
- * to sleep here, and we will be woken by ep_poll_callback() when events
- * become available.
- */
- if (!waiter) {
- waiter = true;
- init_waitqueue_entry(&wait, current);
-
+ do {
+ /*
+ * Internally init_wait() uses autoremove_wake_function(),
+ * thus wait entry is removed from the wait queue on each
+ * wakeup. Why it is important? In case of several waiters
+ * each new wakeup will hit the next waiter, giving it the
+ * chance to harvest new event. Otherwise wakeup can be
+ * lost. This is also good performance-wise, because on
+ * normal wakeup path no need to call __remove_wait_queue()
+ * explicitly, thus ep->lock is not taken, which halts the
+ * event delivery.
+ */
+ init_wait(&wait);
write_lock_irq(&ep->lock);
__add_wait_queue_exclusive(&ep->wq, &wait);
write_unlock_irq(&ep->lock);
- }
- for (;;) {
/*
* We don't want to sleep if the ep_poll_callback() sends us
* a wakeup in between. That's why we set the task state
@@ -1911,10 +1912,20 @@ fetch_events:
timed_out = 1;
break;
}
- }
+
+ /* We were woken up, thus go and try to harvest some events */
+ eavail = 1;
+
+ } while (0);
__set_current_state(TASK_RUNNING);
+ if (!list_empty_careful(&wait.entry)) {
+ write_lock_irq(&ep->lock);
+ __remove_wait_queue(&ep->wq, &wait);
+ write_unlock_irq(&ep->lock);
+ }
+
send_events:
/*
* Try to transfer events to user space. In case we get 0 events and
@@ -1925,12 +1936,6 @@ send_events:
!(res = ep_send_events(ep, events, maxevents)) && !timed_out)
goto fetch_events;
- if (waiter) {
- write_lock_irq(&ep->lock);
- __remove_wait_queue(&ep->wq, &wait);
- write_unlock_irq(&ep->lock);
- }
-
return res;
}
_
Patches currently in -mm which might be from rpenyaev(a)suse.de are
epoll-call-final-ep_events_available-check-under-the-lock.patch
The patch titled
Subject: eventpoll: fix missing wakeup for ovflist in ep_poll_callback
has been removed from the -mm tree. Its filename was
eventpoll-fix-missing-wakeup-for-ovflist-in-ep_poll_callback.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Khazhismel Kumykov <khazhy(a)google.com>
Subject: eventpoll: fix missing wakeup for ovflist in ep_poll_callback
In the event that we add to ovflist, before 339ddb53d373 we would be woken
up by ep_scan_ready_list, and did no wakeup in ep_poll_callback. With
that wakeup removed, if we add to ovflist here, we may never wake up.
Rather than adding back the ep_scan_ready_list wakeup - which was
resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback.
We noticed that one of our workloads was missing wakeups starting with
339ddb53d373 and upon manual inspection, this wakeup seemed missing to me.
With this patch added, we no longer see missing wakeups. I haven't yet
tried to make a small reproducer, but the existing kselftests in
filesystem/epoll passed for me with this patch.
[khazhy(a)google.com: use if/elif instead of goto + cleanup suggested by Roman]
Link: http://lkml.kernel.org/r/20200424190039.192373-1-khazhy@google.com
Link: http://lkml.kernel.org/r/20200424025057.118641-1-khazhy@google.com
Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
Signed-off-by: Khazhismel Kumykov <khazhy(a)google.com>
Reviewed-by: Roman Penyaev <rpenyaev(a)suse.de>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Roman Penyaev <rpenyaev(a)suse.de>
Cc: Heiher <r(a)hev.cc>
Cc: Jason Baron <jbaron(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/eventpoll.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
--- a/fs/eventpoll.c~eventpoll-fix-missing-wakeup-for-ovflist-in-ep_poll_callback
+++ a/fs/eventpoll.c
@@ -1171,6 +1171,10 @@ static inline bool chain_epi_lockless(st
{
struct eventpoll *ep = epi->ep;
+ /* Fast preliminary check */
+ if (epi->next != EP_UNACTIVE_PTR)
+ return false;
+
/* Check that the same epi has not been just chained from another CPU */
if (cmpxchg(&epi->next, EP_UNACTIVE_PTR, NULL) != EP_UNACTIVE_PTR)
return false;
@@ -1237,16 +1241,12 @@ static int ep_poll_callback(wait_queue_e
* chained in ep->ovflist and requeued later on.
*/
if (READ_ONCE(ep->ovflist) != EP_UNACTIVE_PTR) {
- if (epi->next == EP_UNACTIVE_PTR &&
- chain_epi_lockless(epi))
+ if (chain_epi_lockless(epi))
+ ep_pm_stay_awake_rcu(epi);
+ } else if (!ep_is_linked(epi)) {
+ /* In the usual case, add event to ready list. */
+ if (list_add_tail_lockless(&epi->rdllink, &ep->rdllist))
ep_pm_stay_awake_rcu(epi);
- goto out_unlock;
- }
-
- /* If this file is already in the ready list we exit soon */
- if (!ep_is_linked(epi) &&
- list_add_tail_lockless(&epi->rdllink, &ep->rdllist)) {
- ep_pm_stay_awake_rcu(epi);
}
/*
_
Patches currently in -mm which might be from khazhy(a)google.com are
The patch titled
Subject: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected,
e.g., while booting up.
[ 105.608900] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 105.608933] Modules linked in:
[ 105.608933] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4
[ 105.608933] Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014
[ 105.608933] RIP: 0010:__pageblock_pfn_to_page+0x134/0x1c0
[ 105.608933] Code: 85 c0 74 71 4a 8b 04 d0 48 85 c0 74 68 48 01 c1 74 63 f6 01 04 74 5e 48 c1 e7 06 4c 8b 05 cc 991
[ 105.608933] RSP: 0000:ffffb6d94000fe60 EFLAGS: 00010286 ORIG_RAX: ffffffffffffff13
[ 105.608933] RAX: fffff81953250000 RBX: 000000000a4c9600 RCX: ffff8fe9ff7c1990
[ 105.608933] RDX: ffff8fe9ff7dab80 RSI: 000000000a4c95ff RDI: 0000000293250000
[ 105.608933] RBP: ffff8fe9ff7dab80 R08: fffff816c0000000 R09: 0000000000000008
[ 105.608933] R10: 0000000000000014 R11: 0000000000000014 R12: 0000000000000000
[ 105.608933] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 105.608933] FS: 0000000000000000(0000) GS:ffff8fe1ff400000(0000) knlGS:0000000000000000
[ 105.608933] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 105.608933] CR2: 000000000f613000 CR3: 00000088cf20a000 CR4: 00000000000006f0
[ 105.608933] Call Trace:
[ 105.608933] set_zone_contiguous+0x56/0x70
[ 105.608933] page_alloc_init_late+0x166/0x176
[ 105.608933] kernel_init_freeable+0xfa/0x255
[ 105.608933] ? rest_init+0xaa/0xaa
[ 105.608933] kernel_init+0xa/0x106
[ 105.608933] ret_from_fork+0x35/0x40
The issue becomes visible when having a lot of memory (e.g., 4TB) assigned
to a single NUMA node - a system that can easily be created using QEMU.
Inside VMs on a hypervisor with quite some memory overcommit, this is
fairly easy to trigger.
Link: http://lkml.kernel.org/r/20200416073417.5003-1-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Shile Zhang <shile.zhang(a)linux.alibaba.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Shile Zhang <shile.zhang(a)linux.alibaba.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Alexander Duyck <alexander.duyck(a)gmail.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/page_alloc.c~mm-page_alloc-fix-watchdog-soft-lockups-during-set_zone_contiguous
+++ a/mm/page_alloc.c
@@ -1607,6 +1607,7 @@ void set_zone_contiguous(struct zone *zo
if (!__pageblock_pfn_to_page(block_start_pfn,
block_end_pfn, zone))
return;
+ cond_resched();
}
/* We confirm that there is no hole */
_
Patches currently in -mm which might be from david(a)redhat.com are
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
powerpc-pseries-hotplug-memory-stop-checking-is_mem_section_removable.patch
mm-memory_hotplug-remove-is_mem_section_removable.patch
mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch
mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch
boot_aggregate is the first entry of IMA measurement list. Its purpose is
to link pre-boot measurements to IMA measurements. As IMA was designed to
work with a TPM 1.2, the SHA1 PCR bank was always selected even if a
TPM 2.0 with support for stronger hash algorithms is available.
This patch first tries to find a PCR bank with the IMA default hash
algorithm. If it does not find it, it selects the SHA256 PCR bank for
TPM 2.0 and SHA1 for TPM 1.2. Ultimately, it selects SHA1 also for TPM 2.0
if the SHA256 PCR bank is not found.
If none of the PCR banks above can be found, boot_aggregate file digest is
filled with zeros, as for TPM bypass, making it impossible to perform a
remote attestation of the system.
Changelog
v3:
- Remove option to select the first PCR bank and select SHA1 as fallback
choice also for TPM 2.0 (suggested by Mimi)
v1:
- add Mimi's comments
- if there is no PCR bank for the IMA default hash algorithm use SHA256
(suggested by James Bottomley)
Cc: stable(a)vger.kernel.org # 5.1.x
Fixes: 879b589210a9 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Reported-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
Suggested-by: James Bottomley <James.Bottomley(a)HansenPartnership.com>
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/ima/ima_crypto.c | 47 +++++++++++++++++++++++++----
security/integrity/ima/ima_init.c | 22 +++++++++++---
2 files changed, 58 insertions(+), 11 deletions(-)
diff --git a/security/integrity/ima/ima_crypto.c b/security/integrity/ima/ima_crypto.c
index 423c84f95a14..8e445a671225 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -655,18 +655,29 @@ static void __init ima_pcrread(u32 idx, struct tpm_digest *d)
}
/*
- * Calculate the boot aggregate hash
+ * The boot_aggregate is a cumulative hash over TPM registers 0 - 7. With
+ * TPM 1.2 the boot_aggregate was based on reading the SHA1 PCRs, but with
+ * TPM 2.0 hash agility, TPM chips could support multiple TPM PCR banks,
+ * allowing firmware to configure and enable different banks.
+ *
+ * Knowing which TPM bank is read to calculate the boot_aggregate digest
+ * needs to be conveyed to a verifier. For this reason, use the same
+ * hash algorithm for reading the TPM PCRs as for calculating the boot
+ * aggregate digest as stored in the measurement list.
*/
-static int __init ima_calc_boot_aggregate_tfm(char *digest,
+static int __init ima_calc_boot_aggregate_tfm(char *digest, u16 alg_id,
struct crypto_shash *tfm)
{
- struct tpm_digest d = { .alg_id = TPM_ALG_SHA1, .digest = {0} };
+ struct tpm_digest d = { .alg_id = alg_id, .digest = {0} };
int rc;
u32 i;
SHASH_DESC_ON_STACK(shash, tfm);
shash->tfm = tfm;
+ pr_devel("calculating the boot-aggregate based on TPM bank: %04x\n",
+ d.alg_id);
+
rc = crypto_shash_init(shash);
if (rc != 0)
return rc;
@@ -675,7 +686,8 @@ static int __init ima_calc_boot_aggregate_tfm(char *digest,
for (i = TPM_PCR0; i < TPM_PCR8; i++) {
ima_pcrread(i, &d);
/* now accumulate with current aggregate */
- rc = crypto_shash_update(shash, d.digest, TPM_DIGEST_SIZE);
+ rc = crypto_shash_update(shash, d.digest,
+ crypto_shash_digestsize(tfm));
}
if (!rc)
crypto_shash_final(shash, digest);
@@ -685,14 +697,37 @@ static int __init ima_calc_boot_aggregate_tfm(char *digest,
int __init ima_calc_boot_aggregate(struct ima_digest_data *hash)
{
struct crypto_shash *tfm;
- int rc;
+ u16 crypto_id, alg_id;
+ int rc, i, bank_idx = -1;
+
+ for (i = 0; i < ima_tpm_chip->nr_allocated_banks; i++) {
+ crypto_id = ima_tpm_chip->allocated_banks[i].crypto_id;
+ if (crypto_id == hash->algo) {
+ bank_idx = i;
+ break;
+ }
+
+ if (crypto_id == HASH_ALGO_SHA256)
+ bank_idx = i;
+
+ if (bank_idx == -1 && crypto_id == HASH_ALGO_SHA1)
+ bank_idx = i;
+ }
+
+ if (bank_idx == -1) {
+ pr_err("No suitable TPM algorithm for boot aggregate\n");
+ return 0;
+ }
+
+ hash->algo = ima_tpm_chip->allocated_banks[bank_idx].crypto_id;
tfm = ima_alloc_tfm(hash->algo);
if (IS_ERR(tfm))
return PTR_ERR(tfm);
hash->length = crypto_shash_digestsize(tfm);
- rc = ima_calc_boot_aggregate_tfm(hash->digest, tfm);
+ alg_id = ima_tpm_chip->allocated_banks[bank_idx].alg_id;
+ rc = ima_calc_boot_aggregate_tfm(hash->digest, alg_id, tfm);
ima_free_tfm(tfm);
diff --git a/security/integrity/ima/ima_init.c b/security/integrity/ima/ima_init.c
index 567468188a61..fc1e1002b48d 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -25,7 +25,7 @@ struct tpm_chip *ima_tpm_chip;
/* Add the boot aggregate to the IMA measurement list and extend
* the PCR register.
*
- * Calculate the boot aggregate, a SHA1 over tpm registers 0-7,
+ * Calculate the boot aggregate, a hash over tpm registers 0-7,
* assuming a TPM chip exists, and zeroes if the TPM chip does not
* exist. Add the boot aggregate measurement to the measurement
* list and extend the PCR register.
@@ -49,15 +49,27 @@ static int __init ima_add_boot_aggregate(void)
int violation = 0;
struct {
struct ima_digest_data hdr;
- char digest[TPM_DIGEST_SIZE];
+ char digest[TPM_MAX_DIGEST_SIZE];
} hash;
memset(iint, 0, sizeof(*iint));
memset(&hash, 0, sizeof(hash));
iint->ima_hash = &hash.hdr;
- iint->ima_hash->algo = HASH_ALGO_SHA1;
- iint->ima_hash->length = SHA1_DIGEST_SIZE;
-
+ iint->ima_hash->algo = ima_hash_algo;
+ iint->ima_hash->length = hash_digest_size[ima_hash_algo];
+
+ /*
+ * With TPM 2.0 hash agility, TPM chips could support multiple TPM
+ * PCR banks, allowing firmware to configure and enable different
+ * banks. The SHA1 bank is not necessarily enabled.
+ *
+ * Use the same hash algorithm for reading the TPM PCRs as for
+ * calculating the boot aggregate digest. Preference is given to
+ * the configured IMA default hash algorithm. Otherwise, use the
+ * TCG required banks - SHA256 for TPM 2.0, SHA1 for TPM 1.2.
+ * Ultimately select SHA1 also for TPM 2.0 if the SHA256 PCR bank
+ * is not found.
+ */
if (ima_tpm_chip) {
result = ima_calc_boot_aggregate(&hash.hdr);
if (result < 0) {
--
2.17.1
On 5/8/20 9:29 AM, Hillf Danton wrote:
> Dunno if what's missing makes grumpy.
>
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -3439,6 +3439,11 @@ static void io_close_finish(struct io_wq
> static int io_close(struct io_kiocb *req, bool force_nonblock)
> {
> int ret;
> + struct fd f;
> +
> + f = fdget(req->close.fd);
> + if (!f.file)
> + return -EBADF;
>
> req->close.put_file = NULL;
> ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
Can you expand? With the last patch posted, we don't do that fget/fdget
at all. __close_fd_get_file() will error out if we don't have a file
there. It does change the close error from -EBADF to -ENOENT, so maye we
just need to improve that?
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 979d9f977409..9fd1257c8404 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -786,7 +786,6 @@ static const struct io_op_def io_op_defs[] = {
.needs_fs = 1,
},
[IORING_OP_CLOSE] = {
- .needs_file = 1,
.file_table = 1,
},
[IORING_OP_FILES_UPDATE] = {
@@ -3399,10 +3398,6 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return -EBADF;
req->close.fd = READ_ONCE(sqe->fd);
- if (req->file->f_op == &io_uring_fops ||
- req->close.fd == req->ctx->ring_fd)
- return -EBADF;
-
return 0;
}
@@ -3434,8 +3429,11 @@ static int io_close(struct io_kiocb *req, bool force_nonblock)
req->close.put_file = NULL;
ret = __close_fd_get_file(req->close.fd, &req->close.put_file);
- if (ret < 0)
+ if (ret < 0) {
+ if (ret == -ENOENT)
+ ret = -EBADF;
return ret;
+ }
/* if the file has a flush method, be safe and punt to async */
if (req->close.put_file->f_op->flush && force_nonblock) {
--
Jens Axboe
after
commit 27d13da8782a ("w1: omap-hdq: Simplify driver with PM runtime autosuspend")
was applied, we did see timeouts and wrong values when
reading a bq27000 connected to hdq of the omap3. This occurred
mainly after boot and sometimes settled down after several reads
indicating ignored interrupts.
root@letux:~# time cat /sys/class/power_supply/bq27000-battery/uevent
POWER_SUPPLY_NAME=bq27000-battery
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_VOLTAGE_NOW=0
POWER_SUPPLY_CURRENT_NOW=0
POWER_SUPPLY_CAPACITY=0
POWER_SUPPLY_CAPACITY_LEVEL=Normal
POWER_SUPPLY_TEMP=-2731
POWER_SUPPLY_TIME_TO_EMPTY_NOW=0
POWER_SUPPLY_TIME_TO_EMPTY_AVG=0
POWER_SUPPLY_TIME_TO_FULL_NOW=0
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CHARGE_FULL=0
POWER_SUPPLY_CHARGE_NOW=0
POWER_SUPPLY_CHARGE_FULL_DESIGN=0
POWER_SUPPLY_CYCLE_COUNT=0
POWER_SUPPLY_ENERGY_NOW=0
POWER_SUPPLY_POWER_AVG=0
POWER_SUPPLY_HEALTH=Good
POWER_SUPPLY_MANUFACTURER=Texas Instruments
real 0m15.761s
user 0m0.001s
sys 0m0.025s
root@letux:~#
Sometimes the effect did disappear after trying multiple
times and speed went up and results became correct.
Enabling debugging revealed that there were tx and
rx timeouts, i.e. the driver does not always respond
properly to interrupts.
This patch improves interrupt handling to avoid races
and loss of interrupt flags.
The ideas are:
* only the hdq_isr() sets bits in hdq_status
* and does wake_up()
* bits are only reset by the read/write/break functions
if they were waited for
* rx/tx/timeout bits are completely decoupled from each
other (and not reset all after waiting for one of them)
* which bits to reset is specified by a new parameter
to hdq_reset_irqstatus()
* hdq_reset_irqstatus() also returns the state before
resetting so that we can encapsulate the spinlock
* this should now handle the case that the write and read
are both already finished quickly before the hdq_write_byte()
ends. Old code may have reset all status bits making
the next hdq_read_byte() timeout
* the spinlock protects the reset of bits in function
hdq_reset_irqstatus() which could be a read-write-modify
problem if the interrupt handler tries to read-modify-write
exactly at the same moment
* add mutex protection also for hdq_write_byte() just to
be safe not to disturb a hdq_read_byte() triggered by
some other thread/process.
This patch was tested on a gta04 and results in
root@letux:~# time cat /sys/class/power_supply/bq27000-battery/uevent
POWER_SUPPLY_NAME=bq27000-battery
POWER_SUPPLY_STATUS=Discharging
POWER_SUPPLY_PRESENT=1
POWER_SUPPLY_VOLTAGE_NOW=3970000
POWER_SUPPLY_CURRENT_NOW=354144
POWER_SUPPLY_CAPACITY=82
POWER_SUPPLY_CAPACITY_LEVEL=Normal
POWER_SUPPLY_TEMP=266
POWER_SUPPLY_TIME_TO_EMPTY_NOW=7680
POWER_SUPPLY_TIME_TO_EMPTY_AVG=7380
POWER_SUPPLY_TECHNOLOGY=Li-ion
POWER_SUPPLY_CHARGE_FULL=934856
POWER_SUPPLY_CHARGE_NOW=763976
POWER_SUPPLY_CHARGE_FULL_DESIGN=1233792
POWER_SUPPLY_CYCLE_COUNT=82
POWER_SUPPLY_ENERGY_NOW=2852840
POWER_SUPPLY_POWER_AVG=1392840
POWER_SUPPLY_HEALTH=Good
POWER_SUPPLY_MANUFACTURER=Texas Instruments
real 0m0.233s
user 0m0.000s
sys 0m0.025s
root@letux:~#
It was also tested with dev_dbg enabled and more
printk that all activities behave correctly, especially
hdq_write_byte(), hdq_read_byte(), omap_hdq_break().
Not tested is omap_w1_triplet().
Fixes: 27d13da8782a ("w1: omap-hdq: Simplify driver with PM runtime autosuspend")
Cc: stable(a)vger.kernel.org # v5.6+
Signed-off-by: H. Nikolaus Schaller <hns(a)goldelico.com>
---
drivers/w1/masters/omap_hdq.c | 56 ++++++++++++++++++++++++-----------
1 file changed, 38 insertions(+), 18 deletions(-)
diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c
index d363e2a89fdfc..384dad0615a26 100644
--- a/drivers/w1/masters/omap_hdq.c
+++ b/drivers/w1/masters/omap_hdq.c
@@ -54,10 +54,10 @@ MODULE_PARM_DESC(w1_id, "1-wire id for the slave detection in HDQ mode");
struct hdq_data {
struct device *dev;
void __iomem *hdq_base;
- /* lock status update */
+ /* lock read/write/status update */
struct mutex hdq_mutex;
+ /* interrupt status and lock */
u8 hdq_irqstatus;
- /* device lock */
spinlock_t hdq_spinlock;
/* mode: 0-HDQ 1-W1 */
int mode;
@@ -120,13 +120,18 @@ static int hdq_wait_for_flag(struct hdq_data *hdq_data, u32 offset,
}
/* Clear saved irqstatus after using an interrupt */
-static void hdq_reset_irqstatus(struct hdq_data *hdq_data)
+static u8 hdq_reset_irqstatus(struct hdq_data *hdq_data, u8 bits)
{
unsigned long irqflags;
+ u8 status;
spin_lock_irqsave(&hdq_data->hdq_spinlock, irqflags);
- hdq_data->hdq_irqstatus = 0;
+ status = hdq_data->hdq_irqstatus;
+ /* this is a read-modify-write */
+ hdq_data->hdq_irqstatus &= ~bits;
spin_unlock_irqrestore(&hdq_data->hdq_spinlock, irqflags);
+
+ return status;
}
/* write out a byte and fill *status with HDQ_INT_STATUS */
@@ -135,6 +140,12 @@ static int hdq_write_byte(struct hdq_data *hdq_data, u8 val, u8 *status)
int ret;
u8 tmp_status;
+ ret = mutex_lock_interruptible(&hdq_data->hdq_mutex);
+ if (ret < 0) {
+ ret = -EINTR;
+ goto rtn;
+ }
+
*status = 0;
hdq_reg_out(hdq_data, OMAP_HDQ_TX_DATA, val);
@@ -144,14 +155,15 @@ static int hdq_write_byte(struct hdq_data *hdq_data, u8 val, u8 *status)
OMAP_HDQ_CTRL_STATUS_DIR | OMAP_HDQ_CTRL_STATUS_GO);
/* wait for the TXCOMPLETE bit */
ret = wait_event_timeout(hdq_wait_queue,
- hdq_data->hdq_irqstatus, OMAP_HDQ_TIMEOUT);
+ (hdq_data->hdq_irqstatus & OMAP_HDQ_INT_STATUS_TXCOMPLETE),
+ OMAP_HDQ_TIMEOUT);
+ *status = hdq_reset_irqstatus(hdq_data, OMAP_HDQ_INT_STATUS_TXCOMPLETE);
if (ret == 0) {
dev_dbg(hdq_data->dev, "TX wait elapsed\n");
ret = -ETIMEDOUT;
goto out;
}
- *status = hdq_data->hdq_irqstatus;
/* check irqstatus */
if (!(*status & OMAP_HDQ_INT_STATUS_TXCOMPLETE)) {
dev_dbg(hdq_data->dev, "timeout waiting for"
@@ -170,7 +182,8 @@ static int hdq_write_byte(struct hdq_data *hdq_data, u8 val, u8 *status)
}
out:
- hdq_reset_irqstatus(hdq_data);
+ mutex_unlock(&hdq_data->hdq_mutex);
+rtn:
return ret;
}
@@ -181,7 +194,7 @@ static irqreturn_t hdq_isr(int irq, void *_hdq)
unsigned long irqflags;
spin_lock_irqsave(&hdq_data->hdq_spinlock, irqflags);
- hdq_data->hdq_irqstatus = hdq_reg_in(hdq_data, OMAP_HDQ_INT_STATUS);
+ hdq_data->hdq_irqstatus |= hdq_reg_in(hdq_data, OMAP_HDQ_INT_STATUS);
spin_unlock_irqrestore(&hdq_data->hdq_spinlock, irqflags);
dev_dbg(hdq_data->dev, "hdq_isr: %x\n", hdq_data->hdq_irqstatus);
@@ -238,14 +251,15 @@ static int omap_hdq_break(struct hdq_data *hdq_data)
/* wait for the TIMEOUT bit */
ret = wait_event_timeout(hdq_wait_queue,
- hdq_data->hdq_irqstatus, OMAP_HDQ_TIMEOUT);
+ (hdq_data->hdq_irqstatus & OMAP_HDQ_INT_STATUS_TIMEOUT),
+ OMAP_HDQ_TIMEOUT);
+ tmp_status = hdq_reset_irqstatus(hdq_data, OMAP_HDQ_INT_STATUS_TIMEOUT);
if (ret == 0) {
dev_dbg(hdq_data->dev, "break wait elapsed\n");
ret = -EINTR;
goto out;
}
- tmp_status = hdq_data->hdq_irqstatus;
/* check irqstatus */
if (!(tmp_status & OMAP_HDQ_INT_STATUS_TIMEOUT)) {
dev_dbg(hdq_data->dev, "timeout waiting for TIMEOUT, %x\n",
@@ -278,7 +292,6 @@ static int omap_hdq_break(struct hdq_data *hdq_data)
" return to zero, %x\n", tmp_status);
out:
- hdq_reset_irqstatus(hdq_data);
mutex_unlock(&hdq_data->hdq_mutex);
rtn:
return ret;
@@ -311,10 +324,11 @@ static int hdq_read_byte(struct hdq_data *hdq_data, u8 *val)
(hdq_data->hdq_irqstatus
& OMAP_HDQ_INT_STATUS_RXCOMPLETE),
OMAP_HDQ_TIMEOUT);
-
+ status = hdq_reset_irqstatus(hdq_data,
+ OMAP_HDQ_INT_STATUS_RXCOMPLETE);
hdq_reg_merge(hdq_data, OMAP_HDQ_CTRL_STATUS, 0,
OMAP_HDQ_CTRL_STATUS_DIR);
- status = hdq_data->hdq_irqstatus;
+
/* check irqstatus */
if (!(status & OMAP_HDQ_INT_STATUS_RXCOMPLETE)) {
dev_dbg(hdq_data->dev, "timeout waiting for"
@@ -322,11 +336,12 @@ static int hdq_read_byte(struct hdq_data *hdq_data, u8 *val)
ret = -ETIMEDOUT;
goto out;
}
+ } else { /* interrupt had occurred before hdq_read_byte was called */
+ hdq_reset_irqstatus(hdq_data, OMAP_HDQ_INT_STATUS_RXCOMPLETE);
}
/* the data is ready. Read it in! */
*val = hdq_reg_in(hdq_data, OMAP_HDQ_RX_DATA);
out:
- hdq_reset_irqstatus(hdq_data);
mutex_unlock(&hdq_data->hdq_mutex);
rtn:
return ret;
@@ -367,15 +382,15 @@ static u8 omap_w1_triplet(void *_hdq, u8 bdir)
(hdq_data->hdq_irqstatus
& OMAP_HDQ_INT_STATUS_RXCOMPLETE),
OMAP_HDQ_TIMEOUT);
+ /* Must clear irqstatus for another RXCOMPLETE interrupt */
+ hdq_reset_irqstatus(hdq_data, OMAP_HDQ_INT_STATUS_RXCOMPLETE);
+
if (err == 0) {
dev_dbg(hdq_data->dev, "RX wait elapsed\n");
goto out;
}
id_bit = (hdq_reg_in(_hdq, OMAP_HDQ_RX_DATA) & 0x01);
- /* Must clear irqstatus for another RXCOMPLETE interrupt */
- hdq_reset_irqstatus(hdq_data);
-
/* read comp_bit */
hdq_reg_merge(_hdq, OMAP_HDQ_CTRL_STATUS,
ctrl | OMAP_HDQ_CTRL_STATUS_DIR, mask);
@@ -383,6 +398,9 @@ static u8 omap_w1_triplet(void *_hdq, u8 bdir)
(hdq_data->hdq_irqstatus
& OMAP_HDQ_INT_STATUS_RXCOMPLETE),
OMAP_HDQ_TIMEOUT);
+ /* Must clear irqstatus for another RXCOMPLETE interrupt */
+ hdq_reset_irqstatus(hdq_data, OMAP_HDQ_INT_STATUS_RXCOMPLETE);
+
if (err == 0) {
dev_dbg(hdq_data->dev, "RX wait elapsed\n");
goto out;
@@ -409,6 +427,9 @@ static u8 omap_w1_triplet(void *_hdq, u8 bdir)
(hdq_data->hdq_irqstatus
& OMAP_HDQ_INT_STATUS_TXCOMPLETE),
OMAP_HDQ_TIMEOUT);
+ /* Must clear irqstatus for another TXCOMPLETE interrupt */
+ hdq_reset_irqstatus(hdq_data, OMAP_HDQ_INT_STATUS_TXCOMPLETE);
+
if (err == 0) {
dev_dbg(hdq_data->dev, "TX wait elapsed\n");
goto out;
@@ -418,7 +439,6 @@ static u8 omap_w1_triplet(void *_hdq, u8 bdir)
OMAP_HDQ_CTRL_STATUS_SINGLE);
out:
- hdq_reset_irqstatus(hdq_data);
mutex_unlock(&hdq_data->hdq_mutex);
rtn:
pm_runtime_mark_last_busy(hdq_data->dev);
--
2.26.2
The -modesetting ddx has a totally broken idea of how atomic works:
- doesn't disable old connectors, assuming they get auto-disable like
with the legacy setcrtc
- assumes ASYNC_FLIP is wired through for the atomic ioctl
- not a single call to TEST_ONLY
Iow the implementation is a 1:1 translation of legacy ioctls to
atomic, which is a) broken b) pointless.
We already have bugs in both i915 and amdgpu-DC where this prevents us
from enabling neat features.
If anyone ever cares about atomic in X we can easily add a new atomic
level (req->value == 2) for X to get back the shiny toys.
Since these broken versions of -modesetting have been shipping,
there's really no other way to get out of this bind.
References: https://gitlab.freedesktop.org/xorg/xserver/issues/629
References: https://gitlab.freedesktop.org/xorg/xserver/merge_requests/180
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Michel Dänzer <michel(a)daenzer.net>
Cc: Alex Deucher <alexdeucher(a)gmail.com>
Cc: Adam Jackson <ajax(a)redhat.com>
Cc: Sean Paul <sean(a)poorly.run>
Cc: David Airlie <airlied(a)linux.ie>
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_ioctl.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index 2c120c58f72d..1cb7b4c3c87c 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -334,6 +334,9 @@ drm_setclientcap(struct drm_device *dev, void *data, struct drm_file *file_priv)
file_priv->universal_planes = req->value;
break;
case DRM_CLIENT_CAP_ATOMIC:
+ /* The modesetting DDX has a totally broken idea of atomic. */
+ if (strstr(current->comm, "X"))
+ return -EOPNOTSUPP;
if (!drm_core_check_feature(dev, DRIVER_ATOMIC))
return -EOPNOTSUPP;
if (req->value > 1)
--
2.23.0
On Tue, 29 Nov 2016 at 00:00, Johan Hovold <johan(a)kernel.org> wrote:
>
> Make sure to deregister and free any fixed-link PHY registered using
> of_phy_register_fixed_link() on probe errors and on driver unbind.
>
> Fixes: 83895bedeee6 ("net: mvneta: add support for fixed links")
> Signed-off-by: Johan Hovold <johan(a)kernel.org>
> ---
> drivers/net/ethernet/marvell/mvneta.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 0c0a45af950f..707bc4680b9b 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -4191,6 +4191,8 @@ static int mvneta_probe(struct platform_device *pdev)
> clk_disable_unprepare(pp->clk);
> err_put_phy_node:
> of_node_put(phy_node);
> + if (of_phy_is_fixed_link(dn))
> + of_phy_deregister_fixed_link(dn);
While building kernel Image for arm architecture on stable-rc 4.4 branch
the following build error found.
drivers/net/ethernet/marvell/mvneta.c:3442:3: error: implicit
declaration of function 'of_phy_deregister_fixed_link'; did you mean
'of_phy_register_fixed_link'? [-Werror=implicit-function-declaration]
| of_phy_deregister_fixed_link(dn);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
| of_phy_register_fixed_link
ref:
https://gitlab.com/Linaro/lkft/kernel-runs/-/jobs/541374729
- Naresh
From: Xing Li <lixing(a)loongson.cn>
If a CPU support more than 32bit vmbits (which is true for 64bit CPUs),
VPN2_MASK set to fixed 0xffffe000 will lead to a wrong EntryHi in some
functions such as _kvm_mips_host_tlb_inv().
The cpu_vmbits definition of 32bit CPU in cpu-features.h is 31, so we
still use the old definition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Xing Li <lixing(a)loongson.cn>
[Huacai: Improve commit messages]
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
arch/mips/include/asm/kvm_host.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index a01cee9..caa2b936 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -274,7 +274,11 @@ enum emulation_result {
#define MIPS3_PG_SHIFT 6
#define MIPS3_PG_FRAME 0x3fffffc0
+#if defined(CONFIG_64BIT)
+#define VPN2_MASK GENMASK(cpu_vmbits - 1, 13)
+#else
#define VPN2_MASK 0xffffe000
+#endif
#define KVM_ENTRYHI_ASID cpu_asid_mask(&boot_cpu_data)
#define TLB_IS_GLOBAL(x) ((x).tlb_lo[0] & (x).tlb_lo[1] & ENTRYLO_G)
#define TLB_VPN2(x) ((x).tlb_hi & VPN2_MASK)
--
2.7.0
In the request completion path with CQE, request type is being checked
after the request is getting completed. This is resulting in returning
the wrong request type and leading to the IO hang issue.
ASYNC request type is getting returned for DCMD type requests.
Because of this mismatch, mq->cqe_busy flag is never getting cleared
and the driver is not invoking blk_mq_hw_run_queue. So requests are not
getting dispatched to the LLD from the block layer.
All these eventually leading to IO hang issues.
So, get the request type before completing the request.
Cc: <stable(a)vger.kernel.org> # v4.19+
Signed-off-by: Veerabhadrarao Badiganti <vbadigan(a)codeaurora.org>
---
drivers/mmc/core/block.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8499b56..c5367e2 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1370,6 +1370,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
struct mmc_request *mrq = &mqrq->brq.mrq;
struct request_queue *q = req->q;
struct mmc_host *host = mq->card->host;
+ enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
unsigned long flags;
bool put_card;
int err;
@@ -1399,7 +1400,7 @@ static void mmc_blk_cqe_complete_rq(struct mmc_queue *mq, struct request *req)
spin_lock_irqsave(&mq->lock, flags);
- mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+ mq->in_flight[issue_type] -= 1;
put_card = (mmc_tot_in_flight(mq) == 0);
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc., is a member of Code Aurora Forum, a Linux Foundation Collaborative Project
Hi,
I believe some patches are needed to fix build issues on Hexagon:
ac32292c8552f7e8517be184e65dd09786e991f9 hexagon: clean up ioremap
7312b70699252074d753c5005fc67266c547bbe3 hexagon: define ioremap_uc
The same is for stable v5.4.
Best,
Tuowen
The following race occurs while accessing the dmabuf object exported as
file:
P1 P2
dma_buf_release() dmabuffs_dname()
[say lsof reading /proc/<P1 pid>/fd/<num>]
read dmabuf stored in dentry->d_fsdata
Free the dmabuf object
Start accessing the dmabuf structure
In the above description, the dmabuf object freed in P1 is being
accessed from P2 which is resulting into the use-after-free. Below is
the dump stack reported.
We are reading the dmabuf object stored in the dentry->d_fsdata but
there is no binding between the dentry and the dmabuf which means that
the dmabuf can be freed while it is being read from ->d_fsdata and
inuse. Reviews on the patch V1 says that protecting the dmabuf inuse
with an extra refcount is not a viable solution as the exported dmabuf
is already under file's refcount and keeping the multiple refcounts on
the same object coordinated is not possible.
As we are reading the dmabuf in ->d_fsdata just to get the user passed
name, we can directly store the name in d_fsdata thus can avoid the
reading of dmabuf altogether.
Call Trace:
kasan_report+0x12/0x20
__asan_report_load8_noabort+0x14/0x20
dmabuffs_dname+0x4f4/0x560
tomoyo_realpath_from_path+0x165/0x660
tomoyo_get_realpath
tomoyo_check_open_permission+0x2a3/0x3e0
tomoyo_file_open
tomoyo_file_open+0xa9/0xd0
security_file_open+0x71/0x300
do_dentry_open+0x37a/0x1380
vfs_open+0xa0/0xd0
path_openat+0x12ee/0x3490
do_filp_open+0x192/0x260
do_sys_openat2+0x5eb/0x7e0
do_sys_open+0xf2/0x180
Fixes: bb2bb9030425 ("dma-buf: add DMA_BUF_SET_NAME ioctls")
Reported-by: syzbot+3643a18836bce555bff6(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org> [5.3+]
Signed-off-by: Charan Teja Reddy <charante(a)codeaurora.org>
---
Changes in v2:
- Pass the user passed name in ->d_fsdata instead of dmabuf
- Improve the commit message
Changes in v1: (https://patchwork.kernel.org/patch/11514063/)
drivers/dma-buf/dma-buf.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 01ce125..0071f7d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -25,6 +25,7 @@
#include <linux/mm.h>
#include <linux/mount.h>
#include <linux/pseudo_fs.h>
+#include <linux/dcache.h>
#include <uapi/linux/dma-buf.h>
#include <uapi/linux/magic.h>
@@ -40,15 +41,13 @@ struct dma_buf_list {
static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
{
- struct dma_buf *dmabuf;
char name[DMA_BUF_NAME_LEN];
size_t ret = 0;
- dmabuf = dentry->d_fsdata;
- dma_resv_lock(dmabuf->resv, NULL);
- if (dmabuf->name)
- ret = strlcpy(name, dmabuf->name, DMA_BUF_NAME_LEN);
- dma_resv_unlock(dmabuf->resv);
+ spin_lock(&dentry->d_lock);
+ if (dentry->d_fsdata)
+ ret = strlcpy(name, dentry->d_fsdata, DMA_BUF_NAME_LEN);
+ spin_unlock(&dentry->d_lock);
return dynamic_dname(dentry, buffer, buflen, "/%s:%s",
dentry->d_name.name, ret > 0 ? name : "");
@@ -80,12 +79,16 @@ static int dma_buf_fs_init_context(struct fs_context *fc)
static int dma_buf_release(struct inode *inode, struct file *file)
{
struct dma_buf *dmabuf;
+ struct dentry *dentry = file->f_path.dentry;
if (!is_dma_buf_file(file))
return -EINVAL;
dmabuf = file->private_data;
+ spin_lock(&dentry->d_lock);
+ dentry->d_fsdata = NULL;
+ spin_unlock(&dentry->d_lock);
BUG_ON(dmabuf->vmapping_counter);
/*
@@ -343,6 +346,7 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const char __user *buf)
}
kfree(dmabuf->name);
dmabuf->name = name;
+ dmabuf->file->f_path.dentry->d_fsdata = name;
out_unlock:
dma_resv_unlock(dmabuf->resv);
@@ -446,7 +450,6 @@ static struct file *dma_buf_getfile(struct dma_buf *dmabuf, int flags)
goto err_alloc_file;
file->f_flags = flags & (O_ACCMODE | O_NONBLOCK);
file->private_data = dmabuf;
- file->f_path.dentry->d_fsdata = dmabuf;
return file;
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
From: Henry Willard <henry.willard(a)oracle.com>
Subject: mm: limit boost_watermark on small zones
Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external
fragmentation event occurs") adds a boost_watermark() function which
increases the min watermark in a zone by at least pageblock_nr_pages or
the number of pages in a page block. On Arm64, with 64K pages and 512M
huge pages, this is 8192 pages or 512M. It does this regardless of the
number of managed pages managed in the zone or the likelihood of success.
This can put the zone immediately under water in terms of allocating pages
from the zone, and can cause a small machine to fail immediately due to
OoM. Unlike set_recommended_min_free_kbytes(), which substantially
increases min_free_kbytes and is tied to THP, boost_watermark() can be
called even if THP is not active. The problem is most likely to appear
on architectures such as Arm64 where pageblock_nr_pages is very large.
It is desirable to run the kdump capture kernel in as small a space as
possible to avoid wasting memory. In some architectures, such as Arm64,
there are restrictions on where the capture kernel can run, and therefore,
the space available. A capture kernel running in 768M can fail due to OoM
immediately after boost_watermark() sets the min in zone DMA32, where
most of the memory is, to 512M. It fails even though there is over 500M of
free memory. With boost_watermark() suppressed, the capture kernel can run
successfully in 448M.
This patch limits boost_watermark() to boosting a zone's min watermark only
when there are enough pages that the boost will produce positive results.
In this case that is estimated to be four times as many pages as
pageblock_nr_pages.
Mel said:
: There is no harm in marking it stable. Clearly it does not happen very
: often but it's not impossible. 32-bit x86 is a lot less common now
: which would previously have been vulnerable to triggering this easily.
: ppc64 has a larger base page size but typically only has one zone.
: arm64 is likely the most vulnerable, particularly when CMA is
: configured with a small movable zone.
Link: http://lkml.kernel.org/r/1588294148-6586-1-git-send-email-henry.willard@ora…
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Henry Willard <henry.willard(a)oracle.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/mm/page_alloc.c~mm-limit-boost_watermark-on-small-zones
+++ a/mm/page_alloc.c
@@ -2401,6 +2401,14 @@ static inline void boost_watermark(struc
if (!watermark_boost_factor)
return;
+ /*
+ * Don't bother in zones that are unlikely to produce results.
+ * On small machines, including kdump capture kernels running
+ * in a small area, boosting the watermark can cause an out of
+ * memory situation immediately.
+ */
+ if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
+ return;
max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
watermark_boost_factor, 10000);
_
From: Roman Penyaev <rpenyaev(a)suse.de>
Subject: epoll: atomically remove wait entry on wake up
This patch does two things:
1. fixes lost wakeup introduced by:
339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll")
2. improves performance for events delivery.
The description of the problem is the following: if N (>1) threads are
waiting on ep->wq for new events and M (>1) events come, it is quite
likely that >1 wakeups hit the same wait queue entry, because there is
quite a big window between __add_wait_queue_exclusive() and the following
__remove_wait_queue() calls in ep_poll() function. This can lead to lost
wakeups, because thread, which was woken up, can handle not all the events
in ->rdllist. (in better words the problem is described here:
https://lkml.org/lkml/2019/10/7/905)
The idea of the current patch is to use init_wait() instead of
init_waitqueue_entry(). Internally init_wait() sets
autoremove_wake_function as a callback, which removes the wait entry
atomically (under the wq locks) from the list, thus the next coming wakeup
hits the next wait entry in the wait queue, thus preventing lost wakeups.
Problem is very well reproduced by the epoll60 test case [1].
Wait entry removal on wakeup has also performance benefits, because there
is no need to take a ep->lock and remove wait entry from the queue after
the successful wakeup. Here is the timing output of the epoll60 test
case:
With explicit wakeup from ep_scan_ready_list() (the state of the
code prior 339ddb53d373):
real 0m6.970s
user 0m49.786s
sys 0m0.113s
After this patch:
real 0m5.220s
user 0m36.879s
sys 0m0.019s
The other testcase is the stress-epoll [2], where one thread consumes
all the events and other threads produce many events:
With explicit wakeup from ep_scan_ready_list() (the state of the
code prior 339ddb53d373):
threads events/ms run-time ms
8 5427 1474
16 6163 2596
32 6824 4689
64 7060 9064
128 6991 18309
After this patch:
threads events/ms run-time ms
8 5598 1429
16 7073 2262
32 7502 4265
64 7640 8376
128 7634 16767
(number of "events/ms" represents event bandwidth, thus higher is
better; number of "run-time ms" represents overall time spent
doing the benchmark, thus lower is better)
[1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
[2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c
Link: http://lkml.kernel.org/r/20200430130326.1368509-2-rpenyaev@suse.de
Signed-off-by: Roman Penyaev <rpenyaev(a)suse.de>
Reviewed-by: Jason Baron <jbaron(a)akamai.com>
Cc: Khazhismel Kumykov <khazhy(a)google.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Heiher <r(a)hev.cc>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/eventpoll.c | 43 ++++++++++++++++++++++++-------------------
1 file changed, 24 insertions(+), 19 deletions(-)
--- a/fs/eventpoll.c~epoll-atomically-remove-wait-entry-on-wake-up
+++ a/fs/eventpoll.c
@@ -1822,7 +1822,6 @@ static int ep_poll(struct eventpoll *ep,
{
int res = 0, eavail, timed_out = 0;
u64 slack = 0;
- bool waiter = false;
wait_queue_entry_t wait;
ktime_t expires, *to = NULL;
@@ -1867,21 +1866,23 @@ fetch_events:
*/
ep_reset_busy_poll_napi_id(ep);
- /*
- * We don't have any available event to return to the caller. We need
- * to sleep here, and we will be woken by ep_poll_callback() when events
- * become available.
- */
- if (!waiter) {
- waiter = true;
- init_waitqueue_entry(&wait, current);
-
+ do {
+ /*
+ * Internally init_wait() uses autoremove_wake_function(),
+ * thus wait entry is removed from the wait queue on each
+ * wakeup. Why it is important? In case of several waiters
+ * each new wakeup will hit the next waiter, giving it the
+ * chance to harvest new event. Otherwise wakeup can be
+ * lost. This is also good performance-wise, because on
+ * normal wakeup path no need to call __remove_wait_queue()
+ * explicitly, thus ep->lock is not taken, which halts the
+ * event delivery.
+ */
+ init_wait(&wait);
write_lock_irq(&ep->lock);
__add_wait_queue_exclusive(&ep->wq, &wait);
write_unlock_irq(&ep->lock);
- }
- for (;;) {
/*
* We don't want to sleep if the ep_poll_callback() sends us
* a wakeup in between. That's why we set the task state
@@ -1911,10 +1912,20 @@ fetch_events:
timed_out = 1;
break;
}
- }
+
+ /* We were woken up, thus go and try to harvest some events */
+ eavail = 1;
+
+ } while (0);
__set_current_state(TASK_RUNNING);
+ if (!list_empty_careful(&wait.entry)) {
+ write_lock_irq(&ep->lock);
+ __remove_wait_queue(&ep->wq, &wait);
+ write_unlock_irq(&ep->lock);
+ }
+
send_events:
/*
* Try to transfer events to user space. In case we get 0 events and
@@ -1925,12 +1936,6 @@ send_events:
!(res = ep_send_events(ep, events, maxevents)) && !timed_out)
goto fetch_events;
- if (waiter) {
- write_lock_irq(&ep->lock);
- __remove_wait_queue(&ep->wq, &wait);
- write_unlock_irq(&ep->lock);
- }
-
return res;
}
_