First patch removes old rovers, they are inherently racy
and cause packet drops/delays.
Second patch avoids iterating entire range of ports: It takes way
too long when most tuples are in use.
First patch is slightly mangled: nf_nat_proto_udplite.c doesn't exist
anymore upstream and nf_nat_proto_common.c needs minor adjustment
in context.
Florian Westphal (2):
netfilter: nat: remove l4 protocol port rovers
netfilter: nat: limit port clash resolution attempts
include/net/netfilter/nf_nat_l4proto.h | 2 +-
net/netfilter/nf_nat_proto_common.c | 36 ++++++++++++++++++--------
net/netfilter/nf_nat_proto_dccp.c | 5 +---
net/netfilter/nf_nat_proto_sctp.c | 5 +---
net/netfilter/nf_nat_proto_tcp.c | 5 +---
net/netfilter/nf_nat_proto_udp.c | 5 +---
net/netfilter/nf_nat_proto_udplite.c | 5 +---
7 files changed, 31 insertions(+), 32 deletions(-)
--
2.34.1
Dear Friend,
I am Ms Lisa Hugh accountant and files keeping by profession with the bank.
I need your co-operation for the transferring of
($4,500,000,00,U.S.DOLLARS)to your bank account for both of us
benefit.
Please send the follow below,
1)AGE....
2)TELEPHONE NUMBER,,,,,...
3)COUNTRY.....
4)OCCUPATION..
....
Thanks.
Ms Lisa Hugh
From: Uday Shankar <ushankar(a)purestorage.com>
Controller deletion/reset, immediately followed by or concurrent with
a reconnect, is hard failing the connect attempt resulting in a
complete loss of connectivity to the controller.
In the connect request, fabrics looks for an existing controller with
the same address components and aborts the connect if a controller
already exists and the duplicate connect option isn't set. The match
routine filters out controllers that are dead or dying, so they don't
interfere with the new connect request.
When NVME_CTRL_DELETING_NOIO was added, it missed updating the state
filters in the nvmf_ctlr_matches_baseopts() routine. Thus, when in this
new state, it's seen as a live controller and fails the connect request.
Correct by adding the DELETING_NIO state to the match checks.
Fixes: ecca390e8056 ("nvme: fix deadlock in disconnect during scan_work and/or ana_work")
Cc: <stable(a)vger.kernel.org> # v5.7+
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
Reviewed-by: James Smart <jsmart2021(a)gmail.com>
CC: Sagi Grimberg <sagi(a)grimberg.me>
---
drivers/nvme/host/fabrics.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvme/host/fabrics.h b/drivers/nvme/host/fabrics.h
index c3203ff1c654..1e3a09cad961 100644
--- a/drivers/nvme/host/fabrics.h
+++ b/drivers/nvme/host/fabrics.h
@@ -170,6 +170,7 @@ nvmf_ctlr_matches_baseopts(struct nvme_ctrl *ctrl,
struct nvmf_ctrl_options *opts)
{
if (ctrl->state == NVME_CTRL_DELETING ||
+ ctrl->state == NVME_CTRL_DELETING_NOIO ||
ctrl->state == NVME_CTRL_DEAD ||
strcmp(opts->subsysnqn, ctrl->opts->subsysnqn) ||
strcmp(opts->host->nqn, ctrl->opts->host->nqn) ||
--
2.26.2
-Wunaligned-access is a new warning in clang that is default enabled for
arm and arm64 under certain circumstances within the clang frontend (see
LLVM commit below). Under an ARCH=arm allmodconfig, there are
1284 total/70 unique instances of this warning (most of the instances
are in header files), which is quite noisy.
To keep the build green through CONFIG_WERROR, only allow this warning
with W=2, which seems appropriate according to its description:
"warnings which occur quite often but may still be relevant".
This intentionally does not use the -Wno-... + -W... pattern that the
rest of the Makefile does because this warning is not enabled for
anything other than certain arm and arm64 configurations within clang so
it should only be "not disabled", rather than explicitly enabled, so
that other architectures are not disturbed by the warning.
Cc: stable(a)vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/35737df4dcd28534bd3090157c224c1…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
scripts/Makefile.extrawarn | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index d53825503874..5f75fec4d5ac 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -70,6 +70,20 @@ KBUILD_CFLAGS += $(call cc-option, -Wunused-macros)
KBUILD_CPPFLAGS += -DKBUILD_EXTRA_WARN2
+else
+
+ifdef CONFIG_CC_IS_CLANG
+# While this warning is architecture agnostic, it is only default enabled for
+# 32-bit and 64-bit ARM under certain conditions within clang:
+#
+# https://github.com/llvm/llvm-project/commit/35737df4dcd28534bd3090157c224c1…
+#
+# To allow it to be disabled under a normal build or W=1 but show up under W=2
+# for those configurations, disable it when W=2 is not set, rather than enable
+# -Wunaligned-access in the above block explicitly.
+KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access)
+endif
+
endif
#
base-commit: 26291c54e111ff6ba87a164d85d4a4e134b7315c
--
2.35.1
Quoting[1] Ariadne Conill:
"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[5]."
While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.
The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.
Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:
process './argc0' launched './argc0' with NULL argv: empty string added
Additionally WARN() and reject NULL argv usage for kernel threads.
[1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.or…
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+…
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%…
[8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/
Reported-by: Ariadne Conill <ariadne(a)dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: linux-fsdevel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
fs/exec.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/fs/exec.c b/fs/exec.c
index 79f2c9483302..bbf3aadf7ce1 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -495,8 +495,14 @@ static int bprm_stack_limits(struct linux_binprm *bprm)
* the stack. They aren't stored until much later when we can't
* signal to the parent that the child has run out of stack space.
* Instead, calculate it here so it's possible to fail gracefully.
+ *
+ * In the case of argc = 0, make sure there is space for adding a
+ * empty string (which will bump argc to 1), to ensure confused
+ * userspace programs don't start processing from argv[1], thinking
+ * argc can never be 0, to keep them from walking envp by accident.
+ * See do_execveat_common().
*/
- ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
+ ptr_size = (min(bprm->argc, 1) + bprm->envc) * sizeof(void *);
if (limit <= ptr_size)
return -E2BIG;
limit -= ptr_size;
@@ -1897,6 +1903,9 @@ static int do_execveat_common(int fd, struct filename *filename,
}
retval = count(argv, MAX_ARG_STRINGS);
+ if (retval == 0)
+ pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n",
+ current->comm, bprm->filename);
if (retval < 0)
goto out_free;
bprm->argc = retval;
@@ -1923,6 +1932,19 @@ static int do_execveat_common(int fd, struct filename *filename,
if (retval < 0)
goto out_free;
+ /*
+ * When argv is empty, add an empty string ("") as argv[0] to
+ * ensure confused userspace programs that start processing
+ * from argv[1] won't end up walking envp. See also
+ * bprm_stack_limits().
+ */
+ if (bprm->argc == 0) {
+ retval = copy_string_kernel("", bprm);
+ if (retval < 0)
+ goto out_free;
+ bprm->argc = 1;
+ }
+
retval = bprm_execve(bprm, fd, filename, flags);
out_free:
free_bprm(bprm);
@@ -1951,6 +1973,8 @@ int kernel_execve(const char *kernel_filename,
}
retval = count_strings_kernel(argv);
+ if (WARN_ON_ONCE(retval == 0))
+ retval = -EINVAL;
if (retval < 0)
goto out_free;
bprm->argc = retval;
--
2.30.2
The upstream commit 541ab2aeb282 ("KVM: x86: work around leak of
uninitialized stack contents") resets `exception` in the function
`kvm_write_guest_virt_system`.
However, its backported version in stable (commit ba7f1c934f2e
("KVM: x86: work around leak of uninitialized stack contents")) applied
the change in `emulator_write_std` instead.
This patch moves the memset instruction back to
`kvm_write_guest_virt_system`.
Fixes: ba7f1c934f2e ("KVM: x86: work around leak of uninitialized stack contents")
Signed-off-by: Guillaume Bertholon <guillaume.bertholon(a)ens.fr>
---
arch/x86/kvm/x86.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8dce61c..9101002 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4417,13 +4417,6 @@ static int emulator_write_std(struct x86_emulate_ctxt *ctxt, gva_t addr, void *v
if (!system && kvm_x86_ops->get_cpl(vcpu) == 3)
access |= PFERR_USER_MASK;
- /*
- * FIXME: this should call handle_emulation_failure if X86EMUL_IO_NEEDED
- * is returned, but our callers are not ready for that and they blindly
- * call kvm_inject_page_fault. Ensure that they at least do not leak
- * uninitialized kernel stack memory into cr2 and error code.
- */
- memset(exception, 0, sizeof(*exception));
return kvm_write_guest_virt_helper(addr, val, bytes, vcpu,
access, exception);
}
@@ -4431,6 +4424,13 @@ static int emulator_write_std(struct x86_emulate_ctxt *ctxt, gva_t addr, void *v
int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu, gva_t addr, void *val,
unsigned int bytes, struct x86_exception *exception)
{
+ /*
+ * FIXME: this should call handle_emulation_failure if X86EMUL_IO_NEEDED
+ * is returned, but our callers are not ready for that and they blindly
+ * call kvm_inject_page_fault. Ensure that they at least do not leak
+ * uninitialized kernel stack memory into cr2 and error code.
+ */
+ memset(exception, 0, sizeof(*exception));
return kvm_write_guest_virt_helper(addr, val, bytes, vcpu,
PFERR_WRITE_MASK, exception);
}
--
2.7.4
Resent (v1 was mangled).
Note that the "Fixes:" tag is a strong assumption of mine. Speak up if
you think that is wrong.
Jan
Georgi Valkov (1):
ipheth: fix EOVERFLOW in ipheth_rcvbulk_callback
drivers/net/usb/ipheth.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--
2.34.1
Bank Światowy i Światowa Organizacja Zdrowia (WHO) we współpracy z
Organizacją Rodzicielską, Radą Gospodarczą i Społeczną Narodów
Zjednoczonych (ECOSOC), upoważniły to biuro do starannego wypłacania
COvid19. załamanie gospodarcze, które spowodował wirus.
Twoja korzyść z Covid19 w wysokości (550 000 USD) Pięćset pięćdziesiąt
tysięcy dolarów amerykańskich jest gotowa do wypłaty, metoda płatności
to karta debetowa Visa do bankomatu Union Togolaise Bank (UTB), która
zostanie wysłana do Ciebie przez firmę kurierską. Możesz wypłacić
maksymalnie 5000 USD dziennie, aż do całkowitego wypłacenia pełnych
środków. Skontaktuj się z nami na ten e-mail (kodjikokou09(a)gmail.com)
Gorące pozdrowienia,
kodji kokou
Union Togolaise Bank
Back in 06f6c4c6c3e8 ("ata: libata: add missing ata_identify_page_supported() calls")
a read of ATA_LOG_DIRECTORY page was added. This caused the
SATADOM-ML 3ME to lock up.
In 636f6e2af4fb ("libata: add horkage for missing Identify Device log")
a flag was added to cache if a device supports this or not.
This adds a blacklist entry which flags that these devices doesn't
support that call and shouldn't be issued that call.
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Anton Lundin <glance(a)acc.umu.se>
Depends-on: 636f6e2af4fb ("libata: add horkage for missing Identify Device log")
---
drivers/ata/libata-core.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 87d36b29ca5f..e024af9f33d0 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -4070,6 +4070,13 @@ static const struct ata_blacklist_entry ata_device_blacklist [] = {
{ "WDC WD3000JD-*", NULL, ATA_HORKAGE_WD_BROKEN_LPM },
{ "WDC WD3200JD-*", NULL, ATA_HORKAGE_WD_BROKEN_LPM },
+ /*
+ * This sata dom goes on a walkabout when it sees the
+ * ATA_LOG_DIRECTORY read request so ensure we don't issue such a
+ * request to these devices.
+ */
+ { "SATADOM-ML 3ME", NULL, ATA_HORKAGE_NO_ID_DEV_LOG },
+
/* End Marker */
{ }
};
--
2.30.2
NOTE! This is the proposed LAST 4.4.y kernel release to happen under
the rules of the normal stable kernel releases. After this one, it will
be marked End-Of-Life as it has been 6 years and you really should know
better by now and have moved to a newer kernel tree. After this one, no
more security fixes will be backported and you will end up with an
insecure system over time.
--------------------------
This is the start of the stable review cycle for the 4.4.302 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 03 Feb 2022 18:08:10 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.302-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.302-rc1
Guillaume Bertholon <guillaume.bertholon(a)ens.fr>
KVM: x86: Fix misplaced backport of "work around leak of uninitialized stack contents"
Guillaume Bertholon <guillaume.bertholon(a)ens.fr>
Revert "tc358743: fix register i2c_rd/wr function fix"
Guillaume Bertholon <guillaume.bertholon(a)ens.fr>
Revert "drm/radeon/ci: disable mclk switching for high refresh rates (v2)"
Guillaume Bertholon <guillaume.bertholon(a)ens.fr>
Bluetooth: MGMT: Fix misplaced BT_HS check
Eric Dumazet <edumazet(a)google.com>
ipv4: tcp: send zero IPID in SYNACK messages
Eric Dumazet <edumazet(a)google.com>
ipv4: raw: lock the socket in raw_bind()
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (lm90) Reduce maximum conversion rate for G781
Xianting Tian <xianting.tian(a)linux.alibaba.com>
drm/msm: Fix wrong size calculation
Jianguo Wu <wujianguo(a)chinatelecom.cn>
net-procfs: show net devices bound packet types
Eric Dumazet <edumazet(a)google.com>
ipv4: avoid using shared IP generator for connected sockets
Congyu Liu <liu3101(a)purdue.edu>
net: fix information leakage in /proc/net/ptype
Ido Schimmel <idosch(a)nvidia.com>
ipv6_tunnel: Rate limit warning messages
John Meneghini <jmeneghi(a)redhat.com>
scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix hang in usb_kill_urb by adding memory barriers
Alan Stern <stern(a)rowland.harvard.edu>
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
Cameron Williams <cang1(a)live.co.uk>
tty: Add support for Brainboxes UC cards.
daniel.starke(a)siemens.com <daniel.starke(a)siemens.com>
tty: n_gsm: fix SW flow control encoding/handling
Valentin Caron <valentin.caron(a)foss.st.com>
serial: stm32: fix software flow control transfer
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
PM: wakeup: simplify the output logic of pm_show_wakelocks()
Jan Kara <jack(a)suse.cz>
udf: Fix NULL ptr deref when converting from inline format
Jan Kara <jack(a)suse.cz>
udf: Restore i_lenAlloc when inode expansion fails
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
Vasily Gorbik <gor(a)linux.ibm.com>
s390/hypfs: include z/VM guests with access control group set
Brian Gix <brian.gix(a)intel.com>
Bluetooth: refactor malicious adv data check
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: bcm: fix UAF of bcm op
-------------
Diffstat:
Makefile | 4 +-
arch/s390/hypfs/hypfs_vm.c | 6 ++-
arch/x86/kvm/x86.c | 14 +++---
drivers/gpu/drm/msm/msm_drv.c | 2 +-
drivers/gpu/drm/radeon/ci_dpm.c | 6 ---
drivers/hwmon/lm90.c | 2 +-
drivers/media/i2c/tc358743.c | 2 +-
drivers/s390/scsi/zfcp_fc.c | 13 ++++-
drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 20 ++------
drivers/tty/n_gsm.c | 4 +-
drivers/tty/serial/8250/8250_pci.c | 100 ++++++++++++++++++++++++++++++++++++-
drivers/tty/serial/stm32-usart.c | 2 +-
drivers/usb/core/hcd.c | 14 ++++++
drivers/usb/core/urb.c | 12 +++++
drivers/usb/storage/unusual_devs.h | 10 ++++
fs/udf/inode.c | 9 ++--
include/linux/netdevice.h | 1 +
include/net/ip.h | 21 ++++----
kernel/power/wakelock.c | 12 ++---
net/bluetooth/hci_event.c | 10 ++--
net/bluetooth/mgmt.c | 8 +--
net/can/bcm.c | 20 ++++----
net/core/net-procfs.c | 38 ++++++++++++--
net/ipv4/ip_output.c | 11 +++-
net/ipv4/raw.c | 5 +-
net/ipv6/ip6_tunnel.c | 8 +--
net/packet/af_packet.c | 2 +
27 files changed, 262 insertions(+), 94 deletions(-)
This reverts commit 54d516b1d62ff8f17cee2da06e5e4706a0d00b8a
That commit did a refactoring that effectively combined fast and slow
gup paths (again). And that was again incorrect, for two reasons:
a) Fast gup and slow gup get reference counts on pages in different ways
and with different goals: see Linus' writeup in commit cd1adf1b63a1
("Revert "mm/gup: remove try_get_page(), call try_get_compound_head()
directly""), and
b) try_grab_compound_head() also has a specific check for "FOLL_LONGTERM
&& !is_pinned(page)", that assumes that the caller can fall back to slow
gup. This resulted in new failures, as recently report by Will McVicker
[1].
But (a) has problems too, even though they may not have been reported
yet. So just revert this.
[1] https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com
Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
Cc: Christoph Hellwig <hch(a)lst.de>
Tested-by: Will McVicker <willmcvicker(a)google.com>
Cc: Minchan Kim <minchan(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org # 5.15
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Since v1, I've only changed the commit description:
* Added Will's Tested-by
* Added Cc: stable
mm/gup.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index f0af462ac1e2..a9d4d724aef7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -124,8 +124,8 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
* considered failure, and furthermore, a likely bug in the caller, so a warning
* is also emitted.
*/
-struct page *try_grab_compound_head(struct page *page,
- int refs, unsigned int flags)
+__maybe_unused struct page *try_grab_compound_head(struct page *page,
+ int refs, unsigned int flags)
{
if (flags & FOLL_GET)
return try_get_compound_head(page, refs);
@@ -208,10 +208,35 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
*/
bool __must_check try_grab_page(struct page *page, unsigned int flags)
{
- if (!(flags & (FOLL_GET | FOLL_PIN)))
- return true;
+ WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == (FOLL_GET | FOLL_PIN));
- return try_grab_compound_head(page, 1, flags);
+ if (flags & FOLL_GET)
+ return try_get_page(page);
+ else if (flags & FOLL_PIN) {
+ int refs = 1;
+
+ page = compound_head(page);
+
+ if (WARN_ON_ONCE(page_ref_count(page) <= 0))
+ return false;
+
+ if (hpage_pincount_available(page))
+ hpage_pincount_add(page, 1);
+ else
+ refs = GUP_PIN_COUNTING_BIAS;
+
+ /*
+ * Similar to try_grab_compound_head(): even if using the
+ * hpage_pincount_add/_sub() routines, be sure to
+ * *also* increment the normal page refcount field at least
+ * once, so that the page really is pinned.
+ */
+ page_ref_add(page, refs);
+
+ mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, 1);
+ }
+
+ return true;
}
/**
base-commit: 9f7fb8de5d9bac17b6392a14af40baf555d9129b
--
2.35.1
Greetings,
I sent this mail praying it will find you in a good condition, since I
myself am in a very critical health condition in which I sleep every
night without knowing if I may be alive to see the next day. I am
Mrs.William Ann, a widow suffering from a long time illness. I have
some funds I inherited from my late husband, the sum of
($11,000,000.00, Eleven Million Dollars) my Doctor told me recently
that I have serious sickness which is a cancer problem. What disturbs
me most is my stroke sickness. Having known my condition, I decided to
donate this fund to a good person that will utilize it the way I am
going to instruct herein. I need a very honest God.
fearing a person who can claim this money and use it for Charity
works, for orphanages, widows and also build schools for less
privileges that will be named after my late husband if possible and to
promote the word of God and the effort that the house of God is
maintained. I do not want a situation where this money will be used in
an ungodly manner. That's why I'm making this decision. I'm not afraid
of death so I know where I'm going. I accept this decision because I
do not have any child who will inherit this money after I die. Please
I want your sincere and urgent answer to know if you will be able to
execute this project, and I will give you more information on how the
fund will be transferred to your bank account. I am waiting for your
reply.
May God Bless you,
Mrs.William Ann,
The patch titled
Subject: mm/gup.c: fix invalid page pointer returned with FOLL_PIN gups
has been removed from the -mm tree. Its filename was
mm-fix-invalid-page-pointer-returned-with-foll_pin-gups.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/gup.c: fix invalid page pointer returned with FOLL_PIN gups
Alex reported invalid page pointer returned with pin_user_pages_remote()
from vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for
batched pinning with struct vfio_batch"). This problem breaks NVIDIA vfio
mdev.
It turns out that it's not the fault of the vfio commit; however after
vfio switches to a full page buffer to store the page pointers it starts
to expose the problem easier.
The problem is for VM_PFNMAP vmas we should normally fail with an -EFAULT
then vfio will carry on to handle the MMIO regions. However when the bug
triggered, follow_page_mask() returned -EEXIST for such a page, which will
jump over the current page, leaving that entry in **pages untouched.
However the caller is not aware of it, hence the caller will reference the
page as usual even if the pointer data can be anything.
We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP handle
pfn mapping unless FOLL_GET is requested") which seems very reasonable.
It could be that when we reworked GUP with FOLL_PIN we could have
overlooked that special path in commit 3faa52c03f44 ("mm/gup: track
FOLL_PIN pages"), even if that commit rightfully touched up
follow_devmap_pud() on checking FOLL_PIN when it needs to return an
-EEXIST.
Since at it, add another WARN_ON_ONCE() at the -EEXIST handling to make
sure we mustn't have **pages set when reaching there, because otherwise it
means the caller will try to read a garbage right after __get_user_pages()
returns.
Attaching the Fixes to the FOLL_PIN rework commit, as it happened later
than 1027e4436b6a.
Link: https://lkml.kernel.org/r/20220125033700.69705-1-peterx@redhat.com
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Reported-by: Alex Williamson <alex.williamson(a)redhat.com>
Debugged-by: Alex Williamson <alex.williamson(a)redhat.com>
Tested-by: Alex Williamson <alex.williamson(a)redhat.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Jan Kara <jack(a)suse.cz>
Cc: J��r��me Glisse <jglisse(a)redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/gup.c~mm-fix-invalid-page-pointer-returned-with-foll_pin-gups
+++ a/mm/gup.c
@@ -440,7 +440,7 @@ static int follow_pfn_pte(struct vm_area
pte_t *pte, unsigned int flags)
{
/* No page to get reference */
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return -EFAULT;
if (flags & FOLL_TOUCH) {
@@ -1181,7 +1181,13 @@ retry:
/*
* Proper page table entry exists, but no corresponding
* struct page.
+ *
+ * Warn if we jumped over even with a valid **pages.
+ * It shouldn't trigger in practise, but when there's
+ * buggy returns on -EEXIST we'll warn before returning
+ * an invalid page pointer in the array.
*/
+ WARN_ON_ONCE(pages);
goto next_page;
} else if (IS_ERR(page)) {
ret = PTR_ERR(page);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
On Tue, Feb 1, 2022 at 10:00 AM Will McVicker <willmcvicker(a)google.com> wrote:
>
> On Tue, Feb 1, 2022 at 1:29 AM John Hubbard <jhubbard(a)nvidia.com> wrote:
> >
> > This reverts commit 54d516b1d62ff8f17cee2da06e5e4706a0d00b8a
> >
> > That commit did a refactoring that effectively combined fast and slow
> > gup paths (again). And that was again incorrect, for two reasons:
> >
> > a) Fast gup and slow gup get reference counts on pages in different ways
> > and with different goals: see Linus' writeup in commit cd1adf1b63a1
> > ("Revert "mm/gup: remove try_get_page(), call try_get_compound_head()
> > directly""), and
> >
> > b) try_grab_compound_head() also has a specific check for "FOLL_LONGTERM
> > && !is_pinned(page)", that assumes that the caller can fall back to slow
> > gup. This resulted in new failures, as recently report by Will McVicker
> > [1].
> >
> > But (a) has problems too, even though they may not have been reported
> > yet. So just revert this.
> >
> > [1] https://lore.kernel.org/r/20220131203504.3458775-1-willmcvicker@google.com
> >
> > Fixes: 54d516b1d62f ("mm/gup: small refactoring: simplify try_grab_page()")
> > Cc: Christoph Hellwig <hch(a)lst.de>
> > Cc: Will McVicker <willmcvicker(a)google.com>
> > Cc: Minchan Kim <minchan(a)google.com>
> > Cc: Matthew Wilcox <willy(a)infradead.org>
> > Cc: Christian Borntraeger <borntraeger(a)de.ibm.com>
> > Cc: Heiko Carstens <hca(a)linux.ibm.com>
> > Cc: Vasily Gorbik <gor(a)linux.ibm.com>
> > Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
> > Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
> > ---
> > mm/gup.c | 35 ++++++++++++++++++++++++++++++-----
> > 1 file changed, 30 insertions(+), 5 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index f0af462ac1e2..a9d4d724aef7 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -124,8 +124,8 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
> > * considered failure, and furthermore, a likely bug in the caller, so a warning
> > * is also emitted.
> > */
> > -struct page *try_grab_compound_head(struct page *page,
> > - int refs, unsigned int flags)
> > +__maybe_unused struct page *try_grab_compound_head(struct page *page,
> > + int refs, unsigned int flags)
> > {
> > if (flags & FOLL_GET)
> > return try_get_compound_head(page, refs);
> > @@ -208,10 +208,35 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
> > */
> > bool __must_check try_grab_page(struct page *page, unsigned int flags)
> > {
> > - if (!(flags & (FOLL_GET | FOLL_PIN)))
> > - return true;
> > + WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == (FOLL_GET | FOLL_PIN));
> >
> > - return try_grab_compound_head(page, 1, flags);
> > + if (flags & FOLL_GET)
> > + return try_get_page(page);
> > + else if (flags & FOLL_PIN) {
> > + int refs = 1;
> > +
> > + page = compound_head(page);
> > +
> > + if (WARN_ON_ONCE(page_ref_count(page) <= 0))
> > + return false;
> > +
> > + if (hpage_pincount_available(page))
> > + hpage_pincount_add(page, 1);
> > + else
> > + refs = GUP_PIN_COUNTING_BIAS;
> > +
> > + /*
> > + * Similar to try_grab_compound_head(): even if using the
> > + * hpage_pincount_add/_sub() routines, be sure to
> > + * *also* increment the normal page refcount field at least
> > + * once, so that the page really is pinned.
> > + */
> > + page_ref_add(page, refs);
> > +
> > + mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED, 1);
> > + }
> > +
> > + return true;
> > }
> >
> > /**
> >
> > base-commit: 26291c54e111ff6ba87a164d85d4a4e134b7315c
> > --
> > 2.35.1
> >
>
> Thanks John! I verified this works on the Pixel 6 with the 5.15 kernel
> for my camera use-case. Free free to include:
>
> Tested-by: Will McVicker <willmcvicker(a)google.com>
>
> Thanks,
> Will
And just so we don't miss this, I'd also like to request this be
pulled into the 5.15 stable branch please.
Cc: stable(a)vger.kernel.org # 5.15
Thanks,
Will
This reverts commit 0157e2a8a71978c58a7d6cfb3616ab17d9726631.
The reverted commit was backported and applied twice on the stable branch:
- First as commit 15de2e4c90b7 ("drm/radeon/ci: disable mclk switching for
high refresh rates (v2)")
- Then as commit 0157e2a8a719 ("drm/radeon/ci: disable mclk switching for
high refresh rates (v2)")
Fixes: 0157e2a8a719 ("drm/radeon/ci: disable mclk switching for high refresh rates (v2)")
Signed-off-by: Guillaume Bertholon <guillaume.bertholon(a)ens.fr>
---
drivers/gpu/drm/radeon/ci_dpm.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
index 8e1bf9e..c8baa06 100644
--- a/drivers/gpu/drm/radeon/ci_dpm.c
+++ b/drivers/gpu/drm/radeon/ci_dpm.c
@@ -782,12 +782,6 @@ bool ci_dpm_vblank_too_short(struct radeon_device *rdev)
if (r600_dpm_get_vrefresh(rdev) > 120)
return true;
- /* disable mclk switching if the refresh is >120Hz, even if the
- * blanking period would allow it
- */
- if (r600_dpm_get_vrefresh(rdev) > 120)
- return true;
-
if (vblank_time < switch_limit)
return true;
else
--
2.7.4
I'm working on a tool for (reliably) transferring diffs across
versions of a given C code. To evaluate my tool, I have been comparing
the output of my tool against the manual backports in the stable
branch.
In the process, I have found some oddities in some backported patches
which, I believe, are incorrect. In all cases, the root cause seems to
be that `patch` was able to apply a backported diff after some fuzzy
matching but did so at a wrong location (IMHO). Below is an example. I
can report the others if there is indeed a problem.
On the branch stable/linux-4.4.y, the upstream patch
b560a208cda0297fef6ff85bbfd58a8f0a52a543
Bluetooth: MGMT: Fix not checking if BT_HS is enabled
is backported as commit
5abe9f99f5129bee5492072ff76b91ec4fad485b.
If we look at both commits we have:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Upstream
b560a208cda0297fef6ff85bbfd58a8f0a52a543
%%%%%%%%%
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 0b711ad..12d7b36 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
[...]
@@ -1817,6 +1818,10 @@ static int set_hs(struct sock *sk, struct hci_dev *hdev, void *data, u16 len)
bt_dev_dbg(hdev, "sock %p", sk);
+ if (!IS_ENABLED(CONFIG_BT_HS))
+ return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_HS,
+ MGMT_STATUS_NOT_SUPPORTED);
+
status = mgmt_bredr_support(hdev);
if (status)
return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_HS, status);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Backported
5abe9f99f5129bee5492072ff76b91ec4fad485b
%%%%%%%%%
diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index ecc3da6..ee761fb 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
[...]
@@ -2281,6 +2282,10 @@ static int set_link_security(struct sock *sk, struct hci_dev *hdev, void *data,
BT_DBG("request for %s", hdev->name);
+ if (!IS_ENABLED(CONFIG_BT_HS))
+ return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_HS,
+ MGMT_STATUS_NOT_SUPPORTED);
+
status = mgmt_bredr_support(hdev);
if (status)
return mgmt_cmd_status(sk, hdev->id, MGMT_OP_SET_LINK_SECURITY,
I suspect that this does *not* reflect the intent of the orignal patch
(which was addressing an issue in `set_hs`) whereas, here, the
backported patch is somewhat arbitrarily modifying `set_link_security`
while `set_hs` remains as-is.
Is this indeed an issue? Should I report on the other cases as well?
- Guillaume Bertholon
When writing out a stereo control we discard the change notification from
the first channel, meaning that events are only generated based on changes
to the second channel. Ensure that we report a change if either channel
has changed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/soc-ops.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c
index fefd4f34cbc1..6b922d12afb5 100644
--- a/sound/soc/soc-ops.c
+++ b/sound/soc/soc-ops.c
@@ -874,6 +874,7 @@ int snd_soc_put_xr_sx(struct snd_kcontrol *kcontrol,
unsigned long mask = (1UL<<mc->nbits)-1;
long max = mc->max;
long val = ucontrol->value.integer.value[0];
+ int ret = 0;
unsigned int i;
if (invert)
@@ -886,9 +887,11 @@ int snd_soc_put_xr_sx(struct snd_kcontrol *kcontrol,
regmask, regval);
if (err < 0)
return err;
+ if (err > 0)
+ ret = err;
}
- return 0;
+ return ret;
}
EXPORT_SYMBOL_GPL(snd_soc_put_xr_sx);
--
2.30.2
When writing out a stereo control we discard the change notification from
the first channel, meaning that events are only generated based on changes
to the second channel. Ensure that we report a change if either channel
has changed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/soc-ops.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c
index f0d1aeb38346..fefd4f34cbc1 100644
--- a/sound/soc/soc-ops.c
+++ b/sound/soc/soc-ops.c
@@ -498,7 +498,7 @@ int snd_soc_put_volsw_range(struct snd_kcontrol *kcontrol,
unsigned int mask = (1 << fls(max)) - 1;
unsigned int invert = mc->invert;
unsigned int val, val_mask;
- int ret;
+ int err, ret;
if (invert)
val = (max - ucontrol->value.integer.value[0]) & mask;
@@ -507,9 +507,10 @@ int snd_soc_put_volsw_range(struct snd_kcontrol *kcontrol,
val_mask = mask << shift;
val = val << shift;
- ret = snd_soc_component_update_bits(component, reg, val_mask, val);
- if (ret < 0)
- return ret;
+ err = snd_soc_component_update_bits(component, reg, val_mask, val);
+ if (err < 0)
+ return err;
+ ret = err;
if (snd_soc_volsw_is_stereo(mc)) {
if (invert)
@@ -519,8 +520,12 @@ int snd_soc_put_volsw_range(struct snd_kcontrol *kcontrol,
val_mask = mask << shift;
val = val << shift;
- ret = snd_soc_component_update_bits(component, rreg, val_mask,
+ err = snd_soc_component_update_bits(component, rreg, val_mask,
val);
+ /* Don't discard any error code or drop change flag */
+ if (ret == 0 || err < 0) {
+ ret = err;
+ }
}
return ret;
--
2.30.2
When writing out a stereo control we discard the change notification from
the first channel, meaning that events are only generated based on changes
to the second channel. Ensure that we report a change if either channel
has changed.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
sound/soc/soc-ops.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/sound/soc/soc-ops.c b/sound/soc/soc-ops.c
index 08eaa9ddf191..73c9d53de25b 100644
--- a/sound/soc/soc-ops.c
+++ b/sound/soc/soc-ops.c
@@ -308,7 +308,7 @@ int snd_soc_put_volsw(struct snd_kcontrol *kcontrol,
unsigned int sign_bit = mc->sign_bit;
unsigned int mask = (1 << fls(max)) - 1;
unsigned int invert = mc->invert;
- int err;
+ int err, ret;
bool type_2r = false;
unsigned int val2 = 0;
unsigned int val, val_mask;
@@ -336,12 +336,18 @@ int snd_soc_put_volsw(struct snd_kcontrol *kcontrol,
err = snd_soc_component_update_bits(component, reg, val_mask, val);
if (err < 0)
return err;
+ ret = err;
- if (type_2r)
+ if (type_2r) {
err = snd_soc_component_update_bits(component, reg2, val_mask,
- val2);
+ val2);
+ /* Don't discard any error code or drop change flag */
+ if (ret == 0 || err < 0) {
+ ret = err;
+ }
+ }
- return err;
+ return ret;
}
EXPORT_SYMBOL_GPL(snd_soc_put_volsw);
--
2.30.2
This is the start of the stable review cycle for the 5.4.176 release.
There are 64 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 02 Feb 2022 10:51:59 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.176-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.176-rc1
OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp>
block: Fix wrong offset in bio_truncate()
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: invalidate dcache before IN_DELETE event
Marc Kleine-Budde <mkl(a)pengutronix.de>
dt-bindings: can: tcan4x5x: fix mram-cfg RX FIFO config
Eric Dumazet <edumazet(a)google.com>
ipv4: remove sparse error in ip_neigh_gw4()
Eric Dumazet <edumazet(a)google.com>
ipv4: tcp: send zero IPID in SYNACK messages
Eric Dumazet <edumazet(a)google.com>
ipv4: raw: lock the socket in raw_bind()
Yufeng Mo <moyufeng(a)huawei.com>
net: hns3: handle empty unknown interrupt for VF
Hangyu Hua <hbh25y(a)gmail.com>
yam: fix a memory leak in yam_siocdevprivate()
Miaoqian Lin <linmq006(a)gmail.com>
drm/msm/hdmi: Fix missing put_device() call in msm_hdmi_get_phy
Sukadev Bhattiprolu <sukadev(a)linux.ibm.com>
ibmvnic: don't spin in tasklet
Sukadev Bhattiprolu <sukadev(a)linux.ibm.com>
ibmvnic: init ->running_cap_crqs early
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (lm90) Mark alert as broken for MAX6654
David Howells <dhowells(a)redhat.com>
rxrpc: Adjust retransmission backoff
Marek Behún <kabel(a)kernel.org>
phylib: fix potential use-after-free
Robert Hancock <robert.hancock(a)calian.com>
net: phy: broadcom: hook up soft_reset for BCM54616S
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: don't increment invalid counter on NF_REPEAT
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Ensure the server has an up to date ctime before renaming
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Ensure the server has an up to date ctime before hardlinking
Eric Dumazet <edumazet(a)google.com>
ipv6: annotate accesses to fn->fn_sernum
José Expósito <jose.exposito89(a)gmail.com>
drm/msm/dsi: invalid parameter check in msm_dsi_phy_enable
Miaoqian Lin <linmq006(a)gmail.com>
drm/msm/dsi: Fix missing put_device() call in dsi_get_phy
Xianting Tian <xianting.tian(a)linux.alibaba.com>
drm/msm: Fix wrong size calculation
Jianguo Wu <wujianguo(a)chinatelecom.cn>
net-procfs: show net devices bound packet types
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: nfs_atomic_open() can race when looking up a non-regular file
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4: Handle case where the lookup of a directory fails
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (lm90) Reduce maximum conversion rate for G781
Eric Dumazet <edumazet(a)google.com>
ipv4: avoid using shared IP generator for connected sockets
Xin Long <lucien.xin(a)gmail.com>
ping: fix the sk_bound_dev_if match in ping_lookup
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (lm90) Mark alert as broken for MAX6680
Guenter Roeck <linux(a)roeck-us.net>
hwmon: (lm90) Mark alert as broken for MAX6646/6647/6649
Congyu Liu <liu3101(a)purdue.edu>
net: fix information leakage in /proc/net/ptype
sparkhuang <huangshaobo6(a)huawei.com>
ARM: 9170/1: fix panic when kasan and kprobe are enabled
Ido Schimmel <idosch(a)nvidia.com>
ipv6_tunnel: Rate limit warning messages
John Meneghini <jmeneghi(a)redhat.com>
scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
Matthias Kaehlcke <mka(a)chromium.org>
rpmsg: char: Fix race between the release of rpmsg_eptdev and cdev
Sujit Kautkar <sujitka(a)chromium.org>
rpmsg: char: Fix race between the release of rpmsg_ctrldev and cdev
Joe Damato <jdamato(a)fastly.com>
i40e: fix unsigned stat widths
Sylwester Dziedziuch <sylwesterx.dziedziuch(a)intel.com>
i40e: Fix queues reservation for XDP
Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
i40e: Fix issue when maximum queues is exceeded
Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
i40e: Increase delay to 1 s after global EMP reset
Christophe Leroy <christophe.leroy(a)csgroup.eu>
powerpc/32: Fix boot failure with GCC latent entropy plugin
Marek Behún <kabel(a)kernel.org>
net: sfp: ignore disabled SFP node
Sing-Han Chen <singhanc(a)nvidia.com>
ucsi_ccg: Check DEV_INT bit only when starting CCG4
Badhri Jagan Sridharan <badhri(a)google.com>
usb: typec: tcpm: Do not disconnect while receiving VBUS off
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix hang in usb_kill_urb by adding memory barriers
Pavankumar Kondeti <quic_pkondeti(a)quicinc.com>
usb: gadget: f_sourcesink: Fix isoc transfer for USB_SPEED_SUPER_PLUS
Jon Hunter <jonathanh(a)nvidia.com>
usb: common: ulpi: Fix crash in ulpi_match()
Alan Stern <stern(a)rowland.harvard.edu>
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
Cameron Williams <cang1(a)live.co.uk>
tty: Add support for Brainboxes UC cards.
daniel.starke(a)siemens.com <daniel.starke(a)siemens.com>
tty: n_gsm: fix SW flow control encoding/handling
Valentin Caron <valentin.caron(a)foss.st.com>
serial: stm32: fix software flow control transfer
Robert Hancock <robert.hancock(a)calian.com>
serial: 8250: of: Fix mapped region size when using reg-offset property
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_payload: do not update layer 4 checksum when mangling fragments
D Scott Phillips <scott(a)os.amperecomputing.com>
arm64: errata: Fix exec handling in erratum 1418040 workaround
Lucas Stach <l.stach(a)pengutronix.de>
drm/etnaviv: relax submit size limits
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: fix fsnotify hooks in pseudo filesystems
Tom Zanussi <zanussi(a)kernel.org>
tracing: Don't inc err_log entry count if entry allocation fails
Xiaoke Wang <xkernel.wang(a)foxmail.com>
tracing/histogram: Fix a potential memory leak for kstrdup()
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
PM: wakeup: simplify the output logic of pm_show_wakelocks()
Jan Kara <jack(a)suse.cz>
udf: Fix NULL ptr deref when converting from inline format
Jan Kara <jack(a)suse.cz>
udf: Restore i_lenAlloc when inode expansion fails
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
Vasily Gorbik <gor(a)linux.ibm.com>
s390/hypfs: include z/VM guests with access control group set
Brian Gix <brian.gix(a)intel.com>
Bluetooth: refactor malicious adv data check
-------------
Diffstat:
.../devicetree/bindings/net/can/tcan4x5x.txt | 2 +-
Makefile | 4 +-
arch/arm/probes/kprobes/Makefile | 3 +
arch/arm64/kernel/process.c | 39 +++----
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/lib/Makefile | 3 +
arch/s390/hypfs/hypfs_vm.c | 6 +-
block/bio.c | 3 +-
drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 4 +-
drivers/gpu/drm/msm/dsi/dsi.c | 7 +-
drivers/gpu/drm/msm/dsi/phy/dsi_phy.c | 4 +-
drivers/gpu/drm/msm/hdmi/hdmi.c | 7 +-
drivers/gpu/drm/msm/msm_drv.c | 2 +-
drivers/hwmon/lm90.c | 7 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +-
drivers/net/ethernet/ibm/ibmvnic.c | 112 +++++++++++++--------
drivers/net/ethernet/intel/i40e/i40e.h | 9 +-
drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 44 ++++----
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 59 +++++++++++
drivers/net/hamradio/yam.c | 4 +-
drivers/net/phy/broadcom.c | 1 +
drivers/net/phy/phy_device.c | 6 +-
drivers/net/phy/phylink.c | 5 +
drivers/rpmsg/rpmsg_char.c | 22 +---
drivers/s390/scsi/zfcp_fc.c | 13 ++-
drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 20 +---
drivers/tty/n_gsm.c | 4 +-
drivers/tty/serial/8250/8250_of.c | 11 +-
drivers/tty/serial/8250/8250_pci.c | 100 +++++++++++++++++-
drivers/tty/serial/stm32-usart.c | 2 +-
drivers/usb/common/ulpi.c | 7 +-
drivers/usb/core/hcd.c | 14 +++
drivers/usb/core/urb.c | 12 +++
drivers/usb/gadget/function/f_sourcesink.c | 1 +
drivers/usb/storage/unusual_devs.h | 10 ++
drivers/usb/typec/tcpm/tcpm.c | 3 +-
drivers/usb/typec/ucsi/ucsi_ccg.c | 2 +-
fs/btrfs/ioctl.c | 6 +-
fs/configfs/dir.c | 6 +-
fs/devpts/inode.c | 2 +-
fs/namei.c | 10 +-
fs/nfs/dir.c | 22 ++++
fs/nfsd/nfsctl.c | 5 +-
fs/udf/inode.c | 9 +-
include/linux/fsnotify.h | 48 +++++++--
include/linux/netdevice.h | 1 +
include/net/ip.h | 21 ++--
include/net/ip6_fib.h | 2 +-
include/net/route.h | 2 +-
kernel/power/wakelock.c | 11 +-
kernel/trace/trace.c | 3 +-
kernel/trace/trace_events_hist.c | 1 +
net/bluetooth/hci_event.c | 10 +-
net/core/net-procfs.c | 38 ++++++-
net/ipv4/ip_output.c | 11 +-
net/ipv4/ping.c | 3 +-
net/ipv4/raw.c | 5 +-
net/ipv6/ip6_fib.c | 23 +++--
net/ipv6/ip6_tunnel.c | 8 +-
net/ipv6/route.c | 2 +-
net/netfilter/nf_conntrack_core.c | 8 +-
net/netfilter/nft_payload.c | 3 +
net/packet/af_packet.c | 2 +
net/rxrpc/call_event.c | 8 +-
net/rxrpc/output.c | 2 +-
net/sunrpc/rpc_pipe.c | 4 +-
67 files changed, 589 insertions(+), 245 deletions(-)
In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[0]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
To ensure that execve(2) with argc < 1 is not a useful tool for
shellcode to use, we can validate this in do_execveat_common() and
fail for this scenario, effectively blocking successful exploitation
of CVE-2021-4034 and similar bugs which depend on execve(2) working
with argc < 1.
We use -EINVAL for this case, mirroring recent changes to FreeBSD and
OpenBSD. -EINVAL is also used by QNX for this, while Solaris uses
-EFAULT.
In earlier versions of the patch, it was proposed that we create a
fake argv for applications to use when argc < 1, but it was concluded
that it would be better to just fail the execve(2) in these cases, as
launching a process with an empty or NULL argv[0] was likely to just
cause more problems.
Interestingly, Michael Kerrisk opened an issue about this in 2008[1],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[2]
of this bug in a shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[3].
There are a few[4][5] minor edge cases (primarily in test suites) that
are caught by this, but we plan to work with the projects to fix those
edge cases.
[0]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=8408
[2]: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[3]: https://github.com/KSPP/linux/issues/176
[4]: https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+…
[5]: https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%…
Changes from v2:
- Switch to using -EINVAL as the error code for this.
- Use pr_warn_once() to warn when an execve(2) is rejected due to NULL
argv.
Changes from v1:
- Rework commit message significantly.
- Make the argv[0] check explicit rather than hijacking the error-check
for count().
Reported-by: Michael Kerrisk <mtk.manpages(a)gmail.com>
To: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Ariadne Conill <ariadne(a)dereferenced.org>
---
fs/exec.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/exec.c b/fs/exec.c
index 79f2c9483302..982730cfe3b8 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1897,6 +1897,10 @@ static int do_execveat_common(int fd, struct filename *filename,
}
retval = count(argv, MAX_ARG_STRINGS);
+ if (retval == 0) {
+ pr_warn_once("Attempted to run process '%s' with NULL argv\n", bprm->filename);
+ retval = -EINVAL;
+ }
if (retval < 0)
goto out_free;
bprm->argc = retval;
--
2.34.1
If register_memory() fails, we freed the memory block but already added
the memory block to the group list, not good. Let's defer adding the
block to the memory group to after registering the memory block device.
We do handle it properly during unregister_memory(), but that's not
called when the registration fails.
Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks")
Cc: stable(a)vger.kernel.org # v5.15+
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
drivers/base/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 365cd4a7f239..60c38f9cf1a7 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -663,14 +663,16 @@ static int init_memory_block(unsigned long block_id, unsigned long state,
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+ ret = register_memory(mem);
+ if (ret)
+ return ret;
+
if (group) {
mem->group = group;
list_add(&mem->group_next, &group->memory_blocks);
}
- ret = register_memory(mem);
-
- return ret;
+ return 0;
}
static int add_memory_block(unsigned long base_section_nr)
--
2.34.1
Hi Greg,
Sorry for the out of thread mail, gmail appears to have some issue
delivering any of the review thread messages to me at the moment.
5.16.5-rc1 build/boot looks good on x86-64/AMD Cezanne; No build or
dmesg regressions and 10 or so s0ix suspend cycles pass without any
obvious regressions as well.
Built and tested with both GCC and Clang/LLVM on an ASUS GA503QR G15
using Arch Linux.
Tested-By: Scott Bruce <smbruce(a)gmail.com>
This device provides both audio and video. The original quirk added in
commit 48827e1d6af5 ("ALSA: usb-audio: Add quirk for VF0770") used
USB_DEVICE to match the vendor and product ID. Depending on module order,
if snd-usb-audio was asked first, it would match the entire device and
uvcvideo wouldn't get to see it. Change the matching to USB_AUDIO_DEVICE
to restore uvcvideo matching in all cases.
Fixes: 48827e1d6af5 ("ALSA: usb-audio: Add quirk for VF0770")
Reported-by: Jukka Heikintalo <heikintalo.jukka(a)gmail.com>
Tested-by: Jukka Heikintalo <heikintalo.jukka(a)gmail.com>
Reported-by: Paweł Susicki <pawel.susicki(a)gmail.com>
Tested-by: Paweł Susicki <pawel.susicki(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 5.4, 5.10, 5.14, 5.15
Signed-off-by: Jonas Hahnfeld <hahnjo(a)hahnjo.de>
---
sound/usb/quirks-table.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/usb/quirks-table.h b/sound/usb/quirks-table.h
index b1522e43173e..0ea39565e623 100644
--- a/sound/usb/quirks-table.h
+++ b/sound/usb/quirks-table.h
@@ -84,7 +84,7 @@
* combination.
*/
{
- USB_DEVICE(0x041e, 0x4095),
+ USB_AUDIO_DEVICE(0x041e, 0x4095),
.driver_info = (unsigned long) &(const struct snd_usb_audio_quirk) {
.ifnum = QUIRK_ANY_INTERFACE,
.type = QUIRK_COMPOSITE,
--
2.35.1
This unbreaks support for USB devices, which do not have a board_type
to create an alt_path out of and thus were running into a NULL
dereference.
Fixes: 5ff013914c62 ("brcmfmac: firmware: Allow per-board firmware binaries")
Reviewed-by: Arend van Spriel <arend.vanspriel(a)broadcom.com>
Reviewed-by: Dmitry Osipenko <digetx(a)gmail.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
index 1001c8888bfe..63821856bbe1 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
@@ -599,6 +599,9 @@ static char *brcm_alt_fw_path(const char *path, const char *board_type)
char alt_path[BRCMF_FW_NAME_LEN];
char suffix[5];
+ if (!board_type)
+ return NULL;
+
strscpy(alt_path, path, BRCMF_FW_NAME_LEN);
/* At least one character + suffix */
if (strlen(alt_path) < 5)
--
2.33.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a06247c6804f1a7c86a2e5398a4c1f1db1471848 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Tue, 11 Jan 2022 15:23:09 -0800
Subject: [PATCH] psi: Fix uaf issue when psi trigger is destroyed while being
polled
With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger->event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: syzbot+cdb5dd11c97cc532efad(a)syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Analyzed-by: Eric Biggers <ebiggers(a)kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com
diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst
index f2b3439edcc2..860fe651d645 100644
--- a/Documentation/accounting/psi.rst
+++ b/Documentation/accounting/psi.rst
@@ -92,7 +92,8 @@ Triggers can be set on more than one psi metric and more than one trigger
for the same psi metric can be specified. However for each trigger a separate
file descriptor is required to be able to poll it separately from others,
therefore for each trigger a separate open() syscall should be made even
-when opening the same psi interface file.
+when opening the same psi interface file. Write operations to a file descriptor
+with an already existing psi trigger will fail with EBUSY.
Monitors activate only when system enters stall state for the monitored
psi metric and deactivates upon exit from the stall state. While system is
diff --git a/include/linux/psi.h b/include/linux/psi.h
index a70ca833c6d7..f8ce53bfdb2a 100644
--- a/include/linux/psi.h
+++ b/include/linux/psi.h
@@ -33,7 +33,7 @@ void cgroup_move_task(struct task_struct *p, struct css_set *to);
struct psi_trigger *psi_trigger_create(struct psi_group *group,
char *buf, size_t nbytes, enum psi_res res);
-void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *t);
+void psi_trigger_destroy(struct psi_trigger *t);
__poll_t psi_trigger_poll(void **trigger_ptr, struct file *file,
poll_table *wait);
diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index 516c0fe836fd..1a3cef26d129 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -141,9 +141,6 @@ struct psi_trigger {
* events to one per window
*/
u64 last_event_time;
-
- /* Refcounting to prevent premature destruction */
- struct kref refcount;
};
struct psi_group {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b31e1465868a..9d05c3ca2d5e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3643,6 +3643,12 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf,
cgroup_get(cgrp);
cgroup_kn_unlock(of->kn);
+ /* Allow only one trigger per file descriptor */
+ if (ctx->psi.trigger) {
+ cgroup_put(cgrp);
+ return -EBUSY;
+ }
+
psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi;
new = psi_trigger_create(psi, buf, nbytes, res);
if (IS_ERR(new)) {
@@ -3650,8 +3656,7 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf,
return PTR_ERR(new);
}
- psi_trigger_replace(&ctx->psi.trigger, new);
-
+ smp_store_release(&ctx->psi.trigger, new);
cgroup_put(cgrp);
return nbytes;
@@ -3690,7 +3695,7 @@ static void cgroup_pressure_release(struct kernfs_open_file *of)
{
struct cgroup_file_ctx *ctx = of->priv;
- psi_trigger_replace(&ctx->psi.trigger, NULL);
+ psi_trigger_destroy(ctx->psi.trigger);
}
bool cgroup_psi_enabled(void)
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index a679613a7cb7..c137c4d6983e 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1162,7 +1162,6 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
t->event = 0;
t->last_event_time = 0;
init_waitqueue_head(&t->event_wait);
- kref_init(&t->refcount);
mutex_lock(&group->trigger_lock);
@@ -1191,15 +1190,19 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
return t;
}
-static void psi_trigger_destroy(struct kref *ref)
+void psi_trigger_destroy(struct psi_trigger *t)
{
- struct psi_trigger *t = container_of(ref, struct psi_trigger, refcount);
- struct psi_group *group = t->group;
+ struct psi_group *group;
struct task_struct *task_to_destroy = NULL;
- if (static_branch_likely(&psi_disabled))
+ /*
+ * We do not check psi_disabled since it might have been disabled after
+ * the trigger got created.
+ */
+ if (!t)
return;
+ group = t->group;
/*
* Wakeup waiters to stop polling. Can happen if cgroup is deleted
* from under a polling process.
@@ -1235,9 +1238,9 @@ static void psi_trigger_destroy(struct kref *ref)
mutex_unlock(&group->trigger_lock);
/*
- * Wait for both *trigger_ptr from psi_trigger_replace and
- * poll_task RCUs to complete their read-side critical sections
- * before destroying the trigger and optionally the poll_task
+ * Wait for psi_schedule_poll_work RCU to complete its read-side
+ * critical section before destroying the trigger and optionally the
+ * poll_task.
*/
synchronize_rcu();
/*
@@ -1254,18 +1257,6 @@ static void psi_trigger_destroy(struct kref *ref)
kfree(t);
}
-void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new)
-{
- struct psi_trigger *old = *trigger_ptr;
-
- if (static_branch_likely(&psi_disabled))
- return;
-
- rcu_assign_pointer(*trigger_ptr, new);
- if (old)
- kref_put(&old->refcount, psi_trigger_destroy);
-}
-
__poll_t psi_trigger_poll(void **trigger_ptr,
struct file *file, poll_table *wait)
{
@@ -1275,24 +1266,15 @@ __poll_t psi_trigger_poll(void **trigger_ptr,
if (static_branch_likely(&psi_disabled))
return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
- rcu_read_lock();
-
- t = rcu_dereference(*(void __rcu __force **)trigger_ptr);
- if (!t) {
- rcu_read_unlock();
+ t = smp_load_acquire(trigger_ptr);
+ if (!t)
return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
- }
- kref_get(&t->refcount);
-
- rcu_read_unlock();
poll_wait(file, &t->event_wait, wait);
if (cmpxchg(&t->event, 1, 0) == 1)
ret |= EPOLLPRI;
- kref_put(&t->refcount, psi_trigger_destroy);
-
return ret;
}
@@ -1316,14 +1298,24 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf,
buf[buf_size - 1] = '\0';
- new = psi_trigger_create(&psi_system, buf, nbytes, res);
- if (IS_ERR(new))
- return PTR_ERR(new);
-
seq = file->private_data;
+
/* Take seq->lock to protect seq->private from concurrent writes */
mutex_lock(&seq->lock);
- psi_trigger_replace(&seq->private, new);
+
+ /* Allow only one trigger per file descriptor */
+ if (seq->private) {
+ mutex_unlock(&seq->lock);
+ return -EBUSY;
+ }
+
+ new = psi_trigger_create(&psi_system, buf, nbytes, res);
+ if (IS_ERR(new)) {
+ mutex_unlock(&seq->lock);
+ return PTR_ERR(new);
+ }
+
+ smp_store_release(&seq->private, new);
mutex_unlock(&seq->lock);
return nbytes;
@@ -1358,7 +1350,7 @@ static int psi_fop_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
- psi_trigger_replace(&seq->private, NULL);
+ psi_trigger_destroy(seq->private);
return single_release(inode, file);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a06247c6804f1a7c86a2e5398a4c1f1db1471848 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb(a)google.com>
Date: Tue, 11 Jan 2022 15:23:09 -0800
Subject: [PATCH] psi: Fix uaf issue when psi trigger is destroyed while being
polled
With write operation on psi files replacing old trigger with a new one,
the lifetime of its waitqueue is totally arbitrary. Overwriting an
existing trigger causes its waitqueue to be freed and pending poll()
will stumble on trigger->event_wait which was destroyed.
Fix this by disallowing to redefine an existing psi trigger. If a write
operation is used on a file descriptor with an already existing psi
trigger, the operation will fail with EBUSY error.
Also bypass a check for psi_disabled in the psi_trigger_destroy as the
flag can be flipped after the trigger is created, leading to a memory
leak.
Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Reported-by: syzbot+cdb5dd11c97cc532efad(a)syzkaller.appspotmail.com
Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Analyzed-by: Eric Biggers <ebiggers(a)kernel.org>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20220111232309.1786347-1-surenb@google.com
diff --git a/Documentation/accounting/psi.rst b/Documentation/accounting/psi.rst
index f2b3439edcc2..860fe651d645 100644
--- a/Documentation/accounting/psi.rst
+++ b/Documentation/accounting/psi.rst
@@ -92,7 +92,8 @@ Triggers can be set on more than one psi metric and more than one trigger
for the same psi metric can be specified. However for each trigger a separate
file descriptor is required to be able to poll it separately from others,
therefore for each trigger a separate open() syscall should be made even
-when opening the same psi interface file.
+when opening the same psi interface file. Write operations to a file descriptor
+with an already existing psi trigger will fail with EBUSY.
Monitors activate only when system enters stall state for the monitored
psi metric and deactivates upon exit from the stall state. While system is
diff --git a/include/linux/psi.h b/include/linux/psi.h
index a70ca833c6d7..f8ce53bfdb2a 100644
--- a/include/linux/psi.h
+++ b/include/linux/psi.h
@@ -33,7 +33,7 @@ void cgroup_move_task(struct task_struct *p, struct css_set *to);
struct psi_trigger *psi_trigger_create(struct psi_group *group,
char *buf, size_t nbytes, enum psi_res res);
-void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *t);
+void psi_trigger_destroy(struct psi_trigger *t);
__poll_t psi_trigger_poll(void **trigger_ptr, struct file *file,
poll_table *wait);
diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index 516c0fe836fd..1a3cef26d129 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -141,9 +141,6 @@ struct psi_trigger {
* events to one per window
*/
u64 last_event_time;
-
- /* Refcounting to prevent premature destruction */
- struct kref refcount;
};
struct psi_group {
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b31e1465868a..9d05c3ca2d5e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -3643,6 +3643,12 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf,
cgroup_get(cgrp);
cgroup_kn_unlock(of->kn);
+ /* Allow only one trigger per file descriptor */
+ if (ctx->psi.trigger) {
+ cgroup_put(cgrp);
+ return -EBUSY;
+ }
+
psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi;
new = psi_trigger_create(psi, buf, nbytes, res);
if (IS_ERR(new)) {
@@ -3650,8 +3656,7 @@ static ssize_t cgroup_pressure_write(struct kernfs_open_file *of, char *buf,
return PTR_ERR(new);
}
- psi_trigger_replace(&ctx->psi.trigger, new);
-
+ smp_store_release(&ctx->psi.trigger, new);
cgroup_put(cgrp);
return nbytes;
@@ -3690,7 +3695,7 @@ static void cgroup_pressure_release(struct kernfs_open_file *of)
{
struct cgroup_file_ctx *ctx = of->priv;
- psi_trigger_replace(&ctx->psi.trigger, NULL);
+ psi_trigger_destroy(ctx->psi.trigger);
}
bool cgroup_psi_enabled(void)
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index a679613a7cb7..c137c4d6983e 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1162,7 +1162,6 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
t->event = 0;
t->last_event_time = 0;
init_waitqueue_head(&t->event_wait);
- kref_init(&t->refcount);
mutex_lock(&group->trigger_lock);
@@ -1191,15 +1190,19 @@ struct psi_trigger *psi_trigger_create(struct psi_group *group,
return t;
}
-static void psi_trigger_destroy(struct kref *ref)
+void psi_trigger_destroy(struct psi_trigger *t)
{
- struct psi_trigger *t = container_of(ref, struct psi_trigger, refcount);
- struct psi_group *group = t->group;
+ struct psi_group *group;
struct task_struct *task_to_destroy = NULL;
- if (static_branch_likely(&psi_disabled))
+ /*
+ * We do not check psi_disabled since it might have been disabled after
+ * the trigger got created.
+ */
+ if (!t)
return;
+ group = t->group;
/*
* Wakeup waiters to stop polling. Can happen if cgroup is deleted
* from under a polling process.
@@ -1235,9 +1238,9 @@ static void psi_trigger_destroy(struct kref *ref)
mutex_unlock(&group->trigger_lock);
/*
- * Wait for both *trigger_ptr from psi_trigger_replace and
- * poll_task RCUs to complete their read-side critical sections
- * before destroying the trigger and optionally the poll_task
+ * Wait for psi_schedule_poll_work RCU to complete its read-side
+ * critical section before destroying the trigger and optionally the
+ * poll_task.
*/
synchronize_rcu();
/*
@@ -1254,18 +1257,6 @@ static void psi_trigger_destroy(struct kref *ref)
kfree(t);
}
-void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new)
-{
- struct psi_trigger *old = *trigger_ptr;
-
- if (static_branch_likely(&psi_disabled))
- return;
-
- rcu_assign_pointer(*trigger_ptr, new);
- if (old)
- kref_put(&old->refcount, psi_trigger_destroy);
-}
-
__poll_t psi_trigger_poll(void **trigger_ptr,
struct file *file, poll_table *wait)
{
@@ -1275,24 +1266,15 @@ __poll_t psi_trigger_poll(void **trigger_ptr,
if (static_branch_likely(&psi_disabled))
return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
- rcu_read_lock();
-
- t = rcu_dereference(*(void __rcu __force **)trigger_ptr);
- if (!t) {
- rcu_read_unlock();
+ t = smp_load_acquire(trigger_ptr);
+ if (!t)
return DEFAULT_POLLMASK | EPOLLERR | EPOLLPRI;
- }
- kref_get(&t->refcount);
-
- rcu_read_unlock();
poll_wait(file, &t->event_wait, wait);
if (cmpxchg(&t->event, 1, 0) == 1)
ret |= EPOLLPRI;
- kref_put(&t->refcount, psi_trigger_destroy);
-
return ret;
}
@@ -1316,14 +1298,24 @@ static ssize_t psi_write(struct file *file, const char __user *user_buf,
buf[buf_size - 1] = '\0';
- new = psi_trigger_create(&psi_system, buf, nbytes, res);
- if (IS_ERR(new))
- return PTR_ERR(new);
-
seq = file->private_data;
+
/* Take seq->lock to protect seq->private from concurrent writes */
mutex_lock(&seq->lock);
- psi_trigger_replace(&seq->private, new);
+
+ /* Allow only one trigger per file descriptor */
+ if (seq->private) {
+ mutex_unlock(&seq->lock);
+ return -EBUSY;
+ }
+
+ new = psi_trigger_create(&psi_system, buf, nbytes, res);
+ if (IS_ERR(new)) {
+ mutex_unlock(&seq->lock);
+ return PTR_ERR(new);
+ }
+
+ smp_store_release(&seq->private, new);
mutex_unlock(&seq->lock);
return nbytes;
@@ -1358,7 +1350,7 @@ static int psi_fop_release(struct inode *inode, struct file *file)
{
struct seq_file *seq = file->private_data;
- psi_trigger_replace(&seq->private, NULL);
+ psi_trigger_destroy(seq->private);
return single_release(inode, file);
}
Hello Dear Partner
It is my pleasure to reach you after our unsuccessful attempt on our
business transaction, Well i just want to use this medium to thank you
very much for your earlier assistance to help me in receiving the
funds without any positive outcome.
Contact Reverend Pastor John /e-mail: ( pastor.johnvincent11(a)gmail.com )
Call and What's App Phone Number: +229-53812455
I have kept the ATM VISA CARD with Him at amount worth of US$ 1.5
million for your compensation and you are expecting to send him your
full address via mail to deliver your card to you. you are required to
send him the following infromation as listed ,so that he will direct
you how you Atm Visa Card can been deliver to your destination.
1. Your full name....................
2. Your Occupation................
3. Your address......................
4. Your phone number...........
5. Age:..................
6. Nationality:.........
7. whatsapp Number.......
Kind Regards
Yours Sincerely,
Mr Johnson Robert
The patch titled
Subject: drivers/base/memory: add memory block to memory group after registration succeeded
has been added to the -mm tree. Its filename is
drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/drivers-base-memory-add-memory-bl…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/drivers-base-memory-add-memory-bl…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: drivers/base/memory: add memory block to memory group after registration succeeded
If register_memory() fails, we freed the memory block but already added
the memory block to the group list, not good. Let's defer adding the
block to the memory group to after registering the memory block device.
We do handle it properly during unregister_memory(), but that's not
called when the registration fails.
Link: https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com
Fixes: 028fc57a1c36 ("drivers/base/memory: introduce "memory groups" to logically group memory blocks")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/base/memory.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded
+++ a/drivers/base/memory.c
@@ -663,14 +663,16 @@ static int init_memory_block(unsigned lo
mem->nr_vmemmap_pages = nr_vmemmap_pages;
INIT_LIST_HEAD(&mem->group_next);
+ ret = register_memory(mem);
+ if (ret)
+ return ret;
+
if (group) {
mem->group = group;
list_add(&mem->group_next, &group->memory_blocks);
}
- ret = register_memory(mem);
-
- return ret;
+ return 0;
}
static int add_memory_block(unsigned long base_section_nr)
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-optimize-do_wp_page-for-exclusive-pages-in-the-swapcache.patch
mm-optimize-do_wp_page-for-fresh-pages-in-local-lru-pagevecs.patch
mm-slightly-clarify-ksm-logic-in-do_swap_page.patch
mm-streamline-cow-logic-in-do_swap_page.patch
mm-huge_memory-streamline-cow-logic-in-do_huge_pmd_wp_page.patch
mm-khugepaged-remove-reuse_swap_page-usage.patch
mm-swapfile-remove-stale-reuse_swap_page.patch
mm-huge_memory-remove-stale-page_trans_huge_mapcount.patch
mm-huge_memory-remove-stale-locking-logic-from-__split_huge_pmd.patch
drivers-base-memory-add-memory-block-to-memory-group-after-registration-succeeded.patch
proc-vmcore-fix-possible-deadlock-on-concurrent-mmap-and-read.patch
The patch titled
Subject: mm/kmemleak: avoid scanning potential huge holes
has been added to the -mm tree. Its filename is
mm-kmemleak-avoid-scanning-potential-huge-holes.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-kmemleak-avoid-scanning-potent…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-kmemleak-avoid-scanning-potent…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Lang Yu <lang.yu(a)amd.com>
Subject: mm/kmemleak: avoid scanning potential huge holes
When using devm_request_free_mem_region() and devm_memremap_pages() to add
ZONE_DEVICE memory, if requested free mem region's end pfn were huge(e.g.,
0x400000000), the node_end_pfn() will be also huge (see
move_pfn_range_to_zone()). Thus it creates a huge hole between
node_start_pfn() and node_end_pfn().
We found on some AMD APUs, amdkfd requested such a free mem region and
created a huge hole. In such a case, following code snippet was just
doing busy test_bit() looping on the huge hole.
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
struct page *page = pfn_to_online_page(pfn);
if (!page)
continue;
...
}
So we got a soft lockup:
watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [bash:1221]
CPU: 6 PID: 1221 Comm: bash Not tainted 5.15.0-custom #1
RIP: 0010:pfn_to_online_page+0x5/0xd0
Call Trace:
? kmemleak_scan+0x16a/0x440
kmemleak_write+0x306/0x3a0
? common_file_perm+0x72/0x170
full_proxy_write+0x5c/0x90
vfs_write+0xb9/0x260
ksys_write+0x67/0xe0
__x64_sys_write+0x1a/0x20
do_syscall_64+0x3b/0xc0
entry_SYSCALL_64_after_hwframe+0x44/0xae
I did some tests with the patch.
(1) amdgpu module unloaded
before the patch:
real 0m0.976s
user 0m0.000s
sys 0m0.968s
after the patch:
real 0m0.981s
user 0m0.000s
sys 0m0.973s
(2) amdgpu module loaded
before the patch:
real 0m35.365s
user 0m0.000s
sys 0m35.354s
after the patch:
real 0m1.049s
user 0m0.000s
sys 0m1.042s
Link: https://lkml.kernel.org/r/20211108140029.721144-1-lang.yu@amd.com
Signed-off-by: Lang Yu <lang.yu(a)amd.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmemleak.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/mm/kmemleak.c~mm-kmemleak-avoid-scanning-potential-huge-holes
+++ a/mm/kmemleak.c
@@ -1410,7 +1410,8 @@ static void kmemleak_scan(void)
{
unsigned long flags;
struct kmemleak_object *object;
- int i;
+ struct zone *zone;
+ int __maybe_unused i;
int new_leaks = 0;
jiffies_last_scan = jiffies;
@@ -1450,9 +1451,9 @@ static void kmemleak_scan(void)
* Struct page scanning for each node.
*/
get_online_mems();
- for_each_online_node(i) {
- unsigned long start_pfn = node_start_pfn(i);
- unsigned long end_pfn = node_end_pfn(i);
+ for_each_populated_zone(zone) {
+ unsigned long start_pfn = zone->zone_start_pfn;
+ unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -1461,8 +1462,8 @@ static void kmemleak_scan(void)
if (!page)
continue;
- /* only scan pages belonging to this node */
- if (page_to_nid(page) != i)
+ /* only scan pages belonging to this zone */
+ if (page_zone(page) != zone)
continue;
/* only scan if page is in use */
if (page_count(page) == 0)
_
Patches currently in -mm which might be from lang.yu(a)amd.com are
mm-kmemleak-avoid-scanning-potential-huge-holes.patch
The patch titled
Subject: exec: force single empty string when argv is empty
has been added to the -mm tree. Its filename is
exec-force-single-empty-string-when-argv-is-empty.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/exec-force-single-empty-string-wh…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/exec-force-single-empty-string-wh…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Kees Cook <keescook(a)chromium.org>
Subject: exec: force single empty string when argv is empty
Quoting[1] Ariadne Conill:
"In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting
a scenario where argc < 1. POSIX 2017 also recommends this behaviour,
but it is not an explicit requirement[2]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
...
Interestingly, Michael Kerrisk opened an issue about this in 2008[3],
but there was no consensus to support fixing this issue then.
Hopefully now that CVE-2021-4034 shows practical exploitative use[4]
of this bug in a shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[5]."
While the initial code searches[6][7] turned up what appeared to be
mostly corner case tests, trying to that just reject argv == NULL
(or an immediately terminated pointer list) quickly started tripping[8]
existing userspace programs.
The next best approach is forcing a single empty string into argv and
adjusting argc to match. The number of programs depending on argc == 0
seems a smaller set than those calling execve with a NULL argv.
Account for the additional stack space in bprm_stack_limits(). Inject an
empty string when argc == 0 (and set argc = 1). Warn about the case so
userspace has some notice about the change:
process './argc0' launched './argc0' with NULL argv: empty string added
Additionally WARN() and reject NULL argv usage for kernel threads.
[1] https://lore.kernel.org/lkml/20220127000724.15106-1-ariadne@dereferenced.or…
[2] https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[3] https://bugzilla.kernel.org/show_bug.cgi?id=8408
[4] https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[5] https://github.com/KSPP/linux/issues/176
[6] https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+…
[7] https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%…
[8] https://lore.kernel.org/lkml/20220131144352.GE16385@xsang-OptiPlex-9020/
Link: https://lkml.kernel.org/r/20220201000947.2453721-1-keescook@chromium.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>Reported-by: Ariadne Conill <ariadne(a)dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
--- a/fs/exec.c~exec-force-single-empty-string-when-argv-is-empty
+++ a/fs/exec.c
@@ -495,8 +495,14 @@ static int bprm_stack_limits(struct linu
* the stack. They aren't stored until much later when we can't
* signal to the parent that the child has run out of stack space.
* Instead, calculate it here so it's possible to fail gracefully.
+ *
+ * In the case of argc = 0, make sure there is space for adding a
+ * empty string (which will bump argc to 1), to ensure confused
+ * userspace programs don't start processing from argv[1], thinking
+ * argc can never be 0, to keep them from walking envp by accident.
+ * See do_execveat_common().
*/
- ptr_size = (bprm->argc + bprm->envc) * sizeof(void *);
+ ptr_size = (min(bprm->argc, 1) + bprm->envc) * sizeof(void *);
if (limit <= ptr_size)
return -E2BIG;
limit -= ptr_size;
@@ -1897,6 +1903,9 @@ static int do_execveat_common(int fd, st
}
retval = count(argv, MAX_ARG_STRINGS);
+ if (retval == 0)
+ pr_warn_once("process '%s' launched '%s' with NULL argv: empty string added\n",
+ current->comm, bprm->filename);
if (retval < 0)
goto out_free;
bprm->argc = retval;
@@ -1923,6 +1932,19 @@ static int do_execveat_common(int fd, st
if (retval < 0)
goto out_free;
+ /*
+ * When argv is empty, add an empty string ("") as argv[0] to
+ * ensure confused userspace programs that start processing
+ * from argv[1] won't end up walking envp. See also
+ * bprm_stack_limits().
+ */
+ if (bprm->argc == 0) {
+ retval = copy_string_kernel("", bprm);
+ if (retval < 0)
+ goto out_free;
+ bprm->argc = 1;
+ }
+
retval = bprm_execve(bprm, fd, filename, flags);
out_free:
free_bprm(bprm);
@@ -1951,6 +1973,8 @@ int kernel_execve(const char *kernel_fil
}
retval = count_strings_kernel(argv);
+ if (WARN_ON_ONCE(retval == 0))
+ retval = -EINVAL;
if (retval < 0)
goto out_free;
bprm->argc = retval;
_
Patches currently in -mm which might be from keescook(a)chromium.org are
kconfigdebug-make-debug_info-selectable-from-a-choice.patch
kconfigdebug-make-debug_info-selectable-from-a-choice-fix.patch
exec-force-single-empty-string-when-argv-is-empty.patch
The patch titled
Subject: fs/exec: require argv[0] presence in do_execveat_common()
has been removed from the -mm tree. Its filename was
fs-exec-require-argv-presence-in-do_execveat_common.patch
This patch was dropped because an alternative patch was merged
------------------------------------------------------
From: Ariadne Conill <ariadne(a)dereferenced.org>
Subject: fs/exec: require argv[0] presence in do_execveat_common()
In several other operating systems, it is a hard requirement that the
second argument to execve(2) be the name of a program, thus prohibiting a
scenario where argc < 1. POSIX 2017 also recommends this behaviour, but
it is not an explicit requirement[0]:
The argument arg0 should point to a filename string that is
associated with the process being started by one of the exec
functions.
To ensure that execve(2) with argc < 1 is not a useful tool for shellcode
to use, we can validate this in do_execveat_common() and fail for this
scenario, effectively blocking successful exploitation of CVE-2021-4034
and similar bugs which depend on execve(2) working with argc < 1.
We use -EINVAL for this case, mirroring recent changes to FreeBSD and
OpenBSD. -EINVAL is also used by QNX for this, while Solaris uses
-EFAULT.
In earlier versions of the patch, it was proposed that we create a fake
argv for applications to use when argc < 1, but it was concluded that it
would be better to just fail the execve(2) in these cases, as launching a
process with an empty or NULL argv[0] was likely to just cause more
problems.
Interestingly, Michael Kerrisk opened an issue about this in 2008[1], but
there was no consensus to support fixing this issue then. Hopefully now
that CVE-2021-4034 shows practical exploitative use[2] of this bug in a
shellcode, we can reconsider.
This issue is being tracked in the KSPP issue tracker[3].
There are a few[4][5] minor edge cases (primarily in test suites) that are
caught by this, but we plan to work with the projects to fix those edge
cases.
[0]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=8408
[2]: https://www.qualys.com/2022/01/25/cve-2021-4034/pwnkit.txt
[3]: https://github.com/KSPP/linux/issues/176
[4]: https://codesearch.debian.net/search?q=execve%5C+*%5C%28%5B%5E%2C%5D%2B%2C+…
[5]: https://codesearch.debian.net/search?q=execlp%3F%5Cs*%5C%28%5B%5E%2C%5D%2B%…
Link: https://lkml.kernel.org/r/20220127000724.15106-1-ariadne@dereferenced.org
Signed-off-by: Ariadne Conill <ariadne(a)dereferenced.org>
Reported-by: Michael Kerrisk <mtk.manpages(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/fs/exec.c~fs-exec-require-argv-presence-in-do_execveat_common
+++ a/fs/exec.c
@@ -1897,6 +1897,10 @@ static int do_execveat_common(int fd, st
}
retval = count(argv, MAX_ARG_STRINGS);
+ if (retval == 0) {
+ pr_warn_once("Attempted to run process '%s' with NULL argv\n", bprm->filename);
+ retval = -EINVAL;
+ }
if (retval < 0)
goto out_free;
bprm->argc = retval;
_
Patches currently in -mm which might be from ariadne(a)dereferenced.org are
The patch titled
Subject: mm: fix race between MADV_FREE reclaim and blkdev direct IO read
has been added to the -mm tree. Its filename is
mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-fix-race-between-madv_free-rec…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-fix-race-between-madv_free-rec…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mauricio Faria de Oliveira <mfo(a)canonical.com>
Subject: mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Problem:
=======
Userspace might read the zero-page instead of actual data from a
direct IO read on a block device if the buffers have been called
madvise(MADV_FREE) on earlier (this is discussed below) due to a
race between page reclaim on MADV_FREE and blkdev direct IO read.
- Race condition:
==============
During page reclaim, the MADV_FREE page check in try_to_unmap_one()
checks if the page is not dirty, then discards its rmap PTE(s) (vs.
remap back if the page is dirty).
However, after try_to_unmap_one() returns to shrink_page_list(), it
might keep the page _anyway_ if page_ref_freeze() fails (it expects
exactly _one_ page reference, from the isolation for page reclaim).
Well, blkdev_direct_IO() gets references for all pages, and on READ
operations it only sets them dirty _later_.
So, if MADV_FREE'd pages (i.e., not dirty) are used as buffers for
direct IO read from block devices, and page reclaim happens during
__blkdev_direct_IO[_simple]() exactly AFTER bio_iov_iter_get_pages()
returns, but BEFORE the pages are set dirty, the situation happens.
The direct IO read eventually completes. Now, when userspace reads
the buffers, the PTE is no longer there and the page fault handler
do_anonymous_page() services that with the zero-page, NOT the data!
A synthetic reproducer is provided.
- Page faults:
===========
If page reclaim happens BEFORE bio_iov_iter_get_pages() the issue
doesn't happen, because that faults-in all pages as writeable, so
do_anonymous_page() sets up a new page/rmap/PTE, and that is used
by direct IO. The userspace reads don't fault as the PTE is there
(thus zero-page is not used/setup).
But if page reclaim happens AFTER it / BEFORE setting pages dirty,
the PTE is no longer there; the subsequent page faults can't help:
The data-read from the block device probably won't generate faults
due to DMA (no MMU) but even in the case it wouldn't use DMA, that
happens on different virtual addresses (not user-mapped addresses)
because `struct bio_vec` stores `struct page` to figure addresses
out (which are different from user-mapped addresses) for the read.
Thus userspace reads (to user-mapped addresses) still fault, then
do_anonymous_page() gets another `struct page` that would address/
map to other memory than the `struct page` used by `struct bio_vec`
for the read. (The original `struct page` is not available, since
it wasn't freed, as page_ref_freeze() failed due to more page refs.
And even if it were available, its data cannot be trusted anymore.)
Solution:
========
One solution is to check for the expected page reference count
in try_to_unmap_one().
There should be one reference from the isolation (that is also
checked in shrink_page_list() with page_ref_freeze()) plus one
or more references from page mapping(s) (put in discard: label).
Further references mean that rmap/PTE cannot be unmapped/nuked.
(Note: there might be more than one reference from mapping due
to fork()/clone() without CLONE_VM, which use the same `struct
page` for references, until the copy-on-write page gets copied.)
So, additional page references (e.g., from direct IO read) now
prevent the rmap/PTE from being unmapped/dropped; similarly to
the page is not freed per shrink_page_list()/page_ref_freeze()).
- Races and Barriers:
==================
The new check in try_to_unmap_one() should be safe in races with
bio_iov_iter_get_pages() in get_user_pages() fast and slow paths,
as it's done under the PTE lock.
The fast path doesn't take the lock, but it checks if the PTE has
changed and if so, it drops the reference and leaves the page for
the slow path (which does take that lock).
The fast path requires synchronization w/ full memory barrier: it
writes the page reference count first then it reads the PTE later,
while try_to_unmap() writes PTE first then it reads page refcount.
And a second barrier is needed, as the page dirty flag should not
be read before the page reference count (as in __remove_mapping()).
(This can be a load memory barrier only; no writes are involved.)
Call stack/comments:
- try_to_unmap_one()
- page_vma_mapped_walk()
- map_pte() # see pte_offset_map_lock():
pte_offset_map()
spin_lock()
- ptep_get_and_clear() # write PTE
- smp_mb() # (new barrier) GUP fast path
- page_ref_count() # (new check) read refcount
- page_vma_mapped_walk_done() # see pte_unmap_unlock():
pte_unmap()
spin_unlock()
- bio_iov_iter_get_pages()
- __bio_iov_iter_get_pages()
- iov_iter_get_pages()
- get_user_pages_fast()
- internal_get_user_pages_fast()
# fast path
- lockless_pages_from_mm()
- gup_{pgd,p4d,pud,pmd,pte}_range()
ptep = pte_offset_map() # not _lock()
pte = ptep_get_lockless(ptep)
page = pte_page(pte)
try_grab_compound_head(page) # inc refcount
# (RMW/barrier
# on success)
if (pte_val(pte) != pte_val(*ptep)) # read PTE
put_compound_head(page) # dec refcount
# go slow path
# slow path
- __gup_longterm_unlocked()
- get_user_pages_unlocked()
- __get_user_pages_locked()
- __get_user_pages()
- follow_{page,p4d,pud,pmd}_mask()
- follow_page_pte()
ptep = pte_offset_map_lock()
pte = *ptep
page = vm_normal_page(pte)
try_grab_page(page) # inc refcount
pte_unmap_unlock()
- Huge Pages:
==========
Regarding transparent hugepages, that logic shouldn't change, as
MADV_FREE (aka lazyfree) pages are PageAnon() && !PageSwapBacked()
(madvise_free_pte_range() -> mark_page_lazyfree() -> lru_lazyfree_fn())
thus should reach shrink_page_list() -> split_huge_page_to_list()
before try_to_unmap[_one](), so it deals with normal pages only.
(And in case unlikely/TTU_SPLIT_HUGE_PMD/split_huge_pmd_address()
happens, which should not or be rare, the page refcount should be
greater than mapcount: the head page is referenced by tail pages.
That also prevents checking the head `page` then incorrectly call
page_remove_rmap(subpage) for a tail page, that isn't even in the
shrink_page_list()'s page_list (an effect of split huge pmd/pmvw),
as it might happen today in this unlikely scenario.)
MADV_FREE'd buffers:
===================
So, back to the "if MADV_FREE pages are used as buffers" note.
The case is arguable, and subject to multiple interpretations.
The madvise(2) manual page on the MADV_FREE advice value says:
1) 'After a successful MADV_FREE ... data will be lost when
the kernel frees the pages.'
2) 'the free operation will be canceled if the caller writes
into the page' / 'subsequent writes ... will succeed and
then [the] kernel cannot free those dirtied pages'
3) 'If there is no subsequent write, the kernel can free the
pages at any time.'
Thoughts, questions, considerations... respectively:
1) Since the kernel didn't actually free the page (page_ref_freeze()
failed), should the data not have been lost? (on userspace read.)
2) Should writes performed by the direct IO read be able to cancel
the free operation?
- Should the direct IO read be considered as 'the caller' too,
as it's been requested by 'the caller'?
- Should the bio technique to dirty pages on return to userspace
(bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
be considered in another/special way here?
3) Should an upcoming write from a previously requested direct IO
read be considered as a subsequent write, so the kernel should
not free the pages? (as it's known at the time of page reclaim.)
At last:
Technically, the last point would seem a reasonable consideration
and balance, as the madvise(2) manual page apparently (and fairly)
seem to assume that 'writes' are memory access from the userspace
process (not explicitly considering writes from the kernel or its
corner cases; again, fairly).. plus the kernel fix implementation
for the corner case of the largely 'non-atomic write' encompassed
by a direct IO read operation, is relatively simple; and it helps.
Reproducer:
==========
@ test.c (simplified, but works)
#define _GNU_SOURCE
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
int main() {
int fd, i;
char *buf;
fd = open(DEV, O_RDONLY | O_DIRECT);
buf = mmap(NULL, BUF_SIZE, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
buf[i] = 1; // init to non-zero
madvise(buf, BUF_SIZE, MADV_FREE);
read(fd, buf, BUF_SIZE);
for (i = 0; i < BUF_SIZE; i += PAGE_SIZE)
printf("%p: 0x%x
", &buf[i], buf[i]);
return 0;
}
@ block/fops.c (formerly fs/block_dev.c)
+#include <linux/swap.h>
...
... __blkdev_direct_IO[_simple](...)
{
...
+ if (!strcmp(current->comm, "good"))
+ shrink_all_memory(ULONG_MAX);
+
ret = bio_iov_iter_get_pages(...);
+
+ if (!strcmp(current->comm, "bad"))
+ shrink_all_memory(ULONG_MAX);
...
}
@ shell
# NUM_PAGES=4
# PAGE_SIZE=$(getconf PAGE_SIZE)
# yes | dd of=test.img bs=${PAGE_SIZE} count=${NUM_PAGES}
# DEV=$(losetup -f --show test.img)
# gcc -DDEV=\"$DEV\" \
-DBUF_SIZE=$((PAGE_SIZE * NUM_PAGES)) \
-DPAGE_SIZE=${PAGE_SIZE} \
test.c -o test
# od -tx1 $DEV
0000000 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a 79 0a
*
0040000
# mv test good
# ./good
0x7f7c10418000: 0x79
0x7f7c10419000: 0x79
0x7f7c1041a000: 0x79
0x7f7c1041b000: 0x79
# mv good bad
# ./bad
0x7fa1b8050000: 0x0
0x7fa1b8051000: 0x0
0x7fa1b8052000: 0x0
0x7fa1b8053000: 0x0
Ceph/TCMalloc:
=============
For documentation purposes, the use case driving the analysis/fix
is Ceph on Ubuntu 18.04, as the TCMalloc library there still uses
MADV_FREE to release unused memory to the system from the mmap'ed
page heap (might be committed back/used again; it's not munmap'ed.)
- PageHeap::DecommitSpan() -> TCMalloc_SystemRelease() -> madvise()
- PageHeap::CommitSpan() -> TCMalloc_SystemCommit() -> do nothing.
Note: TCMalloc switched back to MADV_DONTNEED a few commits after
the release in Ubuntu 18.04 (google-perftools/gperftools 2.5), so
the issue just 'disappeared' on Ceph on later Ubuntu releases but
is still present in the kernel, and can be hit by other use cases.
The observed issue seems to be the old Ceph bug #22464 [1], where
checksum mismatches are observed (and instrumentation with buffer
dumps shows zero-pages read from mmap'ed/MADV_FREE'd page ranges).
The issue in Ceph was reasonably deemed a kernel bug (comment #50)
and mostly worked around with a retry mechanism, but other parts
of Ceph could still hit that (rocksdb). Anyway, it's less likely
to be hit again as TCMalloc switched out of MADV_FREE by default.
(Some kernel versions/reports from the Ceph bug, and relation with
the MADV_FREE introduction/changes; TCMalloc versions not checked.)
- 4.4 good
- 4.5 (madv_free: introduction)
- 4.9 bad
- 4.10 good? maybe a swapless system
- 4.12 (madv_free: no longer free instantly on swapless systems)
- 4.13 bad
[1] https://tracker.ceph.com/issues/22464
Thanks:
======
Several people contributed to analysis/discussions/tests/reproducers
in the first stages when drilling down on ceph/tcmalloc/linux kernel:
- Dan Hill
- Dan Streetman
- Dongdong Tao
- Gavin Guo
- Gerald Yang
- Heitor Alves de Siqueira
- Ioanna Alifieraki
- Jay Vosburgh
- Matthew Ruffell
- Ponnuvel Palaniyappan
Link: https://lkml.kernel.org/r/20220131230255.789059-1-mfo@canonical.com
Fixes: 802a3a92ad7a ("mm: reclaim MADV_FREE pages")
Signed-off-by: Mauricio Faria de Oliveira <mfo(a)canonical.com>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Dan Hill <daniel.hill(a)canonical.com>
Cc: Dan Streetman <dan.streetman(a)canonical.com>
Cc: Dongdong Tao <dongdong.tao(a)canonical.com>
Cc: Gavin Guo <gavin.guo(a)canonical.com>
Cc: Gerald Yang <gerald.yang(a)canonical.com>
Cc: Heitor Alves de Siqueira <halves(a)canonical.com>
Cc: Ioanna Alifieraki <ioanna-maria.alifieraki(a)canonical.com>
Cc: Jay Vosburgh <jay.vosburgh(a)canonical.com>
Cc: Matthew Ruffell <matthew.ruffell(a)canonical.com>
Cc: Ponnuvel Palaniyappan <ponnuvel.palaniyappan(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/rmap.c | 25 ++++++++++++++++++++++++-
mm/vmscan.c | 2 +-
2 files changed, 25 insertions(+), 2 deletions(-)
--- a/mm/rmap.c~mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read
+++ a/mm/rmap.c
@@ -1599,7 +1599,30 @@ static bool try_to_unmap_one(struct page
/* MADV_FREE page check */
if (!PageSwapBacked(page)) {
- if (!PageDirty(page)) {
+ int ref_count, map_count;
+
+ /*
+ * Synchronize with gup_pte_range():
+ * - clear PTE; barrier; read refcount
+ * - inc refcount; barrier; read PTE
+ */
+ smp_mb();
+
+ ref_count = page_count(page);
+ map_count = page_mapcount(page);
+
+ /*
+ * Order reads for page refcount and dirty flag;
+ * see __remove_mapping().
+ */
+ smp_rmb();
+
+ /*
+ * The only page refs must be from the isolation
+ * plus one or more rmap's (dropped by discard:).
+ */
+ if ((ref_count == 1 + map_count) &&
+ !PageDirty(page)) {
/* Invalidate as we cleared the pte */
mmu_notifier_invalidate_range(mm,
address, address + PAGE_SIZE);
--- a/mm/vmscan.c~mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read
+++ a/mm/vmscan.c
@@ -1714,7 +1714,7 @@ retry:
mapping = page_mapping(page);
}
} else if (unlikely(PageTransHuge(page))) {
- /* Split file THP */
+ /* Split file/lazyfree THP */
if (split_huge_page_to_list(page, page_list))
goto keep_locked;
}
_
Patches currently in -mm which might be from mfo(a)canonical.com are
mm-fix-race-between-madv_free-reclaim-and-blkdev-direct-io-read.patch
The patch titled
Subject: mm/debug_vm_pgtable: remove pte entry from the page table
has been added to the -mm tree. Its filename is
mm-debug_vm_pgtable-remove-pte-entry-from-the-page-table.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-debug_vm_pgtable-remove-pte-en…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-debug_vm_pgtable-remove-pte-en…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Subject: mm/debug_vm_pgtable: remove pte entry from the page table
Patch series "page table check fixes and cleanups", v5.
This patch (of 4):
The pte entry that is used in pte_advanced_tests() is never removed from
the page table at the end of the test.
The issue is detected by page_table_check, to repro compile kernel with
the following configs:
CONFIG_DEBUG_VM_PGTABLE=y
CONFIG_PAGE_TABLE_CHECK=y
CONFIG_PAGE_TABLE_CHECK_ENFORCED=y
During the boot the following BUG is printed:
[ 2.262821] debug_vm_pgtable: [debug_vm_pgtable ]: Validating
architecture page table helpers
[ 2.276826] ------------[ cut here ]------------
[ 2.280426] kernel BUG at mm/page_table_check.c:162!
[ 2.284118] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 2.287787] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.16.0-11413-g2c271fe77d52 #3
[ 2.293226] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org
04/01/2014
...
The entry should be properly removed from the page table before the page
is released to the free list.
Link: https://lkml.kernel.org/r/20220131203249.2832273-1-pasha.tatashin@soleen.com
Link: https://lkml.kernel.org/r/20220131203249.2832273-2-pasha.tatashin@soleen.com
Fixes: a5c3b9ffb0f4 ("mm/debug_vm_pgtable: add tests validating advanced arch page table helpers")
Signed-off-by: Pasha Tatashin <pasha.tatashin(a)soleen.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Tested-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Cc: Paul Turner <pjt(a)google.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: Greg Thelen <gthelen(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: H. Peter Anvin <hpa(a)zytor.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: <stable(a)vger.kernel.org> [5.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/debug_vm_pgtable.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/debug_vm_pgtable.c~mm-debug_vm_pgtable-remove-pte-entry-from-the-page-table
+++ a/mm/debug_vm_pgtable.c
@@ -171,6 +171,8 @@ static void __init pte_advanced_tests(st
ptep_test_and_clear_young(args->vma, args->vaddr, args->ptep);
pte = ptep_get(args->ptep);
WARN_ON(pte_young(pte));
+
+ ptep_get_and_clear_full(args->mm, args->vaddr, args->ptep, 1);
}
static void __init pte_savedwrite_tests(struct pgtable_debug_args *args)
_
Patches currently in -mm which might be from pasha.tatashin(a)soleen.com are
mm-debug_vm_pgtable-remove-pte-entry-from-the-page-table.patch
mm-page_table_check-use-unsigned-long-for-page-counters-and-cleanup.patch
mm-khugepaged-unify-collapse-pmd-clear-flush-and-free.patch
mm-page_table_check-check-entries-at-pmd-levels.patch
hallo Greg
5.16.5-rc1
compiles, boots and runs on my x86_64
(Intel i5-11400, Fedora 35)
Thanks
Tested-by: Ronald Warsow <rwarsow(a)gmx.de>
regards
Ronald
Commit c2426d2ad5027 ("ima: added support for new kernel cmdline parameter
ima_template_fmt") introduced an additional check on the ima_template
variable to avoid multiple template selection.
Unfortunately, ima_template could be also set by the setup function of the
ima_hash= parameter, when it calls ima_template_desc_current(). This causes
attempts to choose a new template with ima_template= or with
ima_template_fmt=, after ima_hash=, to be ignored.
Achieve the goal of the commit mentioned with the new static variable
template_setup_done, so that template selection requests after ima_hash=
are not ignored.
Finally, call ima_init_template_list(), if not already done, to initialize
the list of templates before lookup_template_desc() is called.
Cc: stable(a)vger.kernel.org
Fixes: c2426d2ad5027 ("ima: added support for new kernel cmdline parameter ima_template_fmt")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
security/integrity/ima/ima_template.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/security/integrity/ima/ima_template.c b/security/integrity/ima/ima_template.c
index 694560396be0..db1ad6d7a57f 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -29,6 +29,7 @@ static struct ima_template_desc builtin_templates[] = {
static LIST_HEAD(defined_templates);
static DEFINE_SPINLOCK(template_list);
+static int template_setup_done;
static const struct ima_template_field supported_fields[] = {
{.field_id = "d", .field_init = ima_eventdigest_init,
@@ -101,10 +102,11 @@ static int __init ima_template_setup(char *str)
struct ima_template_desc *template_desc;
int template_len = strlen(str);
- if (ima_template)
+ if (template_setup_done)
return 1;
- ima_init_template_list();
+ if (!ima_template)
+ ima_init_template_list();
/*
* Verify that a template with the supplied name exists.
@@ -128,6 +130,7 @@ static int __init ima_template_setup(char *str)
}
ima_template = template_desc;
+ template_setup_done = 1;
return 1;
}
__setup("ima_template=", ima_template_setup);
@@ -136,7 +139,7 @@ static int __init ima_template_fmt_setup(char *str)
{
int num_templates = ARRAY_SIZE(builtin_templates);
- if (ima_template)
+ if (template_setup_done)
return 1;
if (template_desc_init_fields(str, NULL, NULL) < 0) {
@@ -147,6 +150,7 @@ static int __init ima_template_fmt_setup(char *str)
builtin_templates[num_templates - 1].fmt = str;
ima_template = builtin_templates + num_templates - 1;
+ template_setup_done = 1;
return 1;
}
--
2.32.0
If boardrev is missing from the NVRAM we add a default one, but this
might need more space in the output buffer than was allocated. Ensure
we have enough padding for this in the buffer.
Fixes: 46f2b38a91b0 ("brcmfmac: insert default boardrev in nvram data if missing")
Reviewed-by: Arend van Spriel <arend.vanspriel(a)broadcom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
index 0eb13e5df517..1001c8888bfe 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
@@ -207,6 +207,8 @@ static int brcmf_init_nvram_parser(struct nvram_parser *nvp,
size = BRCMF_FW_MAX_NVRAM_SIZE;
else
size = data_len;
+ /* Add space for properties we may add */
+ size += strlen(BRCMF_FW_DEFAULT_BOARDREV) + 1;
/* Alloc for extra 0 byte + roundup by 4 + length field */
size += 1 + 3 + sizeof(u32);
nvp->nvram = kzalloc(size, GFP_KERNEL);
--
2.33.0
It was reported that the mmc host structure could be accessed after it
was freed in moxart_remove(), so fix this by saving the base register of
the device and using it instead of the pointer dereference.
Cc: Ulf Hansson <ulf.hansson(a)linaro.org>
Cc: Xiyu Yang <xiyuyang19(a)fudan.edu.cn>
Cc: Xin Xiong <xiongx18(a)fudan.edu.cn>
Cc: Xin Tan <tanxin.ctf(a)gmail.com>
Cc: Tony Lindgren <tony(a)atomide.com>
Cc: Yang Li <yang.lee(a)linux.alibaba.com>
Cc: linux-mmc(a)vger.kernel.org
Cc: stable <stable(a)vger.kernel.org>
Reported-by: whitehat002 <hackyzh002(a)gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
v2: changed to only move mmc_free_host() call as per Ulf's request
drivers/mmc/host/moxart-mmc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/host/moxart-mmc.c b/drivers/mmc/host/moxart-mmc.c
index 16d1c7a43d33..b6eb75f4bbfc 100644
--- a/drivers/mmc/host/moxart-mmc.c
+++ b/drivers/mmc/host/moxart-mmc.c
@@ -705,12 +705,12 @@ static int moxart_remove(struct platform_device *pdev)
if (!IS_ERR_OR_NULL(host->dma_chan_rx))
dma_release_channel(host->dma_chan_rx);
mmc_remove_host(mmc);
- mmc_free_host(mmc);
writel(0, host->base + REG_INTERRUPT_MASK);
writel(0, host->base + REG_POWER_CONTROL);
writel(readl(host->base + REG_CLOCK_CONTROL) | CLK_OFF,
host->base + REG_CLOCK_CONTROL);
+ mmc_free_host(mmc);
return 0;
}
--
2.35.0
Currently tx_params is being re-assigned with a new value and the
previous setting IEEE80211_HT_MCS_TX_RX_DIFF is being overwritten.
The assignment operator is incorrect, the original intent was to
bit-wise or the value in. Fix this by replacing the = operator
with |= instead.
Kudos to Christian Lamparter for suggesting the correct fix.
Fixes: fe8ee9ad80b2 ("carl9170: mac80211 glue and command interface")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
Cc: <Stable(a)vger.kernel.org>
---
V2: change subject line to match the correct fix, add in the
missing | operator
---
drivers/net/wireless/ath/carl9170/main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/carl9170/main.c b/drivers/net/wireless/ath/carl9170/main.c
index 49f7ee1c912b..2208ec800482 100644
--- a/drivers/net/wireless/ath/carl9170/main.c
+++ b/drivers/net/wireless/ath/carl9170/main.c
@@ -1914,7 +1914,7 @@ static int carl9170_parse_eeprom(struct ar9170 *ar)
WARN_ON(!(tx_streams >= 1 && tx_streams <=
IEEE80211_HT_MCS_TX_MAX_STREAMS));
- tx_params = (tx_streams - 1) <<
+ tx_params |= (tx_streams - 1) <<
IEEE80211_HT_MCS_TX_MAX_STREAMS_SHIFT;
carl9170_band_2GHz.ht_cap.mcs.tx_params |= tx_params;
--
2.33.1
This is a note to let you know that I've just added the patch titled
usb: gadget: f_uac2: Define specific wTerminalType
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 5432184107cd0013761bdfa6cb6079527ef87b95 Mon Sep 17 00:00:00 2001
From: Pavel Hofman <pavel.hofman(a)ivitera.com>
Date: Mon, 31 Jan 2022 08:18:13 +0100
Subject: usb: gadget: f_uac2: Define specific wTerminalType
Several users have reported that their Win10 does not enumerate UAC2
gadget with the existing wTerminalType set to
UAC_INPUT_TERMINAL_UNDEFINED/UAC_INPUT_TERMINAL_UNDEFINED, e.g.
https://github.com/raspberrypi/linux/issues/4587#issuecomment-926567213.
While the constant is officially defined by the USB terminal types
document, e.g. XMOS firmware for UAC2 (commonly used for Win10) defines
no undefined output terminal type in its usbaudio20.h header.
Therefore wTerminalType of EP-IN is set to
UAC_INPUT_TERMINAL_MICROPHONE and wTerminalType of EP-OUT to
UAC_OUTPUT_TERMINAL_SPEAKER for the UAC2 gadget.
Signed-off-by: Pavel Hofman <pavel.hofman(a)ivitera.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20220131071813.7433-1-pavel.hofman@ivitera.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_uac2.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/f_uac2.c b/drivers/usb/gadget/function/f_uac2.c
index 36fa6ef0581b..097a709549d6 100644
--- a/drivers/usb/gadget/function/f_uac2.c
+++ b/drivers/usb/gadget/function/f_uac2.c
@@ -203,7 +203,7 @@ static struct uac2_input_terminal_descriptor io_in_it_desc = {
.bDescriptorSubtype = UAC_INPUT_TERMINAL,
/* .bTerminalID = DYNAMIC */
- .wTerminalType = cpu_to_le16(UAC_INPUT_TERMINAL_UNDEFINED),
+ .wTerminalType = cpu_to_le16(UAC_INPUT_TERMINAL_MICROPHONE),
.bAssocTerminal = 0,
/* .bCSourceID = DYNAMIC */
.iChannelNames = 0,
@@ -231,7 +231,7 @@ static struct uac2_output_terminal_descriptor io_out_ot_desc = {
.bDescriptorSubtype = UAC_OUTPUT_TERMINAL,
/* .bTerminalID = DYNAMIC */
- .wTerminalType = cpu_to_le16(UAC_OUTPUT_TERMINAL_UNDEFINED),
+ .wTerminalType = cpu_to_le16(UAC_OUTPUT_TERMINAL_SPEAKER),
.bAssocTerminal = 0,
/* .bSourceID = DYNAMIC */
/* .bCSourceID = DYNAMIC */
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 459702eea6132888b5c5b64c0e9c626da4ec2493 Mon Sep 17 00:00:00 2001
From: Adam Ford <aford173(a)gmail.com>
Date: Fri, 28 Jan 2022 16:36:03 -0600
Subject: usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
The support the external role switch a variety of situations were
addressed, but the transition from USB_ROLE_HOST to USB_ROLE_NONE
leaves the host up which can cause some error messages when
switching from host to none, to gadget, to none, and then back
to host again.
xhci-hcd ee000000.usb: Abort failed to stop command ring: -110
xhci-hcd ee000000.usb: xHCI host controller not responding, assume dead
xhci-hcd ee000000.usb: HC died; cleaning up
usb 4-1: device not accepting address 6, error -108
usb usb4-port1: couldn't allocate usb_device
After this happens it will not act as a host again.
Fix this by releasing the host mode when transitioning to USB_ROLE_NONE.
Fixes: 0604160d8c0b ("usb: gadget: udc: renesas_usb3: Enhance role switch support")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Link: https://lore.kernel.org/r/20220128223603.2362621-1-aford173@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/renesas_usb3.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/gadget/udc/renesas_usb3.c b/drivers/usb/gadget/udc/renesas_usb3.c
index 57d417a7c3e0..601829a6b4ba 100644
--- a/drivers/usb/gadget/udc/renesas_usb3.c
+++ b/drivers/usb/gadget/udc/renesas_usb3.c
@@ -2378,6 +2378,8 @@ static void handle_ext_role_switch_states(struct device *dev,
switch (role) {
case USB_ROLE_NONE:
usb3->connection_state = USB_ROLE_NONE;
+ if (cur_role == USB_ROLE_HOST)
+ device_release_driver(host);
if (usb3->driver)
usb3_disconnect(usb3);
usb3_vbus_out(usb3, false);
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: raw-gadget: fix handling of dual-direction-capable endpoints
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 292d2c82b105d92082c2120a44a58de9767e44f1 Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Wed, 26 Jan 2022 21:52:14 +0100
Subject: usb: raw-gadget: fix handling of dual-direction-capable endpoints
Under dummy_hcd, every available endpoint is *either* IN or OUT capable.
But with some real hardware, there are endpoints that support both IN and
OUT. In particular, the PLX 2380 has four available endpoints that each
support both IN and OUT.
raw-gadget currently gets confused and thinks that any endpoint that is
usable as an IN endpoint can never be used as an OUT endpoint.
Fix it by looking at the direction in the configured endpoint descriptor
instead of looking at the hardware capabilities.
With this change, I can use the PLX 2380 with raw-gadget.
Fixes: f2c2e717642c ("usb: gadget: add raw-gadget interface")
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Link: https://lore.kernel.org/r/20220126205214.2149936-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/legacy/raw_gadget.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
index c5a2c734234a..d86c3a36441e 100644
--- a/drivers/usb/gadget/legacy/raw_gadget.c
+++ b/drivers/usb/gadget/legacy/raw_gadget.c
@@ -1004,7 +1004,7 @@ static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
ret = -EBUSY;
goto out_unlock;
}
- if ((in && !ep->ep->caps.dir_in) || (!in && ep->ep->caps.dir_in)) {
+ if (in != usb_endpoint_dir_in(ep->ep->desc)) {
dev_dbg(&dev->gadget->dev, "fail, wrong direction\n");
ret = -EINVAL;
goto out_unlock;
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: gadget: f_uac2: Define specific wTerminalType
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 521e6baff47d34db0f028ff05e5f9a339c45d87c Mon Sep 17 00:00:00 2001
From: Pavel Hofman <pavel.hofman(a)ivitera.com>
Date: Mon, 31 Jan 2022 08:18:13 +0100
Subject: usb: gadget: f_uac2: Define specific wTerminalType
Several users have reported that their Win10 does not enumerate UAC2
gadget with the existing wTerminalType set to
UAC_INPUT_TERMINAL_UNDEFINED/UAC_INPUT_TERMINAL_UNDEFINED, e.g.
https://github.com/raspberrypi/linux/issues/4587#issuecomment-926567213.
While the constant is officially defined by the USB terminal types
document, e.g. XMOS firmware for UAC2 (commonly used for Win10) defines
no undefined output terminal type in its usbaudio20.h header.
Therefore wTerminalType of EP-IN is set to
UAC_INPUT_TERMINAL_MICROPHONE and wTerminalType of EP-OUT to
UAC_OUTPUT_TERMINAL_SPEAKER for the UAC2 gadget.
Signed-off-by: Pavel Hofman <pavel.hofman(a)ivitera.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20220131071813.7433-1-pavel.hofman@ivitera.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_uac2.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/function/f_uac2.c b/drivers/usb/gadget/function/f_uac2.c
index d874e0d34188..141e4814347e 100644
--- a/drivers/usb/gadget/function/f_uac2.c
+++ b/drivers/usb/gadget/function/f_uac2.c
@@ -202,7 +202,7 @@ static struct uac2_input_terminal_descriptor io_in_it_desc = {
.bDescriptorSubtype = UAC_INPUT_TERMINAL,
/* .bTerminalID = DYNAMIC */
- .wTerminalType = cpu_to_le16(UAC_INPUT_TERMINAL_UNDEFINED),
+ .wTerminalType = cpu_to_le16(UAC_INPUT_TERMINAL_MICROPHONE),
.bAssocTerminal = 0,
/* .bCSourceID = DYNAMIC */
.iChannelNames = 0,
@@ -230,7 +230,7 @@ static struct uac2_output_terminal_descriptor io_out_ot_desc = {
.bDescriptorSubtype = UAC_OUTPUT_TERMINAL,
/* .bTerminalID = DYNAMIC */
- .wTerminalType = cpu_to_le16(UAC_OUTPUT_TERMINAL_UNDEFINED),
+ .wTerminalType = cpu_to_le16(UAC_OUTPUT_TERMINAL_SPEAKER),
.bAssocTerminal = 0,
/* .bSourceID = DYNAMIC */
/* .bCSourceID = DYNAMIC */
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From efdfd34d5749724be0eba4168ef0fa5143ea0639 Mon Sep 17 00:00:00 2001
From: Adam Ford <aford173(a)gmail.com>
Date: Fri, 28 Jan 2022 16:36:03 -0600
Subject: usb: gadget: udc: renesas_usb3: Fix host to USB_ROLE_NONE transition
The support the external role switch a variety of situations were
addressed, but the transition from USB_ROLE_HOST to USB_ROLE_NONE
leaves the host up which can cause some error messages when
switching from host to none, to gadget, to none, and then back
to host again.
xhci-hcd ee000000.usb: Abort failed to stop command ring: -110
xhci-hcd ee000000.usb: xHCI host controller not responding, assume dead
xhci-hcd ee000000.usb: HC died; cleaning up
usb 4-1: device not accepting address 6, error -108
usb usb4-port1: couldn't allocate usb_device
After this happens it will not act as a host again.
Fix this by releasing the host mode when transitioning to USB_ROLE_NONE.
Fixes: 0604160d8c0b ("usb: gadget: udc: renesas_usb3: Enhance role switch support")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Link: https://lore.kernel.org/r/20220128223603.2362621-1-aford173@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/renesas_usb3.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/gadget/udc/renesas_usb3.c b/drivers/usb/gadget/udc/renesas_usb3.c
index 57d417a7c3e0..601829a6b4ba 100644
--- a/drivers/usb/gadget/udc/renesas_usb3.c
+++ b/drivers/usb/gadget/udc/renesas_usb3.c
@@ -2378,6 +2378,8 @@ static void handle_ext_role_switch_states(struct device *dev,
switch (role) {
case USB_ROLE_NONE:
usb3->connection_state = USB_ROLE_NONE;
+ if (cur_role == USB_ROLE_HOST)
+ device_release_driver(host);
if (usb3->driver)
usb3_disconnect(usb3);
usb3_vbus_out(usb3, false);
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: raw-gadget: fix handling of dual-direction-capable endpoints
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From d2d1bfa1e422242c3a794007857b463c081f3d5f Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Wed, 26 Jan 2022 21:52:14 +0100
Subject: usb: raw-gadget: fix handling of dual-direction-capable endpoints
Under dummy_hcd, every available endpoint is *either* IN or OUT capable.
But with some real hardware, there are endpoints that support both IN and
OUT. In particular, the PLX 2380 has four available endpoints that each
support both IN and OUT.
raw-gadget currently gets confused and thinks that any endpoint that is
usable as an IN endpoint can never be used as an OUT endpoint.
Fix it by looking at the direction in the configured endpoint descriptor
instead of looking at the hardware capabilities.
With this change, I can use the PLX 2380 with raw-gadget.
Fixes: f2c2e717642c ("usb: gadget: add raw-gadget interface")
Cc: stable <stable(a)vger.kernel.org>
Tested-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Link: https://lore.kernel.org/r/20220126205214.2149936-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/legacy/raw_gadget.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
index c5a2c734234a..d86c3a36441e 100644
--- a/drivers/usb/gadget/legacy/raw_gadget.c
+++ b/drivers/usb/gadget/legacy/raw_gadget.c
@@ -1004,7 +1004,7 @@ static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
ret = -EBUSY;
goto out_unlock;
}
- if ((in && !ep->ep->caps.dir_in) || (!in && ep->ep->caps.dir_in)) {
+ if (in != usb_endpoint_dir_in(ep->ep->desc)) {
dev_dbg(&dev->gadget->dev, "fail, wrong direction\n");
ret = -EINVAL;
goto out_unlock;
--
2.35.1
On Wed, Jan 26, 2022 at 11:37 PM Andrey Konovalov <andreyknvl(a)gmail.com> wrote:
>
> On Wed, Jan 26, 2022 at 11:31 PM Jann Horn <jannh(a)google.com> wrote:
> >
> > On Wed, Jan 26, 2022 at 11:12 PM Andrey Konovalov <andreyknvl(a)gmail.com> wrote:
> > > On Wed, Jan 26, 2022 at 9:52 PM Jann Horn <jannh(a)google.com> wrote:
> > > >
> > > > Under dummy_hcd, every available endpoint is *either* IN or OUT capable.
> > > > But with some real hardware, there are endpoints that support both IN and
> > > > OUT. In particular, the PLX 2380 has four available endpoints that each
> > > > support both IN and OUT.
> > > >
> > > > raw-gadget currently gets confused and thinks that any endpoint that is
> > > > usable as an IN endpoint can never be used as an OUT endpoint.
> > > >
> > > > Fix it by looking at the direction in the configured endpoint descriptor
> > > > instead of looking at the hardware capabilities.
> > > >
> > > > With this change, I can use the PLX 2380 with raw-gadget.
> > > >
> > > > Fixes: f2c2e717642c ("usb: gadget: add raw-gadget interface")
> > > > Signed-off-by: Jann Horn <jannh(a)google.com>
> > > > ---
> > > > drivers/usb/gadget/legacy/raw_gadget.c | 2 +-
> > > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/usb/gadget/legacy/raw_gadget.c b/drivers/usb/gadget/legacy/raw_gadget.c
> > > > index c5a2c734234a..d86c3a36441e 100644
> > > > --- a/drivers/usb/gadget/legacy/raw_gadget.c
> > > > +++ b/drivers/usb/gadget/legacy/raw_gadget.c
> > > > @@ -1004,7 +1004,7 @@ static int raw_process_ep_io(struct raw_dev *dev, struct usb_raw_ep_io *io,
> > > > ret = -EBUSY;
> > > > goto out_unlock;
> > > > }
> > > > - if ((in && !ep->ep->caps.dir_in) || (!in && ep->ep->caps.dir_in)) {
> > > > + if (in != usb_endpoint_dir_in(ep->ep->desc)) {
> > > > dev_dbg(&dev->gadget->dev, "fail, wrong direction\n");
> > > > ret = -EINVAL;
> > > > goto out_unlock;
> > > >
> > > > base-commit: 0280e3c58f92b2fe0e8fbbdf8d386449168de4a8
> > > > --
> > > > 2.35.0.rc0.227.g00780c9af4-goog
> > > >
>
> Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
> Tested-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Greg, could you PTAL and pick this up?
It also makes sense to include this fix into stable kernels.
Cc: <stable(a)vger.kernel.org>
Thanks!
This is a note to let you know that I've just added the patch titled
usb: ulpi: Call of_node_put correctly
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 0a907ee9d95e3ac35eb023d71f29eae0aaa52d1b Mon Sep 17 00:00:00 2001
From: Sean Anderson <sean.anderson(a)seco.com>
Date: Thu, 27 Jan 2022 14:00:03 -0500
Subject: usb: ulpi: Call of_node_put correctly
of_node_put should always be called on device nodes gotten from
of_get_*. Additionally, it should only be called after there are no
remaining users. To address the first issue, call of_node_put if later
steps in ulpi_register fail. To address the latter, call put_device if
device_register fails, which will call ulpi_dev_release if necessary.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Sean Anderson <sean.anderson(a)seco.com>
Link: https://lore.kernel.org/r/20220127190004.1446909-3-sean.anderson@seco.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/common/ulpi.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/common/ulpi.c b/drivers/usb/common/ulpi.c
index 09ad569a1a35..5509d3847af4 100644
--- a/drivers/usb/common/ulpi.c
+++ b/drivers/usb/common/ulpi.c
@@ -248,12 +248,16 @@ static int ulpi_register(struct device *dev, struct ulpi *ulpi)
return ret;
ret = ulpi_read_id(ulpi);
- if (ret)
+ if (ret) {
+ of_node_put(ulpi->dev.of_node);
return ret;
+ }
ret = device_register(&ulpi->dev);
- if (ret)
+ if (ret) {
+ put_device(&ulpi->dev);
return ret;
+ }
dev_dbg(&ulpi->dev, "registered ULPI PHY: vendor %04x, product %04x\n",
ulpi->id.vendor, ulpi->id.product);
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: ulpi: Move of_node_put to ulpi_dev_release
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 092f45b13e51666fe8ecbf2d6cd247aa7e6c1f74 Mon Sep 17 00:00:00 2001
From: Sean Anderson <sean.anderson(a)seco.com>
Date: Thu, 27 Jan 2022 14:00:02 -0500
Subject: usb: ulpi: Move of_node_put to ulpi_dev_release
Drivers are not unbound from the device when ulpi_unregister_interface
is called. Move of_node-freeing code to ulpi_dev_release which is called
only after all users are gone.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Sean Anderson <sean.anderson(a)seco.com>
Link: https://lore.kernel.org/r/20220127190004.1446909-2-sean.anderson@seco.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/common/ulpi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/common/ulpi.c b/drivers/usb/common/ulpi.c
index 8f8405b0d608..09ad569a1a35 100644
--- a/drivers/usb/common/ulpi.c
+++ b/drivers/usb/common/ulpi.c
@@ -130,6 +130,7 @@ static const struct attribute_group *ulpi_dev_attr_groups[] = {
static void ulpi_dev_release(struct device *dev)
{
+ of_node_put(dev->of_node);
kfree(to_ulpi_dev(dev));
}
@@ -299,7 +300,6 @@ EXPORT_SYMBOL_GPL(ulpi_register_interface);
*/
void ulpi_unregister_interface(struct ulpi *ulpi)
{
- of_node_put(ulpi->dev.of_node);
device_unregister(&ulpi->dev);
}
EXPORT_SYMBOL_GPL(ulpi_unregister_interface);
--
2.35.1
Dear Email ID Owner.
The IMF is compensating all the email address that was funds as one of
the ward win Victims and your email address and your name is among the
listed one of approved to pay the sum of $3.6 million U.S Dollars. We
have concluded to effect your own payment through Western Union Money
Transfer for easy pick-up of those funds in good condition,$4000 twice
daily,till the $3.6 million is completely transferred to you.We now
need your information where we will be sending the funds,such
as;Receiver name(Your full Name)address and phone number.Contact
Western Union agent with this Email: ( westerunion995(a)gmail.com ) for
your payment fund.
Ms.Maria Zatto
E-mail:westerunion995@gmail.com
Telephone: +229 682 97 169
Contact Ms.Maria,immediately you get this mail through western union
email address above to enable her speed-up.your payment and release
the $4000 dollars MTCN today for you to pick up the payment OK.
You are expected to provide us with the details as prescribed below to
enable safe and easy release of your funds today.
(1)Your Full name:
(2)Your Phone number:
(3)Your Country:
(4)Your Age:
Thank you,
Dr.Antonia Lloyd.
Contact Dir.Western Union Money Transfer,
Cotonou-Benin Republic.
While in practice vcpu->vcpu_idx == vcpu->vcp_id is often true,
it may not always be, and we must not rely on this.
Currently kvm->arch.idle_mask is indexed by vcpu_id, which implies
that code like
for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) {
vcpu = kvm_get_vcpu(kvm, vcpu_id);
do_stuff(vcpu);
}
is not legit. The trouble is, we do actually use kvm->arch.idle_mask
like this. To fix this problem we have two options. Either use
kvm_get_vcpu_by_id(vcpu_id), which would loop to find the right vcpu_id,
or switch to indexing via vcpu_idx. The latter is preferable for obvious
reasons.
Let us make switch from indexing kvm->arch.idle_mask by vcpu_id to
indexing it by vcpu_idx. To keep gisa_int.kicked_mask indexed by the
same index as idle_mask lets make the same change for it as well.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: 1ee0bc559dc3 ("KVM: s390: get rid of local_int array")
Cc: <stable(a)vger.kernel.org> # 3.15+
---
arch/s390/include/asm/kvm_host.h | 1 +
arch/s390/kvm/interrupt.c | 12 ++++++------
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/kvm-s390.h | 2 +-
4 files changed, 9 insertions(+), 8 deletions(-)
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 161a9e12bfb8..630eab0fa176 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -957,6 +957,7 @@ struct kvm_arch{
atomic64_t cmma_dirty_pages;
/* subset of available cpu features enabled by user space */
DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
+ /* indexed by vcpu_idx */
DECLARE_BITMAP(idle_mask, KVM_MAX_VCPUS);
struct kvm_s390_gisa_interrupt gisa_int;
struct kvm_s390_pv pv;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index d548d60caed2..16256e17a544 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -419,13 +419,13 @@ static unsigned long deliverable_irqs(struct kvm_vcpu *vcpu)
static void __set_cpu_idle(struct kvm_vcpu *vcpu)
{
kvm_s390_set_cpuflags(vcpu, CPUSTAT_WAIT);
- set_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask);
+ set_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask);
}
static void __unset_cpu_idle(struct kvm_vcpu *vcpu)
{
kvm_s390_clear_cpuflags(vcpu, CPUSTAT_WAIT);
- clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask);
+ clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask);
}
static void __reset_intercept_indicators(struct kvm_vcpu *vcpu)
@@ -3050,18 +3050,18 @@ int kvm_s390_get_irq_state(struct kvm_vcpu *vcpu, __u8 __user *buf, int len)
static void __airqs_kick_single_vcpu(struct kvm *kvm, u8 deliverable_mask)
{
- int vcpu_id, online_vcpus = atomic_read(&kvm->online_vcpus);
+ int vcpu_idx, online_vcpus = atomic_read(&kvm->online_vcpus);
struct kvm_s390_gisa_interrupt *gi = &kvm->arch.gisa_int;
struct kvm_vcpu *vcpu;
- for_each_set_bit(vcpu_id, kvm->arch.idle_mask, online_vcpus) {
- vcpu = kvm_get_vcpu(kvm, vcpu_id);
+ for_each_set_bit(vcpu_idx, kvm->arch.idle_mask, online_vcpus) {
+ vcpu = kvm_get_vcpu(kvm, vcpu_idx);
if (psw_ioint_disabled(vcpu))
continue;
deliverable_mask &= (u8)(vcpu->arch.sie_block->gcr[6] >> 24);
if (deliverable_mask) {
/* lately kicked but not yet running */
- if (test_and_set_bit(vcpu_id, gi->kicked_mask))
+ if (test_and_set_bit(vcpu_idx, gi->kicked_mask))
return;
kvm_s390_vcpu_wakeup(vcpu);
return;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 4527ac7b5961..8580543c5bc3 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4044,7 +4044,7 @@ static int vcpu_pre_run(struct kvm_vcpu *vcpu)
kvm_s390_patch_guest_per_regs(vcpu);
}
- clear_bit(vcpu->vcpu_id, vcpu->kvm->arch.gisa_int.kicked_mask);
+ clear_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.gisa_int.kicked_mask);
vcpu->arch.sie_block->icptcode = 0;
cpuflags = atomic_read(&vcpu->arch.sie_block->cpuflags);
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index 9fad25109b0d..ecd741ee3276 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -79,7 +79,7 @@ static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu)
static inline int is_vcpu_idle(struct kvm_vcpu *vcpu)
{
- return test_bit(vcpu->vcpu_id, vcpu->kvm->arch.idle_mask);
+ return test_bit(kvm_vcpu_get_idx(vcpu), vcpu->kvm->arch.idle_mask);
}
static inline int kvm_is_ucontrol(struct kvm *kvm)
base-commit: 77dd11439b86e3f7990e4c0c9e0b67dca82750ba
--
2.25.1
Pozdravy
Jmenuji se paní Daniella Kyleová, je mi 58 let
Filipíny. V současné době jsem hospitalizován na Filipínách, kde jsem
podstupuje léčbu akutního karcinomu jícnu. jsem umírající,
vdova, která se rozhodla darovat část svého majetku spolehlivé osobě
která tyto peníze použije na pomoc chudým a méně privilegovaným. Chci
poskytnout dar ve výši 3 700 000 £ na sirotky nebo charitativní organizace
ve vaší oblasti. Zvládneš to? Pokud jste ochotni tuto nabídku přijmout
a udělejte přesně tak, jak vám říkám, pak se mi vraťte pro další vysvětlení.
pozdravy
Paní Daniella Kyleová
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 322c4293ecc58110227b49d7e47ae37b9b03566f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Mon, 13 Dec 2021 21:55:27 +0900
Subject: [PATCH] loop: make autoclear operation asynchronous
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.
This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.
Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: syzbot <syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com>
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ba76319b5544..7f4ea06534c2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static void __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo)
{
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
@@ -1144,8 +1144,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
/* let user-space know about this change */
kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
mapping_set_gfp_mask(filp->f_mapping, gfp);
- /* This is safe: open() is still holding a reference. */
- module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
@@ -1153,44 +1151,52 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
int err;
- /*
- * open_mutex has been held already in release path, so don't
- * acquire it if this function is called in such case.
- *
- * If the reread partition isn't from release path, lo_refcnt
- * must be at least one and it can only become zero when the
- * current holder is released.
- */
- if (!release)
- mutex_lock(&lo->lo_disk->open_mutex);
+ mutex_lock(&lo->lo_disk->open_mutex);
err = bdev_disk_changed(lo->lo_disk, false);
- if (!release)
- mutex_unlock(&lo->lo_disk->open_mutex);
+ mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
__func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
}
- /*
- * lo->lo_state is set to Lo_unbound here after above partscan has
- * finished. There cannot be anybody else entering __loop_clr_fd() as
- * Lo_rundown state protects us from all the other places trying to
- * change the 'lo' device.
- */
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+
+ fput(filp);
+}
+
+static void loop_rundown_completed(struct loop_device *lo)
+{
mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
+ module_put(THIS_MODULE);
+}
- /*
- * Need not hold lo_mutex to fput backing file. Calling fput holding
- * lo_mutex triggers a circular lock dependency possibility warning as
- * fput can take open_mutex which is usually taken before lo_mutex.
- */
- fput(filp);
+static void loop_rundown_workfn(struct work_struct *work)
+{
+ struct loop_device *lo = container_of(work, struct loop_device,
+ rundown_work);
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __loop_clr_fd(lo);
+ kobject_put(&bdev->bd_device.kobj);
+ module_put(disk->fops->owner);
+ loop_rundown_completed(lo);
+}
+
+static void loop_schedule_rundown(struct loop_device *lo)
+{
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __module_get(disk->fops->owner);
+ kobject_get(&bdev->bd_device.kobj);
+ INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
+ queue_work(system_long_wq, &lo->rundown_work);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1222,7 +1228,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo);
+ loop_rundown_completed(lo);
return 0;
}
@@ -1747,7 +1754,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
* In autoclear mode, stop the loop thread
* and remove configuration after last close.
*/
- __loop_clr_fd(lo, true);
+ loop_schedule_rundown(lo);
return;
} else if (lo->lo_state == Lo_bound) {
/*
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 082d4b6bfc6a..918a7a2dc025 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -56,6 +56,7 @@ struct loop_device {
struct gendisk *lo_disk;
struct mutex lo_mutex;
bool idr_visible;
+ struct work_struct rundown_work;
};
struct loop_cmd {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 322c4293ecc58110227b49d7e47ae37b9b03566f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Mon, 13 Dec 2021 21:55:27 +0900
Subject: [PATCH] loop: make autoclear operation asynchronous
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.
This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.
Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: syzbot <syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com>
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ba76319b5544..7f4ea06534c2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static void __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo)
{
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
@@ -1144,8 +1144,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
/* let user-space know about this change */
kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
mapping_set_gfp_mask(filp->f_mapping, gfp);
- /* This is safe: open() is still holding a reference. */
- module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
@@ -1153,44 +1151,52 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
int err;
- /*
- * open_mutex has been held already in release path, so don't
- * acquire it if this function is called in such case.
- *
- * If the reread partition isn't from release path, lo_refcnt
- * must be at least one and it can only become zero when the
- * current holder is released.
- */
- if (!release)
- mutex_lock(&lo->lo_disk->open_mutex);
+ mutex_lock(&lo->lo_disk->open_mutex);
err = bdev_disk_changed(lo->lo_disk, false);
- if (!release)
- mutex_unlock(&lo->lo_disk->open_mutex);
+ mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
__func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
}
- /*
- * lo->lo_state is set to Lo_unbound here after above partscan has
- * finished. There cannot be anybody else entering __loop_clr_fd() as
- * Lo_rundown state protects us from all the other places trying to
- * change the 'lo' device.
- */
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+
+ fput(filp);
+}
+
+static void loop_rundown_completed(struct loop_device *lo)
+{
mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
+ module_put(THIS_MODULE);
+}
- /*
- * Need not hold lo_mutex to fput backing file. Calling fput holding
- * lo_mutex triggers a circular lock dependency possibility warning as
- * fput can take open_mutex which is usually taken before lo_mutex.
- */
- fput(filp);
+static void loop_rundown_workfn(struct work_struct *work)
+{
+ struct loop_device *lo = container_of(work, struct loop_device,
+ rundown_work);
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __loop_clr_fd(lo);
+ kobject_put(&bdev->bd_device.kobj);
+ module_put(disk->fops->owner);
+ loop_rundown_completed(lo);
+}
+
+static void loop_schedule_rundown(struct loop_device *lo)
+{
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __module_get(disk->fops->owner);
+ kobject_get(&bdev->bd_device.kobj);
+ INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
+ queue_work(system_long_wq, &lo->rundown_work);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1222,7 +1228,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo);
+ loop_rundown_completed(lo);
return 0;
}
@@ -1747,7 +1754,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
* In autoclear mode, stop the loop thread
* and remove configuration after last close.
*/
- __loop_clr_fd(lo, true);
+ loop_schedule_rundown(lo);
return;
} else if (lo->lo_state == Lo_bound) {
/*
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 082d4b6bfc6a..918a7a2dc025 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -56,6 +56,7 @@ struct loop_device {
struct gendisk *lo_disk;
struct mutex lo_mutex;
bool idr_visible;
+ struct work_struct rundown_work;
};
struct loop_cmd {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 322c4293ecc58110227b49d7e47ae37b9b03566f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Mon, 13 Dec 2021 21:55:27 +0900
Subject: [PATCH] loop: make autoclear operation asynchronous
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.
This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.
Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: syzbot <syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com>
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ba76319b5544..7f4ea06534c2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static void __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo)
{
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
@@ -1144,8 +1144,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
/* let user-space know about this change */
kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
mapping_set_gfp_mask(filp->f_mapping, gfp);
- /* This is safe: open() is still holding a reference. */
- module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
@@ -1153,44 +1151,52 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
int err;
- /*
- * open_mutex has been held already in release path, so don't
- * acquire it if this function is called in such case.
- *
- * If the reread partition isn't from release path, lo_refcnt
- * must be at least one and it can only become zero when the
- * current holder is released.
- */
- if (!release)
- mutex_lock(&lo->lo_disk->open_mutex);
+ mutex_lock(&lo->lo_disk->open_mutex);
err = bdev_disk_changed(lo->lo_disk, false);
- if (!release)
- mutex_unlock(&lo->lo_disk->open_mutex);
+ mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
__func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
}
- /*
- * lo->lo_state is set to Lo_unbound here after above partscan has
- * finished. There cannot be anybody else entering __loop_clr_fd() as
- * Lo_rundown state protects us from all the other places trying to
- * change the 'lo' device.
- */
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+
+ fput(filp);
+}
+
+static void loop_rundown_completed(struct loop_device *lo)
+{
mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
+ module_put(THIS_MODULE);
+}
- /*
- * Need not hold lo_mutex to fput backing file. Calling fput holding
- * lo_mutex triggers a circular lock dependency possibility warning as
- * fput can take open_mutex which is usually taken before lo_mutex.
- */
- fput(filp);
+static void loop_rundown_workfn(struct work_struct *work)
+{
+ struct loop_device *lo = container_of(work, struct loop_device,
+ rundown_work);
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __loop_clr_fd(lo);
+ kobject_put(&bdev->bd_device.kobj);
+ module_put(disk->fops->owner);
+ loop_rundown_completed(lo);
+}
+
+static void loop_schedule_rundown(struct loop_device *lo)
+{
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __module_get(disk->fops->owner);
+ kobject_get(&bdev->bd_device.kobj);
+ INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
+ queue_work(system_long_wq, &lo->rundown_work);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1222,7 +1228,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo);
+ loop_rundown_completed(lo);
return 0;
}
@@ -1747,7 +1754,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
* In autoclear mode, stop the loop thread
* and remove configuration after last close.
*/
- __loop_clr_fd(lo, true);
+ loop_schedule_rundown(lo);
return;
} else if (lo->lo_state == Lo_bound) {
/*
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 082d4b6bfc6a..918a7a2dc025 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -56,6 +56,7 @@ struct loop_device {
struct gendisk *lo_disk;
struct mutex lo_mutex;
bool idr_visible;
+ struct work_struct rundown_work;
};
struct loop_cmd {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 322c4293ecc58110227b49d7e47ae37b9b03566f Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Mon, 13 Dec 2021 21:55:27 +0900
Subject: [PATCH] loop: make autoclear operation asynchronous
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with disk->open_mutex held.
This circular dependency cannot be broken unless we call __loop_clr_fd()
without holding disk->open_mutex. Therefore, defer __loop_clr_fd() from
lo_release() to a WQ context.
Link: https://syzkaller.appspot.com/bug?extid=643e4ce4b6ad1347d372 [1]
Reported-by: syzbot <syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com>
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Tested-by: syzbot+643e4ce4b6ad1347d372(a)syzkaller.appspotmail.com
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/1ed7df28-ebd6-71fb-70e5-1c2972e05ddb@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ba76319b5544..7f4ea06534c2 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,7 +1082,7 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static void __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo)
{
struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
@@ -1144,8 +1144,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
/* let user-space know about this change */
kobject_uevent(&disk_to_dev(lo->lo_disk)->kobj, KOBJ_CHANGE);
mapping_set_gfp_mask(filp->f_mapping, gfp);
- /* This is safe: open() is still holding a reference. */
- module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
@@ -1153,44 +1151,52 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
int err;
- /*
- * open_mutex has been held already in release path, so don't
- * acquire it if this function is called in such case.
- *
- * If the reread partition isn't from release path, lo_refcnt
- * must be at least one and it can only become zero when the
- * current holder is released.
- */
- if (!release)
- mutex_lock(&lo->lo_disk->open_mutex);
+ mutex_lock(&lo->lo_disk->open_mutex);
err = bdev_disk_changed(lo->lo_disk, false);
- if (!release)
- mutex_unlock(&lo->lo_disk->open_mutex);
+ mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
__func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
}
- /*
- * lo->lo_state is set to Lo_unbound here after above partscan has
- * finished. There cannot be anybody else entering __loop_clr_fd() as
- * Lo_rundown state protects us from all the other places trying to
- * change the 'lo' device.
- */
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+
+ fput(filp);
+}
+
+static void loop_rundown_completed(struct loop_device *lo)
+{
mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
+ module_put(THIS_MODULE);
+}
- /*
- * Need not hold lo_mutex to fput backing file. Calling fput holding
- * lo_mutex triggers a circular lock dependency possibility warning as
- * fput can take open_mutex which is usually taken before lo_mutex.
- */
- fput(filp);
+static void loop_rundown_workfn(struct work_struct *work)
+{
+ struct loop_device *lo = container_of(work, struct loop_device,
+ rundown_work);
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __loop_clr_fd(lo);
+ kobject_put(&bdev->bd_device.kobj);
+ module_put(disk->fops->owner);
+ loop_rundown_completed(lo);
+}
+
+static void loop_schedule_rundown(struct loop_device *lo)
+{
+ struct block_device *bdev = lo->lo_device;
+ struct gendisk *disk = lo->lo_disk;
+
+ __module_get(disk->fops->owner);
+ kobject_get(&bdev->bd_device.kobj);
+ INIT_WORK(&lo->rundown_work, loop_rundown_workfn);
+ queue_work(system_long_wq, &lo->rundown_work);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1222,7 +1228,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo);
+ loop_rundown_completed(lo);
return 0;
}
@@ -1747,7 +1754,7 @@ static void lo_release(struct gendisk *disk, fmode_t mode)
* In autoclear mode, stop the loop thread
* and remove configuration after last close.
*/
- __loop_clr_fd(lo, true);
+ loop_schedule_rundown(lo);
return;
} else if (lo->lo_state == Lo_bound) {
/*
diff --git a/drivers/block/loop.h b/drivers/block/loop.h
index 082d4b6bfc6a..918a7a2dc025 100644
--- a/drivers/block/loop.h
+++ b/drivers/block/loop.h
@@ -56,6 +56,7 @@ struct loop_device {
struct gendisk *lo_disk;
struct mutex lo_mutex;
bool idr_visible;
+ struct work_struct rundown_work;
};
struct loop_cmd {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6050fa4c84cc93ae509f5105f585a429dffc5633 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Wed, 24 Nov 2021 19:47:40 +0900
Subject: [PATCH] loop: don't hold lo_mutex during __loop_clr_fd()
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.
Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.
Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.
Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: syzbot <syzbot+63614029dfb79abd4383(a)syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0954ea8cf9e3..ba76319b5544 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,13 +1082,10 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static int __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo, bool release)
{
- struct file *filp = NULL;
+ struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
- int err = 0;
- bool partscan = false;
- int lo_number;
struct loop_worker *pos, *worker;
/*
@@ -1103,17 +1100,14 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* became visible.
*/
+ /*
+ * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close()
+ * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if
+ * lo->lo_state has changed while waiting for lo->lo_mutex.
+ */
mutex_lock(&lo->lo_mutex);
- if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
- err = -ENXIO;
- goto out_unlock;
- }
-
- filp = lo->lo_backing_file;
- if (filp == NULL) {
- err = -EINVAL;
- goto out_unlock;
- }
+ BUG_ON(lo->lo_state != Lo_rundown);
+ mutex_unlock(&lo->lo_mutex);
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
@@ -1134,6 +1128,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
del_timer_sync(&lo->timer);
spin_lock_irq(&lo->lo_lock);
+ filp = lo->lo_backing_file;
lo->lo_backing_file = NULL;
spin_unlock_irq(&lo->lo_lock);
@@ -1153,12 +1148,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
- partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
- lo_number = lo->lo_number;
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
-out_unlock:
- mutex_unlock(&lo->lo_mutex);
- if (partscan) {
+
+ if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
+ int err;
+
/*
* open_mutex has been held already in release path, so don't
* acquire it if this function is called in such case.
@@ -1174,24 +1168,20 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
- __func__, lo_number, err);
+ __func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
- err = 0;
}
/*
* lo->lo_state is set to Lo_unbound here after above partscan has
- * finished.
- *
- * There cannot be anybody else entering __loop_clr_fd() as
- * lo->lo_backing_file is already cleared and Lo_rundown state
- * protects us from all the other places trying to change the 'lo'
- * device.
+ * finished. There cannot be anybody else entering __loop_clr_fd() as
+ * Lo_rundown state protects us from all the other places trying to
+ * change the 'lo' device.
*/
- mutex_lock(&lo->lo_mutex);
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+ mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
@@ -1200,9 +1190,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* lo_mutex triggers a circular lock dependency possibility warning as
* fput can take open_mutex which is usually taken before lo_mutex.
*/
- if (filp)
- fput(filp);
- return err;
+ fput(filp);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1234,7 +1222,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- return __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo, false);
+ return 0;
}
static int
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6050fa4c84cc93ae509f5105f585a429dffc5633 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Wed, 24 Nov 2021 19:47:40 +0900
Subject: [PATCH] loop: don't hold lo_mutex during __loop_clr_fd()
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.
Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.
Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.
Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: syzbot <syzbot+63614029dfb79abd4383(a)syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0954ea8cf9e3..ba76319b5544 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,13 +1082,10 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static int __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo, bool release)
{
- struct file *filp = NULL;
+ struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
- int err = 0;
- bool partscan = false;
- int lo_number;
struct loop_worker *pos, *worker;
/*
@@ -1103,17 +1100,14 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* became visible.
*/
+ /*
+ * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close()
+ * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if
+ * lo->lo_state has changed while waiting for lo->lo_mutex.
+ */
mutex_lock(&lo->lo_mutex);
- if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
- err = -ENXIO;
- goto out_unlock;
- }
-
- filp = lo->lo_backing_file;
- if (filp == NULL) {
- err = -EINVAL;
- goto out_unlock;
- }
+ BUG_ON(lo->lo_state != Lo_rundown);
+ mutex_unlock(&lo->lo_mutex);
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
@@ -1134,6 +1128,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
del_timer_sync(&lo->timer);
spin_lock_irq(&lo->lo_lock);
+ filp = lo->lo_backing_file;
lo->lo_backing_file = NULL;
spin_unlock_irq(&lo->lo_lock);
@@ -1153,12 +1148,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
- partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
- lo_number = lo->lo_number;
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
-out_unlock:
- mutex_unlock(&lo->lo_mutex);
- if (partscan) {
+
+ if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
+ int err;
+
/*
* open_mutex has been held already in release path, so don't
* acquire it if this function is called in such case.
@@ -1174,24 +1168,20 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
- __func__, lo_number, err);
+ __func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
- err = 0;
}
/*
* lo->lo_state is set to Lo_unbound here after above partscan has
- * finished.
- *
- * There cannot be anybody else entering __loop_clr_fd() as
- * lo->lo_backing_file is already cleared and Lo_rundown state
- * protects us from all the other places trying to change the 'lo'
- * device.
+ * finished. There cannot be anybody else entering __loop_clr_fd() as
+ * Lo_rundown state protects us from all the other places trying to
+ * change the 'lo' device.
*/
- mutex_lock(&lo->lo_mutex);
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+ mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
@@ -1200,9 +1190,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* lo_mutex triggers a circular lock dependency possibility warning as
* fput can take open_mutex which is usually taken before lo_mutex.
*/
- if (filp)
- fput(filp);
- return err;
+ fput(filp);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1234,7 +1222,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- return __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo, false);
+ return 0;
}
static int
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6050fa4c84cc93ae509f5105f585a429dffc5633 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Wed, 24 Nov 2021 19:47:40 +0900
Subject: [PATCH] loop: don't hold lo_mutex during __loop_clr_fd()
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.
Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.
Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.
Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: syzbot <syzbot+63614029dfb79abd4383(a)syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0954ea8cf9e3..ba76319b5544 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,13 +1082,10 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static int __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo, bool release)
{
- struct file *filp = NULL;
+ struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
- int err = 0;
- bool partscan = false;
- int lo_number;
struct loop_worker *pos, *worker;
/*
@@ -1103,17 +1100,14 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* became visible.
*/
+ /*
+ * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close()
+ * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if
+ * lo->lo_state has changed while waiting for lo->lo_mutex.
+ */
mutex_lock(&lo->lo_mutex);
- if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
- err = -ENXIO;
- goto out_unlock;
- }
-
- filp = lo->lo_backing_file;
- if (filp == NULL) {
- err = -EINVAL;
- goto out_unlock;
- }
+ BUG_ON(lo->lo_state != Lo_rundown);
+ mutex_unlock(&lo->lo_mutex);
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
@@ -1134,6 +1128,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
del_timer_sync(&lo->timer);
spin_lock_irq(&lo->lo_lock);
+ filp = lo->lo_backing_file;
lo->lo_backing_file = NULL;
spin_unlock_irq(&lo->lo_lock);
@@ -1153,12 +1148,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
- partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
- lo_number = lo->lo_number;
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
-out_unlock:
- mutex_unlock(&lo->lo_mutex);
- if (partscan) {
+
+ if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
+ int err;
+
/*
* open_mutex has been held already in release path, so don't
* acquire it if this function is called in such case.
@@ -1174,24 +1168,20 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
- __func__, lo_number, err);
+ __func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
- err = 0;
}
/*
* lo->lo_state is set to Lo_unbound here after above partscan has
- * finished.
- *
- * There cannot be anybody else entering __loop_clr_fd() as
- * lo->lo_backing_file is already cleared and Lo_rundown state
- * protects us from all the other places trying to change the 'lo'
- * device.
+ * finished. There cannot be anybody else entering __loop_clr_fd() as
+ * Lo_rundown state protects us from all the other places trying to
+ * change the 'lo' device.
*/
- mutex_lock(&lo->lo_mutex);
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+ mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
@@ -1200,9 +1190,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* lo_mutex triggers a circular lock dependency possibility warning as
* fput can take open_mutex which is usually taken before lo_mutex.
*/
- if (filp)
- fput(filp);
- return err;
+ fput(filp);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1234,7 +1222,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- return __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo, false);
+ return 0;
}
static int
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6050fa4c84cc93ae509f5105f585a429dffc5633 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Date: Wed, 24 Nov 2021 19:47:40 +0900
Subject: [PATCH] loop: don't hold lo_mutex during __loop_clr_fd()
syzbot is reporting circular locking problem at __loop_clr_fd() [1], for
commit 87579e9b7d8dc36e ("loop: use worker per cgroup instead of kworker")
is calling destroy_workqueue() with lo->lo_mutex held.
Since all functions where lo->lo_state matters are already checking
lo->lo_state with lo->lo_mutex held (in order to avoid racing with e.g.
ioctl(LOOP_CTL_REMOVE)), and __loop_clr_fd() can be called from either
ioctl(LOOP_CLR_FD) xor close(), lo->lo_state == Lo_rundown is considered
as an exclusive lock for __loop_clr_fd(). Therefore, hold lo->lo_mutex
inside __loop_clr_fd() only when asserting/updating lo->lo_state.
Since ioctl(LOOP_CLR_FD) depends on lo->lo_state == Lo_bound, a valid
lo->lo_backing_file must have been assigned by ioctl(LOOP_SET_FD) or
ioctl(LOOP_CONFIGURE). Thus, we can remove lo->lo_backing_file test,
and convert __loop_clr_fd() into a void function.
Link: https://syzkaller.appspot.com/bug?extid=63614029dfb79abd4383 [1]
Reported-by: syzbot <syzbot+63614029dfb79abd4383(a)syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/8ebe3b2e-8975-7f26-0620-7144a3b8b8cd@i-love.sakur…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0954ea8cf9e3..ba76319b5544 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1082,13 +1082,10 @@ static int loop_configure(struct loop_device *lo, fmode_t mode,
return error;
}
-static int __loop_clr_fd(struct loop_device *lo, bool release)
+static void __loop_clr_fd(struct loop_device *lo, bool release)
{
- struct file *filp = NULL;
+ struct file *filp;
gfp_t gfp = lo->old_gfp_mask;
- int err = 0;
- bool partscan = false;
- int lo_number;
struct loop_worker *pos, *worker;
/*
@@ -1103,17 +1100,14 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* became visible.
*/
+ /*
+ * Since this function is called upon "ioctl(LOOP_CLR_FD)" xor "close()
+ * after ioctl(LOOP_CLR_FD)", it is a sign of something going wrong if
+ * lo->lo_state has changed while waiting for lo->lo_mutex.
+ */
mutex_lock(&lo->lo_mutex);
- if (WARN_ON_ONCE(lo->lo_state != Lo_rundown)) {
- err = -ENXIO;
- goto out_unlock;
- }
-
- filp = lo->lo_backing_file;
- if (filp == NULL) {
- err = -EINVAL;
- goto out_unlock;
- }
+ BUG_ON(lo->lo_state != Lo_rundown);
+ mutex_unlock(&lo->lo_mutex);
if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
blk_queue_write_cache(lo->lo_queue, false, false);
@@ -1134,6 +1128,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
del_timer_sync(&lo->timer);
spin_lock_irq(&lo->lo_lock);
+ filp = lo->lo_backing_file;
lo->lo_backing_file = NULL;
spin_unlock_irq(&lo->lo_lock);
@@ -1153,12 +1148,11 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
module_put(THIS_MODULE);
blk_mq_unfreeze_queue(lo->lo_queue);
- partscan = lo->lo_flags & LO_FLAGS_PARTSCAN;
- lo_number = lo->lo_number;
disk_force_media_change(lo->lo_disk, DISK_EVENT_MEDIA_CHANGE);
-out_unlock:
- mutex_unlock(&lo->lo_mutex);
- if (partscan) {
+
+ if (lo->lo_flags & LO_FLAGS_PARTSCAN) {
+ int err;
+
/*
* open_mutex has been held already in release path, so don't
* acquire it if this function is called in such case.
@@ -1174,24 +1168,20 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
mutex_unlock(&lo->lo_disk->open_mutex);
if (err)
pr_warn("%s: partition scan of loop%d failed (rc=%d)\n",
- __func__, lo_number, err);
+ __func__, lo->lo_number, err);
/* Device is gone, no point in returning error */
- err = 0;
}
/*
* lo->lo_state is set to Lo_unbound here after above partscan has
- * finished.
- *
- * There cannot be anybody else entering __loop_clr_fd() as
- * lo->lo_backing_file is already cleared and Lo_rundown state
- * protects us from all the other places trying to change the 'lo'
- * device.
+ * finished. There cannot be anybody else entering __loop_clr_fd() as
+ * Lo_rundown state protects us from all the other places trying to
+ * change the 'lo' device.
*/
- mutex_lock(&lo->lo_mutex);
lo->lo_flags = 0;
if (!part_shift)
lo->lo_disk->flags |= GENHD_FL_NO_PART;
+ mutex_lock(&lo->lo_mutex);
lo->lo_state = Lo_unbound;
mutex_unlock(&lo->lo_mutex);
@@ -1200,9 +1190,7 @@ static int __loop_clr_fd(struct loop_device *lo, bool release)
* lo_mutex triggers a circular lock dependency possibility warning as
* fput can take open_mutex which is usually taken before lo_mutex.
*/
- if (filp)
- fput(filp);
- return err;
+ fput(filp);
}
static int loop_clr_fd(struct loop_device *lo)
@@ -1234,7 +1222,8 @@ static int loop_clr_fd(struct loop_device *lo)
lo->lo_state = Lo_rundown;
mutex_unlock(&lo->lo_mutex);
- return __loop_clr_fd(lo, false);
+ __loop_clr_fd(lo, false);
+ return 0;
}
static int
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 791f3465c4afde02d7f16cf7424ca87070b69396 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 14 Jan 2022 11:59:10 +0000
Subject: [PATCH] io_uring: fix UAF due to missing POLLFREE handling
Fixes a problem described in 50252e4b5e989
("aio: fix use-after-free due to missing POLLFREE handling")
and copies the approach used there.
In short, we have to forcibly eject a poll entry when we meet POLLFREE.
We can't rely on io_poll_get_ownership() as can't wait for potentially
running tw handlers, so we use the fact that wqs are RCU freed. See
Eric's patch and comments for more details.
Reported-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Reported-and-tested-by: syzbot+5426c7ed6868c705ca14(a)syzkaller.appspotmail.com
Fixes: 221c5eb233823 ("io_uring: add support for IORING_OP_POLL")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/4ed56b6f548f7ea337603a82315750449412748a.16421612…
[axboe: drop non-functional change from patch]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fa3277844d2e..422d6de48688 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5462,12 +5462,14 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
{
- struct wait_queue_head *head = poll->head;
+ struct wait_queue_head *head = smp_load_acquire(&poll->head);
- spin_lock_irq(&head->lock);
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- spin_unlock_irq(&head->lock);
+ if (head) {
+ spin_lock_irq(&head->lock);
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ spin_unlock_irq(&head->lock);
+ }
}
static void io_poll_remove_entries(struct io_kiocb *req)
@@ -5475,10 +5477,26 @@ static void io_poll_remove_entries(struct io_kiocb *req)
struct io_poll_iocb *poll = io_poll_get_single(req);
struct io_poll_iocb *poll_double = io_poll_get_double(req);
- if (poll->head)
- io_poll_remove_entry(poll);
- if (poll_double && poll_double->head)
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ io_poll_remove_entry(poll);
+ if (poll_double)
io_poll_remove_entry(poll_double);
+ rcu_read_unlock();
}
/*
@@ -5618,6 +5636,30 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
wait);
__poll_t mask = key_to_poll(key);
+ if (unlikely(mask & POLLFREE)) {
+ io_poll_mark_cancelled(req);
+ /* we have to kick tw in case it's not already */
+ io_poll_execute(req, 0);
+
+ /*
+ * If the waitqueue is being freed early but someone is already
+ * holds ownership over it, we have to tear down the request as
+ * best we can. That means immediately removing the request from
+ * its waitqueue and preventing all further accesses to the
+ * waitqueue via the request.
+ */
+ list_del_init(&poll->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&poll->head, NULL);
+ return 1;
+ }
+
/* for instances that support it check for an event match first */
if (mask && !(mask & poll->events))
return 0;
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 791f3465c4afde02d7f16cf7424ca87070b69396 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 14 Jan 2022 11:59:10 +0000
Subject: [PATCH] io_uring: fix UAF due to missing POLLFREE handling
Fixes a problem described in 50252e4b5e989
("aio: fix use-after-free due to missing POLLFREE handling")
and copies the approach used there.
In short, we have to forcibly eject a poll entry when we meet POLLFREE.
We can't rely on io_poll_get_ownership() as can't wait for potentially
running tw handlers, so we use the fact that wqs are RCU freed. See
Eric's patch and comments for more details.
Reported-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Reported-and-tested-by: syzbot+5426c7ed6868c705ca14(a)syzkaller.appspotmail.com
Fixes: 221c5eb233823 ("io_uring: add support for IORING_OP_POLL")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/4ed56b6f548f7ea337603a82315750449412748a.16421612…
[axboe: drop non-functional change from patch]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fa3277844d2e..422d6de48688 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5462,12 +5462,14 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
{
- struct wait_queue_head *head = poll->head;
+ struct wait_queue_head *head = smp_load_acquire(&poll->head);
- spin_lock_irq(&head->lock);
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- spin_unlock_irq(&head->lock);
+ if (head) {
+ spin_lock_irq(&head->lock);
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ spin_unlock_irq(&head->lock);
+ }
}
static void io_poll_remove_entries(struct io_kiocb *req)
@@ -5475,10 +5477,26 @@ static void io_poll_remove_entries(struct io_kiocb *req)
struct io_poll_iocb *poll = io_poll_get_single(req);
struct io_poll_iocb *poll_double = io_poll_get_double(req);
- if (poll->head)
- io_poll_remove_entry(poll);
- if (poll_double && poll_double->head)
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ io_poll_remove_entry(poll);
+ if (poll_double)
io_poll_remove_entry(poll_double);
+ rcu_read_unlock();
}
/*
@@ -5618,6 +5636,30 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
wait);
__poll_t mask = key_to_poll(key);
+ if (unlikely(mask & POLLFREE)) {
+ io_poll_mark_cancelled(req);
+ /* we have to kick tw in case it's not already */
+ io_poll_execute(req, 0);
+
+ /*
+ * If the waitqueue is being freed early but someone is already
+ * holds ownership over it, we have to tear down the request as
+ * best we can. That means immediately removing the request from
+ * its waitqueue and preventing all further accesses to the
+ * waitqueue via the request.
+ */
+ list_del_init(&poll->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&poll->head, NULL);
+ return 1;
+ }
+
/* for instances that support it check for an event match first */
if (mask && !(mask & poll->events))
return 0;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 791f3465c4afde02d7f16cf7424ca87070b69396 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 14 Jan 2022 11:59:10 +0000
Subject: [PATCH] io_uring: fix UAF due to missing POLLFREE handling
Fixes a problem described in 50252e4b5e989
("aio: fix use-after-free due to missing POLLFREE handling")
and copies the approach used there.
In short, we have to forcibly eject a poll entry when we meet POLLFREE.
We can't rely on io_poll_get_ownership() as can't wait for potentially
running tw handlers, so we use the fact that wqs are RCU freed. See
Eric's patch and comments for more details.
Reported-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Reported-and-tested-by: syzbot+5426c7ed6868c705ca14(a)syzkaller.appspotmail.com
Fixes: 221c5eb233823 ("io_uring: add support for IORING_OP_POLL")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/4ed56b6f548f7ea337603a82315750449412748a.16421612…
[axboe: drop non-functional change from patch]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fa3277844d2e..422d6de48688 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5462,12 +5462,14 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
{
- struct wait_queue_head *head = poll->head;
+ struct wait_queue_head *head = smp_load_acquire(&poll->head);
- spin_lock_irq(&head->lock);
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- spin_unlock_irq(&head->lock);
+ if (head) {
+ spin_lock_irq(&head->lock);
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ spin_unlock_irq(&head->lock);
+ }
}
static void io_poll_remove_entries(struct io_kiocb *req)
@@ -5475,10 +5477,26 @@ static void io_poll_remove_entries(struct io_kiocb *req)
struct io_poll_iocb *poll = io_poll_get_single(req);
struct io_poll_iocb *poll_double = io_poll_get_double(req);
- if (poll->head)
- io_poll_remove_entry(poll);
- if (poll_double && poll_double->head)
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ io_poll_remove_entry(poll);
+ if (poll_double)
io_poll_remove_entry(poll_double);
+ rcu_read_unlock();
}
/*
@@ -5618,6 +5636,30 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
wait);
__poll_t mask = key_to_poll(key);
+ if (unlikely(mask & POLLFREE)) {
+ io_poll_mark_cancelled(req);
+ /* we have to kick tw in case it's not already */
+ io_poll_execute(req, 0);
+
+ /*
+ * If the waitqueue is being freed early but someone is already
+ * holds ownership over it, we have to tear down the request as
+ * best we can. That means immediately removing the request from
+ * its waitqueue and preventing all further accesses to the
+ * waitqueue via the request.
+ */
+ list_del_init(&poll->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&poll->head, NULL);
+ return 1;
+ }
+
/* for instances that support it check for an event match first */
if (mask && !(mask & poll->events))
return 0;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 791f3465c4afde02d7f16cf7424ca87070b69396 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 14 Jan 2022 11:59:10 +0000
Subject: [PATCH] io_uring: fix UAF due to missing POLLFREE handling
Fixes a problem described in 50252e4b5e989
("aio: fix use-after-free due to missing POLLFREE handling")
and copies the approach used there.
In short, we have to forcibly eject a poll entry when we meet POLLFREE.
We can't rely on io_poll_get_ownership() as can't wait for potentially
running tw handlers, so we use the fact that wqs are RCU freed. See
Eric's patch and comments for more details.
Reported-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20211209010455.42744-6-ebiggers@kernel.org
Reported-and-tested-by: syzbot+5426c7ed6868c705ca14(a)syzkaller.appspotmail.com
Fixes: 221c5eb233823 ("io_uring: add support for IORING_OP_POLL")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/4ed56b6f548f7ea337603a82315750449412748a.16421612…
[axboe: drop non-functional change from patch]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fa3277844d2e..422d6de48688 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5462,12 +5462,14 @@ static void io_init_poll_iocb(struct io_poll_iocb *poll, __poll_t events,
static inline void io_poll_remove_entry(struct io_poll_iocb *poll)
{
- struct wait_queue_head *head = poll->head;
+ struct wait_queue_head *head = smp_load_acquire(&poll->head);
- spin_lock_irq(&head->lock);
- list_del_init(&poll->wait.entry);
- poll->head = NULL;
- spin_unlock_irq(&head->lock);
+ if (head) {
+ spin_lock_irq(&head->lock);
+ list_del_init(&poll->wait.entry);
+ poll->head = NULL;
+ spin_unlock_irq(&head->lock);
+ }
}
static void io_poll_remove_entries(struct io_kiocb *req)
@@ -5475,10 +5477,26 @@ static void io_poll_remove_entries(struct io_kiocb *req)
struct io_poll_iocb *poll = io_poll_get_single(req);
struct io_poll_iocb *poll_double = io_poll_get_double(req);
- if (poll->head)
- io_poll_remove_entry(poll);
- if (poll_double && poll_double->head)
+ /*
+ * While we hold the waitqueue lock and the waitqueue is nonempty,
+ * wake_up_pollfree() will wait for us. However, taking the waitqueue
+ * lock in the first place can race with the waitqueue being freed.
+ *
+ * We solve this as eventpoll does: by taking advantage of the fact that
+ * all users of wake_up_pollfree() will RCU-delay the actual free. If
+ * we enter rcu_read_lock() and see that the pointer to the queue is
+ * non-NULL, we can then lock it without the memory being freed out from
+ * under us.
+ *
+ * Keep holding rcu_read_lock() as long as we hold the queue lock, in
+ * case the caller deletes the entry from the queue, leaving it empty.
+ * In that case, only RCU prevents the queue memory from being freed.
+ */
+ rcu_read_lock();
+ io_poll_remove_entry(poll);
+ if (poll_double)
io_poll_remove_entry(poll_double);
+ rcu_read_unlock();
}
/*
@@ -5618,6 +5636,30 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
wait);
__poll_t mask = key_to_poll(key);
+ if (unlikely(mask & POLLFREE)) {
+ io_poll_mark_cancelled(req);
+ /* we have to kick tw in case it's not already */
+ io_poll_execute(req, 0);
+
+ /*
+ * If the waitqueue is being freed early but someone is already
+ * holds ownership over it, we have to tear down the request as
+ * best we can. That means immediately removing the request from
+ * its waitqueue and preventing all further accesses to the
+ * waitqueue via the request.
+ */
+ list_del_init(&poll->wait.entry);
+
+ /*
+ * Careful: this *must* be the last step, since as soon
+ * as req->head is NULL'ed out, the request can be
+ * completed and freed, since aio_poll_complete_work()
+ * will no longer need to take the waitqueue lock.
+ */
+ smp_store_release(&poll->head, NULL);
+ return 1;
+ }
+
/* for instances that support it check for an event match first */
if (mask && !(mask & poll->events))
return 0;
This series is the essential subset of "[PATCH v3 0/5] KVM: nVMX: Fix
Windows 11 + WSL2 + Enlightened VMCS" which got merged upstream recently.
It fixes Windows 11 guests with enabled Hyper-V role on KVM when eVMCS is
in use.
Vitaly Kuznetsov (3):
KVM: nVMX: Rename vmcs_to_field_offset{,_table}
KVM: nVMX: Implement evmcs_field_offset() suitable for handle_vmread()
KVM: nVMX: Allow VMREAD when Enlightened VMCS is in use
arch/x86/kvm/vmx/evmcs.c | 3 +-
arch/x86/kvm/vmx/evmcs.h | 44 +++++++++++++++++++++++------
arch/x86/kvm/vmx/nested.c | 59 +++++++++++++++++++++++++++------------
arch/x86/kvm/vmx/vmcs12.c | 4 +--
arch/x86/kvm/vmx/vmcs12.h | 6 ++--
5 files changed, 83 insertions(+), 33 deletions(-)
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a37d9a17f099072fe4d3a9048b0321978707a918 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il(a)gmail.com>
Date: Thu, 20 Jan 2022 23:53:04 +0200
Subject: [PATCH] fsnotify: invalidate dcache before IN_DELETE event
Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.
Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete() so fsnotify
will have access to a positive dentry.
This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.
To fix the regression, create a new hook fsnotify_delete() that takes
the unlinked inode as an argument and use a helper d_delete_notify() to
pin the inode, so we can pass it to fsnotify_delete() after d_delete().
Backporting hint: this regression is from v5.3. Although patch will
apply with only trivial conflicts to v5.4 and v5.10, it won't build,
because fsnotify_delete() implementation is different in each of those
versions (see fsnotify_link()).
A follow up patch will fix the fsnotify_unlink/rmdir() calls in pseudo
filesystem that do not need to call d_delete().
Link: https://lore.kernel.org/r/20220120215305.282577-1-amir73il@gmail.com
Reported-by: Ivan Delalande <colona(a)arista.com>
Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Fixes: 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of d_delete()")
Cc: stable(a)vger.kernel.org # v5.3+
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index a5bd6926f7ff..7807b28b7892 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -3086,10 +3086,8 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
btrfs_inode_lock(inode, 0);
err = btrfs_delete_subvolume(dir, dentry);
btrfs_inode_unlock(inode, 0);
- if (!err) {
- fsnotify_rmdir(dir, dentry);
- d_delete(dentry);
- }
+ if (!err)
+ d_delete_notify(dir, dentry);
out_dput:
dput(dentry);
diff --git a/fs/namei.c b/fs/namei.c
index d81f04f8d818..4ed0e41feab7 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3974,13 +3974,12 @@ int vfs_rmdir(struct user_namespace *mnt_userns, struct inode *dir,
dentry->d_inode->i_flags |= S_DEAD;
dont_mount(dentry);
detach_mounts(dentry);
- fsnotify_rmdir(dir, dentry);
out:
inode_unlock(dentry->d_inode);
dput(dentry);
if (!error)
- d_delete(dentry);
+ d_delete_notify(dir, dentry);
return error;
}
EXPORT_SYMBOL(vfs_rmdir);
@@ -4102,7 +4101,6 @@ int vfs_unlink(struct user_namespace *mnt_userns, struct inode *dir,
if (!error) {
dont_mount(dentry);
detach_mounts(dentry);
- fsnotify_unlink(dir, dentry);
}
}
}
@@ -4110,9 +4108,11 @@ int vfs_unlink(struct user_namespace *mnt_userns, struct inode *dir,
inode_unlock(target);
/* We don't d_delete() NFS sillyrenamed files--they still exist. */
- if (!error && !(dentry->d_flags & DCACHE_NFSFS_RENAMED)) {
+ if (!error && dentry->d_flags & DCACHE_NFSFS_RENAMED) {
+ fsnotify_unlink(dir, dentry);
+ } else if (!error) {
fsnotify_link_count(target);
- d_delete(dentry);
+ d_delete_notify(dir, dentry);
}
return error;
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index 3a2d7dc3c607..bb8467cd11ae 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -224,6 +224,43 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode,
dir, &new_dentry->d_name, 0);
}
+/*
+ * fsnotify_delete - @dentry was unlinked and unhashed
+ *
+ * Caller must make sure that dentry->d_name is stable.
+ *
+ * Note: unlike fsnotify_unlink(), we have to pass also the unlinked inode
+ * as this may be called after d_delete() and old_dentry may be negative.
+ */
+static inline void fsnotify_delete(struct inode *dir, struct inode *inode,
+ struct dentry *dentry)
+{
+ __u32 mask = FS_DELETE;
+
+ if (S_ISDIR(inode->i_mode))
+ mask |= FS_ISDIR;
+
+ fsnotify_name(mask, inode, FSNOTIFY_EVENT_INODE, dir, &dentry->d_name,
+ 0);
+}
+
+/**
+ * d_delete_notify - delete a dentry and call fsnotify_delete()
+ * @dentry: The dentry to delete
+ *
+ * This helper is used to guaranty that the unlinked inode cannot be found
+ * by lookup of this name after fsnotify_delete() event has been delivered.
+ */
+static inline void d_delete_notify(struct inode *dir, struct dentry *dentry)
+{
+ struct inode *inode = d_inode(dentry);
+
+ ihold(inode);
+ d_delete(dentry);
+ fsnotify_delete(dir, inode, dentry);
+ iput(inode);
+}
+
/*
* fsnotify_unlink - 'name' was unlinked
*
@@ -231,10 +268,10 @@ static inline void fsnotify_link(struct inode *dir, struct inode *inode,
*/
static inline void fsnotify_unlink(struct inode *dir, struct dentry *dentry)
{
- /* Expected to be called before d_delete() */
- WARN_ON_ONCE(d_is_negative(dentry));
+ if (WARN_ON_ONCE(d_is_negative(dentry)))
+ return;
- fsnotify_dirent(dir, dentry, FS_DELETE);
+ fsnotify_delete(dir, d_inode(dentry), dentry);
}
/*
@@ -258,10 +295,10 @@ static inline void fsnotify_mkdir(struct inode *dir, struct dentry *dentry)
*/
static inline void fsnotify_rmdir(struct inode *dir, struct dentry *dentry)
{
- /* Expected to be called before d_delete() */
- WARN_ON_ONCE(d_is_negative(dentry));
+ if (WARN_ON_ONCE(d_is_negative(dentry)))
+ return;
- fsnotify_dirent(dir, dentry, FS_DELETE | FS_ISDIR);
+ fsnotify_delete(dir, d_inode(dentry), dentry);
}
/*
On Sat, 2022-01-29 at 17:24 +0100, gregkh(a)linuxfoundation.org wrote:
> This is a note to let you know that I've just added the patch titled
>
> usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode
>
> to the 5.16-stable tree which can be found at:
>
> https://urldefense.com/v3/__http://www.kernel.org/git/?p=linux*kernel*git*s…
>
>
> The filename of the patch is:
> usb-dwc3-xilinx-skip-resets-and-usb3-register-settings-for-usb2.0-
> mode.patch
> and it can be found in the queue-5.16 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Hi Greg,
This patch should likely only go into stable along with the follow-up patch I
posted (or something equivalent):
https://patchwork.kernel.org/project/linux-usb/patch/20220127221500.177021-…
Otherwise it will cause a regression.
>
>
> From 9678f3361afc27a3124cd2824aec0227739986fb Mon Sep 17 00:00:00 2001
> From: Robert Hancock <robert.hancock(a)calian.com>
> Date: Tue, 25 Jan 2022 18:02:50 -0600
> Subject: usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0
> mode
>
> From: Robert Hancock <robert.hancock(a)calian.com>
>
> commit 9678f3361afc27a3124cd2824aec0227739986fb upstream.
>
> It appears that the PIPE clock should not be selected when only USB 2.0
> is being used in the design and no USB 3.0 reference clock is used.
> Also, the core resets are not required if a USB3 PHY is not in use, and
> will break things if USB3 is actually used but the PHY entry is not
> listed in the device tree.
>
> Skip core resets and register settings that are only required for
> USB3 mode when no USB3 PHY is specified in the device tree.
>
> Fixes: 84770f028fab ("usb: dwc3: Add driver for Xilinx platforms")
> Cc: stable <stable(a)vger.kernel.org>
> Signed-off-by: Robert Hancock <robert.hancock(a)calian.com>
> Link:
> https://urldefense.com/v3/__https://lore.kernel.org/r/20220126000253.158676…
>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> drivers/usb/dwc3/dwc3-xilinx.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> --- a/drivers/usb/dwc3/dwc3-xilinx.c
> +++ b/drivers/usb/dwc3/dwc3-xilinx.c
> @@ -110,6 +110,18 @@ static int dwc3_xlnx_init_zynqmp(struct
> usb3_phy = NULL;
> }
>
> + /*
> + * The following core resets are not required unless a USB3 PHY
> + * is used, and the subsequent register settings are not required
> + * unless a core reset is performed (they should be set properly
> + * by the first-stage boot loader, but may be reverted by a core
> + * reset). They may also break the configuration if USB3 is actually
> + * in use but the usb3-phy entry is missing from the device tree.
> + * Therefore, skip these operations in this case.
> + */
> + if (!usb3_phy)
> + goto skip_usb3_phy;
> +
> crst = devm_reset_control_get_exclusive(dev, "usb_crst");
> if (IS_ERR(crst)) {
> ret = PTR_ERR(crst);
> @@ -188,6 +200,7 @@ static int dwc3_xlnx_init_zynqmp(struct
> goto err;
> }
>
> +skip_usb3_phy:
> /*
> * This routes the USB DMA traffic to go through FPD path instead
> * of reaching DDR directly. This traffic routing is needed to
>
>
> Patches currently in stable-queue which might be from
> robert.hancock(a)calian.com are
>
> queue-5.16/serial-8250-of-fix-mapped-region-size-when-using-reg-offset-
> property.patch
> queue-5.16/usb-dwc3-xilinx-skip-resets-and-usb3-register-settings-for-usb2.0-
> mode.patch
> queue-5.16/usb-dwc3-xilinx-fix-error-handling-when-getting-usb3-phy.patch
This is a note to let you know that I've just added the patch titled
usb: dwc3: xilinx: fix uninitialized return value
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From b470947c3672f7eb7c4c271d510383d896831cc2 Mon Sep 17 00:00:00 2001
From: Robert Hancock <robert.hancock(a)calian.com>
Date: Thu, 27 Jan 2022 16:15:00 -0600
Subject: usb: dwc3: xilinx: fix uninitialized return value
A previous patch to skip part of the initialization when a USB3 PHY was
not present could result in the return value being uninitialized in that
case, causing spurious probe failures. Initialize ret to 0 to avoid this.
Fixes: 9678f3361afc ("usb: dwc3: xilinx: Skip resets and USB3 register settings for USB2.0 mode")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Robert Hancock <robert.hancock(a)calian.com>
Link: https://lore.kernel.org/r/20220127221500.177021-1-robert.hancock@calian.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/dwc3-xilinx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/dwc3-xilinx.c b/drivers/usb/dwc3/dwc3-xilinx.c
index e14ac15e24c3..a6f3a9b38789 100644
--- a/drivers/usb/dwc3/dwc3-xilinx.c
+++ b/drivers/usb/dwc3/dwc3-xilinx.c
@@ -99,7 +99,7 @@ static int dwc3_xlnx_init_zynqmp(struct dwc3_xlnx *priv_data)
struct device *dev = priv_data->dev;
struct reset_control *crst, *hibrst, *apbrst;
struct phy *usb3_phy;
- int ret;
+ int ret = 0;
u32 reg;
usb3_phy = devm_phy_optional_get(dev, "usb3-phy");
--
2.35.1
Greetings my beloved,
I sent this mail praying it will get to you in a good condition of
health, since I myself are in a very critical health condition in
which I sleep every night without knowing if I may be alive to see the
next day. I bring peace and love to you. It is by the grace of God, I
had no choice than to do what is lawful and right in the sight of God
for eternal life and in the sight of man, for witness of God’s mercy
and glory upon my life. I am Mrs.James Jackie,a widow,I am suffering
from a long time brain tumor, It has defiled all forms of medical
treatment, and right now I have about a few months to leave, according
to medical experts.
The situation has gotten complicated recently with my inability to
hear proper, am communicating with you with the help of the chief
nurse herein the hospital, from all indication my conditions is really
deteriorating and it is quite obvious that, according to my doctors
they have advised me that I may not live too long, Because this
illness has gotten to a very bad stage. I plead that you will not
expose or betray this trust and confidence that I am about to repose
on you for the mutual benefit of the orphans and the less privilege. I
have some funds I inherited from my late husband, the sum of ($
12,500,000.00 Dollars).Having known my condition, I decided to donate
this fund to you believing that you will utilize it the way i am going
to instruct herein.
I need you to assist me and reclaim this money and use it for
Charity works, for orphanages and gives justice and help to the poor,
needy and widows says The Lord." Jeremiah 22:15-16.“ and also build
schools for less privilege that will be named after my late husband if
possible and to promote the word of God and the effort that the house
of God is maintained. I do not want a situation where this money will
be used in an ungodly manner. That's why I'm taking this decision. I'm
not afraid of death, so I know where I'm going. I accept this decision
because I do not have any child who will inherit this money after I
die. Please I want your sincerely and urgent answer to know if you
will be able to execute this project for the glory of God, and I will
give you more information on how the fund will be transferred to your
bank account. May the grace, peace, love and the truth in the Word of
God be with you and all those that you love and care for.
I'm waiting for your immediate reply.
Regards.
Mrs.James Jackie
Writting From the hospital.
May God Bless you.
[TLDR: I'm adding this regression to regzbot, the Linux kernel
regression tracking bot; most text you find below is compiled from a few
templates paragraphs some of you might have seen already.]
Hi, this is your Linux kernel regression tracker speaking.
Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
CCing Greg and Stable as well, as it's a stable regression.
On 30.01.22 22:33, Kim Nilsson wrote:
>
> I've recently been doing some suspend/resume testing related to an
> s2idle bug in amdgpu
> (https://gitlab.freedesktop.org/drm/amd/-/issues/1879) and I've come
> across some problems with C-states.
>
> After resuming from s2idle, Pkg(HW) will no longer enter any state C2 or
> deeper. Suspending to s3 would not trigger this issue on 5.16.2, but
> after trying out 5.16.4 I'm facing a similar problem where Pkg(HW) will
> rarely if ever go deeper than C3. Now, the reason I am contacting you is
> because I was playing around today and found that unloading i2c-i801
> seems to fix the issue.
Thanks for the report.
To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:
#regzbot ^introduced v5.16.2..v5.16.4
#regzbot title i2c: Possible ACPI bug/regression in i2c-i801 [plain text]
#regzbot ignore-activity
Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail) using the kernel.org redirector,
as explained in 'Documentation/process/submitting-patches.rst'. Regzbot
then will automatically mark the regression as resolved once the fix
lands in the appropriate tree. For more details about regzbot see footer.
Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.
Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.
> # CPU: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
> # GPU: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620], Advanced
> Micro Devices, Inc. [AMD/ATI] Lexa XT [Radeon PRO WX 3100]
> # 00:15.0 Serial bus controller: Intel Corporation Cannon Point-LP
> Serial IO I2C Controller #0 (rev 30)
> # 00:15.1 Serial bus controller: Intel Corporation Cannon Point-LP
> Serial IO I2C Controller #1 (rev 300:19.0 Serial bus controller: Intel
> Corporation Cannon Point-LP Serial IO I2C Host Controller (rev 30)
>
>
> dmesg output when loading the module after an unload:
>
> [ 1002.961091] i801_smbus 0000:00:1f.4: SPD Write Disable is set
> [ 1002.961171] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
> [ 1002.961321] iTCO_wdt iTCO_wdt: unable to reset NO_REBOOT flag, device
> disabled by hardware/BIOS
> [ 1002.975399] i801_smbus 0000:00:1f.4: Accelerometer lis3lv02d is
> present on SMBus but its address is unknown, skipping registration
> [ 1002.975406] i2c i2c-10: 2/2 memory slots populated (from DMI)
> [ 1002.976713] ee1004 10-0050: 512 byte EE1004-compliant SPD EEPROM,
> read-only
> [ 1002.976808] i2c i2c-10: Successfully instantiated SPD at 0x50
>
>
> This is my first LKML bug report, so I could use some guidance in how to
> provide more information.
Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)
P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.
BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.
---
Additional information about regzbot:
If you want to know more about regzbot, check out its web-interface, the
getting start guide, and/or the references documentation:
https://linux-regtracking.leemhuis.info/regzbot/https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.mdhttps://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
The last two documents will explain how you can interact with regzbot
yourself if your want to.
Hint for reporters: when reporting a regression it's in your interest to
tell #regzbot about it in the report, as that will ensure the regression
gets on the radar of regzbot and the regression tracker. That's in your
interest, as they will make sure the report won't fall through the
cracks unnoticed.
Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include a 'Link:' tag to the report in the commit message, as explained
in Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c
LOAN IS AVAILABLE, IF INTERESTED KEEP IN TOUCH
Good day to you all, I'm samuel mohamadi, from Burkina Faso the loan
manager, this is to inform everyone that we are now offering loan to
the need or business funding as a result of COVID-19 damages with 3%
rate per month, we offer business loan, car loan, investment loan or
loan to pay off your bills, contact us with your Full,
Name:
Phone:
Number:
Amount Needed:
Duration:
Address:
Occupation:
Gender:
whatsapp us on your number for easy conversation
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2afc3b5a31f9edf3ef0f374f5d70610c79c93a42 Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Sat, 22 Jan 2022 06:40:56 -0500
Subject: [PATCH] ping: fix the sk_bound_dev_if match in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
the selftests 'router_broadcast.sh' will fail, as such command
# ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
can't receive the response skb by the PING socket. It's caused by mismatch
of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
as dif is vrf-h1 if dif's master was set to vrf-h1.
This patch is to fix this regression by also checking the sk_bound_dev_if
against sdif so that the packets can stil be received even if the socket
is not bound to the vrf device but to the real iif.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Hangbin Liu <liuhangbin(a)gmail.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 0e56df3a45e2..bcf7bc71cb56 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -220,7 +220,8 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident)
continue;
}
- if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif)
+ if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
+ sk->sk_bound_dev_if != inet_sdif(skb))
continue;
sock_hold(sk);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2afc3b5a31f9edf3ef0f374f5d70610c79c93a42 Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Sat, 22 Jan 2022 06:40:56 -0500
Subject: [PATCH] ping: fix the sk_bound_dev_if match in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
the selftests 'router_broadcast.sh' will fail, as such command
# ip vrf exec vrf-h1 ping -I veth0 198.51.100.255 -b
can't receive the response skb by the PING socket. It's caused by mismatch
of sk_bound_dev_if and dif in ping_rcv() when looking up the PING socket,
as dif is vrf-h1 if dif's master was set to vrf-h1.
This patch is to fix this regression by also checking the sk_bound_dev_if
against sdif so that the packets can stil be received even if the socket
is not bound to the vrf device but to the real iif.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Hangbin Liu <liuhangbin(a)gmail.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index 0e56df3a45e2..bcf7bc71cb56 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -220,7 +220,8 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident)
continue;
}
- if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif)
+ if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
+ sk->sk_bound_dev_if != inet_sdif(skb))
continue;
sock_hold(sk);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1751fc1db36f6f411709e143d5393f92d12137a9 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Thu, 6 Jan 2022 18:24:03 -0500
Subject: [PATCH] NFSv4: nfs_atomic_open() can race when looking up a
non-regular file
If the file type changes back to being a regular file on the server
between the failed OPEN and our LOOKUP, then we need to re-run the OPEN.
Fixes: 0dd2b474d0b6 ("nfs: implement i_op->atomic_open()")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 408c3bb549b1..5df75ed09268 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1999,12 +1999,17 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
if ((lookup_flags & LOOKUP_DIRECTORY) && inode &&
!S_ISDIR(inode->i_mode))
res = ERR_PTR(-ENOTDIR);
+ else if (inode && S_ISREG(inode->i_mode))
+ res = ERR_PTR(-EOPENSTALE);
} else if (!IS_ERR(res)) {
inode = d_inode(res);
if ((lookup_flags & LOOKUP_DIRECTORY) && inode &&
!S_ISDIR(inode->i_mode)) {
dput(res);
res = ERR_PTR(-ENOTDIR);
+ } else if (inode && S_ISREG(inode->i_mode)) {
+ dput(res);
+ res = ERR_PTR(-EOPENSTALE);
}
}
if (switched) {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ac795161c93699d600db16c1a8cc23a65a1eceaf Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Date: Thu, 6 Jan 2022 18:24:02 -0500
Subject: [PATCH] NFSv4: Handle case where the lookup of a directory fails
If the application sets the O_DIRECTORY flag, and tries to open a
regular file, nfs_atomic_open() will punt to doing a regular lookup.
If the server then returns a regular file, we will happily return a
file descriptor with uninitialised open state.
The fix is to return the expected ENOTDIR error in these cases.
Reported-by: Lyu Tao <tao.lyu(a)epfl.ch>
Fixes: 0dd2b474d0b6 ("nfs: implement i_op->atomic_open()")
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 2884138848f4..408c3bb549b1 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1994,6 +1994,19 @@ int nfs_atomic_open(struct inode *dir, struct dentry *dentry,
no_open:
res = nfs_lookup(dir, dentry, lookup_flags);
+ if (!res) {
+ inode = d_inode(dentry);
+ if ((lookup_flags & LOOKUP_DIRECTORY) && inode &&
+ !S_ISDIR(inode->i_mode))
+ res = ERR_PTR(-ENOTDIR);
+ } else if (!IS_ERR(res)) {
+ inode = d_inode(res);
+ if ((lookup_flags & LOOKUP_DIRECTORY) && inode &&
+ !S_ISDIR(inode->i_mode)) {
+ dput(res);
+ res = ERR_PTR(-ENOTDIR);
+ }
+ }
if (switched) {
d_lookup_done(dentry);
if (!res)
arch/arm/mach-ep93xx/clock.c:154:2: warning: Use of memory after it is freed [clang-analyzer-unix.Malloc]
arch/arm/mach-ep93xx/clock.c:151:2: note: Taking true branch
if (IS_ERR(clk))
^
arch/arm/mach-ep93xx/clock.c:152:3: note: Memory is released
kfree(psc);
^~~~~~~~~~
arch/arm/mach-ep93xx/clock.c:154:2: note: Use of memory after it is freed
return &psc->hw;
^ ~~~~~~~~
Link: https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/B5YCO2N…
Reported-by: kernel test robot <lkp(a)intel.com>
Fixes: 9645ccc7bd7a ("ep93xx: clock: convert in-place to COMMON_CLK")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)gmail.com>
---
Changelog:
v2:
- Thanks to Nick for the hint to use ERR_CAST()
arch/arm/mach-ep93xx/clock.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-ep93xx/clock.c b/arch/arm/mach-ep93xx/clock.c
index cc75087134d3..4aee14f18123 100644
--- a/arch/arm/mach-ep93xx/clock.c
+++ b/arch/arm/mach-ep93xx/clock.c
@@ -148,8 +148,10 @@ static struct clk_hw *ep93xx_clk_register_gate(const char *name,
psc->lock = &clk_lock;
clk = clk_register(NULL, &psc->hw);
- if (IS_ERR(clk))
+ if (IS_ERR(clk)) {
kfree(psc);
+ return ERR_CAST(clk);
+ }
return &psc->hw;
}
--
2.34.1
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d3d079bde07e1b7deaeb57506dc0b86010121d17 Mon Sep 17 00:00:00 2001
From: Valentin Caron <valentin.caron(a)foss.st.com>
Date: Tue, 11 Jan 2022 17:44:40 +0100
Subject: [PATCH] serial: stm32: prevent TDR register overwrite when sending
x_char
When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
the value of TDR register by the value of x_char. If this happens, the
previous value that was present in TDR register will not be sent through
uart.
This code checks if the previous value in TDR register is sent before
writing the x_char value into register.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Valentin Caron <valentin.caron(a)foss.st.com>
Link: https://lore.kernel.org/r/20220111164441.6178-2-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 1f89ab0e49ac..c1b8828451c8 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -550,11 +550,23 @@ static void stm32_usart_transmit_chars(struct uart_port *port)
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
struct circ_buf *xmit = &port->state->xmit;
+ u32 isr;
+ int ret;
if (port->x_char) {
if (stm32_usart_tx_dma_started(stm32_port) &&
stm32_usart_tx_dma_enabled(stm32_port))
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_DMAT);
+
+ /* Check that TDR is empty before filling FIFO */
+ ret =
+ readl_relaxed_poll_timeout_atomic(port->membase + ofs->isr,
+ isr,
+ (isr & USART_SR_TXE),
+ 10, 1000);
+ if (ret)
+ dev_warn(port->dev, "1 character may be erased\n");
+
writel_relaxed(port->x_char, port->membase + ofs->tdr);
port->x_char = 0;
port->icount.tx++;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d3d079bde07e1b7deaeb57506dc0b86010121d17 Mon Sep 17 00:00:00 2001
From: Valentin Caron <valentin.caron(a)foss.st.com>
Date: Tue, 11 Jan 2022 17:44:40 +0100
Subject: [PATCH] serial: stm32: prevent TDR register overwrite when sending
x_char
When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
the value of TDR register by the value of x_char. If this happens, the
previous value that was present in TDR register will not be sent through
uart.
This code checks if the previous value in TDR register is sent before
writing the x_char value into register.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Valentin Caron <valentin.caron(a)foss.st.com>
Link: https://lore.kernel.org/r/20220111164441.6178-2-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 1f89ab0e49ac..c1b8828451c8 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -550,11 +550,23 @@ static void stm32_usart_transmit_chars(struct uart_port *port)
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
struct circ_buf *xmit = &port->state->xmit;
+ u32 isr;
+ int ret;
if (port->x_char) {
if (stm32_usart_tx_dma_started(stm32_port) &&
stm32_usart_tx_dma_enabled(stm32_port))
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_DMAT);
+
+ /* Check that TDR is empty before filling FIFO */
+ ret =
+ readl_relaxed_poll_timeout_atomic(port->membase + ofs->isr,
+ isr,
+ (isr & USART_SR_TXE),
+ 10, 1000);
+ if (ret)
+ dev_warn(port->dev, "1 character may be erased\n");
+
writel_relaxed(port->x_char, port->membase + ofs->tdr);
port->x_char = 0;
port->icount.tx++;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d3d079bde07e1b7deaeb57506dc0b86010121d17 Mon Sep 17 00:00:00 2001
From: Valentin Caron <valentin.caron(a)foss.st.com>
Date: Tue, 11 Jan 2022 17:44:40 +0100
Subject: [PATCH] serial: stm32: prevent TDR register overwrite when sending
x_char
When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
the value of TDR register by the value of x_char. If this happens, the
previous value that was present in TDR register will not be sent through
uart.
This code checks if the previous value in TDR register is sent before
writing the x_char value into register.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Valentin Caron <valentin.caron(a)foss.st.com>
Link: https://lore.kernel.org/r/20220111164441.6178-2-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 1f89ab0e49ac..c1b8828451c8 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -550,11 +550,23 @@ static void stm32_usart_transmit_chars(struct uart_port *port)
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
struct circ_buf *xmit = &port->state->xmit;
+ u32 isr;
+ int ret;
if (port->x_char) {
if (stm32_usart_tx_dma_started(stm32_port) &&
stm32_usart_tx_dma_enabled(stm32_port))
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_DMAT);
+
+ /* Check that TDR is empty before filling FIFO */
+ ret =
+ readl_relaxed_poll_timeout_atomic(port->membase + ofs->isr,
+ isr,
+ (isr & USART_SR_TXE),
+ 10, 1000);
+ if (ret)
+ dev_warn(port->dev, "1 character may be erased\n");
+
writel_relaxed(port->x_char, port->membase + ofs->tdr);
port->x_char = 0;
port->icount.tx++;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d3d079bde07e1b7deaeb57506dc0b86010121d17 Mon Sep 17 00:00:00 2001
From: Valentin Caron <valentin.caron(a)foss.st.com>
Date: Tue, 11 Jan 2022 17:44:40 +0100
Subject: [PATCH] serial: stm32: prevent TDR register overwrite when sending
x_char
When sending x_char in stm32_usart_transmit_chars(), driver can overwrite
the value of TDR register by the value of x_char. If this happens, the
previous value that was present in TDR register will not be sent through
uart.
This code checks if the previous value in TDR register is sent before
writing the x_char value into register.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Valentin Caron <valentin.caron(a)foss.st.com>
Link: https://lore.kernel.org/r/20220111164441.6178-2-valentin.caron@foss.st.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 1f89ab0e49ac..c1b8828451c8 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -550,11 +550,23 @@ static void stm32_usart_transmit_chars(struct uart_port *port)
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
struct circ_buf *xmit = &port->state->xmit;
+ u32 isr;
+ int ret;
if (port->x_char) {
if (stm32_usart_tx_dma_started(stm32_port) &&
stm32_usart_tx_dma_enabled(stm32_port))
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_DMAT);
+
+ /* Check that TDR is empty before filling FIFO */
+ ret =
+ readl_relaxed_poll_timeout_atomic(port->membase + ofs->isr,
+ isr,
+ (isr & USART_SR_TXE),
+ 10, 1000);
+ if (ret)
+ dev_warn(port->dev, "1 character may be erased\n");
+
writel_relaxed(port->x_char, port->membase + ofs->tdr);
port->x_char = 0;
port->icount.tx++;