This is the start of the stable review cycle for the 6.0.14 release.
There are 16 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 17 Dec 2022 17:28:57 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.14-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.0.14-rc1
Lei Rao <lei.rao(a)intel.com>
nvme-pci: clear the prp2 field when not used
Peter Zijlstra <peterz(a)infradead.org>
perf: Fix perf_pending_task() UaF
Charles Keepax <ckeepax(a)opensource.cirrus.com>
ASoC: cs42l51: Correct PGA Volume minimum value
Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
net: fec: don't reset irq coalesce settings to defaults on "ip link up"
Yasushi SHOJI <yasushi.shoji(a)gmail.com>
can: mcba_usb: Fix termination command argument
Heiko Schocher <hs(a)denx.de>
can: sja1000: fix size of OCR_MODE_MASK define
Ricardo Ribalda <ribalda(a)chromium.org>
pinctrl: meditatek: Startup with the IRQs disabled
Hou Tao <houtao1(a)huawei.com>
libbpf: Use page size as max_entries when probing ring buffer map
Mark Brown <broonie(a)kernel.org>
ASoC: ops: Check bounds for second channel in snd_soc_put_volsw_sx()
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: fsl_micfil: explicitly clear CHnF flags
Shengjiu Wang <shengjiu.wang(a)nxp.com>
ASoC: fsl_micfil: explicitly clear software reset bit
Alexandre Belloni <alexandre.belloni(a)bootlin.com>
rtc: cmos: fix build on non-ACPI platforms
David Michael <fedora.dm0(a)gmail.com>
libbpf: Fix uninitialized warning in btf_dump_dump_type_data
Nathan Chancellor <nathan(a)kernel.org>
x86/vdso: Conditionally export __vdso_sgx_enter_enclave()
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
rtc: cmos: Fix wake alarm breakage
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
rtc: cmos: Fix event handler registration ordering issue
-------------
Diffstat:
Makefile | 4 ++--
arch/x86/entry/vdso/vdso.lds.S | 2 ++
drivers/net/can/usb/mcba_usb.c | 10 ++++++---
drivers/net/ethernet/freescale/fec_main.c | 22 ++++++-------------
drivers/nvme/host/pci.c | 2 ++
drivers/pinctrl/mediatek/mtk-eint.c | 9 +++++---
drivers/rtc/rtc-cmos.c | 35 +++++++++++++++++++++++--------
include/linux/can/platform/sja1000.h | 2 +-
kernel/events/core.c | 17 +++++++++++----
sound/soc/codecs/cs42l51.c | 2 +-
sound/soc/fsl/fsl_micfil.c | 19 +++++++++++++++++
sound/soc/soc-ops.c | 6 ++++++
tools/lib/bpf/btf_dump.c | 2 +-
tools/lib/bpf/libbpf_probes.c | 2 +-
14 files changed, 93 insertions(+), 41 deletions(-)
From: Eric Biggers <ebiggers(a)google.com>
An issue that arises when migrating from builtin signatures to userspace
signatures is that existing files that have builtin signatures cannot be
opened unless either CONFIG_FS_VERITY_BUILTIN_SIGNATURES is disabled or
the signing certificate is left in the .fs-verity keyring.
Since builtin signatures provide no security benefit when
fs.verity.require_signatures=0 anyway, let's just skip the signature
verification in this case.
Fixes: 432434c9f8e1 ("fs-verity: support builtin file signatures")
Cc: <stable(a)vger.kernel.org> # v5.4+
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/verity/signature.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
index 143a530a80088..dc6935701abda 100644
--- a/fs/verity/signature.c
+++ b/fs/verity/signature.c
@@ -13,8 +13,8 @@
#include <linux/verification.h>
/*
- * /proc/sys/fs/verity/require_signatures
- * If 1, all verity files must have a valid builtin signature.
+ * /proc/sys/fs/verity/require_signatures. If 1, then builtin signatures are
+ * verified and all verity files must have a valid builtin signature.
*/
static int fsverity_require_signatures;
@@ -54,6 +54,20 @@ int fsverity_verify_signature(const struct fsverity_info *vi,
return 0;
}
+ /*
+ * If require_signatures=0, don't verify builtin signatures.
+ * Originally, builtin signatures were verified opportunistically in
+ * this case. However, no security property is possible when
+ * require_signatures=0 anyway. Skipping the builtin signature
+ * verification makes it easier to migrate existing files from builtin
+ * signature verification to userspace signature verification.
+ */
+ if (!fsverity_require_signatures) {
+ fsverity_warn(inode,
+ "Not checking builtin signature due to require_signatures=0");
+ return 0;
+ }
+
d = kzalloc(sizeof(*d) + hash_alg->digest_size, GFP_KERNEL);
if (!d)
return -ENOMEM;
base-commit: 479174d402bcf60789106eedc4def3957c060bad
--
2.38.1
When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may
have it and has pte marker installed. The warning is improper along with
the comment. The right thing is to inherit the pte marker when needed, or
keep the dst pte empty.
A vague guess is this happened by an accident when there's the prior patch
to introduce src/dst vma into this helper during the uffd-wp feature got
developed and I probably messed up in the rebase, since if we replace
dst_vma with src_vma the warning & comment it all makes sense too.
Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the
general path.
Reproducer:
https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_pag…
Cc: <stable(a)vger.kernel.org> # 5.19+
Fixes: c56d1b62cce8 ("mm/shmem: handle uffd-wp during fork()")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216808
Reported-by: Pengfei Xu <pengfei.xu(a)intel.com>
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/memory.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index aad226daf41b..032ef700c3e8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -828,12 +828,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
return -EBUSY;
return -ENOENT;
} else if (is_pte_marker_entry(entry)) {
- /*
- * We're copying the pgtable should only because dst_vma has
- * uffd-wp enabled, do sanity check.
- */
- WARN_ON_ONCE(!userfaultfd_wp(dst_vma));
- set_pte_at(dst_mm, addr, dst_pte, pte);
+ if (userfaultfd_wp(dst_vma))
+ set_pte_at(dst_mm, addr, dst_pte, pte);
return 0;
}
if (!userfaultfd_wp(dst_vma))
--
2.37.3
When encountering any vma in the range with policy other than MPOL_BIND
or MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put
on the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
[ Mathieu: compile-tested only. Tested-by would be welcome. ]
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: linux-api(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org # 5.17+
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 61aa9aedb728..02c8a712282f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
--
2.25.1
The patch titled
Subject: mm, mremap: fix mremap() expanding vma with addr inside vma
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mremap-fix-mremap-expanding-vma-with-addr-inside-vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, mremap: fix mremap() expanding vma with addr inside vma
Date: Fri, 16 Dec 2022 17:32:27 +0100
Since 6.1 we have noticed random rpm install failures that were tracked to
mremap() returning -ENOMEM and to commit ca3d76b0aa80 ("mm: add merging
after mremap resize").
The problem occurs when mremap() expands a VMA in place, but using an
starting address that's not vma->vm_start, but somewhere in the middle.
The extension_pgoff calculation introduced by the commit is wrong in that
case, so vma_merge() fails due to pgoffs not being compatible. Fix the
calculation.
By the way it seems that the situations, where rpm now expands a vma from
the middle, were made possible also due to that commit, thanks to the
improved vma merging. Yet it should work just fine, except for the buggy
calculation.
Link: https://lkml.kernel.org/r/20221216163227.24648-1-vbabka@suse.cz
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359
Fixes: ca3d76b0aa80 ("mm: add merging after mremap resize")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Jakub Mat��na <matenajakub(a)gmail.com>
Cc: "Kirill A . Shutemov" <kirill(a)shutemov.name>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/mremap.c~mm-mremap-fix-mremap-expanding-vma-with-addr-inside-vma
+++ a/mm/mremap.c
@@ -1016,7 +1016,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, a
long pages = (new_len - old_len) >> PAGE_SHIFT;
unsigned long extension_start = addr + old_len;
unsigned long extension_end = addr + new_len;
- pgoff_t extension_pgoff = vma->vm_pgoff + (old_len >> PAGE_SHIFT);
+ pgoff_t extension_pgoff = vma->vm_pgoff +
+ ((extension_start - vma->vm_start) >> PAGE_SHIFT);
if (vma->vm_flags & VM_ACCOUNT) {
if (security_vm_enough_memory_mm(mm, pages)) {
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-mremap-fix-mremap-expanding-vma-with-addr-inside-vma.patch
If the sink gets disconnected during receiving a multi-packet DP MST AUX
down-reply/up-request sideband message, the state keeping track of which
packets have been received already is not reset. This results in a failed
sanity check for the subsequent message packet received after a sink is
reconnected (due to the pending message not yet completed with an
end-of-message-transfer packet), indicated by the
"sideband msg set header failed"
error.
Fix the above by resetting the up/down message reception state after a
disconnect event.
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # v3.17+
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 51a46689cda70..90819fff2c9ba 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -3641,6 +3641,9 @@ int drm_dp_mst_topology_mgr_set_mst(struct drm_dp_mst_topology_mgr *mgr, bool ms
drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL, 0);
ret = 0;
mgr->payload_id_table_cleared = false;
+
+ memset(&mgr->down_rep_recv, 0, sizeof(mgr->down_rep_recv));
+ memset(&mgr->up_req_recv, 0, sizeof(mgr->up_req_recv));
}
out_unlock:
--
2.37.1
Since 6.1 we have noticed random rpm install failures that were tracked
to mremap() returning -ENOMEM and to commit ca3d76b0aa80 ("mm: add
merging after mremap resize").
The problem occurs when mremap() expands a VMA in place, but using an
starting address that's not vma->vm_start, but somewhere in the middle.
The extension_pgoff calculation introduced by the commit is wrong in
that case, so vma_merge() fails due to pgoffs not being compatible.
Fix the calculation.
By the way it seems that the situations, where rpm now expands a vma
from the middle, were made possible also due to that commit, thanks to
the improved vma merging. Yet it should work just fine, except for the
buggy calculation.
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Link: https://bugzilla.suse.com/show_bug.cgi?id=1206359
Fixes: ca3d76b0aa80 ("mm: add merging after mremap resize")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Jakub Matěna <matenajakub(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Kirill A . Shutemov" <kirill(a)shutemov.name>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
---
Hi, this fixes a regression in 6.1 so please process ASAP so that stable
6.1.y can get the fix.
mm/mremap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index e465ffe279bb..fe587c5d6591 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -1016,7 +1016,8 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
long pages = (new_len - old_len) >> PAGE_SHIFT;
unsigned long extension_start = addr + old_len;
unsigned long extension_end = addr + new_len;
- pgoff_t extension_pgoff = vma->vm_pgoff + (old_len >> PAGE_SHIFT);
+ pgoff_t extension_pgoff = vma->vm_pgoff +
+ ((extension_start - vma->vm_start) >> PAGE_SHIFT);
if (vma->vm_flags & VM_ACCOUNT) {
if (security_vm_enough_memory_mm(mm, pages)) {
--
2.38.1
The assumption that Documentation was right about how this value work was
wrong. It was discovered that the length value of the mgmt header is in
step of word size.
As an example to process 4 byte of data the correct length to set is 2.
To process 8 byte 4, 12 byte 6, 16 byte 8...
Odd values will always return the next size on the ack packet.
(length of 3 (6 byte) will always return 8 bytes of data)
This means that a value of 15 (0xf) actually means reading/writing 32 bytes
of data instead of 16 bytes. This behaviour is totally absent and not
documented in the switch Documentation.
In fact from Documentation the max value that mgmt eth can process is
16 byte of data while in reality it can process 32 bytes at once.
To handle this we always round up the length after deviding it for word
size. We check if the result is odd and we round another time to align
to what the switch will provide in the ack packet.
The workaround for the length limit of 15 is still needed as the length
reg max value is 0xf(15)
Reported-by: Ronald Wahl <ronald.wahl(a)raritan.com>
Tested-by: Ronald Wahl <ronald.wahl(a)raritan.com>
Fixes: 90386223f44e ("net: dsa: qca8k: add support for larger read/write size with mgmt Ethernet")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
Cc: stable(a)vger.kernel.org # v5.18+
---
drivers/net/dsa/qca/qca8k-8xxx.c | 45 +++++++++++++++++++++++++-------
1 file changed, 35 insertions(+), 10 deletions(-)
diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
index c5c3b4e92f28..46151320b2a8 100644
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -146,7 +146,16 @@ static void qca8k_rw_reg_ack_handler(struct dsa_switch *ds, struct sk_buff *skb)
command = get_unaligned_le32(&mgmt_ethhdr->command);
cmd = FIELD_GET(QCA_HDR_MGMT_CMD, command);
+
len = FIELD_GET(QCA_HDR_MGMT_LENGTH, command);
+ /* Special case for len of 15 as this is the max value for len and needs to
+ * be increased before converting it from word to dword.
+ */
+ if (len == 15)
+ len++;
+
+ /* We can ignore odd value, we always round up them in the alloc function. */
+ len *= sizeof(u16);
/* Make sure the seq match the requested packet */
if (get_unaligned_le32(&mgmt_ethhdr->seq) == mgmt_eth_data->seq)
@@ -193,17 +202,33 @@ static struct sk_buff *qca8k_alloc_mdio_header(enum mdio_cmd cmd, u32 reg, u32 *
if (!skb)
return NULL;
- /* Max value for len reg is 15 (0xf) but the switch actually return 16 byte
- * Actually for some reason the steps are:
- * 0: nothing
- * 1-4: first 4 byte
- * 5-6: first 12 byte
- * 7-15: all 16 byte
+ /* Hdr mgmt length value is in step of word size.
+ * As an example to process 4 byte of data the correct length to set is 2.
+ * To process 8 byte 4, 12 byte 6, 16 byte 8...
+ *
+ * Odd values will always return the next size on the ack packet.
+ * (length of 3 (6 byte) will always return 8 bytes of data)
+ *
+ * This means that a value of 15 (0xf) actually means reading/writing 32 bytes
+ * of data.
+ *
+ * To correctly calculate the length we devide the requested len by word and
+ * round up.
+ * On the ack function we can skip the odd check as we already handle the
+ * case here.
+ */
+ real_len = DIV_ROUND_UP(len, sizeof(u16));
+
+ /* We check if the result len is odd and we round up another time to
+ * the next size. (length of 3 will be increased to 4 as switch will always
+ * return 8 bytes)
*/
- if (len == 16)
- real_len = 15;
- else
- real_len = len;
+ if (real_len % sizeof(u16) != 0)
+ real_len++;
+
+ /* Max reg value is 0xf(15) but switch will always return the next size (32 byte) */
+ if (real_len == 16)
+ real_len--;
skb_reset_mac_header(skb);
skb_set_network_header(skb, skb->len);
--
2.37.2
Hi Handa-san,
On Thu, Nov 17, 2022 at 4:32 PM Tetsuo Handa
<penguin-kernel(a)i-love.sakura.ne.jp> wrote:
> A kernel built with syzbot's config file reported that
>
> scr_memcpyw(q, save, array3_size(logo_lines, new_cols, 2))
>
> causes uninitialized "save" to be copied.
> Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Thanks for your patch, which is now commit a6a00d7e8ffd78d1
("fbcon: Use kzalloc() in fbcon_prepare_logo()") in v6.1-rc7,
and which is being backported to stable.
> --- a/drivers/video/fbdev/core/fbcon.c
> +++ b/drivers/video/fbdev/core/fbcon.c
> @@ -577,7 +577,7 @@ static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info,
> if (scr_readw(r) != vc->vc_video_erase_char)
> break;
> if (r != q && new_rows >= rows + logo_lines) {
> - save = kmalloc(array3_size(logo_lines, new_cols, 2),
> + save = kzalloc(array3_size(logo_lines, new_cols, 2),
> GFP_KERNEL);
> if (save) {
> int i = min(cols, new_cols);
The next line is:
scr_memsetw(save, erase,
array3_size(logo_lines, new_cols, 2));
So how can this turn out to be uninitialized later below?
scr_memcpyw(q, save, array3_size(logo_lines, new_cols, 2));
What am I missing?
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
The commit mentioned below causes the ovs_flow_tbl_lookup() function
to be called with the masked key. However, it's supposed to be called
with the unmasked key. This due to the fact that the datapath supports
installing wider flows, and OVS relies on this behavior. For example
if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
128.0.0.0) is allowed to be added.
However, if we try to add a wildcard rule, the installation fails:
$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
$ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)
The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed.
This change reverses the commit below, but rather than having the key
on the stack, it's allocated.
Fixes: 190aa3e77880 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
Signed-off-by: Eelco Chaudron <echaudro(a)redhat.com>
---
Version history:
v3: Updated commit message to explain the problem in more details.
v2: Fixed ENOMEN error :( Forgot to do a stg refresh.
net/openvswitch/datapath.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
index 861dfb8daf4a..55b697c4d576 100644
--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -948,6 +948,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
struct sw_flow_mask mask;
struct sk_buff *reply;
struct datapath *dp;
+ struct sw_flow_key *key;
struct sw_flow_actions *acts;
struct sw_flow_match match;
u32 ufid_flags = ovs_nla_get_ufid_flags(a[OVS_FLOW_ATTR_UFID_FLAGS]);
@@ -975,24 +976,26 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
}
/* Extract key. */
- ovs_match_init(&match, &new_flow->key, false, &mask);
+ key = kzalloc(sizeof(*key), GFP_KERNEL);
+ if (!key) {
+ error = -ENOMEM;
+ goto err_kfree_key;
+ }
+
+ ovs_match_init(&match, key, false, &mask);
error = ovs_nla_get_match(net, &match, a[OVS_FLOW_ATTR_KEY],
a[OVS_FLOW_ATTR_MASK], log);
if (error)
goto err_kfree_flow;
+ ovs_flow_mask_key(&new_flow->key, key, true, &mask);
+
/* Extract flow identifier. */
error = ovs_nla_get_identifier(&new_flow->id, a[OVS_FLOW_ATTR_UFID],
- &new_flow->key, log);
+ key, log);
if (error)
goto err_kfree_flow;
- /* unmasked key is needed to match when ufid is not used. */
- if (ovs_identifier_is_key(&new_flow->id))
- match.key = new_flow->id.unmasked_key;
-
- ovs_flow_mask_key(&new_flow->key, &new_flow->key, true, &mask);
-
/* Validate actions. */
error = ovs_nla_copy_actions(net, a[OVS_FLOW_ATTR_ACTIONS],
&new_flow->key, &acts, log);
@@ -1019,7 +1022,7 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
if (ovs_identifier_is_ufid(&new_flow->id))
flow = ovs_flow_tbl_lookup_ufid(&dp->table, &new_flow->id);
if (!flow)
- flow = ovs_flow_tbl_lookup(&dp->table, &new_flow->key);
+ flow = ovs_flow_tbl_lookup(&dp->table, key);
if (likely(!flow)) {
rcu_assign_pointer(new_flow->sf_acts, acts);
@@ -1089,6 +1092,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
if (reply)
ovs_notify(&dp_flow_genl_family, reply, info);
+
+ kfree(key);
return 0;
err_unlock_ovs:
@@ -1098,6 +1103,8 @@ static int ovs_flow_cmd_new(struct sk_buff *skb, struct genl_info *info)
ovs_nla_free_flow_actions(acts);
err_kfree_flow:
ovs_flow_free(new_flow, false);
+err_kfree_key:
+ kfree(key);
error:
return error;
}
Sorry, I may not have sent the email correctly.
I will resend it.
On Thu, 27 Oct 2022 20:26:04 +0000 Andrew Morton <akpm(a)linux-foundation.org> wrote:
> On Wed, 26 Oct 2022 20:24:38 +0900 NARIBAYASHI Akira <a.naribayashi(a)fujitsu.com> wrote:
>
> > Depending on the memory configuration, isolate_freepages_block() may
> > scan pages out of the target range and causes panic.
> >
> > The problem is that pfn as argument of fast_isolate_around() could
> > be out of the target range. Therefore we should consider the case
> > where pfn < start_pfn, and also the case where end_pfn < pfn.
> >
> > This problem should have been addressd by the commit 6e2b7044c199
> > ("mm, compaction: make fast_isolate_freepages() stay within zone")
> > but there was an oversight.
> >
> > Case1: pfn < start_pfn
> >
> > <at memory compaction for node Y>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ end_pfn
> > ^ start_pfn = cc->zone->zone_start_pfn
> > pfn
> > <---------> scanned range by "Scan After"
> >
> > Case2: end_pfn < pfn
> >
> > <at memory compaction for node X>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ pfn
> > ^ end_pfn
> > start_pfn
> > <---------> scanned range by "Scan Before"
> >
> > It seems that there is no good reason to skip nr_isolated pages
> > just after given pfn. So let perform simple scan from start to end
> > instead of dividing the scan into "Before" and "After".
>
> Under what circumstances will this panic occur? I assume those
> circumstnces are pretty rare, give that 6e2b7044c1992 was nearly two
> years ago.
>
> Did you consider the desirability of backporting this fix into earlier
> kernels?
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special configurations.
Depending on how many similar systems there are, it may be a good idea to fix this problem for older kernels as well.
From: Xiubo Li <xiubli(a)redhat.com>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/caps.c | 2 +-
fs/ceph/locks.c | 4 ----
fs/ceph/super.h | 1 -
3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 065e9311b607..948136f81fc8 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2964,7 +2964,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
while (true) {
flags &= CEPH_FILE_MODE_MASK;
- if (atomic_read(&fi->num_locks))
+ if (vfs_inode_has_locks(inode))
flags |= CHECK_FILELOCK;
_got = 0;
ret = try_get_cap_refs(inode, need, want, endoff,
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 3e2843e86e27..b191426bf880 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -32,18 +32,14 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
- struct ceph_file_info *fi = dst->fl_file->private_data;
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
- atomic_inc(&fi->num_locks);
}
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct ceph_file_info *fi = fl->fl_file->private_data;
struct inode *inode = file_inode(fl->fl_file);
struct ceph_inode_info *ci = ceph_inode(inode);
- atomic_dec(&fi->num_locks);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 14454f464029..e7662ff6f149 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -804,7 +804,6 @@ struct ceph_file_info {
struct list_head rw_contexts;
u32 filp_gen;
- atomic_t num_locks;
};
struct ceph_dir_file_info {
--
2.31.1
The catch-all evict can fail due to object lock contention, since it
only goes as far as trylocking the object, due to us already holding the
vm->mutex. Doing a full object lock here can deadlock, since the
vm->mutex is always our inner lock. Add another execbuf pass which drops
the vm->mutex and then tries to grab the object will the full lock,
before then retrying the eviction. This should be good enough for now to
fix the immediate regression with userspace seeing -ENOSPC from execbuf
due to contended object locks during GTT eviction.
Testcase: igt@gem_ppgtt@shrink-vs-evict-*
Fixes: 7e00897be8bf ("drm/i915: Add object locking to i915_gem_evict_for_node and i915_gem_evict_something, v2.")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7627
References: https://gitlab.freedesktop.org/drm/intel/-/issues/7570
References: https://bugzilla.mozilla.org/show_bug.cgi?id=1779558
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: Andrzej Hajda <andrzej.hajda(a)intel.com>
Cc: Mani Milani <mani(a)chromium.org>
Cc: <stable(a)vger.kernel.org> # v5.18+
---
.../gpu/drm/i915/gem/i915_gem_execbuffer.c | 25 +++++++++++--
drivers/gpu/drm/i915/gem/i915_gem_mman.c | 2 +-
drivers/gpu/drm/i915/i915_gem_evict.c | 37 ++++++++++++++-----
drivers/gpu/drm/i915/i915_gem_evict.h | 4 +-
drivers/gpu/drm/i915/i915_vma.c | 2 +-
.../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +-
6 files changed, 56 insertions(+), 18 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 86956b902c97..e2ce1e4e9723 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -745,25 +745,44 @@ static int eb_reserve(struct i915_execbuffer *eb)
*
* Defragmenting is skipped if all objects are pinned at a fixed location.
*/
- for (pass = 0; pass <= 2; pass++) {
+ for (pass = 0; pass <= 3; pass++) {
int pin_flags = PIN_USER | PIN_VALIDATE;
if (pass == 0)
pin_flags |= PIN_NONBLOCK;
if (pass >= 1)
- unpinned = eb_unbind(eb, pass == 2);
+ unpinned = eb_unbind(eb, pass >= 2);
if (pass == 2) {
err = mutex_lock_interruptible(&eb->context->vm->mutex);
if (!err) {
- err = i915_gem_evict_vm(eb->context->vm, &eb->ww);
+ err = i915_gem_evict_vm(eb->context->vm, &eb->ww, NULL);
mutex_unlock(&eb->context->vm->mutex);
}
if (err)
return err;
}
+ if (pass == 3) {
+retry:
+ err = mutex_lock_interruptible(&eb->context->vm->mutex);
+ if (!err) {
+ struct drm_i915_gem_object *busy_bo = NULL;
+
+ err = i915_gem_evict_vm(eb->context->vm, &eb->ww, &busy_bo);
+ mutex_unlock(&eb->context->vm->mutex);
+ if (err && busy_bo) {
+ err = i915_gem_object_lock(busy_bo, &eb->ww);
+ i915_gem_object_put(busy_bo);
+ if (!err)
+ goto retry;
+ }
+ }
+ if (err)
+ return err;
+ }
+
list_for_each_entry(ev, &eb->unbound, bind_link) {
err = eb_reserve_vma(eb, ev, pin_flags);
if (err)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index d73ba0f5c4c5..4f69bff63068 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -369,7 +369,7 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
if (vma == ERR_PTR(-ENOSPC)) {
ret = mutex_lock_interruptible(&ggtt->vm.mutex);
if (!ret) {
- ret = i915_gem_evict_vm(&ggtt->vm, &ww);
+ ret = i915_gem_evict_vm(&ggtt->vm, &ww, NULL);
mutex_unlock(&ggtt->vm.mutex);
}
if (ret)
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 4cfe36b0366b..c02ebd6900ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -441,6 +441,11 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
* @vm: Address space to cleanse
* @ww: An optional struct i915_gem_ww_ctx. If not NULL, i915_gem_evict_vm
* will be able to evict vma's locked by the ww as well.
+ * @busy_bo: Optional pointer to struct drm_i915_gem_object. If not NULL, then
+ * in the event i915_gem_evict_vm() is unable to trylock an object for eviction,
+ * then @busy_bo will point to it. -EBUSY is also returned. The caller must drop
+ * the vm->mutex, before trying again to acquire the contended lock. The caller
+ * also owns a reference to the object.
*
* This function evicts all vmas from a vm.
*
@@ -450,7 +455,8 @@ int i915_gem_evict_for_node(struct i915_address_space *vm,
* To clarify: This is for freeing up virtual address space, not for freeing
* memory in e.g. the shrinker.
*/
-int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
+int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww,
+ struct drm_i915_gem_object **busy_bo)
{
int ret = 0;
@@ -482,15 +488,22 @@ int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
* the resv is shared among multiple objects, we still
* need the object ref.
*/
- if (dying_vma(vma) ||
+ if (!i915_gem_object_get_rcu(vma->obj) ||
(ww && (dma_resv_locking_ctx(vma->obj->base.resv) == &ww->ctx))) {
__i915_vma_pin(vma);
list_add(&vma->evict_link, &locked_eviction_list);
continue;
}
- if (!i915_gem_object_trylock(vma->obj, ww))
+ if (!i915_gem_object_trylock(vma->obj, ww)) {
+ if (busy_bo) {
+ *busy_bo = vma->obj; /* holds ref */
+ ret = -EBUSY;
+ break;
+ }
+ i915_gem_object_put(vma->obj);
continue;
+ }
__i915_vma_pin(vma);
list_add(&vma->evict_link, &eviction_list);
@@ -498,25 +511,29 @@ int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
if (list_empty(&eviction_list) && list_empty(&locked_eviction_list))
break;
- ret = 0;
/* Unbind locked objects first, before unlocking the eviction_list */
list_for_each_entry_safe(vma, vn, &locked_eviction_list, evict_link) {
__i915_vma_unpin(vma);
- if (ret == 0)
+ if (ret == 0) {
ret = __i915_vma_unbind(vma);
- if (ret != -EINTR) /* "Get me out of here!" */
- ret = 0;
+ if (ret != -EINTR) /* "Get me out of here!" */
+ ret = 0;
+ }
+ if (!dying_vma(vma))
+ i915_gem_object_put(vma->obj);
}
list_for_each_entry_safe(vma, vn, &eviction_list, evict_link) {
__i915_vma_unpin(vma);
- if (ret == 0)
+ if (ret == 0) {
ret = __i915_vma_unbind(vma);
- if (ret != -EINTR) /* "Get me out of here!" */
- ret = 0;
+ if (ret != -EINTR) /* "Get me out of here!" */
+ ret = 0;
+ }
i915_gem_object_unlock(vma->obj);
+ i915_gem_object_put(vma->obj);
}
} while (ret == 0);
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.h b/drivers/gpu/drm/i915/i915_gem_evict.h
index e593c530f9bd..bf0ee0e4fe60 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.h
+++ b/drivers/gpu/drm/i915/i915_gem_evict.h
@@ -11,6 +11,7 @@
struct drm_mm_node;
struct i915_address_space;
struct i915_gem_ww_ctx;
+struct drm_i915_gem_object;
int __must_check i915_gem_evict_something(struct i915_address_space *vm,
struct i915_gem_ww_ctx *ww,
@@ -23,6 +24,7 @@ int __must_check i915_gem_evict_for_node(struct i915_address_space *vm,
struct drm_mm_node *node,
unsigned int flags);
int i915_gem_evict_vm(struct i915_address_space *vm,
- struct i915_gem_ww_ctx *ww);
+ struct i915_gem_ww_ctx *ww,
+ struct drm_i915_gem_object **busy_bo);
#endif /* __I915_GEM_EVICT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index 34f0e6c923c2..7d044888ac33 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -1599,7 +1599,7 @@ static int __i915_ggtt_pin(struct i915_vma *vma, struct i915_gem_ww_ctx *ww,
* locked objects when called from execbuf when pinning
* is removed. This would probably regress badly.
*/
- i915_gem_evict_vm(vm, NULL);
+ i915_gem_evict_vm(vm, NULL, NULL);
mutex_unlock(&vm->mutex);
}
} while (1);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index 8c6517d29b8e..37068542aafe 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -344,7 +344,7 @@ static int igt_evict_vm(void *arg)
/* Everything is pinned, nothing should happen */
mutex_lock(&ggtt->vm.mutex);
- err = i915_gem_evict_vm(&ggtt->vm, NULL);
+ err = i915_gem_evict_vm(&ggtt->vm, NULL, NULL);
mutex_unlock(&ggtt->vm.mutex);
if (err) {
pr_err("i915_gem_evict_vm on a full GGTT returned err=%d]\n",
@@ -356,7 +356,7 @@ static int igt_evict_vm(void *arg)
for_i915_gem_ww(&ww, err, false) {
mutex_lock(&ggtt->vm.mutex);
- err = i915_gem_evict_vm(&ggtt->vm, &ww);
+ err = i915_gem_evict_vm(&ggtt->vm, &ww, NULL);
mutex_unlock(&ggtt->vm.mutex);
}
--
2.38.1
The patch titled
Subject: hfs: fix missing hfs_bnode_get() in __hfs_bnode_create
has been added to the -mm mm-nonmm-unstable branch. Its filename is
hfs-fix-missing-hfs_bnode_get-in-__hfs_bnode_create.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: hfs: fix missing hfs_bnode_get() in __hfs_bnode_create
Date: Mon, 12 Dec 2022 10:16:27 +0800
Syzbot found a kernel BUG in hfs_bnode_put():
kernel BUG at fs/hfs/bnode.c:466!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 3634 Comm: kworker/u4:5 Not tainted 6.1.0-rc7-syzkaller-00190-g97ee9d1c1696 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Workqueue: writeback wb_workfn (flush-7:0)
RIP: 0010:hfs_bnode_put+0x46f/0x480 fs/hfs/bnode.c:466
Code: 8a 80 ff e9 73 fe ff ff 89 d9 80 e1 07 80 c1 03 38 c1 0f 8c a0 fe ff ff 48 89 df e8 db 8a 80 ff e9 93 fe ff ff e8 a1 68 2c ff <0f> 0b e8 9a 68 2c ff 0f 0b 0f 1f 84 00 00 00 00 00 55 41 57 41 56
RSP: 0018:ffffc90003b4f258 EFLAGS: 00010293
RAX: ffffffff825e318f RBX: 0000000000000000 RCX: ffff8880739dd7c0
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffc90003b4f430 R08: ffffffff825e2d9b R09: ffffed10045157d1
R10: ffffed10045157d1 R11: 1ffff110045157d0 R12: ffff8880228abe80
R13: ffff88807016c000 R14: dffffc0000000000 R15: ffff8880228abe00
FS: 0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa6ebe88718 CR3: 000000001e93d000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
hfs_write_inode+0x1bc/0xb40
write_inode fs/fs-writeback.c:1440 [inline]
__writeback_single_inode+0x4d6/0x670 fs/fs-writeback.c:1652
writeback_sb_inodes+0xb3b/0x18f0 fs/fs-writeback.c:1878
__writeback_inodes_wb+0x125/0x420 fs/fs-writeback.c:1949
wb_writeback+0x440/0x7b0 fs/fs-writeback.c:2054
wb_check_start_all fs/fs-writeback.c:2176 [inline]
wb_do_writeback fs/fs-writeback.c:2202 [inline]
wb_workfn+0x827/0xef0 fs/fs-writeback.c:2235
process_one_work+0x877/0xdb0 kernel/workqueue.c:2289
worker_thread+0xb14/0x1330 kernel/workqueue.c:2436
kthread+0x266/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
</TASK>
The BUG_ON() is triggered at here:
/* Dispose of resources used by a node */
void hfs_bnode_put(struct hfs_bnode *node)
{
if (node) {
<skipped>
BUG_ON(!atomic_read(&node->refcnt)); <- we have issue here!!!!
<skipped>
}
}
By tracing the refcnt, I found the node is created by hfs_bmap_alloc()
with refcnt 1. Then the node is used by hfs_btree_write(). There is a
missing of hfs_bnode_get() after find the node. The issue happened in
following path:
<alloc>
hfs_bmap_alloc
hfs_bnode_find
__hfs_bnode_create <- allocate a new node with refcnt 1.
hfs_bnode_put <- decrease the refcnt
<write>
hfs_btree_write
hfs_bnode_find
__hfs_bnode_create
hfs_bnode_findhash <- find the node without refcnt increased.
hfs_bnode_put <- trigger the BUG_ON() since refcnt is 0.
Link: https://lkml.kernel.org/r/20221212021627.3766829-1-liushixin2@huawei.com
Reported-by: syzbot+5b04b49a7ec7226c7426(a)syzkaller.appspotmail.com
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Cc: Fabio M. De Francesco <fmdefrancesco(a)gmail.com>
Cc: Viacheslav Dubeyko <slava(a)dubeyko.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hfs/bnode.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/hfs/bnode.c~hfs-fix-missing-hfs_bnode_get-in-__hfs_bnode_create
+++ a/fs/hfs/bnode.c
@@ -274,6 +274,7 @@ static struct hfs_bnode *__hfs_bnode_cre
tree->node_hash[hash] = node;
tree->node_hash_cnt++;
} else {
+ hfs_bnode_get(node2);
spin_unlock(&tree->hash_lock);
kfree(node);
wait_event(node2->lock_wq, !test_bit(HFS_BNODE_NEW, &node2->flags));
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
hfs-fix-missing-hfs_bnode_get-in-__hfs_bnode_create.patch
The patch titled
Subject: hugetlb: really allocate vma lock for all sharable vmas
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hugetlb-really-allocate-vma-lock-for-all-sharable-vmas.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: really allocate vma lock for all sharable vmas
Date: Mon, 12 Dec 2022 15:50:41 -0800
Commit bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
removed the pmd sharable checks in the vma lock helper routines. However,
it left the functional version of helper routines behind #ifdef
CONFIG_ARCH_WANT_HUGE_PMD_SHARE. Therefore, the vma lock is not being
used for sharable vmas on architectures that do not support pmd sharing.
On these architectures, a potential fault/truncation race is exposed that
could leave pages in a hugetlb file past i_size until the file is removed.
Move the functional vma lock helpers outside the ifdef, and remove the
non-functional stubs. Since the vma lock is not just for pmd sharing,
rename the routine __vma_shareable_flags_pmd.
Link: https://lkml.kernel.org/r/20221212235042.178355-1-mike.kravetz@oracle.com
Fixes: bbff39cc6cbc ("hugetlb: allocate vma lock for all sharable vmas")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Mina Almasry <almasrymina(a)google.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 333 +++++++++++++++++++++----------------------------
1 file changed, 148 insertions(+), 185 deletions(-)
--- a/mm/hugetlb.c~hugetlb-really-allocate-vma-lock-for-all-sharable-vmas
+++ a/mm/hugetlb.c
@@ -255,6 +255,152 @@ static inline struct hugepage_subpool *s
return subpool_inode(file_inode(vma->vm_file));
}
+/*
+ * hugetlb vma_lock helper routines
+ */
+static bool __vma_shareable_lock(struct vm_area_struct *vma)
+{
+ return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) &&
+ vma->vm_private_data;
+}
+
+void hugetlb_vma_lock_read(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ down_read(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ up_read(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_lock_write(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ down_write(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_unlock_write(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ up_write(&vma_lock->rw_sema);
+ }
+}
+
+int hugetlb_vma_trylock_write(struct vm_area_struct *vma)
+{
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ if (!__vma_shareable_lock(vma))
+ return 1;
+
+ return down_write_trylock(&vma_lock->rw_sema);
+}
+
+void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ lockdep_assert_held(&vma_lock->rw_sema);
+ }
+}
+
+void hugetlb_vma_lock_release(struct kref *kref)
+{
+ struct hugetlb_vma_lock *vma_lock = container_of(kref,
+ struct hugetlb_vma_lock, refs);
+
+ kfree(vma_lock);
+}
+
+static void __hugetlb_vma_unlock_write_put(struct hugetlb_vma_lock *vma_lock)
+{
+ struct vm_area_struct *vma = vma_lock->vma;
+
+ /*
+ * vma_lock structure may or not be released as a result of put,
+ * it certainly will no longer be attached to vma so clear pointer.
+ * Semaphore synchronizes access to vma_lock->vma field.
+ */
+ vma_lock->vma = NULL;
+ vma->vm_private_data = NULL;
+ up_write(&vma_lock->rw_sema);
+ kref_put(&vma_lock->refs, hugetlb_vma_lock_release);
+}
+
+static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma)
+{
+ if (__vma_shareable_lock(vma)) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ __hugetlb_vma_unlock_write_put(vma_lock);
+ }
+}
+
+static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
+{
+ /*
+ * Only present in sharable vmas.
+ */
+ if (!vma || !__vma_shareable_lock(vma))
+ return;
+
+ if (vma->vm_private_data) {
+ struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
+
+ down_write(&vma_lock->rw_sema);
+ __hugetlb_vma_unlock_write_put(vma_lock);
+ }
+}
+
+static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
+{
+ struct hugetlb_vma_lock *vma_lock;
+
+ /* Only establish in (flags) sharable vmas */
+ if (!vma || !(vma->vm_flags & VM_MAYSHARE))
+ return;
+
+ /* Should never get here with non-NULL vm_private_data */
+ if (vma->vm_private_data)
+ return;
+
+ vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
+ if (!vma_lock) {
+ /*
+ * If we can not allocate structure, then vma can not
+ * participate in pmd sharing. This is only a possible
+ * performance enhancement and memory saving issue.
+ * However, the lock is also used to synchronize page
+ * faults with truncation. If the lock is not present,
+ * unlikely races could leave pages in a file past i_size
+ * until the file is removed. Warn in the unlikely case of
+ * allocation failure.
+ */
+ pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
+ return;
+ }
+
+ kref_init(&vma_lock->refs);
+ init_rwsem(&vma_lock->rw_sema);
+ vma_lock->vma = vma;
+ vma->vm_private_data = vma_lock;
+}
+
/* Helper that removes a struct file_region from the resv_map cache and returns
* it for use.
*/
@@ -6616,7 +6762,8 @@ bool hugetlb_reserve_pages(struct inode
}
/*
- * vma specific semaphore used for pmd sharing synchronization
+ * vma specific semaphore used for pmd sharing and fault/truncation
+ * synchronization
*/
hugetlb_vma_lock_alloc(vma);
@@ -6872,149 +7019,6 @@ void adjust_range_if_pmd_sharing_possibl
*end = ALIGN(*end, PUD_SIZE);
}
-static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma)
-{
- return vma->vm_flags & (VM_MAYSHARE | VM_SHARED) &&
- vma->vm_private_data;
-}
-
-void hugetlb_vma_lock_read(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- down_read(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- up_read(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_lock_write(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- down_write(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_unlock_write(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- up_write(&vma_lock->rw_sema);
- }
-}
-
-int hugetlb_vma_trylock_write(struct vm_area_struct *vma)
-{
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- if (!__vma_shareable_flags_pmd(vma))
- return 1;
-
- return down_write_trylock(&vma_lock->rw_sema);
-}
-
-void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- lockdep_assert_held(&vma_lock->rw_sema);
- }
-}
-
-void hugetlb_vma_lock_release(struct kref *kref)
-{
- struct hugetlb_vma_lock *vma_lock = container_of(kref,
- struct hugetlb_vma_lock, refs);
-
- kfree(vma_lock);
-}
-
-static void __hugetlb_vma_unlock_write_put(struct hugetlb_vma_lock *vma_lock)
-{
- struct vm_area_struct *vma = vma_lock->vma;
-
- /*
- * vma_lock structure may or not be released as a result of put,
- * it certainly will no longer be attached to vma so clear pointer.
- * Semaphore synchronizes access to vma_lock->vma field.
- */
- vma_lock->vma = NULL;
- vma->vm_private_data = NULL;
- up_write(&vma_lock->rw_sema);
- kref_put(&vma_lock->refs, hugetlb_vma_lock_release);
-}
-
-static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma)
-{
- if (__vma_shareable_flags_pmd(vma)) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- __hugetlb_vma_unlock_write_put(vma_lock);
- }
-}
-
-static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
-{
- /*
- * Only present in sharable vmas.
- */
- if (!vma || !__vma_shareable_flags_pmd(vma))
- return;
-
- if (vma->vm_private_data) {
- struct hugetlb_vma_lock *vma_lock = vma->vm_private_data;
-
- down_write(&vma_lock->rw_sema);
- __hugetlb_vma_unlock_write_put(vma_lock);
- }
-}
-
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
-{
- struct hugetlb_vma_lock *vma_lock;
-
- /* Only establish in (flags) sharable vmas */
- if (!vma || !(vma->vm_flags & VM_MAYSHARE))
- return;
-
- /* Should never get here with non-NULL vm_private_data */
- if (vma->vm_private_data)
- return;
-
- vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL);
- if (!vma_lock) {
- /*
- * If we can not allocate structure, then vma can not
- * participate in pmd sharing. This is only a possible
- * performance enhancement and memory saving issue.
- * However, the lock is also used to synchronize page
- * faults with truncation. If the lock is not present,
- * unlikely races could leave pages in a file past i_size
- * until the file is removed. Warn in the unlikely case of
- * allocation failure.
- */
- pr_warn_once("HugeTLB: unable to allocate vma specific lock\n");
- return;
- }
-
- kref_init(&vma_lock->refs);
- init_rwsem(&vma_lock->rw_sema);
- vma_lock->vma = vma;
- vma->vm_private_data = vma_lock;
-}
-
/*
* Search for a shareable pmd page for hugetlb. In any case calls pmd_alloc()
* and returns the corresponding pte. While this is not necessary for the
@@ -7103,47 +7107,6 @@ int huge_pmd_unshare(struct mm_struct *m
#else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */
-void hugetlb_vma_lock_read(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_unlock_read(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_lock_write(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_unlock_write(struct vm_area_struct *vma)
-{
-}
-
-int hugetlb_vma_trylock_write(struct vm_area_struct *vma)
-{
- return 1;
-}
-
-void hugetlb_vma_assert_locked(struct vm_area_struct *vma)
-{
-}
-
-void hugetlb_vma_lock_release(struct kref *kref)
-{
-}
-
-static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma)
-{
-}
-
-static void hugetlb_vma_lock_free(struct vm_area_struct *vma)
-{
-}
-
-static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma)
-{
-}
-
pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long addr, pud_t *pud)
{
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-really-allocate-vma-lock-for-all-sharable-vmas.patch
The patch titled
Subject: mm/uffd: fix pte marker when fork() without fork event
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-uffd-fix-pte-marker-when-fork-without-fork-event.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/uffd: fix pte marker when fork() without fork event
Date: Wed, 14 Dec 2022 15:04:52 -0500
Patch series "mm: Fixes on pte markers".
Patch 1 resolves the syzkiller report from Pengfei.
Patch 2 further harden pte markers when used with the recent swapin error
markers. The major case is we should persist a swapin error marker after
fork(), so child shouldn't read a corrupted page.
This patch (of 2):
When fork(), dst_vma is not guaranteed to have VM_UFFD_WP even if src may
have it and has pte marker installed. The warning is improper along with
the comment. The right thing is to inherit the pte marker when needed, or
keep the dst pte empty.
A vague guess is this happened by an accident when there's the prior patch
to introduce src/dst vma into this helper during the uffd-wp feature got
developed and I probably messed up in the rebase, since if we replace
dst_vma with src_vma the warning & comment it all makes sense too.
Hugetlb did exactly the right here (copy_hugetlb_page_range()). Fix the
general path.
Reproducer:
https://github.com/xupengfe/syzkaller_logs/blob/main/221208_115556_copy_pag…
Bugzilla report: https://bugzilla.kernel.org/show_bug.cgi?id=216808
Link: https://lkml.kernel.org/r/20221214200453.1772655-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20221214200453.1772655-2-peterx@redhat.com
Fixes: c56d1b62cce8 ("mm/shmem: handle uffd-wp during fork()")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Pengfei Xu <pengfei.xu(a)intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 5.19+
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
--- a/mm/memory.c~mm-uffd-fix-pte-marker-when-fork-without-fork-event
+++ a/mm/memory.c
@@ -828,12 +828,8 @@ copy_nonpresent_pte(struct mm_struct *ds
return -EBUSY;
return -ENOENT;
} else if (is_pte_marker_entry(entry)) {
- /*
- * We're copying the pgtable should only because dst_vma has
- * uffd-wp enabled, do sanity check.
- */
- WARN_ON_ONCE(!userfaultfd_wp(dst_vma));
- set_pte_at(dst_mm, addr, dst_pte, pte);
+ if (userfaultfd_wp(dst_vma))
+ set_pte_at(dst_mm, addr, dst_pte, pte);
return 0;
}
if (!userfaultfd_wp(dst_vma))
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-uffd-fix-pte-marker-when-fork-without-fork-event.patch
mm-fix-a-few-rare-cases-of-using-swapin-error-pte-marker.patch
The patch titled
Subject: kmsan: export kmsan_handle_urb
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-export-kmsan_handle_urb.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: kmsan: export kmsan_handle_urb
Date: Thu, 15 Dec 2022 17:26:57 +0100
USB support can be in a loadable module, and this causes a link failure
with KMSAN:
ERROR: modpost: "kmsan_handle_urb" [drivers/usb/core/usbcore.ko] undefined!
Export the symbol so it can be used by this module.
Link: https://lkml.kernel.org/r/20221215162710.3802378-1-arnd@kernel.org
Fixes: 553a80188a5d ("kmsan: handle memory sent to/from USB")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/hooks.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kmsan/hooks.c~kmsan-export-kmsan_handle_urb
+++ a/mm/kmsan/hooks.c
@@ -260,6 +260,7 @@ void kmsan_handle_urb(const struct urb *
urb->transfer_buffer_length,
/*checked*/ false);
}
+EXPORT_SYMBOL_GPL(kmsan_handle_urb);
static void kmsan_handle_dma_page(const void *addr, size_t size,
enum dma_data_direction dir)
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kmsan-include-linux-vmalloch.patch
kmsan-export-kmsan_handle_urb.patch
The patch titled
Subject: kmsan: include linux/vmalloc.h
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-include-linux-vmalloch.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: kmsan: include linux/vmalloc.h
Date: Thu, 15 Dec 2022 17:30:17 +0100
This is needed for the vmap/vunmap declarations:
mm/kmsan/kmsan_test.c:316:9: error: implicit declaration of function 'vmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:316:29: error: use of undeclared identifier 'VM_MAP'
vbuf = vmap(pages, npages, VM_MAP, PAGE_KERNEL);
^
mm/kmsan/kmsan_test.c:322:3: error: implicit declaration of function 'vunmap' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
vunmap(vbuf);
^
Link: https://lkml.kernel.org/r/20221215163046.4079767-1-arnd@kernel.org
Fixes: 8ed691b02ade ("kmsan: add tests for KMSAN")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/kmsan_test.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/kmsan/kmsan_test.c~kmsan-include-linux-vmalloch
+++ a/mm/kmsan/kmsan_test.c
@@ -22,6 +22,7 @@
#include <linux/spinlock.h>
#include <linux/string.h>
#include <linux/tracepoint.h>
+#include <linux/vmalloc.h>
#include <trace/events/printk.h>
static DEFINE_PER_CPU(int, per_cpu_var);
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kmsan-include-linux-vmalloch.patch
The patch titled
Subject: mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-fix-memory-leak-in-set_mempolicy_home_node-system-call.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Subject: mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
Date: Thu, 15 Dec 2022 14:46:21 -0500
When encountering any vma in the range with policy other than MPOL_BIND or
MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put on
the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Link: https://lkml.kernel.org/r/20221215194621.202816-1-mathieu.desnoyers@efficio…
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mempolicy.c~mm-mempolicy-fix-memory-leak-in-set_mempolicy_home_node-system-call
+++ a/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node,
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
_
Patches currently in -mm which might be from mathieu.desnoyers(a)efficios.com are
mm-mempolicy-fix-memory-leak-in-set_mempolicy_home_node-system-call.patch
When encountering any vma in the range with policy other than MPOL_BIND
or MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put
on the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky(a)intel.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <linux-api(a)vger.kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org # 5.17+
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 61aa9aedb728..02c8a712282f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
--
2.25.1
When encountering any vma in the range with policy other than MPOL_BIND
or MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put
on the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Reviewed-by: Randy Dunlap <rdunlap(a)infradead.org>
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Feng Tang <feng.tang(a)intel.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: <linux-api(a)vger.kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org # 5.17+
---
mm/mempolicy.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 61aa9aedb728..02c8a712282f 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1540,6 +1540,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
* the home node for vmas we already updated before.
*/
if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) {
+ mpol_put(new);
err = -EOPNOTSUPP;
break;
}
--
2.25.1
HP Elitebook 865 supports both the AMD GUID w/ _REV 2 and Microsoft
GUID with _REV 0. Both have very similar code but the AMD GUID
has a special workaround that is specific to a problem with
spurious wakeups on systems with Qualcomm WLAN.
This is believed to be a bug in the Qualcomm WLAN F/W (it doesn't
affect any other WLAN H/W). If this WLAN firmware is fixed this
quirk can be dropped.
Cc: stable(a)vger.kernel.org # 6.1
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/acpi/x86/s2idle.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c
index 5350c73564b6..422415cb14f4 100644
--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -401,6 +401,13 @@ static const struct acpi_device_id amd_hid_ids[] = {
{}
};
+static int lps0_prefer_amd(const struct dmi_system_id *id)
+{
+ pr_debug("Using AMD GUID w/ _REV 2.\n");
+ rev_id = 2;
+ return 0;
+}
+
static int lps0_prefer_microsoft(const struct dmi_system_id *id)
{
pr_debug("Preferring Microsoft GUID.\n");
@@ -462,6 +469,19 @@ static const struct dmi_system_id s2idle_dmi_table[] __initconst = {
DMI_MATCH(DMI_PRODUCT_NAME, "ROG Flow X16 GV601"),
},
},
+ {
+ /*
+ * AMD Rembrandt based HP EliteBook 835/845/865 G9
+ * Contains specialized AML in AMD/_REV 2 path to avoid
+ * triggering a bug in Qualcomm WLAN firmware. This may be
+ * removed in the future if that firmware is fixed.
+ */
+ .callback = lps0_prefer_amd,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "HP"),
+ DMI_MATCH(DMI_BOARD_NAME, "8990"),
+ },
+ },
{}
};
--
2.34.1
Hello:
This patch was applied to netdev/net.git (master)
by Paolo Abeni <pabeni(a)redhat.com>:
On Wed, 14 Dec 2022 10:51:18 +0000 you wrote:
> This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch
> device to config mode" during unbind.
>
> We are doing register access after pm_runtime_put_sync().
>
> We usually do cleanup in reverse order of init. Currently in
> remove(), the "pm_runtime_put_sync" is not in reverse order.
>
> [...]
Here is the summary with links:
- [net,v2] ravb: Fix "failed to switch device to config mode" message during unbind
https://git.kernel.org/netdev/net/c/c72a7e42592b
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
When updating the operating mode as part of regulator enable, the caller
has already locked the regulator tree and drms_uA_update() must not try
to do the same in order not to trigger a deadlock.
The lock inversion is reported by lockdep as:
======================================================
WARNING: possible circular locking dependency detected
6.1.0-next-20221215 #142 Not tainted
------------------------------------------------------
udevd/154 is trying to acquire lock:
ffffc11f123d7e50 (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x54/0x280
but task is already holding lock:
ffff80000e4c36e8 (regulator_ww_class_acquire){+.+.}-{0:0}, at: regulator_enable+0x34/0x80
which lock already depends on the new lock.
...
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(regulator_ww_class_acquire);
lock(regulator_list_mutex);
lock(regulator_ww_class_acquire);
lock(regulator_list_mutex);
*** DEADLOCK ***
just before probe of a Qualcomm UFS controller (occasionally) deadlocks
when enabling one of its regulators.
Fixes: 9243a195be7a ("regulator: core: Change voltage setting path")
Fixes: f8702f9e4aa7 ("regulator: core: Use ww_mutex for regulators locking")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/regulator/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 729c45393803..ae69e493913d 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -1002,7 +1002,7 @@ static int drms_uA_update(struct regulator_dev *rdev)
/* get input voltage */
input_uV = 0;
if (rdev->supply)
- input_uV = regulator_get_voltage(rdev->supply);
+ input_uV = regulator_get_voltage_rdev(rdev->supply->rdev);
if (input_uV <= 0)
input_uV = rdev->constraints->input_uV;
--
2.37.4
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
46307fd6e27a ("cgroup: Reorganize css_set_lock and kernfs path processing")
4534dee94105 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id")
74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
e210a89f5b07 ("cgroup.c: add helper __cset_cgroup_from_root to cleanup duplicated codes")
be288169712f ("cgroup: reduce dependency on cgroup_mutex")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46307fd6e27a3f678a1678b02e667678c22aa8cc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny(a)suse.com>
Date: Mon, 10 Oct 2022 10:29:18 +0200
Subject: [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
kernfs_walk_and_get.
We only need to make sure that the returned root_cgrp won't be freed
under us. This is given in the case of global root because it is static
(cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
is pinned by cgroup_ns->root_cset (and `current` task cannot switch
namespace asynchronously so ns_proxy pins cgroup_ns).
Note this reasoning won't hold for root cgroups in v1 hierarchies,
therefore create a special-cased helper function just for the default
hierarchy.
Fixes: 74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 764bdd5fd8d1..ecf409e3c3a7 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1392,6 +1392,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
cgroup_free_root(root);
}
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
@@ -1403,6 +1406,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
res_cgroup = cset->dfl_cgrp;
} else {
struct cgrp_cset_link *link;
+ lockdep_assert_held(&css_set_lock);
list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;
@@ -1414,6 +1418,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
}
}
+ BUG_ON(!res_cgroup);
return res_cgroup;
}
@@ -1436,23 +1441,36 @@ current_cgns_cgroup_from_root(struct cgroup_root *root)
rcu_read_unlock();
- BUG_ON(!res);
return res;
}
+/*
+ * Look up cgroup associated with current task's cgroup namespace on the default
+ * hierarchy.
+ *
+ * Unlike current_cgns_cgroup_from_root(), this doesn't need locks:
+ * - Internal rcu_read_lock is unnecessary because we don't dereference any rcu
+ * pointers.
+ * - css_set_lock is not needed because we just read cset->dfl_cgrp.
+ * - As a bonus returned cgrp is pinned with the current because it cannot
+ * switch cgroup_ns asynchronously.
+ */
+static struct cgroup *current_cgns_cgroup_dfl(void)
+{
+ struct css_set *cset;
+
+ cset = current->nsproxy->cgroup_ns->root_cset;
+ return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
+}
+
/* look up cgroup associated with given css_set on the specified hierarchy */
static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
- struct cgroup *res = NULL;
-
lockdep_assert_held(&cgroup_mutex);
lockdep_assert_held(&css_set_lock);
- res = __cset_cgroup_from_root(cset, root);
-
- BUG_ON(!res);
- return res;
+ return __cset_cgroup_from_root(cset, root);
}
/*
@@ -6105,9 +6123,7 @@ struct cgroup *cgroup_get_from_id(u64 id)
if (!cgrp)
return ERR_PTR(-ENOENT);
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
- spin_unlock_irq(&css_set_lock);
+ root_cgrp = current_cgns_cgroup_dfl();
if (!cgroup_is_descendant(cgrp, root_cgrp)) {
cgroup_put(cgrp);
return ERR_PTR(-ENOENT);
@@ -6686,10 +6702,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
struct cgroup *cgrp = ERR_PTR(-ENOENT);
struct cgroup *root_cgrp;
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
+ root_cgrp = current_cgns_cgroup_dfl();
kn = kernfs_walk_and_get(root_cgrp->kn, path);
- spin_unlock_irq(&css_set_lock);
if (!kn)
goto out;
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
46307fd6e27a ("cgroup: Reorganize css_set_lock and kernfs path processing")
4534dee94105 ("cgroup: cgroup: Honor caller's cgroup NS when resolving cgroup id")
74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46307fd6e27a3f678a1678b02e667678c22aa8cc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Michal=20Koutn=C3=BD?= <mkoutny(a)suse.com>
Date: Mon, 10 Oct 2022 10:29:18 +0200
Subject: [PATCH] cgroup: Reorganize css_set_lock and kernfs path processing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The commit 74e4b956eb1c incorrectly wrapped kernfs_walk_and_get
(might_sleep) under css_set_lock (spinlock). css_set_lock is needed by
__cset_cgroup_from_root to ensure stable cset->cgrp_links but not for
kernfs_walk_and_get.
We only need to make sure that the returned root_cgrp won't be freed
under us. This is given in the case of global root because it is static
(cgrp_dfl_root.cgrp). When the root_cgrp is lower in the hierarchy, it
is pinned by cgroup_ns->root_cset (and `current` task cannot switch
namespace asynchronously so ns_proxy pins cgroup_ns).
Note this reasoning won't hold for root cgroups in v1 hierarchies,
therefore create a special-cased helper function just for the default
hierarchy.
Fixes: 74e4b956eb1c ("cgroup: Honor caller's cgroup NS when resolving path")
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Signed-off-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 764bdd5fd8d1..ecf409e3c3a7 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1392,6 +1392,9 @@ static void cgroup_destroy_root(struct cgroup_root *root)
cgroup_free_root(root);
}
+/*
+ * Returned cgroup is without refcount but it's valid as long as cset pins it.
+ */
static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
@@ -1403,6 +1406,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
res_cgroup = cset->dfl_cgrp;
} else {
struct cgrp_cset_link *link;
+ lockdep_assert_held(&css_set_lock);
list_for_each_entry(link, &cset->cgrp_links, cgrp_link) {
struct cgroup *c = link->cgrp;
@@ -1414,6 +1418,7 @@ static inline struct cgroup *__cset_cgroup_from_root(struct css_set *cset,
}
}
+ BUG_ON(!res_cgroup);
return res_cgroup;
}
@@ -1436,23 +1441,36 @@ current_cgns_cgroup_from_root(struct cgroup_root *root)
rcu_read_unlock();
- BUG_ON(!res);
return res;
}
+/*
+ * Look up cgroup associated with current task's cgroup namespace on the default
+ * hierarchy.
+ *
+ * Unlike current_cgns_cgroup_from_root(), this doesn't need locks:
+ * - Internal rcu_read_lock is unnecessary because we don't dereference any rcu
+ * pointers.
+ * - css_set_lock is not needed because we just read cset->dfl_cgrp.
+ * - As a bonus returned cgrp is pinned with the current because it cannot
+ * switch cgroup_ns asynchronously.
+ */
+static struct cgroup *current_cgns_cgroup_dfl(void)
+{
+ struct css_set *cset;
+
+ cset = current->nsproxy->cgroup_ns->root_cset;
+ return __cset_cgroup_from_root(cset, &cgrp_dfl_root);
+}
+
/* look up cgroup associated with given css_set on the specified hierarchy */
static struct cgroup *cset_cgroup_from_root(struct css_set *cset,
struct cgroup_root *root)
{
- struct cgroup *res = NULL;
-
lockdep_assert_held(&cgroup_mutex);
lockdep_assert_held(&css_set_lock);
- res = __cset_cgroup_from_root(cset, root);
-
- BUG_ON(!res);
- return res;
+ return __cset_cgroup_from_root(cset, root);
}
/*
@@ -6105,9 +6123,7 @@ struct cgroup *cgroup_get_from_id(u64 id)
if (!cgrp)
return ERR_PTR(-ENOENT);
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
- spin_unlock_irq(&css_set_lock);
+ root_cgrp = current_cgns_cgroup_dfl();
if (!cgroup_is_descendant(cgrp, root_cgrp)) {
cgroup_put(cgrp);
return ERR_PTR(-ENOENT);
@@ -6686,10 +6702,8 @@ struct cgroup *cgroup_get_from_path(const char *path)
struct cgroup *cgrp = ERR_PTR(-ENOENT);
struct cgroup *root_cgrp;
- spin_lock_irq(&css_set_lock);
- root_cgrp = current_cgns_cgroup_from_root(&cgrp_dfl_root);
+ root_cgrp = current_cgns_cgroup_dfl();
kn = kernfs_walk_and_get(root_cgrp->kn, path);
- spin_unlock_irq(&css_set_lock);
if (!kn)
goto out;
Dear Stable,
[NB: Re-poking Stable with the correct contact address this time! :)]
> > > area_cache_get() is used to distribute cache->area and set cache->id,
> > > and if cache->id is not 0 and cache->area->kref refcount is 0, it will
> > > release the cache->area by nfp_cpp_area_release(). area_cache_get()
> > > set cache->id before cpp->op->area_init() and nfp_cpp_area_acquire().
> > >
> > > But if area_init() or nfp_cpp_area_acquire() fails, the cache->id is
> > > is already set but the refcount is not increased as expected. At this
> > > time, calling the nfp_cpp_area_release() will cause use-after-free.
> > >
> > > To avoid the use-after-free, set cache->id after area_init() and
> > > nfp_cpp_area_acquire() complete successfully.
> > >
> > > Note: This vulnerability is triggerable by providing emulated device
> > > equipped with specified configuration.
> > >
> > > BUG: KASAN: use-after-free in nfp6000_area_init (/home/user/Kernel/v5.19
> > > /x86_64/src/drivers/net/ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
> > > Write of size 4 at addr ffff888005b7f4a0 by task swapper/0/1
> > >
> > > Call Trace:
> > > <TASK>
> > > nfp6000_area_init (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > > /ethernet/netronome/nfp/nfpcore/nfp6000_pcie.c:760)
> > > area_cache_get.constprop.8 (/home/user/Kernel/v5.19/x86_64/src/drivers
> > > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:884)
> > >
> > > Allocated by task 1:
> > > nfp_cpp_area_alloc_with_name (/home/user/Kernel/v5.19/x86_64/src/drivers
> > > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:303)
> > > nfp_cpp_area_cache_add (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > > /ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:802)
> > > nfp6000_init (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfpcore/nfp6000_pcie.c:1230)
> > > nfp_cpp_from_operations (/home/user/Kernel/v5.19/x86_64/src/drivers/net
> > > /ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:1215)
> > > nfp_pci_probe (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfp_main.c:744)
> > >
> > > Freed by task 1:
> > > kfree (/home/user/Kernel/v5.19/x86_64/src/mm/slub.c:4562)
> > > area_cache_get.constprop.8 (/home/user/Kernel/v5.19/x86_64/src/drivers
> > > /net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:873)
> > > nfp_cpp_read (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfpcore/nfp_cppcore.c:924 /home/user/Kernel/v5.19/x86_64
> > > /src/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_cppcore.c:973)
> > > nfp_cpp_readl (/home/user/Kernel/v5.19/x86_64/src/drivers/net/ethernet
> > > /netronome/nfp/nfpcore/nfp_cpplib.c:48)
> > >
> > > Signed-off-by: Jialiang Wang <wangjialiang0806(a)163.com>
> >
> > Any reason why this doesn't have a Fixes: tag applied and/or didn't
> > get sent to Stable?
> >
> > Looks as if this needs to go back as far as v4.19.
> >
> > Fixes: 4cb584e0ee7df ("nfp: add CPP access core")
> >
> > commit 02e1a114fdb71e59ee6770294166c30d437bf86a upstream.
Would you be able to take this with the information provided please?
--
Lee Jones [李琼斯]
From: Miklos Szeredi <mszeredi(a)redhat.com>
commit df8629af293493757beccac2d3168fe5a315636e upstream.
Failure to do so may result in EEXIST even if the file only exists in the
cache and not in the filesystem.
The atomic nature of O_EXCL mandates that the cached state should be
ignored and existence verified anew.
Change-Id: I0f173de6f9f1af05d6e816246b5c56b670ec079c
Reported-by: Ken Schalk <kschalk(a)nvidia.com>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
Signed-off-by: Wu Bo <bo.wu(a)vivo.com>
---
fs/fuse/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 8e95a75a4559..80a9e50392a0 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -205,7 +205,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
if (inode && fuse_is_bad(inode))
goto invalid;
else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
- (flags & LOOKUP_REVAL)) {
+ (flags & (LOOKUP_EXCL | LOOKUP_REVAL))) {
struct fuse_entry_out outarg;
FUSE_ARGS(args);
struct fuse_forget_link *forget;
--
2.35.3
Greg,
The recent history of copy_file_range() API is somewhat convoluted.
The API changes are documented in copy_file_range(2) man page.
I've just posted a man page update patch [1] to fix some wrong kernel
version references in the man page.
The problem is that it took many kernel releases to get reports on the
regression from v5.3 and yet more releases (v5.12..v5.19) to work on
the solution and get it merged.
This situation leads to confusion among users as can be seen in [2].
You've already picked the patch [1/2] to 5.15.y and I sent you a
request to pick patch [2/2] (from v6.1) as well.
Following are backports of the two patches to 5.10.y, which I verified
with the relevant test in fstests.
Backport to 5.4.y is more challenging because Darrick did a lot of
cleanup in that area between v5.4..v5.10, so I did not do it.
Thanks,
Amir.
[1] https://lore.kernel.org/linux-fsdevel/20221213120834.948163-1-amir73il@gmai…
[2] https://bugzilla.kernel.org/show_bug.cgi?id=216800
Amir Goldstein (2):
vfs: fix copy_file_range() regression in cross-fs copies
vfs: fix copy_file_range() averts filesystem freeze protection
fs/nfsd/vfs.c | 8 ++++-
fs/read_write.c | 90 +++++++++++++++++++++++++++++++++---------------------
include/linux/fs.h | 8 +++++
3 files changed, 70 insertions(+), 36 deletions(-)
--
2.16.5
v5.11 changes the blkdev lookup mechanism completely since commit
22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get"),
and small part of the change is to unhash part bdev inode when
deleting partition. Turns out this kind of change does fix one
nasty issue in case of BLOCK_EXT_MAJOR:
1) when one partition is deleted & closed, disk_put_part() is always
called before bdput(bdev), see blkdev_put(); so the part's devt can
be freed & re-used before the inode is dropped
2) then new partition with same devt can be created just before the
inode in 1) is dropped, then the old inode/bdev structurein 1) is
re-used for this new partition, this way causes use-after-free and
kernel panic.
It isn't possible to backport the whole big patchset of "merge struct
block_device and struct hd_struct v4" for addressing this issue.
https://lore.kernel.org/linux-block/20201128161510.347752-1-hch@lst.de/
So fixes it by unhashing part bdev in delete_partition(), and this way
is actually aligned with v5.11+'s behavior.
Backported from the following 5.10.y commit:
5f2f77560591 ("block: unhash blkdev part inode when the part is deleted")
Reported-by: Shiwei Cui <cuishw(a)inspur.com>
Tested-by: Shiwei Cui <cuishw(a)inspur.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Jan Kara <jack(a)suse.cz>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/partition-generic.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/block/partition-generic.c b/block/partition-generic.c
index 298c05f8b5e3..434c122cb958 100644
--- a/block/partition-generic.c
+++ b/block/partition-generic.c
@@ -254,6 +254,7 @@ void delete_partition(struct gendisk *disk, int partno)
{
struct disk_part_tbl *ptbl = disk->part_tbl;
struct hd_struct *part;
+ struct block_device *bdev;
if (partno >= ptbl->len)
return;
@@ -267,6 +268,11 @@ void delete_partition(struct gendisk *disk, int partno)
kobject_put(part->holder_dir);
device_del(part_to_dev(part));
+ bdev = bdget(part_devt(part));
+ if (bdev) {
+ remove_inode_hash(bdev->bd_inode);
+ bdput(bdev);
+ }
hd_struct_kill(part);
}
--
2.38.1
A bad bug in clang's implementation of -fzero-call-used-regs can result
in NULL pointer dereferences (see the links above the check for more
information). Restrict CONFIG_CC_HAS_ZERO_CALL_USED_REGS to either a
supported GCC version or a clang newer than 15.0.6, which will catch
both a theoretical 15.0.7 and the upcoming 16.0.0, which will both have
the bug fixed.
Cc: stable(a)vger.kernel.org # v5.15+
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
security/Kconfig.hardening | 3 +++
1 file changed, 3 insertions(+)
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index d766b7d0ffd1..53baa95cb644 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -257,6 +257,9 @@ config INIT_ON_FREE_DEFAULT_ON
config CC_HAS_ZERO_CALL_USED_REGS
def_bool $(cc-option,-fzero-call-used-regs=used-gpr)
+ # https://github.com/ClangBuiltLinux/linux/issues/1766
+ # https://github.com/llvm/llvm-project/issues/59242
+ depends on !CC_IS_CLANG || CLANG_VERSION > 150006
config ZERO_CALL_USED_REGS
bool "Enable register zeroing on function exit"
base-commit: 830b3c68c1fb1e9176028d02ef86f3cf76aa2476
--
2.39.0
Commit c9bfcb315104 ("spi_mpc83xx: much improved driver") made
modifications to the driver to not perform speed changes while
chipselect is active. But those changes where lost with the
convertion to tranfer_one.
Previous implementation was allowing speed changes during
message transfer when cs_change flag was set.
At the time being, core SPI does not provide any feature to change
speed while chipselect is off, so do not allow any speed change during
message transfer, and perform the transfer setup in prepare_message
in order to set correct speed while chipselect is still off.
Reported-by: Herve Codina <herve.codina(a)bootlin.com>
Fixes: 64ca1a034f00 ("spi: fsl_spi: Convert to transfer_one")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Tested-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/spi/spi-fsl-spi.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-fsl-spi.c b/drivers/spi/spi-fsl-spi.c
index 731624f157fc..93152144fd2e 100644
--- a/drivers/spi/spi-fsl-spi.c
+++ b/drivers/spi/spi-fsl-spi.c
@@ -333,13 +333,26 @@ static int fsl_spi_prepare_message(struct spi_controller *ctlr,
{
struct mpc8xxx_spi *mpc8xxx_spi = spi_controller_get_devdata(ctlr);
struct spi_transfer *t;
+ struct spi_transfer *first;
+
+ first = list_first_entry(&m->transfers, struct spi_transfer,
+ transfer_list);
/*
* In CPU mode, optimize large byte transfers to use larger
* bits_per_word values to reduce number of interrupts taken.
+ *
+ * Some glitches can appear on the SPI clock when the mode changes.
+ * Check that there is no speed change during the transfer and set it up
+ * now to change the mode without having a chip-select asserted.
*/
- if (!(mpc8xxx_spi->flags & SPI_CPM_MODE)) {
- list_for_each_entry(t, &m->transfers, transfer_list) {
+ list_for_each_entry(t, &m->transfers, transfer_list) {
+ if (t->speed_hz != first->speed_hz) {
+ dev_err(&m->spi->dev,
+ "speed_hz cannot change during message.\n");
+ return -EINVAL;
+ }
+ if (!(mpc8xxx_spi->flags & SPI_CPM_MODE)) {
if (t->len < 256 || t->bits_per_word != 8)
continue;
if ((t->len & 3) == 0)
@@ -348,7 +361,7 @@ static int fsl_spi_prepare_message(struct spi_controller *ctlr,
t->bits_per_word = 16;
}
}
- return 0;
+ return fsl_spi_setup_transfer(m->spi, first);
}
static int fsl_spi_transfer_one(struct spi_controller *controller,
--
2.38.1
From: Baolin Wang <baolin.wang(a)linux.alibaba.com>
[ Upstream commit fac35ba763ed07ba93154c95ffc0c4a55023707f ]
On some architectures (like ARM64), it can support CONT-PTE/PMD size
hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
hugetlb in follow_page_pte(). However this pte entry lock is incorrect
for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
the correct lock, which is mm->page_table_lock.
That means the pte entry of the CONT-PTE size hugetlb under current pte
lock is unstable in follow_page_pte(), we can continue to migrate or
poison the pte entry of the CONT-PTE size hugetlb, which can cause some
potential race issues, even though they are under the 'pte lock'.
For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
page by move_pages() syscall under the lock, however antoher thread B can
migrate the CONT-PTE hugetlb page at the same time, which will cause
thread A to get an incorrect page, if thread A also wants to do page
migration, then data inconsistency error occurs.
Moreover we have the same issue for CONT-PMD size hugetlb in
follow_huge_pmd().
To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
get the correct pte entry lock to make the pte entry stable.
Mike said:
Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb:
refactor find_num_contig()"). Patch series "Support for contiguous pte
hugepages", v4. However, I do not believe these code paths were
executed until migration support was added with 5480280d3f2d ("arm64/mm:
enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
with 5480280d3f2d for the Fixes: targe.
Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.16620175…
Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Suggested-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[5.4: Fixup contextual diffs before pin_user_pages()]
Signed-off-by: Samuel Mendoza-Jonas <samjonas(a)amazon.com>
---
v2: Fix up stray out label that is now unused.
include/linux/hugetlb.h | 6 +++---
mm/gup.c | 13 ++++++++++++-
mm/hugetlb.c | 30 +++++++++++++++---------------
3 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index cef70d6e1657..c7507135c2a0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -127,8 +127,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address,
struct page *follow_huge_pd(struct vm_area_struct *vma,
unsigned long address, hugepd_t hpd,
int flags, int pdshift);
-struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags);
+struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address,
+ int flags);
struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags);
struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address,
@@ -175,7 +175,7 @@ static inline void hugetlb_show_meminfo(void)
{
}
#define follow_huge_pd(vma, addr, hpd, flags, pdshift) NULL
-#define follow_huge_pmd(mm, addr, pmd, flags) NULL
+#define follow_huge_pmd_pte(vma, addr, flags) NULL
#define follow_huge_pud(mm, addr, pud, flags) NULL
#define follow_huge_pgd(mm, addr, pgd, flags) NULL
#define prepare_hugepage_range(file, addr, len) (-EINVAL)
diff --git a/mm/gup.c b/mm/gup.c
index 3ef769529548..98bd5e0a51c8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -188,6 +188,17 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
spinlock_t *ptl;
pte_t *ptep, pte;
+ /*
+ * Considering PTE level hugetlb, like continuous-PTE hugetlb on
+ * ARM64 architecture.
+ */
+ if (is_vm_hugetlb_page(vma)) {
+ page = follow_huge_pmd_pte(vma, address, flags);
+ if (page)
+ return page;
+ return no_page_table(vma, flags);
+ }
+
retry:
if (unlikely(pmd_bad(*pmd)))
return no_page_table(vma, flags);
@@ -333,7 +344,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
if (pmd_none(pmdval))
return no_page_table(vma, flags);
if (pmd_huge(pmdval) && vma->vm_flags & VM_HUGETLB) {
- page = follow_huge_pmd(mm, address, pmd, flags);
+ page = follow_huge_pmd_pte(vma, address, flags);
if (page)
return page;
return no_page_table(vma, flags);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 2126d78f053d..e4478b62d0f7 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5157,30 +5157,30 @@ follow_huge_pd(struct vm_area_struct *vma,
}
struct page * __weak
-follow_huge_pmd(struct mm_struct *mm, unsigned long address,
- pmd_t *pmd, int flags)
+follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, int flags)
{
+ struct hstate *h = hstate_vma(vma);
+ struct mm_struct *mm = vma->vm_mm;
struct page *page = NULL;
spinlock_t *ptl;
- pte_t pte;
+ pte_t *ptep, pte;
+
retry:
- ptl = pmd_lockptr(mm, pmd);
- spin_lock(ptl);
- /*
- * make sure that the address range covered by this pmd is not
- * unmapped from other threads.
- */
- if (!pmd_huge(*pmd))
- goto out;
- pte = huge_ptep_get((pte_t *)pmd);
+ ptep = huge_pte_offset(mm, address, huge_page_size(h));
+ if (!ptep)
+ return NULL;
+
+ ptl = huge_pte_lock(h, mm, ptep);
+ pte = huge_ptep_get(ptep);
if (pte_present(pte)) {
- page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
+ page = pte_page(pte) +
+ ((address & ~huge_page_mask(h)) >> PAGE_SHIFT);
if (flags & FOLL_GET)
get_page(page);
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
- __migration_entry_wait(mm, (pte_t *)pmd, ptl);
+ __migration_entry_wait(mm, ptep, ptl);
goto retry;
}
/*
@@ -5188,7 +5188,7 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
* follow_page_mask().
*/
}
-out:
+
spin_unlock(ptl);
return page;
}
--
2.25.1
Currently, the max_loop commandline argument can be used to specify how
many loop block devices are created at init time. If it is not
specified on the commandline, CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block
devices will be created.
The max_loop commandline argument can be used to override the value of
CONFIG_BLK_DEV_LOOP_MIN_COUNT. However, when max_loop is set to 0
through the commandline, the current logic treats it as if it had not
been set, and creates CONFIG_BLK_DEV_LOOP_MIN_COUNT devices anyway.
Fix this by starting max_loop off as set to CONFIG_BLK_DEV_LOOP_MIN_COUNT.
This preserves the intended behavior of creating
CONFIG_BLK_DEV_LOOP_MIN_COUNT loop block devices if the max_loop
commandline parameter is not specified, and allowing max_loop to
be respected for all values, including 0.
This allows environments that can create all of their required loop
block devices on demand to not have to unnecessarily preallocate loop
block devices.
Fixes: 732850827450 ("remove artificial software max_loop limit")
Cc: stable(a)vger.kernel.org
Cc: Ken Chen <kenchen(a)google.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
---
drivers/block/loop.c | 28 ++++++++++++----------------
1 file changed, 12 insertions(+), 16 deletions(-)
This is a resend because I misspelled the address for
stable(a)vger.kernel.org the first time.
--Isaac
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ad92192c7d61..d12d3d171ec4 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1773,7 +1773,16 @@ static const struct block_device_operations lo_fops = {
/*
* And now the modules code and kernel interface.
*/
-static int max_loop;
+
+/*
+ * If max_loop is specified, create that many devices upfront.
+ * This also becomes a hard limit. If max_loop is not specified,
+ * create CONFIG_BLK_DEV_LOOP_MIN_COUNT loop devices at module
+ * init time. Loop devices can be requested on-demand with the
+ * /dev/loop-control interface, or be instantiated by accessing
+ * a 'dead' device node.
+ */
+static int max_loop = CONFIG_BLK_DEV_LOOP_MIN_COUNT;
module_param(max_loop, int, 0444);
MODULE_PARM_DESC(max_loop, "Maximum number of loop devices");
module_param(max_part, int, 0444);
@@ -2181,7 +2190,7 @@ MODULE_ALIAS("devname:loop-control");
static int __init loop_init(void)
{
- int i, nr;
+ int i;
int err;
part_shift = 0;
@@ -2209,19 +2218,6 @@ static int __init loop_init(void)
goto err_out;
}
- /*
- * If max_loop is specified, create that many devices upfront.
- * This also becomes a hard limit. If max_loop is not specified,
- * create CONFIG_BLK_DEV_LOOP_MIN_COUNT loop devices at module
- * init time. Loop devices can be requested on-demand with the
- * /dev/loop-control interface, or be instantiated by accessing
- * a 'dead' device node.
- */
- if (max_loop)
- nr = max_loop;
- else
- nr = CONFIG_BLK_DEV_LOOP_MIN_COUNT;
-
err = misc_register(&loop_misc);
if (err < 0)
goto err_out;
@@ -2233,7 +2229,7 @@ static int __init loop_init(void)
}
/* pre-create number of devices given by config or max_loop */
- for (i = 0; i < nr; i++)
+ for (i = 0; i < max_loop; i++)
loop_add(i);
printk(KERN_INFO "loop: module loaded\n");
--
2.39.0.rc1.256.g54fd8350bd-goog