The patch titled
Subject: kmsan: do not wipe out origin when doing partial unpoisoning
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexander Potapenko <glider(a)google.com>
Subject: kmsan: do not wipe out origin when doing partial unpoisoning
Date: Tue, 28 May 2024 12:48:06 +0200
As noticed by Brian, KMSAN should not be zeroing the origin when
unpoisoning parts of a four-byte uninitialized value, e.g.:
char a[4];
kmsan_unpoison_memory(a, 1);
This led to false negatives, as certain poisoned values could receive zero
origins, preventing those values from being reported.
To fix the problem, check that kmsan_internal_set_shadow_origin() writes
zero origins only to slots which have zero shadow.
Link: https://lkml.kernel.org/r/20240528104807.738758-1-glider@google.com
Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core")
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reported-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Link: https://lore.kernel.org/lkml/20240524232804.1984355-1-bjohannesmeyer@gmail.…
Reviewed-by: Marco Elver <elver(a)google.com>
Tested-by: Brian Johannesmeyer <bjohannesmeyer(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kmsan/core.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
--- a/mm/kmsan/core.c~kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning
+++ a/mm/kmsan/core.c
@@ -196,8 +196,7 @@ void kmsan_internal_set_shadow_origin(vo
u32 origin, bool checked)
{
u64 address = (u64)addr;
- void *shadow_start;
- u32 *origin_start;
+ u32 *shadow_start, *origin_start;
size_t pad = 0;
KMSAN_WARN_ON(!kmsan_metadata_is_contiguous(addr, size));
@@ -225,8 +224,16 @@ void kmsan_internal_set_shadow_origin(vo
origin_start =
(u32 *)kmsan_get_metadata((void *)address, KMSAN_META_ORIGIN);
- for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++)
- origin_start[i] = origin;
+ /*
+ * If the new origin is non-zero, assume that the shadow byte is also non-zero,
+ * and unconditionally overwrite the old origin slot.
+ * If the new origin is zero, overwrite the old origin slot iff the
+ * corresponding shadow slot is zero.
+ */
+ for (int i = 0; i < size / KMSAN_ORIGIN_SIZE; i++) {
+ if (origin || !shadow_start[i])
+ origin_start[i] = origin;
+ }
}
struct page *kmsan_vmalloc_to_page_or_null(void *vaddr)
_
Patches currently in -mm which might be from glider(a)google.com are
kmsan-do-not-wipe-out-origin-when-doing-partial-unpoisoning.patch
This fixes a NULL pointer dereference bug due to a data race which
looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43
Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018
Workqueue: events_unbound netfs_rreq_write_to_cache_work
RIP: 0010:cachefiles_prepare_write+0x30/0xa0
Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10
RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286
RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000
RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438
RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001
R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68
R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00
FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0
Call Trace:
<TASK>
? __die+0x1f/0x70
? page_fault_oops+0x15d/0x440
? search_module_extables+0xe/0x40
? fixup_exception+0x22/0x2f0
? exc_page_fault+0x5f/0x100
? asm_exc_page_fault+0x22/0x30
? cachefiles_prepare_write+0x30/0xa0
netfs_rreq_write_to_cache_work+0x135/0x2e0
process_one_work+0x137/0x2c0
worker_thread+0x2e9/0x400
? __pfx_worker_thread+0x10/0x10
kthread+0xcc/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x30/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1b/0x30
</TASK>
Modules linked in:
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
This happened because fscache_cookie_state_machine() was slow and was
still running while another process invoked fscache_unuse_cookie();
this led to a fscache_cookie_lru_do_one() call, setting the
FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by
fscache_cookie_state_machine(), withdrawing the cookie via
cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked
cachefiles_prepare_write(), which found a NULL pointer in this code
line:
struct cachefiles_object *object = cachefiles_cres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefiles_prepare_write(), the "n_accesses" counter is
non-zero (via fscache_begin_operation()). The cookie must not be
withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before
switching to FSCACHE_COOKIE_STATE_RELINQUISHING and
FSCACHE_COOKIE_STATE_WITHDRAWING (in "case
FSCACHE_COOKIE_STATE_FAILED"), but not for
FSCACHE_COOKIES_TATE_LRU_DISCARDING ("case
FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter,
the function returns and the next fscache_end_cookie_access() call
will queue another fscache_cookie_state_machine() call to handle the
still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Max Kellermann <max.kellermann(a)ionos.com>
---
fs/netfs/fscache_cookie.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/netfs/fscache_cookie.c b/fs/netfs/fscache_cookie.c
index bce2492186d0..d4d4b3a8b106 100644
--- a/fs/netfs/fscache_cookie.c
+++ b/fs/netfs/fscache_cookie.c
@@ -741,6 +741,10 @@ static void fscache_cookie_state_machine(struct fscache_cookie *cookie)
spin_lock(&cookie->lock);
}
if (test_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) {
+ if (atomic_read(&cookie->n_accesses) != 0)
+ /* still being accessed: postpone it */
+ break;
+
__fscache_set_cookie_state(cookie,
FSCACHE_COOKIE_STATE_LRU_DISCARDING);
wake = true;
--
2.39.2
On Mon, May 27, 2024 at 11:40 PM Chengen Du <chengen.du(a)canonical.com> wrote:
>
> Hi Willem,
>
> Thank you for your suggestions on the patch.
> However, there are some parts I am not familiar with, and I would appreciate more detailed information from your side.
Please respond with plain-text email. This message did not make it to
the list. Also no top posting.
https://docs.kernel.org/process/submitting-patches.htmlhttps://subspace.kernel.org/etiquette.html
> > > @@ -2457,7 +2465,8 @@ static int tpacket_rcv(struct sk_buff *skb, struct net_device *dev,
> > > sll->sll_halen = dev_parse_header(skb, sll->sll_addr);
> > > sll->sll_family = AF_PACKET;
> > > sll->sll_hatype = dev->type;
> > > - sll->sll_protocol = skb->protocol;
> > > + sll->sll_protocol = (skb->protocol == htons(ETH_P_8021Q)) ?
> > > + vlan_eth_hdr(skb)->h_vlan_encapsulated_proto : skb->protocol;
> >
> > In SOCK_RAW mode, the VLAN tag will be present, so should be returned.
>
> Based on libpcap's handling, the SLL may not be used in SOCK_RAW mode.
The kernel fills in the sockaddr_ll fields in tpacket_rcv for both
SOCK_RAW and SOCK_DGRAM. Libpcap already can use both SOCK_RAW and
SOCK_DGRAM. And constructs the sll2_header pseudo header that tcpdump
sees itself, in pcap_handle_packet_mmap.
> Do you recommend evaluating the mode and maintaining the original logic in SOCK_RAW mode,
> or should we use the same logic for both SOCK_DGRAM and SOCK_RAW modes?
I suggest keeping as is for SOCK_RAW, as returning data that starts at
a VLAN header together with skb->protocol of ETH_P_IPV6 would be just
as confusing as the inverse that we do today on SOCK_DGRAM.
> >
> > I'm concerned about returning a different value between SOCK_RAW and
> > SOCK_DGRAM. But don't immediately see a better option. And for
> > SOCK_DGRAM this approach is indistinguishable from the result on a
> > device with hardware offload, so is acceptable.
> >
> > This test for ETH_P_8021Q ignores the QinQ stacked VLAN case. When
> > fixing VLAN encap, both variants should be addressed at the same time.
> > Note that ETH_P_8021AD is included in the eth_type_vlan test you call
> > above.
>
> In patch 1, the eth_type_vlan() function is used to determine if we need to set the sll_protocol to the VLAN-encapsulated protocol, which includes both ETH_P_8021Q and ETH_P_8021AD.
> You mentioned previously that we might want the true network protocol instead of the inner VLAN tag in the QinQ case (which means 802.1ad?).
> I believe I may have misunderstood your point.
I mean that if SOCK_DGRAM strips all VLAN headers to return the data
from the start of the true network header, then skb->protocol should
return that network protocol.
With vlan stacking, your patch currently returns ETH_P_8021Q.
See the packet formats in
https://en.wikipedia.org/wiki/IEEE_802.1ad#Frame_format if you're
confused about how stacking works.
> Could you please confirm if both ETH_P_8021Q and ETH_P_8021AD should use the VLAN-encapsulated protocol when VLAN hardware offloading is unavailable?
> Or are there other aspects that this judgment does not handle correctly?
Addresses an issue where a CAN bus error during a BAM transmission could
stall the socket queue, preventing further transmissions even after the
bus error is resolved. The fix activates the next queued session after
the error recovery, allowing communication to continue.
Fixes: 9d71dd0c70099 ("can: add support of SAE J1939 protocol")
Cc: stable(a)vger.kernel.org
Reported-by: Alexander Hölzl <alexander.hoelzl(a)gmx.net>
Tested-by: Alexander Hölzl <alexander.hoelzl(a)gmx.net>
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
---
net/can/j1939/transport.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index fe3df23a25957..9805124d16763 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -1681,6 +1681,8 @@ static int j1939_xtp_rx_rts_session_active(struct j1939_session *session,
j1939_session_timers_cancel(session);
j1939_session_cancel(session, J1939_XTP_ABORT_BUSY);
+ if (session->transmission)
+ j1939_session_deactivate_activate_next(session);
return -EBUSY;
}
--
2.39.2
Just a quick note to say I've just booted 6.1.92, plus the 274 patches
in the current stable queue applied, on an x86-64 workstation and no
problems or issues to report, no unexpected errors in dmesg.
(AMD Ryzen, 3 x AMD gpu, sata, nvme, lots of USB, Gentoo, SELinux, KDE)
Eddie
In LUCID EVO PLL CAL_L_VAL and L_VAL bitfields are part of single
PLL_L_VAL register. Update for L_VAL bitfield values in PLL_L_VAL
register using regmap_write() API in __alpha_pll_trion_set_rate
callback will override LUCID EVO PLL initial configuration related
to PLL_CAL_L_VAL bit fields in PLL_L_VAL register.
Observed random PLL lock failures during PLL enable due to such
override in PLL calibration value. Use regmap_update_bits() with
L_VAL bitfield mask instead of regmap_write() API to update only
PLL_L_VAL bitfields in __alpha_pll_trion_set_rate callback.
Fixes: 260e36606a03 ("clk: qcom: clk-alpha-pll: add Lucid EVO PLL configuration interfaces")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ajit Pandey <quic_ajipan(a)quicinc.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
---
drivers/clk/qcom/clk-alpha-pll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/clk/qcom/clk-alpha-pll.c b/drivers/clk/qcom/clk-alpha-pll.c
index d4227909d1fe..62138ed3df88 100644
--- a/drivers/clk/qcom/clk-alpha-pll.c
+++ b/drivers/clk/qcom/clk-alpha-pll.c
@@ -1665,7 +1665,7 @@ static int __alpha_pll_trion_set_rate(struct clk_hw *hw, unsigned long rate,
if (ret < 0)
return ret;
- regmap_write(pll->clkr.regmap, PLL_L_VAL(pll), l);
+ regmap_update_bits(pll->clkr.regmap, PLL_L_VAL(pll), LUCID_EVO_PLL_L_VAL_MASK, l);
regmap_write(pll->clkr.regmap, PLL_ALPHA_VAL(pll), a);
/* Latch the PLL input */
--
2.25.1
As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the
Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams
do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188
as well.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com>
---
Changes in v2:
- Porting to latest release
drivers/usb/host/xhci-pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index b47d57d80b96..effeec5cf1fa 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -399,6 +399,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
pdev->device == PCI_DEVICE_ID_EJ188) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
+ xhci->quirks |= XHCI_BROKEN_STREAMS;
}
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0014) {
--
2.25.1
There are a number of error paths in the probe function that do not call
of_node_put() to decrement the np device node refcount, leading to
memory leaks if those errors occur.
In order to ease backporting, the fix has been divided into two patches:
the first one simply adds the missing calls to of_node_put(), and the
second one adds the __free() macro to the existing device nodes to
remove the need for of_node_put(), ensuring that the same bug will not
arise in the future.
The issue was found by chance while analyzing the code, and I do not
have the hardware to test it beyond compiling and static analysis tools.
Although the issue is clear and the fix too, if someone wants to
volunteer to test the series with real hardware, it would be great.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
cpufreq: qcom-nvmem: fix memory leaks in probe error paths
cpufreq: qcom-nvmem: eliminate uses of of_node_put()
drivers/cpufreq/qcom-cpufreq-nvmem.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
---
base-commit: 3689b0ef08b70e4e03b82ebd37730a03a672853a
change-id: 20240523-qcom-cpufreq-nvmem_memleak-6b6821db52b1
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the
Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams
do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188
as well.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com>
---
Changes in v2:
- Porting to latest release
drivers/usb/host/xhci-pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index b47d57d80b96..effeec5cf1fa 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -399,6 +399,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
pdev->device == PCI_DEVICE_ID_EJ188) {
xhci->quirks |= XHCI_RESET_ON_RESUME;
+ xhci->quirks |= XHCI_BROKEN_STREAMS;
}
if (pdev->vendor == PCI_VENDOR_ID_RENESAS &&
pdev->device == 0x0014) {
--
2.25.1