The patch titled
Subject: padata: Fix possible divide-by-0 panic in padata_mt_helper()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
padata-fix-possible-divide-by-0-panic-in-padata_mt_helper.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Waiman Long <longman(a)redhat.com>
Subject: padata: Fix possible divide-by-0 panic in padata_mt_helper()
Date: Tue, 6 Aug 2024 13:46:47 -0400
We are hit with a not easily reproducible divide-by-0 panic in padata.c at
bootup time.
[ 10.017908] Oops: divide error: 0000 1 PREEMPT SMP NOPTI
[ 10.017908] CPU: 26 PID: 2627 Comm: kworker/u1666:1 Not tainted 6.10.0-15.el10.x86_64 #1
[ 10.017908] Hardware name: Lenovo ThinkSystem SR950 [7X12CTO1WW]/[7X12CTO1WW], BIOS [PSE140J-2.30] 07/20/2021
[ 10.017908] Workqueue: events_unbound padata_mt_helper
[ 10.017908] RIP: 0010:padata_mt_helper+0x39/0xb0
:
[ 10.017963] Call Trace:
[ 10.017968] <TASK>
[ 10.018004] ? padata_mt_helper+0x39/0xb0
[ 10.018084] process_one_work+0x174/0x330
[ 10.018093] worker_thread+0x266/0x3a0
[ 10.018111] kthread+0xcf/0x100
[ 10.018124] ret_from_fork+0x31/0x50
[ 10.018138] ret_from_fork_asm+0x1a/0x30
[ 10.018147] </TASK>
Looking at the padata_mt_helper() function, the only way a divide-by-0
panic can happen is when ps->chunk_size is 0. The way that chunk_size is
initialized in padata_do_multithreaded(), chunk_size can be 0 when the
min_chunk in the passed-in padata_mt_job structure is 0.
Fix this divide-by-0 panic by making sure that chunk_size will be at least
1 no matter what the input parameters are.
Link: https://lkml.kernel.org/r/20240806174647.1050398-1-longman@redhat.com
Fixes: 004ed42638f4 ("padata: add basic support for multithreaded jobs")
Signed-off-by: Waiman Long <longman(a)redhat.com>
Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com>
Cc: Steffen Klassert <steffen.klassert(a)secunet.com>
Cc: Waiman Long <longman(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/padata.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/kernel/padata.c~padata-fix-possible-divide-by-0-panic-in-padata_mt_helper
+++ a/kernel/padata.c
@@ -517,6 +517,13 @@ void __init padata_do_multithreaded(stru
ps.chunk_size = max(ps.chunk_size, job->min_chunk);
ps.chunk_size = roundup(ps.chunk_size, job->align);
+ /*
+ * chunk_size can be 0 if the caller sets min_chunk to 0. So force it
+ * to at least 1 to prevent divide-by-0 panic in padata_mt_helper().`
+ */
+ if (!ps.chunk_size)
+ ps.chunk_size = 1U;
+
list_for_each_entry(pw, &works, pw_list)
if (job->numa_aware) {
int old_node = atomic_read(&last_used_nid);
_
Patches currently in -mm which might be from longman(a)redhat.com are
padata-fix-possible-divide-by-0-panic-in-padata_mt_helper.patch
watchdog-handle-the-enodev-failure-case-of-lockup_detector_delay_init-separately.patch
The following changes since commit 6d834691da474ed1c648753d3d3a3ef8379fa1c1:
virtio_pci_modern: remove admin queue serialization lock (2024-07-17 05:43:21 -0400)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 0823dc64586ba5ea13a7d200a5d33e4c5fa45950:
vhost-vdpa: switch to use vmf_insert_pfn() in the fault handler (2024-07-26 03:26:02 -0400)
----------------------------------------------------------------
virtio: bugfix
Fixes a single, long-standing issue with kick pass-through vdpa.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Jason Wang (1):
vhost-vdpa: switch to use vmf_insert_pfn() in the fault handler
drivers/vhost/vdpa.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
Hi,
Cannon Estimation,LLC brings you a major discount on Cost Estimating & Quantities Take-Off Services. We claim a 98% accuracy guarantee with a refund policy in case of any error in quantities. We are using certified software’s like PlanSwift, BlueBeams, Accu-Bid, Auto-Bid & RSmeans etc
Send us your plans for a quote on our service charges before getting started. Please reply to that email, so I can share some sample estimates.
Thanks & Have a Great Day.
Regards,
Elvis Lehmann
Business Development Manager
Cannon Estimation, LLC
From: Willem de Bruijn <willemb(a)google.com>
[ Upstream commit 89add40066f9ed9abe5f7f886fe5789ff7e0c50e ]
Tighten csum_start and csum_offset checks in virtio_net_hdr_to_skb
for GSO packets.
The function already checks that a checksum requested with
VIRTIO_NET_HDR_F_NEEDS_CSUM is in skb linear. But for GSO packets
this might not hold for segs after segmentation.
Syzkaller demonstrated to reach this warning in skb_checksum_help
offset = skb_checksum_start_offset(skb);
ret = -EINVAL;
if (WARN_ON_ONCE(offset >= skb_headlen(skb)))
By injecting a TSO packet:
WARNING: CPU: 1 PID: 3539 at net/core/dev.c:3284 skb_checksum_help+0x3d0/0x5b0
ip_do_fragment+0x209/0x1b20 net/ipv4/ip_output.c:774
ip_finish_output_gso net/ipv4/ip_output.c:279 [inline]
__ip_finish_output+0x2bd/0x4b0 net/ipv4/ip_output.c:301
iptunnel_xmit+0x50c/0x930 net/ipv4/ip_tunnel_core.c:82
ip_tunnel_xmit+0x2296/0x2c70 net/ipv4/ip_tunnel.c:813
__gre_xmit net/ipv4/ip_gre.c:469 [inline]
ipgre_xmit+0x759/0xa60 net/ipv4/ip_gre.c:661
__netdev_start_xmit include/linux/netdevice.h:4850 [inline]
netdev_start_xmit include/linux/netdevice.h:4864 [inline]
xmit_one net/core/dev.c:3595 [inline]
dev_hard_start_xmit+0x261/0x8c0 net/core/dev.c:3611
__dev_queue_xmit+0x1b97/0x3c90 net/core/dev.c:4261
packet_snd net/packet/af_packet.c:3073 [inline]
The geometry of the bad input packet at tcp_gso_segment:
[ 52.003050][ T8403] skb len=12202 headroom=244 headlen=12093 tailroom=0
[ 52.003050][ T8403] mac=(168,24) mac_len=24 net=(192,52) trans=244
[ 52.003050][ T8403] shinfo(txflags=0 nr_frags=1 gso(size=1552 type=3 segs=0))
[ 52.003050][ T8403] csum(0x60000c7 start=199 offset=1536
ip_summed=3 complete_sw=0 valid=0 level=0)
Mitigate with stricter input validation.
csum_offset: for GSO packets, deduce the correct value from gso_type.
This is already done for USO. Extend it to TSO. Let UFO be:
udp[46]_ufo_fragment ignores these fields and always computes the
checksum in software.
csum_start: finding the real offset requires parsing to the transport
header. Do not add a parser, use existing segmentation parsing. Thanks
to SKB_GSO_DODGY, that also catches bad packets that are hw offloaded.
Again test both TSO and USO. Do not test UFO for the above reason, and
do not test UDP tunnel offload.
GSO packet are almost always CHECKSUM_PARTIAL. USO packets may be
CHECKSUM_NONE since commit 10154dbded6d6 ("udp: Allow GSO transmit
from devices with no checksum offload"), but then still these fields
are initialized correctly in udp4_hwcsum/udp6_hwcsum_outgoing. So no
need to test for ip_summed == CHECKSUM_PARTIAL first.
This revises an existing fix mentioned in the Fixes tag, which broke
small packets with GSO offload, as detected by kselftests.
Link: https://syzkaller.appspot.com/bug?extid=e1db31216c789f552871
Link: https://lore.kernel.org/netdev/20240723223109.2196886-1-kuba@kernel.org
Fixes: e269d79c7d35 ("net: missing check virtio")
Cc: stable(a)vger.kernel.org
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Link: https://patch.msgid.link/20240729201108.1615114-1-willemdebruijn.kernel@gma…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
Hi,
This patch fixes network failures on OpenStack VMs running with Kernel
6.6.44 (it was working fine with 6.6.43).
```
[ 237.422038] eth0: bad gso: type: 1, size: 1432
```
This has been tested on Flatcar Linux CI with Kernel 6.6.44.
I think it has to be backported on Linux branches that have the "net:
missing check virtio" commit.
At this moment, I know those releases to be concerned:
* 6.1.y (with 6.1.103)
* 6.6.y (with 6.6.44)
* 6.10.y (with 6.10.3)
Thanks,
Mathieu - @tormath1
include/linux/virtio_net.h | 16 +++++-----------
net/ipv4/tcp_offload.c | 3 +++
net/ipv4/udp_offload.c | 4 ++++
3 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index d1d7825318c3..6c395a2600e8 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -56,7 +56,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
unsigned int thlen = 0;
unsigned int p_off = 0;
unsigned int ip_proto;
- u64 ret, remainder, gso_size;
if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
@@ -99,16 +98,6 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
u32 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
u32 needed = start + max_t(u32, thlen, off + sizeof(__sum16));
- if (hdr->gso_size) {
- gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
- ret = div64_u64_rem(skb->len, gso_size, &remainder);
- if (!(ret && (hdr->gso_size > needed) &&
- ((remainder > needed) || (remainder == 0)))) {
- return -EINVAL;
- }
- skb_shinfo(skb)->tx_flags |= SKBFL_SHARED_FRAG;
- }
-
if (!pskb_may_pull(skb, needed))
return -EINVAL;
@@ -182,6 +171,11 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
if (gso_type != SKB_GSO_UDP_L4)
return -EINVAL;
break;
+ case SKB_GSO_TCPV4:
+ case SKB_GSO_TCPV6:
+ if (skb->csum_offset != offsetof(struct tcphdr, check))
+ return -EINVAL;
+ break;
}
/* Kernel has a special handling for GSO_BY_FRAGS. */
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 8311c38267b5..69e6012ae82f 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -73,6 +73,9 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
if (thlen < sizeof(*th))
goto out;
+ if (unlikely(skb_checksum_start(skb) != skb_transport_header(skb)))
+ goto out;
+
if (!pskb_may_pull(skb, thlen))
goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index e5971890d637..9cb13a50011e 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -278,6 +278,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
if (gso_skb->len <= sizeof(*uh) + mss)
return ERR_PTR(-EINVAL);
+ if (unlikely(skb_checksum_start(gso_skb) !=
+ skb_transport_header(gso_skb)))
+ return ERR_PTR(-EINVAL);
+
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
--
2.44.2
From: Francesco Dolcini <francesco.dolcini(a)toradex.com>
This reverts commit 3935fbc87ddebea5439f3ab6a78b1e83e976bf88.
CTRL_SLEEP_MOCI# is a signal that is defined for all the SoM
implementing the Verdin family specification, this signal is supposed to
control the power enable in the carrier board when the system is in deep
sleep mode. However this is not possible with Texas Instruments AM62
SoC, IOs output buffer is disabled in deep sleep and IOs are in
tri-state mode.
Given that we cannot properly control this pin, force it to be always
high to minimize potential issues.
Fixes: 3935fbc87dde ("arm64: dts: ti: k3-am62-verdin-dahlia: support sleep-moci")
Cc: <stable(a)vger.kernel.org>
Link: https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1…
Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
---
.../boot/dts/ti/k3-am62-verdin-dahlia.dtsi | 22 -------------------
arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi | 6 -----
2 files changed, 28 deletions(-)
diff --git a/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi b/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi
index e8f4d136e5df..9202181fbd65 100644
--- a/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62-verdin-dahlia.dtsi
@@ -43,15 +43,6 @@ simple-audio-card,cpu {
sound-dai = <&mcasp0>;
};
};
-
- reg_usb_hub: regulator-usb-hub {
- compatible = "regulator-fixed";
- enable-active-high;
- /* Verdin CTRL_SLEEP_MOCI# (SODIMM 256) */
- gpio = <&main_gpio0 31 GPIO_ACTIVE_HIGH>;
- regulator-boot-on;
- regulator-name = "HUB_PWR_EN";
- };
};
/* Verdin ETHs */
@@ -193,11 +184,6 @@ &ospi0 {
status = "okay";
};
-/* Do not force CTRL_SLEEP_MOCI# always enabled */
-®_force_sleep_moci {
- status = "disabled";
-};
-
/* Verdin SD_1 */
&sdhci1 {
status = "okay";
@@ -218,15 +204,7 @@ &usbss1 {
};
&usb1 {
- #address-cells = <1>;
- #size-cells = <0>;
status = "okay";
-
- usb-hub@1 {
- compatible = "usb424,2744";
- reg = <1>;
- vdd-supply = <®_usb_hub>;
- };
};
/* Verdin CTRL_WAKE1_MICO# */
diff --git a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi
index 359f53f3e019..5bef31b8577b 100644
--- a/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-am62-verdin.dtsi
@@ -138,12 +138,6 @@ reg_1v8_eth: regulator-1v8-eth {
vin-supply = <®_1v8>;
};
- /*
- * By default we enable CTRL_SLEEP_MOCI#, this is required to have
- * peripherals on the carrier board powered.
- * If more granularity or power saving is required this can be disabled
- * in the carrier board device tree files.
- */
reg_force_sleep_moci: regulator-force-sleep-moci {
compatible = "regulator-fixed";
enable-active-high;
--
2.39.2
Unaccepted memory is considered unusable free memory, which is not
counted as free on the zone watermark check. This causes
get_page_from_freelist() to accept more memory to hit the high
watermark, but it creates problems in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts
to reclaim memory. This is usually successful, but if there is little or
no reclaimable memory, it can result in endless reclaim with little to
no progress. This can occur early in the boot process, just after start
of the init process when the only reclaimable memory is the page cache
of the init executable and its libraries.
Make unaccepted memory free from watermark check point of view. This way
unaccepted memory will never be the trigger of memory reclaim.
Accept more memory in the get_page_from_freelist() if needed.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Reported-by: Jianxiong Gao <jxgao(a)google.com>
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Cc: stable(a)vger.kernel.org # v6.5+
---
mm/page_alloc.c | 42 ++++++++++++++++++++----------------------
1 file changed, 20 insertions(+), 22 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 28f80daf5c04..aa9b1eaa638c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -287,7 +287,7 @@ EXPORT_SYMBOL(nr_online_nodes);
static bool page_contains_unaccepted(struct page *page, unsigned int order);
static void accept_page(struct page *page, unsigned int order);
-static bool try_to_accept_memory(struct zone *zone, unsigned int order);
+static bool cond_accept_memory(struct zone *zone, unsigned int order);
static inline bool has_unaccepted_memory(void);
static bool __free_unaccepted(struct page *page);
@@ -3072,9 +3072,6 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
if (!(alloc_flags & ALLOC_CMA))
unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
#endif
-#ifdef CONFIG_UNACCEPTED_MEMORY
- unusable_free += zone_page_state(z, NR_UNACCEPTED);
-#endif
return unusable_free;
}
@@ -3368,6 +3365,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
}
}
+ cond_accept_memory(zone, order);
+
/*
* Detect whether the number of free pages is below high
* watermark. If so, we will decrease pcp->high and free
@@ -3393,10 +3392,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
gfp_mask)) {
int ret;
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/*
@@ -3450,10 +3447,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
return page;
} else {
- if (has_unaccepted_memory()) {
- if (try_to_accept_memory(zone, order))
- goto try_this_zone;
- }
+ if (cond_accept_memory(zone, order))
+ goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
/* Try again if zone has deferred pages */
@@ -6951,9 +6946,6 @@ static bool try_to_accept_memory_one(struct zone *zone)
struct page *page;
bool last;
- if (list_empty(&zone->unaccepted_pages))
- return false;
-
spin_lock_irqsave(&zone->lock, flags);
page = list_first_entry_or_null(&zone->unaccepted_pages,
struct page, lru);
@@ -6979,23 +6971,29 @@ static bool try_to_accept_memory_one(struct zone *zone)
return true;
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
long to_accept;
- int ret = false;
+ bool ret = false;
+
+ if (!has_unaccepted_memory())
+ return false;
+
+ if (list_empty(&zone->unaccepted_pages))
+ return false;
/* How much to accept to get to high watermark? */
to_accept = high_wmark_pages(zone) -
(zone_page_state(zone, NR_FREE_PAGES) -
- __zone_watermark_unusable_free(zone, order, 0));
+ __zone_watermark_unusable_free(zone, order, 0) -
+ zone_page_state(zone, NR_UNACCEPTED));
- /* Accept at least one page */
- do {
+ while (to_accept > 0) {
if (!try_to_accept_memory_one(zone))
break;
ret = true;
to_accept -= MAX_ORDER_NR_PAGES;
- } while (to_accept > 0);
+ }
return ret;
}
@@ -7038,7 +7036,7 @@ static void accept_page(struct page *page, unsigned int order)
{
}
-static bool try_to_accept_memory(struct zone *zone, unsigned int order)
+static bool cond_accept_memory(struct zone *zone, unsigned int order)
{
return false;
}
--
2.43.0
uevent_show() wants to de-reference dev->driver->name. There is no clean
way for a device attribute to de-reference dev->driver unless that
attribute is defined via (struct device_driver).dev_groups. Instead, the
anti-pattern of taking the device_lock() in the attribute handler risks
deadlocks with code paths that remove device attributes while holding
the lock.
This deadlock is typically invisible to lockdep given the device_lock()
is marked lockdep_set_novalidate_class(), but some subsystems allocate a
local lockdep key for @dev->mutex to reveal reports of the form:
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc7+ #275 Tainted: G OE N
------------------------------------------------------
modprobe/2374 is trying to acquire lock:
ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
but task is already holding lock:
ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&cxl_root_key){+.+.}-{3:3}:
__mutex_lock+0x99/0xc30
uevent_show+0xac/0x130
dev_attr_show+0x18/0x40
sysfs_kf_seq_show+0xac/0xf0
seq_read_iter+0x110/0x450
vfs_read+0x25b/0x340
ksys_read+0x67/0xf0
do_syscall_64+0x75/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (kn->active#6){++++}-{0:0}:
__lock_acquire+0x121a/0x1fa0
lock_acquire+0xd6/0x2e0
kernfs_drain+0x1e9/0x200
__kernfs_remove+0xde/0x220
kernfs_remove_by_name_ns+0x5e/0xa0
device_del+0x168/0x410
device_unregister+0x13/0x60
devres_release_all+0xb8/0x110
device_unbind_cleanup+0xe/0x70
device_release_driver_internal+0x1c7/0x210
driver_detach+0x47/0x90
bus_remove_driver+0x6c/0xf0
cxl_acpi_exit+0xc/0x11 [cxl_acpi]
__do_sys_delete_module.isra.0+0x181/0x260
do_syscall_64+0x75/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The observation though is that driver objects are typically much longer
lived than device objects. It is reasonable to perform lockless
de-reference of a @driver pointer even if it is racing detach from a
device. Given the infrequency of driver unregistration, use
synchronize_rcu() in module_remove_driver() to close any potential
races. It is potentially overkill to suffer synchronize_rcu() just to
handle the rare module removal racing uevent_show() event.
Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
Reported-by: syzbot+4762dd74e32532cda5ff(a)syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.n… [1]
Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.n…
Cc: stable(a)vger.kernel.org
Cc: Ashish Sangwan <a.sangwan(a)samsung.com>
Cc: Namjae Jeon <namjae.jeon(a)samsung.com>
Cc: Dirk Behme <dirk.behme(a)de.bosch.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/base/core.c | 13 ++++++++-----
drivers/base/module.c | 4 ++++
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b4c0624b704..b5399262198a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/netdevice.h>
+#include <linux/rcupdate.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
#include <linux/string_helpers.h>
@@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
{
const struct device *dev = kobj_to_dev(kobj);
+ struct device_driver *driver;
int retval = 0;
/* add device node properties if present */
@@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
if (dev->type && dev->type->name)
add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
- if (dev->driver)
- add_uevent_var(env, "DRIVER=%s", dev->driver->name);
+ /* Synchronize with module_remove_driver() */
+ rcu_read_lock();
+ driver = READ_ONCE(dev->driver);
+ if (driver)
+ add_uevent_var(env, "DRIVER=%s", driver->name);
+ rcu_read_unlock();
/* Add common DT information about the device */
of_device_uevent(dev, env);
@@ -2739,11 +2745,8 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
if (!env)
return -ENOMEM;
- /* Synchronize with really_probe() */
- device_lock(dev);
/* let the kset specific function add its keys */
retval = kset->uevent_ops->uevent(&dev->kobj, env);
- device_unlock(dev);
if (retval)
goto out;
diff --git a/drivers/base/module.c b/drivers/base/module.c
index a1b55da07127..b0b79b9c189d 100644
--- a/drivers/base/module.c
+++ b/drivers/base/module.c
@@ -7,6 +7,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/rcupdate.h>
#include "base.h"
static char *make_driver_name(struct device_driver *drv)
@@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv)
if (!drv)
return;
+ /* Synchronize with dev_uevent() */
+ synchronize_rcu();
+
sysfs_remove_link(&drv->p->kobj, "module");
if (drv->owner)