As explained in the comment block this change adds, we can't tell what
userspace's intent is when the stack grows towards an inaccessible VMA.
I have a (highly contrived) C testcase for 32-bit x86 userspace with glibc
that mixes malloc(), pthread creation, and recursion in just the right way
such that the main stack overflows into malloc() arena memory.
(Let me know if you want me to share that.)
I don't know of any specific scenario where this is actually exploitable,
but it seems like it could be a security problem for sufficiently unlucky
userspace.
I believe we should ensure that, as long as code is compiled with something
like -fstack-check, a stack overflow in it can never cause the main stack
to overflow into adjacent heap memory.
My fix effectively reverts the behavior for !vma_is_accessible() VMAs to
the behavior before commit 1be7107fbe18 ("mm: larger stack guard gap,
between vmas"), so I think it should be a fairly safe change even in
case A.
Fixes: 561b5e0709e4 ("mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
I have tested that Libreoffice still launches after this change, though
I don't know if that means anything.
Note that I haven't tested the growsup code.
---
mm/mmap.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 46 insertions(+), 7 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index dd4b35a25aeb..971bfd6c1cea 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1064,10 +1064,12 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
gap_addr = TASK_SIZE;
next = find_vma_intersection(mm, vma->vm_end, gap_addr);
- if (next && vma_is_accessible(next)) {
- if (!(next->vm_flags & VM_GROWSUP))
+ if (next && !(next->vm_flags & VM_GROWSUP)) {
+ /* see comments in expand_downwards() */
+ if (vma_is_accessible(prev))
+ return -ENOMEM;
+ if (address == next->vm_start)
return -ENOMEM;
- /* Check that both stack segments have the same anon_vma? */
}
if (next)
@@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
/* Enforce stack_guard_gap */
prev = vma_prev(&vmi);
/* Check that both stack segments have the same anon_vma? */
- if (prev) {
- if (!(prev->vm_flags & VM_GROWSDOWN) &&
- vma_is_accessible(prev) &&
- (address - prev->vm_end < stack_guard_gap))
+ if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
+ (address - prev->vm_end < stack_guard_gap)) {
+ /*
+ * If the previous VMA is accessible, this is the normal case
+ * where the main stack is growing down towards some unrelated
+ * VMA. Enforce the full stack guard gap.
+ */
+ if (vma_is_accessible(prev))
+ return -ENOMEM;
+
+ /*
+ * If the previous VMA is not accessible, we have a problem:
+ * We can't tell what userspace's intent is.
+ *
+ * Case A:
+ * Maybe userspace wants to use the previous VMA as a
+ * "guard region" at the bottom of the main stack, in which case
+ * userspace wants us to grow the stack until it is adjacent to
+ * the guard region. Apparently some Java runtime environments
+ * and Rust do that?
+ * That is kind of ugly, and in that case userspace really ought
+ * to ensure that the stack is fully expanded immediately, but
+ * we have to handle this case.
+ *
+ * Case B:
+ * But maybe the previous VMA is entirely unrelated to the stack
+ * and is only *temporarily* PROT_NONE. For example, glibc
+ * malloc arenas create a big PROT_NONE region and then
+ * progressively mark parts of it as writable.
+ * In that case, we must not let the stack become adjacent to
+ * the previous VMA. Otherwise, after the region later becomes
+ * writable, a stack overflow will cause the stack to grow into
+ * the previous VMA, and we won't have any stack gap to protect
+ * against this.
+ *
+ * As an ugly tradeoff, enforce a single-page gap.
+ * A single page will hopefully be small enough to not be
+ * noticed in case A, while providing the same level of
+ * protection in case B that normal userspace threads get.
+ */
+ if (address == prev->vm_end)
return -ENOMEM;
}
---
base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
change-id: 20241008-stack-gap-inaccessible-c7319f7d4b1b
--
Jann Horn <jannh(a)google.com>
This series of patches contains 3 separate changes that fix some bugs in
the qla2xxx driver.
---
v2:
- Change a spinlock wrap to a WRITE_ONCE() in patch 1
- Add Reviewed-by tags on patches 2 and 3
---
Anastasia Kovaleva (3):
scsi: qla2xxx: Drop starvation counter on success
scsi: qla2xxx: Make target send correct LOGO
scsi: qla2xxx: Remove incorrect trap
drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++
drivers/scsi/qla2xxx/qla_isr.c | 4 ++++
drivers/scsi/qla2xxx/qla_target.c | 16 +++++++---------
3 files changed, 22 insertions(+), 9 deletions(-)
--
2.40.1
This series of patches contains 3 separate changes that fix some bugs in
the qla2xxx driver.
---
v3:
- Fix build issue in patch 1
v2:
- Change a spinlock wrap to a WRITE_ONCE() in patch 1
- Add Reviewed-by tags on patches 2 and 3
---
Anastasia Kovaleva (3):
scsi: qla2xxx: Drop starvation counter on success
scsi: qla2xxx: Make target send correct LOGO
scsi: qla2xxx: Remove incorrect trap
drivers/scsi/qla2xxx/qla_iocb.c | 11 +++++++++++
drivers/scsi/qla2xxx/qla_isr.c | 4 ++++
drivers/scsi/qla2xxx/qla_target.c | 16 +++++++---------
3 files changed, 22 insertions(+), 9 deletions(-)
--
2.40.1
Currently, there's nothing actually stopping a driver from only registering
vblank support for some of it's CRTCs and not for others. As far as I can
tell, this isn't really defined behavior on the C side of things - as the
documentation explicitly mentions to not use drm_vblank_init() if you don't
have vblank support - since DRM then steps in and adds its own vblank
emulation implementation.
So, let's fix this edge case and check to make sure it's all or none.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 3ed4351a83ca ("drm: Extract drm_vblank.[hc]")
Cc: Stefan Agner <stefan(a)agner.ch>
Cc: Daniel Vetter <daniel.vetter(a)intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Simona Vetter <simona(a)ffwll.ch>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.13+
---
drivers/gpu/drm/drm_vblank.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 94e45ed6869d0..4d00937e8ca2e 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -525,9 +525,19 @@ static void drm_vblank_init_release(struct drm_device *dev, void *ptr)
*/
int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs)
{
+ struct drm_crtc *crtc;
int ret;
unsigned int i;
+ // Confirm that the required vblank functions have been filled out for all CRTCS
+ drm_for_each_crtc(crtc, dev) {
+ if (!crtc->funcs->enable_vblank || !crtc->funcs->disable_vblank) {
+ drm_err(dev, "CRTC vblank functions not initialized for %s, abort\n",
+ crtc->name);
+ return -EINVAL;
+ }
+ }
+
spin_lock_init(&dev->vbl_lock);
spin_lock_init(&dev->vblank_time_lock);
base-commit: 22512c3ee0f47faab5def71c4453638923c62522
--
2.46.1
From: Zijun Hu <quic_zijuhu(a)quicinc.com>
dev_pm_get_subsys_data() has below 2 issues under condition
(@dev->power.subsys_data != NULL):
- it will do unnecessary kzalloc() and kfree().
- it will return -ENOMEM if the kzalloc() fails, that is wrong
since the kzalloc() is not needed.
Fixed by not doing kzalloc() and returning 0 for the condition.
Fixes: ef27bed1870d ("PM: Reference counting of power.subsys_data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
drivers/base/power/common.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/base/power/common.c b/drivers/base/power/common.c
index 8c34ae1cd8d5..13cb1f2a06e7 100644
--- a/drivers/base/power/common.c
+++ b/drivers/base/power/common.c
@@ -26,6 +26,14 @@ int dev_pm_get_subsys_data(struct device *dev)
{
struct pm_subsys_data *psd;
+ spin_lock_irq(&dev->power.lock);
+ if (dev->power.subsys_data) {
+ dev->power.subsys_data->refcount++;
+ spin_unlock_irq(&dev->power.lock);
+ return 0;
+ }
+ spin_unlock_irq(&dev->power.lock);
+
psd = kzalloc(sizeof(*psd), GFP_KERNEL);
if (!psd)
return -ENOMEM;
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241010-fix_dev_pm_get_subsys_data-2478bb200fde
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
When a devlink instance is unregistered the following happens (among
other things):
t0 - The instance is marked with 'DEVLINK_UNREGISTERING'.
t1 - Blocking until an RCU grace period passes.
t2 - The 'DEVLINK_UNREGISTERING' mark is cleared from the instance.
When iterating over devlink instances (f.e., when requesting a dump of
available instances) and encountering an instance that is currently
being unregistered, the current code will loop around until the
'DEVLINK_UNREGISTERING' mark is cleared.
The iteration over devlink instances happens in an RCU critical section,
so if the instance that is currently being unregistered was encountered
between t0 and t1, the system will deadlock and RCU stalls will be
reported [1]. The task unregistering the instance will forever wait for an
RCU grace period to pass and the task iterating over the instances will
forever wait for the mark to be cleared.
The issue can be reliably reproduced by increasing the time window
between t0 and t1 (used a 60 seconds sleep) and running the following
reproducer [2].
Fix by skipping over instances that are currently being unregistered.
[1]
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P344
(detected by 4, t=26002 jiffies, g=5773, q=12 ncpus=8)
task:devlink state:R running task stack:25568 pid:344 ppid:260 flags:0x00004002
[...]
Call Trace:
xa_get_mark+0x184/0x3e0
devlinks_xa_find_get.constprop.0+0xc6/0x2e0
devlink_nl_cmd_get_dumpit+0x105/0x3f0
netlink_dump+0x568/0xff0
__netlink_dump_start+0x651/0x900
genl_family_rcv_msg_dumpit+0x201/0x340
genl_rcv_msg+0x573/0x780
netlink_rcv_skb+0x15f/0x430
genl_rcv+0x29/0x40
netlink_unicast+0x546/0x800
netlink_sendmsg+0x958/0xe60
__sys_sendto+0x3a2/0x480
__x64_sys_sendto+0xe1/0x1b0
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x68/0xd2
[2]
# echo 10 > /sys/bus/netdevsim/new_device
# echo 10 > /sys/bus/netdevsim/del_device &
# devlink dev
Fixes: c2368b19807a ("net: devlink: introduce "unregistering" mark and use it during devlinks iteration")
Reported-by: Vivek Reddy Karri <vkarri(a)nvidia.com>
Signed-off-by: Ido Schimmel <idosch(a)nvidia.com>
---
I read the stable rules and I am not providing an "upstream commit ID"
since the code in upstream has been reworked, making this fix
irrelevant. The only affected stable kernel is 6.1.y.
---
net/devlink/leftover.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/devlink/leftover.c b/net/devlink/leftover.c
index 032c7af065cd..c6f781a08d06 100644
--- a/net/devlink/leftover.c
+++ b/net/devlink/leftover.c
@@ -301,6 +301,9 @@ devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter,
if (!devlink)
goto unlock;
+ /* For a possible retry, the xa_find_after() should be always used */
+ xa_find_fn = xa_find_after;
+
/* In case devlink_unregister() was already called and "unregistering"
* mark was set, do not allow to get a devlink reference here.
* This prevents live-lock of devlink_unregister() wait for completion.
@@ -308,8 +311,6 @@ devlinks_xa_find_get(struct net *net, unsigned long *indexp, xa_mark_t filter,
if (xa_get_mark(&devlinks, *indexp, DEVLINK_UNREGISTERING))
goto retry;
- /* For a possible retry, the xa_find_after() should be always used */
- xa_find_fn = xa_find_after;
if (!devlink_try_get(devlink))
goto retry;
if (!net_eq(devlink_net(devlink), net)) {
--
2.46.1
From: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
commit b30289e7fa927f921bfb4d0d04727461706ae822 upstream.
If the retain context is enabled we will unconditionally increment the
device's pm use count on each exception and when the drivers are unloaded
we do not correct this (as we don't know how many times we 'prevented
d3 entry').
Introduce a flag to make sure that we do not increment the use count more
than once and on module unload decrement the use count if needed to
balance it.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Link: https://msgid.link/r/20240213114729.7055-1-peter.ujfalusi@linux.intel.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Vitaliy Shevtsov <v.shevtsov(a)maxima.ru>
---
sound/soc/sof/core.c | 10 ++++++++++
sound/soc/sof/debug.c | 8 +++++---
sound/soc/sof/sof-priv.h | 1 +
3 files changed, 16 insertions(+), 3 deletions(-)
diff --git a/sound/soc/sof/core.c b/sound/soc/sof/core.c
index 0938b259f703..cba0af5b86aa 100644
--- a/sound/soc/sof/core.c
+++ b/sound/soc/sof/core.c
@@ -477,6 +477,16 @@ int snd_sof_device_remove(struct device *dev)
*/
snd_sof_machine_unregister(sdev, pdata);
+ /*
+ * Balance the runtime pm usage count in case we are faced with an
+ * exception and we forcably prevented D3 power state to preserve
+ * context
+ */
+ if (sdev->d3_prevented) {
+ sdev->d3_prevented = false;
+ pm_runtime_put_noidle(sdev->dev);
+ }
+
if (sdev->fw_state > SOF_FW_BOOT_NOT_STARTED) {
sof_fw_trace_free(sdev);
ret = snd_sof_dsp_power_down_notify(sdev);
diff --git a/sound/soc/sof/debug.c b/sound/soc/sof/debug.c
index d547318e0d32..7c8aafca8fde 100644
--- a/sound/soc/sof/debug.c
+++ b/sound/soc/sof/debug.c
@@ -433,13 +433,15 @@ static void snd_sof_ipc_dump(struct snd_sof_dev *sdev)
void snd_sof_handle_fw_exception(struct snd_sof_dev *sdev, const char *msg)
{
- if (IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT) ||
- sof_debug_check_flag(SOF_DBG_RETAIN_CTX)) {
+ if ((IS_ENABLED(CONFIG_SND_SOC_SOF_DEBUG_RETAIN_DSP_CONTEXT) ||
+ sof_debug_check_flag(SOF_DBG_RETAIN_CTX)) && !sdev->d3_prevented) {
/* should we prevent DSP entering D3 ? */
if (!sdev->ipc_dump_printed)
dev_info(sdev->dev,
"Attempting to prevent DSP from entering D3 state to preserve context\n");
- pm_runtime_get_if_in_use(sdev->dev);
+
+ if (pm_runtime_get_if_in_use(sdev->dev) == 1)
+ sdev->d3_prevented = true;
}
/* dump vital information to the logs */
diff --git a/sound/soc/sof/sof-priv.h b/sound/soc/sof/sof-priv.h
index d4f6702e93dc..b27a5f9e1bc8 100644
--- a/sound/soc/sof/sof-priv.h
+++ b/sound/soc/sof/sof-priv.h
@@ -593,6 +593,7 @@ struct snd_sof_dev {
struct list_head dfsentry_list;
bool dbg_dump_printed;
bool ipc_dump_printed;
+ bool d3_prevented; /* runtime pm use count incremented to prevent context lost */
/* firmware loader */
struct sof_ipc_fw_ready fw_ready;
--
2.46.2
If:
1) the user requested USO, but
2) there is not enough payload for GSO to kick in, and
3) the egress device doesn't offer checksum offload, then
we want to compute the L4 checksum in software early on.
In the case when we taking the GSO path, but it has been requested, the
software checksum fallback in skb_segment doesn't get a chance to compute
the full checksum, if the egress device can't do it. As a result we end up
sending UDP datagrams with only a partial checksum filled in, which the
peer will discard.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: Ivan Babrou <ivan(a)cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Cc: stable(a)vger.kernel.org
---
This shouldn't have fallen through the cracks. I clearly need to extend the
net/udpgso selftests further to cover the whole TX path for software
USO+csum case. I will follow up with that but I wanted to get the fix out
in the meantime. Apologies for the oversight.
---
net/ipv4/udp.c | 4 +++-
net/ipv6/udp.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8accbf4cb295..2849b273b131 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -951,8 +951,10 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
cork->gso_size);
+
+ /* Don't checksum the payload, skb will get segmented */
+ goto csum_partial;
}
- goto csum_partial;
}
if (is_udplite) /* UDP-Lite */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 52dfbb2ff1a8..0cef8ae5d1ea 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1266,8 +1266,10 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
cork->gso_size);
+
+ /* Don't checksum the payload, skb will get segmented */
+ goto csum_partial;
}
- goto csum_partial;
}
if (is_udplite)