On Wed, Oct 9, 2024 at 4:53 PM Lorenzo Stoakes lorenzo.stoakes@oracle.com wrote:
On Tue, Oct 08, 2024 at 04:20:13PM +0200, Jann Horn wrote:
Though, while writing the above reproducer, I noticed another dodgy scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently ignores the stack guard region, because it only checks for VMA intersection, see this example:
[snip]
That could also be bad: MAP_FIXED_NOREPLACE exists, from what I understand, partly so that malloc implementations can use it to grow heap memory chunks (though glibc doesn't use it, I'm not sure who actually uses it that way). We wouldn't want a malloc implementation to grow a heap memory chunk until it is directly adjacent to a stack.
It seems... weird to use it that way as you couldn't be sure you weren't overwriting another VMA.
Here I'm talking about MAP_FIXED_NOREPLACE, not MAP_FIXED. MAP_FIXED_NOREPLACE is supposed to be sort of like calling mmap() with an address hint, except that if creating the VMA at the provided hint is not possible, it fails. I remember Daniel Micay talking about using it in his memory allocator at some point...
@@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address) /* Enforce stack_guard_gap */ prev = vma_prev(&vmi); /* Check that both stack segments have the same anon_vma? */
if (prev) {
if (!(prev->vm_flags & VM_GROWSDOWN) &&
vma_is_accessible(prev) &&
(address - prev->vm_end < stack_guard_gap))
if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
(address - prev->vm_end < stack_guard_gap)) {
/*
* If the previous VMA is accessible, this is the normal case
* where the main stack is growing down towards some unrelated
* VMA. Enforce the full stack guard gap.
*/
if (vma_is_accessible(prev))
return -ENOMEM;
/*
* If the previous VMA is not accessible, we have a problem:
* We can't tell what userspace's intent is.
*
* Case A:
* Maybe userspace wants to use the previous VMA as a
* "guard region" at the bottom of the main stack, in which case
* userspace wants us to grow the stack until it is adjacent to
* the guard region. Apparently some Java runtime environments
* and Rust do that?
* That is kind of ugly, and in that case userspace really ought
* to ensure that the stack is fully expanded immediately, but
* we have to handle this case.
Yeah we can't break userspace on this, no doubt somebody is relying on this _somewhere_.
It would have to be a new user who appeared after commit 1be7107fbe18. And they'd have to install a "guard vma" somewhere below the main stack, and they'd have to care so much about the size of the stack that a single page makes a difference.
You did say 'Apparently some Java runtime environments and Rust do that' though right? Or am I misunderstanding?
Ah, sorry, the context for this is in the commit message of commit 561b5e0709e4, and the upstream discussion leading up to it (https://lore.kernel.org/all/1499126133.2707.20.camel@decadent.org.uk/T/).
So before commit 1be7107fbe18, these workloads worked fine despite the kernel unconditionally enforcing a single-page gap; and only when 1be7107fbe18 changed that gap to be 1MB, people started seeing issues, which 561b5e0709e4 was supposed to address.
So my idea with this patch was to revert the behavior for such workloads to the pre-1be7107fbe18 situation.