area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Signed-off-by: Roman Penyaev rpenyaev@suse.de Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Joe Perches joe@perches.com Cc: "Luis R. Rodriguez" mcgrof@kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org --- mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 871e41c55e23..2cd24186ba84 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr, if (!(area->flags & VM_USERMAP)) return -EINVAL;
- if (kaddr + size > area->addr + area->size) + if (kaddr + size > area->addr + get_vm_area_size(area)) return -EINVAL;
do {
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Can this actually happen? I am not really familiar with all the callers of this API but VM_NO_GUARD is not really used wildly in the kernel. All I can see is kasan na arm64 which doesn't really seem to use it for vmalloc.
So is the problem real or this is a mere cleanup?
Signed-off-by: Roman Penyaev rpenyaev@suse.de Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Joe Perches joe@perches.com Cc: "Luis R. Rodriguez" mcgrof@kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 871e41c55e23..2cd24186ba84 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr, if (!(area->flags & VM_USERMAP)) return -EINVAL;
- if (kaddr + size > area->addr + area->size)
- if (kaddr + size > area->addr + get_vm_area_size(area)) return -EINVAL;
do { -- 2.19.1
On 2019-01-03 16:13, Michal Hocko wrote:
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Can this actually happen? I am not really familiar with all the callers of this API but VM_NO_GUARD is not really used wildly in the kernel.
Exactly, by default (VM_NO_GUARD is not set) each area has guard page, thus the area->size will be bigger. The bug is not reproduced if VM_NO_GUARD is set.
All I can see is kasan na arm64 which doesn't really seem to use it for vmalloc.
So is the problem real or this is a mere cleanup?
This is the real problem, try this hunk for any file descriptor which provides mapping, or say modify epoll as example:
-------------------------------- diff --git a/fs/eventpoll.c b/fs/eventpoll.c
+static int ep_mmap(struct file *file, struct vm_area_struct *vma) +{ + void *mem; + + mem = vmalloc_user(4096); + BUG_ON(!mem); + /* Do not care about mem leak */ + + return remap_vmalloc_range(vma, mem, 0); +} + /* File callbacks that implement the eventpoll file behaviour */ static const struct file_operations eventpoll_fops = { #ifdef CONFIG_PROC_FS .show_fdinfo = ep_show_fdinfo, #endif + .mmap = ep_mmap, .release = ep_eventpoll_release, --------------------------------
and the following code from userspace, which maps 2 pages, instead of 1:
-------------------------------- epfd = epoll_create1(0); assert(epfd >= 0);
p = mmap(NULL, 2<<12, PROT_WRITE|PROT_READ, MAP_PRIVATE, epfd, 0); assert(p != MAP_FAILED); --------------------------------
You immediately get the following oops:
[ 38.894571] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 38.899048] #PF error: [normal kernel read fault] [ 38.901487] PGD 0 P4D 0 [ 38.902801] Oops: 0000 [#1] PREEMPT SMP PTI [ 38.904984] CPU: 2 PID: 399 Comm: mmap-epoll Not tainted 4.20.0-1 #238 [ 38.914064] RIP: 0010:vm_insert_page+0x3b/0x1d0 [ 38.941181] Call Trace: [ 38.941656] remap_vmalloc_range_partial+0x8d/0xd0 [ 38.942417] mmap_region+0x3c7/0x630 [ 38.942982] do_mmap+0x38d/0x560 [ 38.943479] vm_mmap_pgoff+0x9a/0xf0 [ 38.944028] ksys_mmap_pgoff+0x18e/0x220 [ 38.944554] do_syscall_64+0x48/0xf0 [ 38.945076] entry_SYSCALL_64_after_hwframe+0x44/0xa9
-- Roman
On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
On 2019-01-03 16:13, Michal Hocko wrote:
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Can this actually happen? I am not really familiar with all the callers of this API but VM_NO_GUARD is not really used wildly in the kernel.
Exactly, by default (VM_NO_GUARD is not set) each area has guard page, thus the area->size will be bigger. The bug is not reproduced if VM_NO_GUARD is set.
All I can see is kasan na arm64 which doesn't really seem to use it for vmalloc.
So is the problem real or this is a mere cleanup?
This is the real problem, try this hunk for any file descriptor which provides mapping, or say modify epoll as example:
OK, my response was more confusing than I intended. I meant to say. Is there any in kernel code that would allow the bug have had in mind? In other words can userspace trick any existing code?
On 2019-01-03 20:40, Michal Hocko wrote:
On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
On 2019-01-03 16:13, Michal Hocko wrote:
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Can this actually happen? I am not really familiar with all the callers of this API but VM_NO_GUARD is not really used wildly in the kernel.
Exactly, by default (VM_NO_GUARD is not set) each area has guard page, thus the area->size will be bigger. The bug is not reproduced if VM_NO_GUARD is set.
All I can see is kasan na arm64 which doesn't really seem to use it for vmalloc.
So is the problem real or this is a mere cleanup?
This is the real problem, try this hunk for any file descriptor which provides mapping, or say modify epoll as example:
OK, my response was more confusing than I intended. I meant to say. Is there any in kernel code that would allow the bug have had in mind? In other words can userspace trick any existing code?
In theory any existing caller of remap_vmalloc_range() which does not have an explicit size check should trigger an oops, e.g. this is a good candidate:
*** drivers/media/usb/stkwebcam/stk-webcam.c: v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
According to the code no explicit size check, should be easy to reproduce: mmap the frame buffer and you are done.
Other callers are not so easy to follow. But wait, here is another example:
(drivers/video/fbdev/core/fbmem.c) static int fb_mmap(struct file *file, struct vm_area_struct * vma) ... res = fb->fb_mmap(info, vma);
(drivers/video/fbdev/vfb.c) static int vfb_mmap(struct fb_info *info, struct vm_area_struct *vma) { return remap_vmalloc_range(vma, (void *)info->fix.smem_start, vma->vm_pgoff); }
No checks, naked calls, should be also the candidate.
-- Roman
On Thu 03-01-19 21:31:58, Roman Penyaev wrote:
On 2019-01-03 20:40, Michal Hocko wrote:
On Thu 03-01-19 20:27:26, Roman Penyaev wrote:
On 2019-01-03 16:13, Michal Hocko wrote:
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Can this actually happen? I am not really familiar with all the callers of this API but VM_NO_GUARD is not really used wildly in the kernel.
Exactly, by default (VM_NO_GUARD is not set) each area has guard page, thus the area->size will be bigger. The bug is not reproduced if VM_NO_GUARD is set.
All I can see is kasan na arm64 which doesn't really seem to use it for vmalloc.
So is the problem real or this is a mere cleanup?
This is the real problem, try this hunk for any file descriptor which provides mapping, or say modify epoll as example:
OK, my response was more confusing than I intended. I meant to say. Is there any in kernel code that would allow the bug have had in mind? In other words can userspace trick any existing code?
In theory any existing caller of remap_vmalloc_range() which does not have an explicit size check should trigger an oops, e.g. this is a good candidate:
*** drivers/media/usb/stkwebcam/stk-webcam.c: v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have buf->v4lbuf.length. mmap callback maps this buffer to the vma size and that is indeed not enforced to be <= length AFAICS. So you are right!
Can we have an example in the changelog please?
Thanks!
On 2019-01-04 10:38, Michal Hocko wrote:
[...]
OK, my response was more confusing than I intended. I meant to say. Is there any in kernel code that would allow the bug have had in mind? In other words can userspace trick any existing code?
In theory any existing caller of remap_vmalloc_range() which does not have an explicit size check should trigger an oops, e.g. this is a good candidate:
*** drivers/media/usb/stkwebcam/stk-webcam.c: v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have buf->v4lbuf.length. mmap callback maps this buffer to the vma size and that is indeed not enforced to be <= length AFAICS. So you are right!
Can we have an example in the changelog please?
You mean to resend this particular patch with the list of possible candidates for oops in a comment message? Sure thing.
-- Roman
On Fri 04-01-19 11:21:39, Roman Penyaev wrote:
On 2019-01-04 10:38, Michal Hocko wrote:
[...]
OK, my response was more confusing than I intended. I meant to say. Is there any in kernel code that would allow the bug have had in mind? In other words can userspace trick any existing code?
In theory any existing caller of remap_vmalloc_range() which does not have an explicit size check should trigger an oops, e.g. this is a good candidate:
*** drivers/media/usb/stkwebcam/stk-webcam.c: v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
Hmm, sbuf->buffer is allocated in stk_setup_siobuf to have buf->v4lbuf.length. mmap callback maps this buffer to the vma size and that is indeed not enforced to be <= length AFAICS. So you are right!
Can we have an example in the changelog please?
You mean to resend this particular patch with the list of possible candidates for oops in a comment message? Sure thing.
I would just reply to the original patch with an updated changelog wording (to include the above example and explain how the vma setup is completely independent on the buffer allocation and ask Andrew to update the changelog of the patch that is already in the mmotm tree).
Thanks!
On Thu 03-01-19 15:59:52, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Signed-off-by: Roman Penyaev rpenyaev@suse.de Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Joe Perches joe@perches.com Cc: "Luis R. Rodriguez" mcgrof@kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
Forgot to add Acked-by: Michal Hocko mhocko@suse.com Although I am not really sure the stable backport is really needed as I haven't seen any explicit example of a buggy kernel code to trigger the issue.
mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 871e41c55e23..2cd24186ba84 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr, if (!(area->flags & VM_USERMAP)) return -EINVAL;
- if (kaddr + size > area->addr + area->size)
- if (kaddr + size > area->addr + get_vm_area_size(area)) return -EINVAL;
do { -- 2.19.1
Hi Andrew,
Could you please update the commit message with something as the following (I hope it will become a bit clearer now):
------------- When VM_NO_GUARD is not set area->size includes adjacent guard page, thus for correct size checking get_vm_area_size() should be used, but not area->size.
This fixes possible kernel oops when userspace tries to mmap an area on 1 page bigger than was allocated by vmalloc_user() call: the size check inside remap_vmalloc_range_partial() accounts non-existing guard page also, so check successfully passes but vmalloc_to_page() returns NULL (guard page does not physically exist).
The following code pattern example should trigger an oops:
static int oops_mmap(struct file *file, struct vm_area_struct *vma) { void *mem;
mem = vmalloc_user(4096); BUG_ON(!mem); /* Do not care about mem leak */
return remap_vmalloc_range(vma, mem, 0); }
And userspace simply mmaps size + PAGE_SIZE:
mmap(NULL, 8192, PROT_WRITE|PROT_READ, MAP_PRIVATE, fd, 0);
Possible candidates for oops which do not have any explicit size checks:
*** drivers/media/usb/stkwebcam/stk-webcam.c: v4l_stk_mmap[789] ret = remap_vmalloc_range(vma, sbuf->buffer, 0);
Or the following one:
*** drivers/video/fbdev/core/fbmem.c static int fb_mmap(struct file *file, struct vm_area_struct * vma) ... res = fb->fb_mmap(info, vma);
Where fb_mmap callback calls remap_vmalloc_range() directly without any explicit checks:
*** drivers/video/fbdev/vfb.c static int vfb_mmap(struct fb_info *info, struct vm_area_struct *vma) { return remap_vmalloc_range(vma, (void *)info->fix.smem_start, vma->vm_pgoff); } --------
-- Roman
On 2019-01-03 15:59, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Signed-off-by: Roman Penyaev rpenyaev@suse.de Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Joe Perches joe@perches.com Cc: "Luis R. Rodriguez" mcgrof@kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
mm/vmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 871e41c55e23..2cd24186ba84 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2248,7 +2248,7 @@ int remap_vmalloc_range_partial(struct vm_area_struct *vma, unsigned long uaddr, if (!(area->flags & VM_USERMAP)) return -EINVAL;
- if (kaddr + size > area->addr + area->size)
if (kaddr + size > area->addr + get_vm_area_size(area)) return -EINVAL;
do {
On 1/3/19 5:59 PM, Roman Penyaev wrote:
area->size can include adjacent guard page but get_vm_area_size() returns actual size of the area.
This fixes possible kernel crash when userspace tries to map area on 1 page bigger: size check passes but the following vmalloc_to_page() returns NULL on last guard (non-existing) page.
Signed-off-by: Roman Penyaev rpenyaev@suse.de Cc: Andrew Morton akpm@linux-foundation.org Cc: Michal Hocko mhocko@suse.com Cc: Andrey Ryabinin aryabinin@virtuozzo.com Cc: Joe Perches joe@perches.com Cc: "Luis R. Rodriguez" mcgrof@kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org
Fixes: e69e9d4aee71 ("vmalloc: introduce remap_vmalloc_range_partial") Acked-by: Andrey Ryabinin aryabinin@virtuozzo.com
linux-stable-mirror@lists.linaro.org