[Linux-stable-mirror] [PATCH] exit: move exit_task_namespaces() after exit_task_work()

List overview All Threads
Download

newer

older

[Linux-stable-mirror] v4.9.70...

[Linux-stable-mirror] v4.4.106...

Cong Wang

14 Dec 2017 14 Dec '17

8:17 p.m.

syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

Reported-by: syzbot syzkaller@googlegroups.com Cc: Ingo Molnar mingo@kernel.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com --- kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c index 6b4298a41167..909e43c45158 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -861,8 +861,8 @@ void __noreturn do_exit(long code) exit_fs(tsk); if (group_dead) disassociate_ctty(1); - exit_task_namespaces(tsk); exit_task_work(tsk); + exit_task_namespaces(tsk); exit_thread(tsk);

-- 2.13.0

Show replies by date

Al Viro

14 Dec 14 Dec

9:08 p.m.

On Thu, Dec 14, 2017 at 12:17:57PM -0800, Cong Wang wrote:

...

syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

What's to prevent somebody else holding a reference to the same inode past the exit(2)? IOW, I don't believe that this is fixing anything - in the best case, your patch papers over a specific reproducer.

Cong Wang

15 Dec 15 Dec

11:59 p.m.

On Thu, Dec 14, 2017 at 1:08 PM, Al Viro viro@zeniv.linux.org.uk wrote:

...

On Thu, Dec 14, 2017 at 12:17:57PM -0800, Cong Wang wrote:

...
syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

What's to prevent somebody else holding a reference to the same inode past the exit(2)? IOW, I don't believe that this is fixing anything - in the best case, your patch papers over a specific reproducer.

You are right, I missed mq_clear_sbinfo().

And the offending commit is:

commit 9c583773d036336176e9e50441890659bc4eeae8 Author: Giuseppe Scrivano gscrivan@redhat.com Date: Fri Dec 15 01:06:28 2017 +0000

ipc, mqueue: lazy call kern_mount_data in new namespaces

kern_mount_data is a relatively expensive operation when creating a new IPC namespace, so delay the mount until its first usage when not creating the the global namespace.

This is a net saving for new IPC namespaces that don't use mq_open(). In this case there won't be any kern_mount_data() cost at all.

On my machine, the time for creating 1000 new IPC namespaces dropped from ~8s to ~2s.

Link: http://lkml.kernel.org/r/20171206151422.9660-1-gscrivan@redhat.com Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com Cc: Manfred Spraul manfred@colorfullife.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Andrew Morton akpm@linux-foundation.org

Giuseppe Scrivano

16 Dec 16 Dec

10:04 p.m.

Cong Wang xiyou.wangcong@gmail.com writes:

...

On Thu, Dec 14, 2017 at 1:08 PM, Al Viro viro@zeniv.linux.org.uk wrote:

...
On Thu, Dec 14, 2017 at 12:17:57PM -0800, Cong Wang wrote:

...
syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

What's to prevent somebody else holding a reference to the same inode past the exit(2)? IOW, I don't believe that this is fixing anything - in the best case, your patch papers over a specific reproducer.

You are right, I missed mq_clear_sbinfo().

yes, the patch 9c583773d036336176e9e50441890659bc4eeae8 introduced an issue since it doesn't reset s_fs_info when ns->mq_mnt == NULL.

The following patch solves the issue for me. I store the superblock in "struct ipc_namespace" since we cannot assume "mq_mnt" exists.

Does the patch look fine? I'll clean it up and submit it separately if this approach is correct.

Regards, Giuseppe

diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index 554e31494f69..66d4a5740a60 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -53,6 +53,7 @@ struct ipc_namespace {

/* The kern_mount of the mqueuefs sb. We take a ref on it */ struct vfsmount *mq_mnt; + struct super_block *mq_sb;

/* # queues in this ns, protected by mq_lock */ unsigned int mq_queues_count; diff --git a/ipc/mqueue.c b/ipc/mqueue.c index f78d6687a61b..16347cf1de2d 100644 --- a/ipc/mqueue.c +++ b/ipc/mqueue.c @@ -341,6 +341,7 @@ static int mqueue_fill_super(struct super_block *sb, void *data, int silent) sb->s_root = d_make_root(inode); if (!sb->s_root) return -ENOMEM; + ns->mq_sb = sb; return 0; }

@@ -1554,6 +1555,7 @@ int mq_init_ns(struct ipc_namespace *ns, bool mount) ns->mq_msg_max = DFLT_MSGMAX; ns->mq_msgsize_max = DFLT_MSGSIZEMAX; ns->mq_msg_default = DFLT_MSG; + ns->mq_sb = NULL; ns->mq_msgsize_default = DFLT_MSGSIZE;

if (!mount) @@ -1573,8 +1575,8 @@ int mq_init_ns(struct ipc_namespace *ns, bool mount)

void mq_clear_sbinfo(struct ipc_namespace *ns) { - if (ns->mq_mnt) - ns->mq_mnt->mnt_sb->s_fs_info = NULL; + if (ns->mq_sb) + ns->mq_sb->s_fs_info = NULL; }

void mq_put_mnt(struct ipc_namespace *ns)

ebiederm＠xmission.com

15 Dec 15 Dec

6:56 a.m.

Cong Wang xiyou.wangcong@gmail.com writes:

...

syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

How does that possibly work. (I haven't seen this syzbot report).

Looking at the code we have get_ns_from_inode. Which takes the mq_lock, sees if the pointer is NULL and takes a reference if it is non-NULL.

Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held when the count drops to zero.

Where is the race in that?

The rest of mqueue_evict_inode uses the returned pointer and tests that the pointer is non-NULL before user it.

So either szbot is giving you a bad report or there is a subtle race there I am not seeing. The change below is not at all the proper way to fix a subtle race.

Eric

...

Reported-by: syzbot syzkaller@googlegroups.com Cc: Ingo Molnar mingo@kernel.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com

kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c index 6b4298a41167..909e43c45158 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -861,8 +861,8 @@ void __noreturn do_exit(long code) exit_fs(tsk); if (group_dead) disassociate_ctty(1);

exit_task_namespaces(tsk); exit_task_work(tsk);

exit_task_namespaces(tsk); exit_thread(tsk);

/*

Dmitry Vyukov

7:35 a.m.

On Fri, Dec 15, 2017 at 7:56 AM, Eric W. Biederman ebiederm@xmission.com wrote:

...

Cong Wang xiyou.wangcong@gmail.com writes:

...
syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

How does that possibly work. (I haven't seen this syzbot report).

Looking at the code we have get_ns_from_inode. Which takes the mq_lock, sees if the pointer is NULL and takes a reference if it is non-NULL.

Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held when the count drops to zero.

Where is the race in that?

The rest of mqueue_evict_inode uses the returned pointer and tests that the pointer is non-NULL before user it.

So either szbot is giving you a bad report or there is a subtle race there I am not seeing. The change below is not at all the proper way to fix a subtle race.

Eric

Cong, what was that report? Searching by "exit_task_work|exit_task_namespaces" there are too many of them: https://groups.google.com/forum/#%21searchin/syzkaller-bugs/%22exit_task_wor...

I can only say that syzbot does not make up reports. That's something that actually happened and was provoked by userspace.

...

...
Reported-by: syzbot syzkaller@googlegroups.com Cc: Ingo Molnar mingo@kernel.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com

kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c index 6b4298a41167..909e43c45158 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -861,8 +861,8 @@ void __noreturn do_exit(long code) exit_fs(tsk); if (group_dead) disassociate_ctty(1);
exit_task_namespaces(tsk);
exit_task_work(tsk);
exit_task_namespaces(tsk);
exit_thread(tsk);

/*

Dmitry Vyukov

8 a.m.

On Fri, Dec 15, 2017 at 8:35 AM, Dmitry Vyukov dvyukov@google.com wrote:

...

On Fri, Dec 15, 2017 at 7:56 AM, Eric W. Biederman ebiederm@xmission.com wrote:

...
Cong Wang xiyou.wangcong@gmail.com writes:

...
syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

How does that possibly work. (I haven't seen this syzbot report).

Looking at the code we have get_ns_from_inode. Which takes the mq_lock, sees if the pointer is NULL and takes a reference if it is non-NULL.

Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held when the count drops to zero.

Where is the race in that?

The rest of mqueue_evict_inode uses the returned pointer and tests that the pointer is non-NULL before user it.

So either szbot is giving you a bad report or there is a subtle race there I am not seeing. The change below is not at all the proper way to fix a subtle race.

Eric

Cong, what was that report? Searching by "exit_task_work|exit_task_namespaces" there are too many of them: https://groups.google.com/forum/#%21searchin/syzkaller-bugs/%22exit_task_wor...

I can only say that syzbot does not make up reports. That's something that actually happened and was provoked by userspace.

Ah, found that bug: https://groups.google.com/d/msg/syzkaller-bugs/1XBaqnPSXzs/VF-eCSPuCQAJ

...

...
...
Reported-by: syzbot syzkaller@googlegroups.com Cc: Ingo Molnar mingo@kernel.org Cc: Al Viro viro@zeniv.linux.org.uk Cc: Andrew Morton akpm@linux-foundation.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: stable@vger.kernel.org Signed-off-by: Cong Wang xiyou.wangcong@gmail.com

kernel/exit.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/exit.c b/kernel/exit.c index 6b4298a41167..909e43c45158 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -861,8 +861,8 @@ void __noreturn do_exit(long code) exit_fs(tsk); if (group_dead) disassociate_ctty(1);
exit_task_namespaces(tsk);
exit_task_work(tsk);
exit_task_namespaces(tsk);
exit_thread(tsk);

/*

Cong Wang

16 Dec 16 Dec

midnight

On Fri, Dec 15, 2017 at 12:00 AM, Dmitry Vyukov dvyukov@google.com wrote:

...

On Fri, Dec 15, 2017 at 8:35 AM, Dmitry Vyukov dvyukov@google.com wrote:

...
On Fri, Dec 15, 2017 at 7:56 AM, Eric W. Biederman ebiederm@xmission.com wrote:

...
Cong Wang xiyou.wangcong@gmail.com writes:

...
syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

How does that possibly work. (I haven't seen this syzbot report).

Looking at the code we have get_ns_from_inode. Which takes the mq_lock, sees if the pointer is NULL and takes a reference if it is non-NULL.

Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held when the count drops to zero.

Where is the race in that?

The rest of mqueue_evict_inode uses the returned pointer and tests that the pointer is non-NULL before user it.

So either szbot is giving you a bad report or there is a subtle race there I am not seeing. The change below is not at all the proper way to fix a subtle race.

Eric

Cong, what was that report? Searching by "exit_task_work|exit_task_namespaces" there are too many of them: https://groups.google.com/forum/#%21searchin/syzkaller-bugs/%22exit_task_wor...

I can only say that syzbot does not make up reports. That's something that actually happened and was provoked by userspace.

Ah, found that bug: https://groups.google.com/d/msg/syzkaller-bugs/1XBaqnPSXzs/VF-eCSPuCQAJ

Yeah, and it is introduced by:

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/commit/?id=9c583773d03633617...

Dmitry Vyukov

7:40 a.m.

On Sat, Dec 16, 2017 at 1:00 AM, Cong Wang xiyou.wangcong@gmail.com wrote:

...

...
...
...
...
syzbot reported we have a use-after-free when mqueue_evict_inode() is called on __cleanup_mnt() path, where the ipc ns is already freed by the previous exit_task_namespaces(). We can just move it after after exit_task_work() to avoid this use-after-free.

How does that possibly work. (I haven't seen this syzbot report).

Looking at the code we have get_ns_from_inode. Which takes the mq_lock, sees if the pointer is NULL and takes a reference if it is non-NULL.

Meanwhile put_ipc_ns calls mq_clear_sbinfo(ns) with the mq_lock held when the count drops to zero.

Where is the race in that?

The rest of mqueue_evict_inode uses the returned pointer and tests that the pointer is non-NULL before user it.

So either szbot is giving you a bad report or there is a subtle race there I am not seeing. The change below is not at all the proper way to fix a subtle race.

Eric

Cong, what was that report? Searching by "exit_task_work|exit_task_namespaces" there are too many of them: https://groups.google.com/forum/#%21searchin/syzkaller-bugs/%22exit_task_wor...

I can only say that syzbot does not make up reports. That's something that actually happened and was provoked by userspace.

Ah, found that bug: https://groups.google.com/d/msg/syzkaller-bugs/1XBaqnPSXzs/VF-eCSPuCQAJ

Yeah, and it is introduced by:

http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/commit/?id=9c583773d03633617...

If it's going to be discussed here, note that it was reported by syzbot and requires "#syz fix" tag after fixing here: https://groups.google.com/d/msg/syzkaller-bugs/1XBaqnPSXzs/jGDCMoz_AQAJ

2936

days inactive

2938

days old

linux-stable-mirror@lists.linaro.org

8 comments

participants

tags (0)

participants (5)

Al Viro
Cong Wang
Dmitry Vyukov
ebiederm＠xmission.com
Giuseppe Scrivano