BugLink: https://bugs.launchpad.net/bugs/1775165
[Note to upstream] I understand that this patch is a little long for -stable, but this patch series fixes a real issue, seen by real users, is testable, and is made up from upstream commits. Please consider it.
[Impact]
When userspace tasks which are processing fanotify permission events act incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes the whole notification subsystem to hang.
This has been seen in production, and it can also be seen when running the Linux Test Project testsuite, specifically fanotify07.
[Fix]
Instead of holding the SRCU lock while waiting for userspace to respond, which may never happen, or not in the order we are expecting, we drop the fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then reacquire the lock again when userspace responds.
The fixes are from a series of upstream commits:
05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick) 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport) abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)
[Testcase]
You can reproduce the problem pretty quickly with the Linux Test Project:
Steps (with root): 1. sudo apt-get install git xfsprogs -y 2. git clone --depth=1 https://github.com/linux-test-project/ltp.git 3. cd ltp 4. make autotools 5. ./configure 6. make; make install 7. cd /opt/ltp 8. echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs 9. ./runltp -f /tmp/jobs
On a stock 4.9.168 kernel, the system will hang, and the testcase will look like:
<<<test_start>>> tag=fanotify07 stime=1554326200 cmdline="fanotify07 " contacts="" analysis=exit <<<test_output>>> tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Test timeouted, sending SIGKILL! Cannot kill test processes! Congratulation, likely test hit a kernel bug. Exitting uncleanly... <<<execution_status>>> initiation_status="ok" duration=350 termination_type=exited termination_id=1 corefile=no cutime=0 cstime=0 <<<test_end>>>
Looking at dmesg, we see the following call stack
[ 41.648244] LTP: starting fanotify07 (fanotify07 ) [ 242.729211] INFO: task fanotify07:1511 blocked for more than 120 seconds. [ 242.729257] Not tainted 4.9.168vanilla #1 [ 242.729281] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 242.729320] fanotify07 D 0 1511 1510 0x00000000 [ 242.729325] ffff92faf98a64c0 ffff92faf50f8000 ffff92faf19f2d00 ffff92faf989ad00 [ 242.729329] ffff92faffc19900 ffffb2f5c0cbbc70 ffffffffae8cf2c2 ffffb2f5c0cbbd60 [ 242.729333] 00e590a200000010 ffff92faffc19900 0000000000000046 7fffffffffffffff [ 242.729336] Call Trace: [ 242.729345] [<ffffffffae8cf2c2>] ? __schedule+0x242/0x700 [ 242.729348] [<ffffffffae8cf7ac>] schedule+0x2c/0x80 [ 242.729351] [<ffffffffae8d2e0b>] schedule_timeout+0x1fb/0x370 [ 242.729355] [<ffffffffae0fcaae>] ? add_timer+0x11e/0x290 [ 242.729358] [<ffffffffae8d021a>] wait_for_completion+0xba/0x140 [ 242.729361] [<ffffffffae0b4ac0>] ? wake_up_q+0x80/0x80 [ 242.729364] [<ffffffffae0f1a44>] __synchronize_srcu+0xf4/0x140 [ 242.729367] [<ffffffffae0f08a0>] ? trace_raw_output_rcu_utilization+0x60/0x60 [ 242.729370] [<ffffffffae0f1ab3>] synchronize_srcu+0x23/0x40 [ 242.729374] [<ffffffffae283bab>] fsnotify_mark_destroy_list+0x7b/0xe0 [ 242.729377] [<ffffffffae282d7f>] fsnotify_destroy_group+0x1f/0x50 [ 242.729380] [<ffffffffae285e66>] fanotify_release+0xd6/0x140 [ 242.729384] [<ffffffffae23e70a>] __fput+0xea/0x230 [ 242.729386] [<ffffffffae23e88e>] ____fput+0xe/0x10 [ 242.729390] [<ffffffffae0a6f2c>] task_work_run+0x7c/0xa0 [ 242.729394] [<ffffffffae003753>] exit_to_usermode_loop+0x93/0xa0 [ 242.729397] [<ffffffffae003bc7>] do_syscall_64+0xc7/0xe0 [ 242.729400] [<ffffffffae8d448e>] entry_SYSCALL_64_after_swapgs+0x58/0xc6
Note this call stack is the same as the one for 4.4, but some function names have changed due to the two commits already included in 4.9.
On a patched kernel, the test will pass successfully, and there will be no messages in dmesg.
[Regression Potential]
This makes modifications to how locking is performed in fsnotify / fanotify and there may be some cause for regression. Running all fanotify Linux Test Project tests shows that there are no extra failures caused by the patches, and instead fewer failures are seen due to the bugfix.
Running the entire Linux Test Project testsuite actually works and runs to completion, something which doesn't happen in a unpatched kernel since it will hang on the fanotify07 test.
The patches are taken from upstream, and all necessary commits have been taken into account, so I am happy with the potential risks and that testing has been completed.
Jan Kara (3): fsnotify: Provide framework for dropping SRCU lock in ->handle_event fsnotify: Pass fsnotify_iter_info into handle_event handler fanotify: Release SRCU lock when waiting for userspace response
fs/notify/dnotify/dnotify.c | 3 +- fs/notify/fanotify/fanotify.c | 20 ++++++- fs/notify/fsnotify.c | 19 +++++-- fs/notify/fsnotify.h | 6 ++ fs/notify/group.c | 1 + fs/notify/inotify/inotify.h | 3 +- fs/notify/inotify/inotify_fsnotify.c | 3 +- fs/notify/inotify/inotify_user.c | 2 +- fs/notify/mark.c | 83 +++++++++++++++++++++++++++- include/linux/fsnotify_backend.h | 8 ++- kernel/audit_fsnotify.c | 3 +- kernel/audit_tree.c | 3 +- kernel/audit_watch.c | 3 +- 13 files changed, 139 insertions(+), 18 deletions(-)
From: Jan Kara jack@suse.cz
commit abc77577a669f424c5d0c185b9994f2621c52aa4 upstream.
fanotify wants to drop fsnotify_mark_srcu lock when waiting for response from userspace so that the whole notification subsystem is not blocked during that time. This patch provides a framework for safely getting mark reference for a mark found in the object list which pins the mark in that list. We can then drop fsnotify_mark_srcu, wait for userspace response and then safely continue iteration of the object list once we reaquire fsnotify_mark_srcu.
Reviewed-by: Miklos Szeredi mszeredi@redhat.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz [mruffell: backport: realign file fs/notify/mark.c] Signed-off-by: Matthew Ruffell matthew.ruffell@canonical.com --- fs/notify/fsnotify.h | 6 +++ fs/notify/group.c | 1 + fs/notify/mark.c | 83 +++++++++++++++++++++++++++++++- include/linux/fsnotify_backend.h | 5 ++ 4 files changed, 94 insertions(+), 1 deletion(-)
diff --git a/fs/notify/fsnotify.h b/fs/notify/fsnotify.h index 0a3bc2cf192c..0ad0eb9f2e14 100644 --- a/fs/notify/fsnotify.h +++ b/fs/notify/fsnotify.h @@ -8,6 +8,12 @@
#include "../mount.h"
+struct fsnotify_iter_info { + struct fsnotify_mark *inode_mark; + struct fsnotify_mark *vfsmount_mark; + int srcu_idx; +}; + /* destroy all events sitting in this groups notification queue */ extern void fsnotify_flush_notify(struct fsnotify_group *group);
diff --git a/fs/notify/group.c b/fs/notify/group.c index fbe3cbebec16..864103b707f4 100644 --- a/fs/notify/group.c +++ b/fs/notify/group.c @@ -124,6 +124,7 @@ struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops) /* set to 0 when there a no external references to this group */ atomic_set(&group->refcnt, 1); atomic_set(&group->num_marks, 0); + atomic_set(&group->user_waits, 0);
spin_lock_init(&group->notification_lock); INIT_LIST_HEAD(&group->notification_list); diff --git a/fs/notify/mark.c b/fs/notify/mark.c index d3fea0bd89e2..d3005d95d530 100644 --- a/fs/notify/mark.c +++ b/fs/notify/mark.c @@ -105,6 +105,16 @@ void fsnotify_get_mark(struct fsnotify_mark *mark) atomic_inc(&mark->refcnt); }
+/* + * Get mark reference when we found the mark via lockless traversal of object + * list. Mark can be already removed from the list by now and on its way to be + * destroyed once SRCU period ends. + */ +static bool fsnotify_get_mark_safe(struct fsnotify_mark *mark) +{ + return atomic_inc_not_zero(&mark->refcnt); +} + void fsnotify_put_mark(struct fsnotify_mark *mark) { if (atomic_dec_and_test(&mark->refcnt)) { @@ -125,6 +135,72 @@ u32 fsnotify_recalc_mask(struct hlist_head *head) return new_mask; }
+bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info) +{ + struct fsnotify_group *group; + + if (WARN_ON_ONCE(!iter_info->inode_mark && !iter_info->vfsmount_mark)) + return false; + + if (iter_info->inode_mark) + group = iter_info->inode_mark->group; + else + group = iter_info->vfsmount_mark->group; + + /* + * Since acquisition of mark reference is an atomic op as well, we can + * be sure this inc is seen before any effect of refcount increment. + */ + atomic_inc(&group->user_waits); + + if (iter_info->inode_mark) { + /* This can fail if mark is being removed */ + if (!fsnotify_get_mark_safe(iter_info->inode_mark)) + goto out_wait; + } + if (iter_info->vfsmount_mark) { + if (!fsnotify_get_mark_safe(iter_info->vfsmount_mark)) + goto out_inode; + } + + /* + * Now that both marks are pinned by refcount in the inode / vfsmount + * lists, we can drop SRCU lock, and safely resume the list iteration + * once userspace returns. + */ + srcu_read_unlock(&fsnotify_mark_srcu, iter_info->srcu_idx); + + return true; +out_inode: + if (iter_info->inode_mark) + fsnotify_put_mark(iter_info->inode_mark); +out_wait: + if (atomic_dec_and_test(&group->user_waits) && group->shutdown) + wake_up(&group->notification_waitq); + return false; +} + +void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info) +{ + struct fsnotify_group *group = NULL; + + iter_info->srcu_idx = srcu_read_lock(&fsnotify_mark_srcu); + if (iter_info->inode_mark) { + group = iter_info->inode_mark->group; + fsnotify_put_mark(iter_info->inode_mark); + } + if (iter_info->vfsmount_mark) { + group = iter_info->vfsmount_mark->group; + fsnotify_put_mark(iter_info->vfsmount_mark); + } + /* + * We abuse notification_waitq on group shutdown for waiting for all + * marks pinned when waiting for userspace. + */ + if (atomic_dec_and_test(&group->user_waits) && group->shutdown) + wake_up(&group->notification_waitq); +} + /* * Remove mark from inode / vfsmount list, group list, drop inode reference * if we got one. @@ -161,7 +237,6 @@ void fsnotify_detach_mark(struct fsnotify_mark *mark) * __fsnotify_parent() lazily when next event happens on one of our * children. */ - list_del_init(&mark->g_list);
spin_unlock(&mark->lock); @@ -508,6 +583,12 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group) __fsnotify_free_mark(mark); fsnotify_put_mark(mark); } + /* + * Some marks can still be pinned when waiting for response from + * userspace. Wait for those now. fsnotify_prepare_user_wait() will + * not succeed now so this wait is race-free. + */ + wait_event(group->notification_waitq, !atomic_read(&group->user_waits)); }
void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old) diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 79467b239fcf..3de53a2b8944 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -79,6 +79,7 @@ struct fsnotify_event; struct fsnotify_mark; struct fsnotify_event_private_data; struct fsnotify_fname; +struct fsnotify_iter_info;
/* * Each group much define these ops. The fsnotify infrastructure will call @@ -162,6 +163,8 @@ struct fsnotify_group { struct fsnotify_event *overflow_event; /* Event we queue when the * notification list is too * full */ + atomic_t user_waits; /* Number of tasks waiting for user + * response */
/* groups can define private fields here or use the void *private */ union { @@ -350,6 +353,8 @@ extern void fsnotify_clear_marks_by_group_flags(struct fsnotify_group *group, un extern void fsnotify_get_mark(struct fsnotify_mark *mark); extern void fsnotify_put_mark(struct fsnotify_mark *mark); extern void fsnotify_unmount_inodes(struct super_block *sb); +extern void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info); +extern bool fsnotify_prepare_user_wait(struct fsnotify_iter_info *iter_info);
/* put here because inotify does some weird stuff when destroying watches */ extern void fsnotify_init_event(struct fsnotify_event *event,
From: Jan Kara jack@suse.cz
commit 9385a84d7e1f658bb2d96ab798393e4b16268aaa upstream.
Pass fsnotify_iter_info into ->handle_event() handler so that it can release and reacquire SRCU lock via fsnotify_prepare_user_wait() and fsnotify_finish_user_wait() functions. These functions also make sure current marks are appropriately pinned so that iteration protected by srcu in fsnotify() stays safe.
Reviewed-by: Miklos Szeredi mszeredi@redhat.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz [mruffell: backport: removing const keyword and minor realignment] Signed-off-by: Matthew Ruffell matthew.ruffell@canonical.com --- fs/notify/dnotify/dnotify.c | 3 ++- fs/notify/fanotify/fanotify.c | 3 ++- fs/notify/fsnotify.c | 19 +++++++++++++------ fs/notify/inotify/inotify.h | 3 ++- fs/notify/inotify/inotify_fsnotify.c | 3 ++- fs/notify/inotify/inotify_user.c | 2 +- include/linux/fsnotify_backend.h | 3 ++- kernel/audit_fsnotify.c | 3 ++- kernel/audit_tree.c | 3 ++- kernel/audit_watch.c | 3 ++- 10 files changed, 30 insertions(+), 15 deletions(-)
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c index 6faaf710e563..264bfd99a694 100644 --- a/fs/notify/dnotify/dnotify.c +++ b/fs/notify/dnotify/dnotify.c @@ -86,7 +86,8 @@ static int dnotify_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *file_name, u32 cookie) + const unsigned char *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info) { struct dnotify_mark *dn_mark; struct dnotify_struct *dn; diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 8a459b179183..4944956cdbd9 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -174,7 +174,8 @@ static int fanotify_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *fanotify_mark, u32 mask, void *data, int data_type, - const unsigned char *file_name, u32 cookie) + const unsigned char *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info) { int ret = 0; struct fanotify_event_info *event; diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 56b4f855fa9b..da8c6674990d 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -131,7 +131,8 @@ static int send_to_group(struct inode *to_tell, struct fsnotify_mark *vfsmount_mark, __u32 mask, void *data, int data_is, u32 cookie, - const unsigned char *file_name) + const unsigned char *file_name, + struct fsnotify_iter_info *iter_info) { struct fsnotify_group *group = NULL; __u32 inode_test_mask = 0; @@ -182,7 +183,7 @@ static int send_to_group(struct inode *to_tell,
return group->ops->handle_event(group, to_tell, inode_mark, vfsmount_mark, mask, data, data_is, - file_name, cookie); + file_name, cookie, iter_info); }
/* @@ -197,8 +198,9 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, struct hlist_node *inode_node = NULL, *vfsmount_node = NULL; struct fsnotify_mark *inode_mark = NULL, *vfsmount_mark = NULL; struct fsnotify_group *inode_group, *vfsmount_group; + struct fsnotify_iter_info iter_info; struct mount *mnt; - int idx, ret = 0; + int ret = 0; /* global tests shouldn't care about events on child only the specific event */ __u32 test_mask = (mask & ~FS_EVENT_ON_CHILD);
@@ -231,7 +233,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, !(mnt && test_mask & mnt->mnt_fsnotify_mask)) return 0;
- idx = srcu_read_lock(&fsnotify_mark_srcu); + iter_info.srcu_idx = srcu_read_lock(&fsnotify_mark_srcu);
if ((mask & FS_MODIFY) || (test_mask & to_tell->i_fsnotify_mask)) @@ -280,8 +282,13 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, vfsmount_mark = NULL; } } + + iter_info.inode_mark = inode_mark; + iter_info.vfsmount_mark = vfsmount_mark; + ret = send_to_group(to_tell, inode_mark, vfsmount_mark, mask, - data, data_is, cookie, file_name); + data, data_is, cookie, file_name, + &iter_info);
if (ret && (mask & ALL_FSNOTIFY_PERM_EVENTS)) goto out; @@ -295,7 +302,7 @@ int fsnotify(struct inode *to_tell, __u32 mask, void *data, int data_is, } ret = 0; out: - srcu_read_unlock(&fsnotify_mark_srcu, idx); + srcu_read_unlock(&fsnotify_mark_srcu, iter_info.srcu_idx);
return ret; } diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h index ed855ef6f077..726b06b303b8 100644 --- a/fs/notify/inotify/inotify.h +++ b/fs/notify/inotify/inotify.h @@ -27,6 +27,7 @@ extern int inotify_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *file_name, u32 cookie); + const unsigned char *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info);
extern const struct fsnotify_ops inotify_fsnotify_ops; diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c index 2cd900c2c737..79a5f06b9100 100644 --- a/fs/notify/inotify/inotify_fsnotify.c +++ b/fs/notify/inotify/inotify_fsnotify.c @@ -67,7 +67,8 @@ int inotify_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *file_name, u32 cookie) + const unsigned char *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info) { struct inotify_inode_mark *i_mark; struct inotify_event_info *event; diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c index 69d1ea3d292a..25336bd8d9aa 100644 --- a/fs/notify/inotify/inotify_user.c +++ b/fs/notify/inotify/inotify_user.c @@ -494,7 +494,7 @@ void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
/* Queue ignore event for the watch */ inotify_handle_event(group, NULL, fsn_mark, NULL, FS_IN_IGNORED, - NULL, FSNOTIFY_EVENT_NONE, NULL, 0); + NULL, FSNOTIFY_EVENT_NONE, NULL, 0, NULL);
i_mark = container_of(fsn_mark, struct inotify_inode_mark, fsn_mark); /* remove this mark from the idr */ diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 3de53a2b8944..4174d7e2898a 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -98,7 +98,8 @@ struct fsnotify_ops { struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *file_name, u32 cookie); + const unsigned char *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info); void (*free_group_priv)(struct fsnotify_group *group); void (*freeing_mark)(struct fsnotify_mark *mark, struct fsnotify_group *group); void (*free_event)(struct fsnotify_event *event); diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c index f84f8d06e1f6..231c0f3cbf2b 100644 --- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -169,7 +169,8 @@ static int audit_mark_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *dname, u32 cookie) + const unsigned char *dname, u32 cookie, + struct fsnotify_iter_info *iter_info) { struct audit_fsnotify_mark *audit_mark; struct inode *inode = NULL; diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c index 25772476fa4a..b0bdebaf811d 100644 --- a/kernel/audit_tree.c +++ b/kernel/audit_tree.c @@ -949,7 +949,8 @@ static int audit_tree_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *file_name, u32 cookie) + const unsigned char *file_name, u32 cookie, + struct fsnotify_iter_info *iter_info) { return 0; } diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index f036b6ada6ef..160d52b662e2 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -485,7 +485,8 @@ static int audit_watch_handle_event(struct fsnotify_group *group, struct fsnotify_mark *inode_mark, struct fsnotify_mark *vfsmount_mark, u32 mask, void *data, int data_type, - const unsigned char *dname, u32 cookie) + const unsigned char *dname, u32 cookie, + struct fsnotify_iter_info *iter_info) { struct inode *inode; struct audit_parent *parent;
From: Jan Kara jack@suse.cz
commit 05f0e38724e8449184acd8fbf0473ee5a07adc6c upstream.
When userspace task processing fanotify permission events screws up and does not respond, fsnotify_mark_srcu SRCU is held indefinitely which causes further hangs in the whole notification subsystem. Although we cannot easily solve the problem of operations blocked waiting for response from userspace, we can at least somewhat localize the damage by dropping SRCU lock before waiting for userspace response and reacquiring it when userspace responds.
Reviewed-by: Miklos Szeredi mszeredi@redhat.com Reviewed-by: Amir Goldstein amir73il@gmail.com Signed-off-by: Jan Kara jack@suse.cz [mruffell: cherry picked] Signed-off-by: Matthew Ruffell matthew.ruffell@canonical.com --- fs/notify/fanotify/fanotify.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 4944956cdbd9..eeb5cc1f6978 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -61,14 +61,26 @@ static int fanotify_merge(struct list_head *list, struct fsnotify_event *event)
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS static int fanotify_get_response(struct fsnotify_group *group, - struct fanotify_perm_event_info *event) + struct fanotify_perm_event_info *event, + struct fsnotify_iter_info *iter_info) { int ret;
pr_debug("%s: group=%p event=%p\n", __func__, group, event);
+ /* + * fsnotify_prepare_user_wait() fails if we race with mark deletion. + * Just let the operation pass in that case. + */ + if (!fsnotify_prepare_user_wait(iter_info)) { + event->response = FAN_ALLOW; + goto out; + } + wait_event(group->fanotify_data.access_waitq, event->response);
+ fsnotify_finish_user_wait(iter_info); +out: /* userspace responded, convert to something usable */ switch (event->response) { case FAN_ALLOW: @@ -216,7 +228,8 @@ static int fanotify_handle_event(struct fsnotify_group *group,
#ifdef CONFIG_FANOTIFY_ACCESS_PERMISSIONS if (mask & FAN_ALL_PERM_EVENTS) { - ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event)); + ret = fanotify_get_response(group, FANOTIFY_PE(fsn_event), + iter_info); fsnotify_destroy_event(group, fsn_event); } #endif
On Thu, Apr 11, 2019 at 03:24:27PM +1200, Matthew Ruffell wrote:
BugLink: https://bugs.launchpad.net/bugs/1775165
[Note to upstream] I understand that this patch is a little long for -stable, but this patch series fixes a real issue, seen by real users, is testable, and is made up from upstream commits. Please consider it.
[Impact]
When userspace tasks which are processing fanotify permission events act incorrectly, the fsnotify_mark_srcu SRCU is held indefinitely which causes the whole notification subsystem to hang.
This has been seen in production, and it can also be seen when running the Linux Test Project testsuite, specifically fanotify07.
[Fix]
Instead of holding the SRCU lock while waiting for userspace to respond, which may never happen, or not in the order we are expecting, we drop the fsnotify_mark_srcu SRCU lock before waiting for userspace response, and then reacquire the lock again when userspace responds.
The fixes are from a series of upstream commits:
05f0e38724e8449184acd8fbf0473ee5a07adc6c (cherry-pick) 9385a84d7e1f658bb2d96ab798393e4b16268aaa (backport) abc77577a669f424c5d0c185b9994f2621c52aa4 (backport)
[Testcase]
You can reproduce the problem pretty quickly with the Linux Test Project:
Steps (with root):
- sudo apt-get install git xfsprogs -y
- git clone --depth=1 https://github.com/linux-test-project/ltp.git
- cd ltp
- make autotools
- ./configure
- make; make install
- cd /opt/ltp
- echo -e "fanotify07 fanotify07 \nfanotify08 fanotify08" > /tmp/jobs
- ./runltp -f /tmp/jobs
I've queued up this and the 4.4 patches, thank you!
-- Thanks, Sasha
linux-stable-mirror@lists.linaro.org