From: SeongJae Park sjpark@amazon.de
SeongJae Park (5): xen/xenbus: Allow watches discard events before queueing xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path() xen/xenbus/xen_bus_type: Support will_handle watch callback xen/xenbus: Count pending messages for each watch xenbus/xenbus_backend: Disallow pending watch messages
drivers/block/xen-blkback/xenbus.c | 3 +- drivers/net/xen-netback/xenbus.c | 4 ++- drivers/xen/xen-pciback/xenbus.c | 2 +- drivers/xen/xenbus/xenbus_client.c | 8 ++++- drivers/xen/xenbus/xenbus_probe.c | 1 + drivers/xen/xenbus/xenbus_probe.h | 2 ++ drivers/xen/xenbus/xenbus_probe_backend.c | 7 +++++ drivers/xen/xenbus/xenbus_xs.c | 38 +++++++++++++++-------- include/xen/xenbus.h | 15 ++++++++- 9 files changed, 62 insertions(+), 18 deletions(-)
From: SeongJae Park sjpark@amazon.de
If handling logics of watch events are slower than the events enqueue logic and the events can be created from the guests, the guests could trigger memory pressure by intensively inducing the events, because it will create a huge number of pending events that exhausting the memory. This is known as XSA-349.
Fortunately, some watch events could be ignored, depending on its handler callback. For example, if the callback has interest in only one single path, the watch wouldn't want multiple pending events. Or, some watches could ignore events to same path.
To let such watches to volutarily help avoiding the memory pressure situation, this commit introduces new watch callback, 'will_handle'. If it is not NULL, it will be called for each new event just before enqueuing it. Then, if the callback returns false, the event will be discarded. No watch is using the callback for now, though.
This is part of XSA-349
This is upstream commit fed1755b118147721f2c87b37b9d66e62c39b668
Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park sjpark@amazon.de Reported-by: Michael Kurth mku@amazon.de Reported-by: Pawel Wieczorkiewicz wipawel@amazon.de Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com --- drivers/net/xen-netback/xenbus.c | 2 ++ drivers/xen/xenbus/xenbus_client.c | 1 + drivers/xen/xenbus/xenbus_xs.c | 7 ++++++- include/xen/xenbus.h | 7 +++++++ 4 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index b44f37fff890..5cb9dbf62816 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -770,12 +770,14 @@ static int xen_register_credit_watch(struct xenbus_device *dev, return -ENOMEM; snprintf(node, maxlen, "%s/rate", dev->nodename); vif->credit_watch.node = node; + vif->credit_watch.will_handle = NULL; vif->credit_watch.callback = xen_net_rate_changed; err = register_xenbus_watch(&vif->credit_watch); if (err) { pr_err("Failed to set watcher %s\n", vif->credit_watch.node); kfree(node); vif->credit_watch.node = NULL; + vif->credit_watch.will_handle = NULL; vif->credit_watch.callback = NULL; } return err; diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index 266f446ba331..d02d25f784c9 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -120,6 +120,7 @@ int xenbus_watch_path(struct xenbus_device *dev, const char *path, int err;
watch->node = path; + watch->will_handle = NULL; watch->callback = callback;
err = register_xenbus_watch(watch); diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c index 22f7cd711c57..d200aa707988 100644 --- a/drivers/xen/xenbus/xenbus_xs.c +++ b/drivers/xen/xenbus/xenbus_xs.c @@ -901,7 +901,12 @@ static int process_msg(void) spin_lock(&watches_lock); msg->u.watch.handle = find_watch( msg->u.watch.vec[XS_WATCH_TOKEN]); - if (msg->u.watch.handle != NULL) { + if (msg->u.watch.handle != NULL && + (!msg->u.watch.handle->will_handle || + msg->u.watch.handle->will_handle( + msg->u.watch.handle, + (const char **)msg->u.watch.vec, + msg->u.watch.vec_size))) { spin_lock(&watch_events_lock); list_add_tail(&msg->list, &watch_events); wake_up(&watch_events_waitq); diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 32b944b7cebd..11697aa023b5 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -58,6 +58,13 @@ struct xenbus_watch /* Path being watched. */ const char *node;
+ /* + * Called just before enqueing new event while a spinlock is held. + * The event will be discarded if this callback returns false. + */ + bool (*will_handle)(struct xenbus_watch *, + const char **vec, unsigned int len); + /* Callback (executed in a process context with no locks held). */ void (*callback)(struct xenbus_watch *, const char **vec, unsigned int len);
From: SeongJae Park sjpark@amazon.de
Some code does not directly make 'xenbus_watch' object and call 'register_xenbus_watch()' but use 'xenbus_watch_path()' instead. This commit adds support of 'will_handle' callback in the 'xenbus_watch_path()' and it's wrapper, 'xenbus_watch_pathfmt()'.
This is part of XSA-349
This is upstream commit 2e85d32b1c865bec703ce0c962221a5e955c52c2
Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park sjpark@amazon.de Reported-by: Michael Kurth mku@amazon.de Reported-by: Pawel Wieczorkiewicz wipawel@amazon.de Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com --- drivers/block/xen-blkback/xenbus.c | 3 ++- drivers/net/xen-netback/xenbus.c | 2 +- drivers/xen/xen-pciback/xenbus.c | 2 +- drivers/xen/xenbus/xenbus_client.c | 9 +++++++-- drivers/xen/xenbus/xenbus_probe.c | 2 +- include/xen/xenbus.h | 6 +++++- 6 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 9a4ac6fd262a..27c9d7a5b4de 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -646,7 +646,8 @@ static int xen_blkbk_probe(struct xenbus_device *dev, /* setup back pointer */ be->blkif->be = be;
- err = xenbus_watch_pathfmt(dev, &be->backend_watch, backend_changed, + err = xenbus_watch_pathfmt(dev, &be->backend_watch, NULL, + backend_changed, "%s/%s", dev->nodename, "physical-device"); if (err) goto fail; diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c index 5cb9dbf62816..78788402edd8 100644 --- a/drivers/net/xen-netback/xenbus.c +++ b/drivers/net/xen-netback/xenbus.c @@ -1040,7 +1040,7 @@ static void connect(struct backend_info *be) xenvif_carrier_on(be->vif);
unregister_hotplug_status_watch(be); - err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, + err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch, NULL, hotplug_status_changed, "%s/%s", dev->nodename, "hotplug-status"); if (!err) diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c index f33eb40cb414..36ec99cff507 100644 --- a/drivers/xen/xen-pciback/xenbus.c +++ b/drivers/xen/xen-pciback/xenbus.c @@ -689,7 +689,7 @@ static int xen_pcibk_xenbus_probe(struct xenbus_device *dev,
/* watch the backend node for backend configuration information */ err = xenbus_watch_path(dev, dev->nodename, &pdev->be_watch, - xen_pcibk_be_watch); + NULL, xen_pcibk_be_watch); if (err) goto out;
diff --git a/drivers/xen/xenbus/xenbus_client.c b/drivers/xen/xenbus/xenbus_client.c index d02d25f784c9..8bbd887ca422 100644 --- a/drivers/xen/xenbus/xenbus_client.c +++ b/drivers/xen/xenbus/xenbus_client.c @@ -114,19 +114,22 @@ EXPORT_SYMBOL_GPL(xenbus_strstate); */ int xenbus_watch_path(struct xenbus_device *dev, const char *path, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char **, unsigned int), void (*callback)(struct xenbus_watch *, const char **, unsigned int)) { int err;
watch->node = path; - watch->will_handle = NULL; + watch->will_handle = will_handle; watch->callback = callback;
err = register_xenbus_watch(watch);
if (err) { watch->node = NULL; + watch->will_handle = NULL; watch->callback = NULL; xenbus_dev_fatal(dev, err, "adding watch on %s", path); } @@ -153,6 +156,8 @@ EXPORT_SYMBOL_GPL(xenbus_watch_path); */ int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char **, unsigned int), void (*callback)(struct xenbus_watch *, const char **, unsigned int), const char *pathfmt, ...) @@ -169,7 +174,7 @@ int xenbus_watch_pathfmt(struct xenbus_device *dev, xenbus_dev_fatal(dev, -ENOMEM, "allocating path for watch"); return -ENOMEM; } - err = xenbus_watch_path(dev, path, watch, callback); + err = xenbus_watch_path(dev, path, watch, will_handle, callback);
if (err) kfree(path); diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index c2d447687e33..c560c1b8489a 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -137,7 +137,7 @@ static int watch_otherend(struct xenbus_device *dev) container_of(dev->dev.bus, struct xen_bus_type, bus);
return xenbus_watch_pathfmt(dev, &dev->otherend_watch, - bus->otherend_changed, + NULL, bus->otherend_changed, "%s/%s", dev->otherend, "state"); }
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 11697aa023b5..1772507dc2c9 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -201,10 +201,14 @@ void xenbus_suspend_cancel(void);
int xenbus_watch_path(struct xenbus_device *dev, const char *path, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char **, unsigned int), void (*callback)(struct xenbus_watch *, const char **, unsigned int)); -__printf(4, 5) +__printf(5, 6) int xenbus_watch_pathfmt(struct xenbus_device *dev, struct xenbus_watch *watch, + bool (*will_handle)(struct xenbus_watch *, + const char **, unsigned int), void (*callback)(struct xenbus_watch *, const char **, unsigned int), const char *pathfmt, ...);
From: SeongJae Park sjpark@amazon.de
This commit adds support of the 'will_handle' watch callback for 'xen_bus_type' users.
This is part of XSA-349
This is upstream commit be987200fbaceaef340872841d4f7af2c5ee8dc3
Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park sjpark@amazon.de Reported-by: Michael Kurth mku@amazon.de Reported-by: Pawel Wieczorkiewicz wipawel@amazon.de Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com --- drivers/xen/xenbus/xenbus_probe.c | 3 ++- drivers/xen/xenbus/xenbus_probe.h | 2 ++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index c560c1b8489a..ba7590d75985 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -137,7 +137,8 @@ static int watch_otherend(struct xenbus_device *dev) container_of(dev->dev.bus, struct xen_bus_type, bus);
return xenbus_watch_pathfmt(dev, &dev->otherend_watch, - NULL, bus->otherend_changed, + bus->otherend_will_handle, + bus->otherend_changed, "%s/%s", dev->otherend, "state"); }
diff --git a/drivers/xen/xenbus/xenbus_probe.h b/drivers/xen/xenbus/xenbus_probe.h index c9ec7ca1f7ab..2c394c6ba605 100644 --- a/drivers/xen/xenbus/xenbus_probe.h +++ b/drivers/xen/xenbus/xenbus_probe.h @@ -42,6 +42,8 @@ struct xen_bus_type { int (*get_bus_id)(char bus_id[XEN_BUS_ID_SIZE], const char *nodename); int (*probe)(struct xen_bus_type *bus, const char *type, const char *dir); + bool (*otherend_will_handle)(struct xenbus_watch *watch, + const char **vec, unsigned int len); void (*otherend_changed)(struct xenbus_watch *watch, const char **vec, unsigned int len); struct bus_type bus;
From: SeongJae Park sjpark@amazon.de
This commit adds a counter of pending messages for each watch in the struct. It is used to skip unnecessary pending messages lookup in 'unregister_xenbus_watch()'. It could also be used in 'will_handle' callback.
This is part of XSA-349
This is upstream commit 3dc86ca6b4c8cfcba9da7996189d1b5a358a94fc
Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park sjpark@amazon.de Reported-by: Michael Kurth mku@amazon.de Reported-by: Pawel Wieczorkiewicz wipawel@amazon.de Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com --- drivers/xen/xenbus/xenbus_xs.c | 31 +++++++++++++++++++------------ include/xen/xenbus.h | 2 ++ 2 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c index d200aa707988..88b443637106 100644 --- a/drivers/xen/xenbus/xenbus_xs.c +++ b/drivers/xen/xenbus/xenbus_xs.c @@ -699,6 +699,8 @@ int register_xenbus_watch(struct xenbus_watch *watch)
sprintf(token, "%lX", (long)watch);
+ watch->nr_pending = 0; + down_read(&xs_state.watch_mutex);
spin_lock(&watches_lock); @@ -748,12 +750,15 @@ void unregister_xenbus_watch(struct xenbus_watch *watch)
/* Cancel pending watch events. */ spin_lock(&watch_events_lock); - list_for_each_entry_safe(msg, tmp, &watch_events, list) { - if (msg->u.watch.handle != watch) - continue; - list_del(&msg->list); - kfree(msg->u.watch.vec); - kfree(msg); + if (watch->nr_pending) { + list_for_each_entry_safe(msg, tmp, &watch_events, list) { + if (msg->u.watch.handle != watch) + continue; + list_del(&msg->list); + kfree(msg->u.watch.vec); + kfree(msg); + } + watch->nr_pending = 0; } spin_unlock(&watch_events_lock);
@@ -800,7 +805,6 @@ void xs_suspend_cancel(void)
static int xenwatch_thread(void *unused) { - struct list_head *ent; struct xs_stored_msg *msg;
for (;;) { @@ -813,13 +817,15 @@ static int xenwatch_thread(void *unused) mutex_lock(&xenwatch_mutex);
spin_lock(&watch_events_lock); - ent = watch_events.next; - if (ent != &watch_events) - list_del(ent); + msg = list_first_entry_or_null(&watch_events, + struct xs_stored_msg, list); + if (msg) { + list_del(&msg->list); + msg->u.watch.handle->nr_pending--; + } spin_unlock(&watch_events_lock);
- if (ent != &watch_events) { - msg = list_entry(ent, struct xs_stored_msg, list); + if (msg) { msg->u.watch.handle->callback( msg->u.watch.handle, (const char **)msg->u.watch.vec, @@ -909,6 +915,7 @@ static int process_msg(void) msg->u.watch.vec_size))) { spin_lock(&watch_events_lock); list_add_tail(&msg->list, &watch_events); + msg->u.watch.handle->nr_pending++; wake_up(&watch_events_waitq); spin_unlock(&watch_events_lock); } else { diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h index 1772507dc2c9..ed9e7e3307b7 100644 --- a/include/xen/xenbus.h +++ b/include/xen/xenbus.h @@ -58,6 +58,8 @@ struct xenbus_watch /* Path being watched. */ const char *node;
+ unsigned int nr_pending; + /* * Called just before enqueing new event while a spinlock is held. * The event will be discarded if this callback returns false.
From: SeongJae Park sjpark@amazon.de
'xenbus_backend' watches 'state' of devices, which is writable by guests. Hence, if guests intensively updates it, dom0 will have lots of pending events that exhausting memory of dom0. In other words, guests can trigger dom0 memory pressure. This is known as XSA-349. However, the watch callback of it, 'frontend_changed()', reads only 'state', so doesn't need to have the pending events.
To avoid the problem, this commit disallows pending watch messages for 'xenbus_backend' using the 'will_handle()' watch callback.
This is part of XSA-349
This is upstream commit 9996bd494794a2fe393e97e7a982388c6249aa76
Cc: stable@vger.kernel.org Signed-off-by: SeongJae Park sjpark@amazon.de Reported-by: Michael Kurth mku@amazon.de Reported-by: Pawel Wieczorkiewicz wipawel@amazon.de Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Juergen Gross jgross@suse.com --- drivers/xen/xenbus/xenbus_probe_backend.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c b/drivers/xen/xenbus/xenbus_probe_backend.c index 04f7f85a5edf..597c0b038454 100644 --- a/drivers/xen/xenbus/xenbus_probe_backend.c +++ b/drivers/xen/xenbus/xenbus_probe_backend.c @@ -181,6 +181,12 @@ static int xenbus_probe_backend(struct xen_bus_type *bus, const char *type, return err; }
+static bool frontend_will_handle(struct xenbus_watch *watch, + const char **vec, unsigned int len) +{ + return watch->nr_pending == 0; +} + static void frontend_changed(struct xenbus_watch *watch, const char **vec, unsigned int len) { @@ -192,6 +198,7 @@ static struct xen_bus_type xenbus_backend = { .levels = 3, /* backend/type/<frontend>/<id> */ .get_bus_id = backend_bus_id, .probe = xenbus_probe_backend, + .otherend_will_handle = frontend_will_handle, .otherend_changed = frontend_changed, .bus = { .name = "xen-backend",
On Tue, Jan 05, 2021 at 12:01:37PM +0100, SeongJae Park wrote:
From: SeongJae Park sjpark@amazon.de
SeongJae Park (5): xen/xenbus: Allow watches discard events before queueing xen/xenbus: Add 'will_handle' callback support in xenbus_watch_path() xen/xenbus/xen_bus_type: Support will_handle watch callback xen/xenbus: Count pending messages for each watch xenbus/xenbus_backend: Disallow pending watch messages
Now queued up, thanks.
greg k-h
linux-stable-mirror@lists.linaro.org