This v2 rebases onto the latest linux-next tag, next-20210701. A few changes were needed, namely:
1) changes kernfs_init_failure_injection() to return int instead of void. On the latest linux-next we have a new static build check for this, so this mistake was captured when building.
2) I made kernfs_init_failure_injection static
3) lib/test_sysfs.c moved to the new blk_alloc_disk() added by Christoph as direct queue allocation is no longer supported, ie, blk_alloc_queue() is no longer exported. This work was done by Christoph in preparation to help make add_disk*() callers eventually return an error code and make the error handling much saner. Because of this same change blk_cleanup_queue() is no longer needed so we embrace the shiny new blk_cleanup_disk().
I've put this up on my linux-next git tree [0] under the branch named 20210701-sysfs-fix-races-v2.
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux-next.git/log/?h...
Luis Chamberlain (4): selftests: add tests_sysfs module kernfs: add initial failure injection support test_sysfs: add support to use kernfs failure injection test_sysfs: demonstrate deadlock fix
.../fault-injection/fault-injection.rst | 22 + MAINTAINERS | 9 +- fs/kernfs/Makefile | 1 + fs/kernfs/failure-injection.c | 83 + fs/kernfs/file.c | 13 + fs/kernfs/kernfs-internal.h | 72 + include/linux/kernfs.h | 5 + lib/Kconfig.debug | 23 + lib/Makefile | 1 + lib/test_sysfs.c | 1027 ++++++++++++ tools/testing/selftests/sysfs/Makefile | 12 + tools/testing/selftests/sysfs/config | 5 + tools/testing/selftests/sysfs/sysfs.sh | 1376 +++++++++++++++++ 13 files changed, 2648 insertions(+), 1 deletion(-) create mode 100644 fs/kernfs/failure-injection.c create mode 100644 lib/test_sysfs.c create mode 100644 tools/testing/selftests/sysfs/Makefile create mode 100644 tools/testing/selftests/sysfs/config create mode 100755 tools/testing/selftests/sysfs/sysfs.sh
This adds a new selftest module which can be used to test sysfs, which would otherwise require using an existing driver. This lets us muck with a template driver to test breaking things without affecting system behaviour or requiring the dependencies of a real device driver.
A series of 28 tests are added. Support for using two device types are supported:
* misc * block
Contrary to sysctls, sysfs requires a full write to happen at once, and so we reduce the digit tests to single writes. Two main sysfs knobs are provided for testing reading/storing, one which doesn't inclur any delays and another which can incur programmed delays. What locks are held, if any, are configurable, at module load time, or through dynamic configuration at run time.
Since sysfs is a technically filesystem, but a pseudo one, which requires a kernel user, our test_sysfs module and respective test script embraces fstests format for tests in the kernel ring bufffer. Likewise, a scraper for kernel crashes is provided which matches what fstests does as well.
Two tests are kept disabled as they currently cause a deadlock, and so this provides a mechanism to easily show proof and demo how the deadlock can happen:
Demos the deadlock with a device specific lock ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
Demos the deadlock with rtnl_lock() ./tools/testing/selftests/sysfs/sysfs.sh -t 0028
Two separate solutions to the deadlock issue have been proposed, and so now its a matter of either documenting this limitation or eventually adopting a generic fix.
This selftests will shortly be expanded upon with more tests which require further kernel changes in order to provide better test coverage.
Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- MAINTAINERS | 7 + lib/Kconfig.debug | 10 + lib/Makefile | 1 + lib/test_sysfs.c | 943 +++++++++++++++++++ tools/testing/selftests/sysfs/Makefile | 12 + tools/testing/selftests/sysfs/config | 2 + tools/testing/selftests/sysfs/sysfs.sh | 1202 ++++++++++++++++++++++++ 7 files changed, 2177 insertions(+) create mode 100644 lib/test_sysfs.c create mode 100644 tools/testing/selftests/sysfs/Makefile create mode 100644 tools/testing/selftests/sysfs/config create mode 100755 tools/testing/selftests/sysfs/sysfs.sh
diff --git a/MAINTAINERS b/MAINTAINERS index 66d047dc6880..fd369ed50040 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17958,6 +17958,13 @@ L: linux-mmc@vger.kernel.org S: Maintained F: drivers/mmc/host/sdhci-pci-dwc-mshc.c
+SYSFS TEST DRIVER +M: Luis Chamberlain mcgrof@kernel.org +L: linux-kernel@vger.kernel.org +S: Maintained +F: lib/test_sysfs.c +F: tools/testing/selftests/sysfs/ + SYSTEM CONFIGURATION (SYSCON) M: Lee Jones lee.jones@linaro.org M: Arnd Bergmann arnd@arndb.de diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index fb370c7c4756..568838ac1189 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2360,6 +2360,16 @@ config TEST_SYSCTL
If unsure, say N.
+config TEST_SYSFS + tristate "sysfs test driver" + depends on SYSFS + help + This builds the "test_sysfs" module. This driver enables to test the + sysfs file system safely without affecting production knobs which + might alter system functionality. + + If unsure, say N. + config BITFIELD_KUNIT tristate "KUnit test bitfield functions at runtime" depends on KUNIT diff --git a/lib/Makefile b/lib/Makefile index 5efd1b435a37..effd1ef806f0 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -61,6 +61,7 @@ obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o obj-$(CONFIG_TEST_BITOPS) += test_bitops.o CFLAGS_test_bitops.o += -Werror obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o +obj-$(CONFIG_TEST_SYSFS) += test_sysfs.o obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o obj-$(CONFIG_TEST_IDA) += test_ida.o obj-$(CONFIG_KASAN_KUNIT_TEST) += test_kasan.o diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c new file mode 100644 index 000000000000..bf43016d40b5 --- /dev/null +++ b/lib/test_sysfs.c @@ -0,0 +1,943 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * sysfs test driver + * + * Copyright (C) 2021 Luis Chamberlain mcgrof@kernel.org + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the Free + * Software Foundation; either version 2 of the License, or at your option any + * later version; or, when distributed separately from the Linux kernel or + * when incorporated into other software packages, subject to the following + * license: + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of copyleft-next (version 0.3.1 or later) as published + * at http://copyleft-next.org/. + */ + +/* + * This module allows us to add race conditions which we can test for + * against the sysfs filesystem. + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include <linux/init.h> +#include <linux/list.h> +#include <linux/module.h> +#include <linux/printk.h> +#include <linux/fs.h> +#include <linux/miscdevice.h> +#include <linux/slab.h> +#include <linux/uaccess.h> +#include <linux/async.h> +#include <linux/delay.h> +#include <linux/vmalloc.h> +#include <linux/debugfs.h> +#include <linux/rtnetlink.h> +#include <linux/genhd.h> +#include <linux/blkdev.h> + +static bool enable_lock; +module_param(enable_lock, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_lock, + "enable locking on reads / stores from the start"); + +static bool enable_lock_on_rmmod; +module_param(enable_lock_on_rmmod, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_lock_on_rmmod, + "enable locking on rmmod"); + +static bool use_rtnl_lock; +module_param(use_rtnl_lock, bool_enable_only, 0644); +MODULE_PARM_DESC(use_rtnl_lock, + "use an rtnl_lock instead of the device mutex_lock"); + +static unsigned int write_delay_msec_y = 500; +module_param_named(write_delay_msec_y, write_delay_msec_y, uint, 0644); +MODULE_PARM_DESC(write_delay_msec_y, "msec write delay for writes to y"); + +static unsigned int test_devtype; +module_param_named(devtype, test_devtype, uint, 0644); +MODULE_PARM_DESC(devtype, "device type to register"); + +static bool enable_busy_alloc; +module_param(enable_busy_alloc, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_busy_alloc, "do a fake allocation during writes"); + +static bool enable_debugfs; +module_param(enable_debugfs, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_debugfs, "enable a few debugfs files"); + +static bool enable_verbose_writes; +module_param(enable_verbose_writes, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_debugfs, "enable stores to print verbose information"); + +static unsigned int delay_rmmod_ms; +module_param_named(delay_rmmod_ms, delay_rmmod_ms, uint, 0644); +MODULE_PARM_DESC(delay_rmmod_ms, "if set how many ms to delay rmmod before device deletion"); + +static bool enable_verbose_rmmod; +module_param(enable_verbose_rmmod, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_verbose_rmmod, "enable verbose print messages on rmmod"); + +static int sysfs_test_major; + +/** + * test_config - used for configuring how the sysfs test device will behave + * + * @enable_lock: if enabled a lock will be used when reading/storing variables + * @enable_lock_on_rmmod: if enabled a lock will be used when reading/storing + * sysfs attributes, but it will also be used to lock on rmmod. This is + * useful to test for a deadlock. + * @use_rtnl_lock: if enabled instead of configuration specific mutex, we'll + * use the rtnl_lock. If your test case is modifying this on the fly + * while doing other stores / reads, things will break as a lock can be + * left contending. Best is that tests use this knob serially, without + * allowing userspace to modify other knobs while this one changes. + * @write_delay_msec_y: the amount of delay to use when writing to y + * @enable_busy_alloc: if enabled we'll do a large allocation between + * writes. We immediately free right away. We also schedule to give the + * kernel some time to re-use any memory we don't need. This is intened + * to mimic typical driver behaviour. + */ +struct test_config { + bool enable_lock; + bool enable_lock_on_rmmod; + bool use_rtnl_lock; + unsigned int write_delay_msec_y; + bool enable_busy_alloc; +}; + +/** + * enum sysfs_test_devtype - sysfs device type + * @TESTDEV_TYPE_MISC: misc device type + * @TESTDEV_TYPE_BLOCK: use a block device for the sysfs test device. + */ +enum sysfs_test_devtype { + TESTDEV_TYPE_MISC = 0, + TESTDEV_TYPE_BLOCK, +}; + +/** + * sysfs_test_device - test device to help test sysfs + * + * @devtype: the type of device to use + * @config: configuration for the test + * @config_mutex: protects configuration of test + * @misc_dev: we use a misc device under the hood + * @disk: represents a disk when used as a block device + * @dev: pointer to misc_dev's own struct device + * @dev_idx: unique ID for test device + * @x: variable we can use to test read / store + * @y: slow variable we can use to test read / store + */ +struct sysfs_test_device { + enum sysfs_test_devtype devtype; + struct test_config config; + struct mutex config_mutex; + struct miscdevice misc_dev; + struct gendisk *disk; + struct device *dev; + int dev_idx; + int x; + int y; +}; + +struct sysfs_test_device *first_test_dev; + +static struct miscdevice *dev_to_misc_dev(struct device *dev) +{ + return dev_get_drvdata(dev); +} + +static struct sysfs_test_device *misc_dev_to_test_dev(struct miscdevice *misc_dev) +{ + return container_of(misc_dev, struct sysfs_test_device, misc_dev); +} + +static struct sysfs_test_device *devblock_to_test_dev(struct device *dev) +{ + return (struct sysfs_test_device *)dev_to_disk(dev)->private_data; +} + +static struct sysfs_test_device *devmisc_to_testdev(struct device *dev) +{ + struct miscdevice *misc_dev; + + misc_dev = dev_to_misc_dev(dev); + return misc_dev_to_test_dev(misc_dev); +} + +static struct sysfs_test_device *dev_to_test_dev(struct device *dev) +{ + if (test_devtype == TESTDEV_TYPE_MISC) + return devmisc_to_testdev(dev); + else if (test_devtype == TESTDEV_TYPE_BLOCK) + return devblock_to_test_dev(dev); + return NULL; +} + +static void test_dev_config_lock(struct sysfs_test_device *test_dev) +{ + struct test_config *config; + + config = &test_dev->config; + if (config->enable_lock) { + if (config->use_rtnl_lock) + rtnl_lock(); + else + mutex_lock(&test_dev->config_mutex); + } +} + +static void test_dev_config_unlock(struct sysfs_test_device *test_dev) +{ + struct test_config *config; + + config = &test_dev->config; + if (config->enable_lock) { + if (config->use_rtnl_lock) + rtnl_unlock(); + else + mutex_unlock(&test_dev->config_mutex); + } +} + +static void test_dev_config_lock_rmmod(struct sysfs_test_device *test_dev) +{ + struct test_config *config; + + config = &test_dev->config; + if (config->enable_lock_on_rmmod) + test_dev_config_lock(test_dev); +} + +static void test_dev_config_unlock_rmmod(struct sysfs_test_device *test_dev) +{ + struct test_config *config; + + config = &test_dev->config; + if (config->enable_lock_on_rmmod) + test_dev_config_unlock(test_dev); +} + +static void free_test_dev_sysfs(struct sysfs_test_device *test_dev) +{ + if (test_dev) { + kfree_const(test_dev->misc_dev.name); + test_dev->misc_dev.name = NULL; + kfree(test_dev); + test_dev = NULL; + } +} + +static void test_sysfs_reset_vals(struct sysfs_test_device *test_dev) +{ + test_dev->x = 3; + test_dev->y = 4; +} + +static ssize_t config_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config = &test_dev->config; + int len = 0; + + test_dev_config_lock(test_dev); + + len += snprintf(buf, PAGE_SIZE, + "Configuration for: %s\n", + dev_name(dev)); + + len += snprintf(buf+len, PAGE_SIZE - len, + "x:\t%d\n", + test_dev->x); + + len += snprintf(buf+len, PAGE_SIZE - len, + "y:\t%d\n", + test_dev->y); + + len += snprintf(buf+len, PAGE_SIZE - len, + "enable_lock:\t%s\n", + config->enable_lock ? "true" : "false"); + + len += snprintf(buf+len, PAGE_SIZE - len, + "enable_lock_on_rmmmod:\t%s\n", + config->enable_lock_on_rmmod ? "true" : "false"); + + len += snprintf(buf+len, PAGE_SIZE - len, + "use_rtnl_lock:\t%s\n", + config->use_rtnl_lock ? "true" : "false"); + + len += snprintf(buf+len, PAGE_SIZE - len, + "write_delay_msec_y:\t%d\n", + config->write_delay_msec_y); + + len += snprintf(buf+len, PAGE_SIZE - len, + "enable_busy_alloc:\t%s\n", + config->enable_busy_alloc ? "true" : "false"); + + len += snprintf(buf+len, PAGE_SIZE - len, + "enable_debugfs:\t%s\n", + enable_debugfs ? "true" : "false"); + + len += snprintf(buf+len, PAGE_SIZE - len, + "enable_verbose_writes:\t%s\n", + enable_verbose_writes ? "true" : "false"); + + test_dev_config_unlock(test_dev); + + return len; +} +static DEVICE_ATTR_RO(config); + +static ssize_t reset_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + + config = &test_dev->config; + + /* + * We compromise and simplify this condition and do not use a lock + * here as the lock type can change. + */ + config->enable_lock = false; + config->enable_lock_on_rmmod = false; + config->use_rtnl_lock = false; + config->enable_busy_alloc = false; + test_sysfs_reset_vals(test_dev); + + dev_info(dev, "reset\n"); + + return count; +} +static DEVICE_ATTR_WO(reset); + +static void test_dev_busy_alloc(struct sysfs_test_device *test_dev) +{ + struct test_config *config; + char *ignore; + + config = &test_dev->config; + if (!config->enable_busy_alloc) + return; + + ignore = kzalloc(sizeof(struct sysfs_test_device) * 10, GFP_KERNEL); + kfree(ignore); + + schedule(); +} + +static ssize_t test_dev_x_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + int ret; + + test_dev_busy_alloc(test_dev); + test_dev_config_lock(test_dev); + + ret = kstrtoint(buf, 10, &test_dev->x); + if (ret) + count = ret; + + if (enable_verbose_writes) + dev_info(test_dev->dev, "wrote x = %d\n", test_dev->x); + + test_dev_config_unlock(test_dev); + + return count; +} + +static ssize_t test_dev_x_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + int ret; + + test_dev_config_lock(test_dev); + ret = snprintf(buf, PAGE_SIZE, "%d\n", test_dev->x); + test_dev_config_unlock(test_dev); + + return ret; +} +static DEVICE_ATTR_RW(test_dev_x); + +static ssize_t test_dev_y_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + int y; + int ret; + + test_dev_busy_alloc(test_dev); + test_dev_config_lock(test_dev); + + config = &test_dev->config; + + ret = kstrtoint(buf, 10, &y); + if (ret) + count = ret; + + msleep(config->write_delay_msec_y); + test_dev->y = test_dev->x + y + 7; + + if (enable_verbose_writes) + dev_info(test_dev->dev, "wrote y = %d\n", test_dev->y); + + test_dev_config_unlock(test_dev); + + return count; +} + +static ssize_t test_dev_y_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + int ret; + + test_dev_config_lock(test_dev); + ret = snprintf(buf, PAGE_SIZE, "%d\n", test_dev->y); + test_dev_config_unlock(test_dev); + + return ret; +} +static DEVICE_ATTR_RW(test_dev_y); + +static ssize_t config_enable_lock_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + int ret; + int val; + + config = &test_dev->config; + ret = kstrtoint(buf, 10, &val); + if (ret) + return ret; + + /* + * We compromise for simplicty and do not lock when changing + * locking configuration, with the assumption userspace tests + * will know this. + */ + if (val) + config->enable_lock = true; + else + config->enable_lock = false; + + return count; +} + +static ssize_t config_enable_lock_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + ssize_t ret; + + config = &test_dev->config; + + test_dev_config_lock(test_dev); + ret = snprintf(buf, PAGE_SIZE, "%d\n", config->enable_lock); + test_dev_config_unlock(test_dev); + + return ret; +} +static DEVICE_ATTR_RW(config_enable_lock); + +static ssize_t config_enable_lock_on_rmmod_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + int ret; + int val; + + config = &test_dev->config; + ret = kstrtoint(buf, 10, &val); + if (ret) + return ret; + + test_dev_config_lock(test_dev); + if (val) + config->enable_lock_on_rmmod = true; + else + config->enable_lock_on_rmmod = false; + test_dev_config_unlock(test_dev); + + return count; +} + +static ssize_t config_enable_lock_on_rmmod_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + ssize_t ret; + + config = &test_dev->config; + + test_dev_config_lock(test_dev); + ret = snprintf(buf, PAGE_SIZE, "%d\n", config->enable_lock_on_rmmod); + test_dev_config_unlock(test_dev); + + return ret; +} +static DEVICE_ATTR_RW(config_enable_lock_on_rmmod); + +static ssize_t config_use_rtnl_lock_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + int ret; + int val; + + config = &test_dev->config; + ret = kstrtoint(buf, 10, &val); + if (ret) + return ret; + + /* + * We compromise and simplify this condition and do not use a lock + * here as the lock type can change. + */ + if (val) + config->use_rtnl_lock = true; + else + config->use_rtnl_lock = false; + + return count; +} + +static ssize_t config_use_rtnl_lock_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + + config = &test_dev->config; + + return snprintf(buf, PAGE_SIZE, "%d\n", config->use_rtnl_lock); +} +static DEVICE_ATTR_RW(config_use_rtnl_lock); + +static ssize_t config_write_delay_msec_y_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + int ret; + int val; + + config = &test_dev->config; + ret = kstrtoint(buf, 10, &val); + if (ret) + return ret; + + test_dev_config_lock(test_dev); + config->write_delay_msec_y = val; + test_dev_config_unlock(test_dev); + + return count; +} + +static ssize_t config_write_delay_msec_y_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + + config = &test_dev->config; + + return snprintf(buf, PAGE_SIZE, "%d\n", config->write_delay_msec_y); +} +static DEVICE_ATTR_RW(config_write_delay_msec_y); + +static ssize_t config_enable_busy_alloc_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + int ret; + int val; + + config = &test_dev->config; + ret = kstrtoint(buf, 10, &val); + if (ret) + return ret; + + test_dev_config_lock(test_dev); + config->enable_busy_alloc = val; + test_dev_config_unlock(test_dev); + + return count; +} + +static ssize_t config_enable_busy_alloc_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct sysfs_test_device *test_dev = dev_to_test_dev(dev); + struct test_config *config; + + config = &test_dev->config; + + return snprintf(buf, PAGE_SIZE, "%d\n", config->enable_busy_alloc); +} +static DEVICE_ATTR_RW(config_enable_busy_alloc); + +#define TEST_SYSFS_DEV_ATTR(name) (&dev_attr_##name.attr) + +static struct attribute *test_dev_attrs[] = { + /* Generic driver knobs go here */ + TEST_SYSFS_DEV_ATTR(config), + TEST_SYSFS_DEV_ATTR(reset), + + /* These are used to test sysfs */ + TEST_SYSFS_DEV_ATTR(test_dev_x), + TEST_SYSFS_DEV_ATTR(test_dev_y), + + /* + * These are configuration knobs to modify how we test sysfs when + * doing reads / stores. + */ + TEST_SYSFS_DEV_ATTR(config_enable_lock), + TEST_SYSFS_DEV_ATTR(config_enable_lock_on_rmmod), + TEST_SYSFS_DEV_ATTR(config_use_rtnl_lock), + TEST_SYSFS_DEV_ATTR(config_write_delay_msec_y), + TEST_SYSFS_DEV_ATTR(config_enable_busy_alloc), + + NULL, +}; + +ATTRIBUTE_GROUPS(test_dev); + +static int sysfs_test_dev_alloc_miscdev(struct sysfs_test_device *test_dev) +{ + struct miscdevice *misc_dev; + + misc_dev = &test_dev->misc_dev; + misc_dev->minor = MISC_DYNAMIC_MINOR; + misc_dev->name = kasprintf(GFP_KERNEL, "test_sysfs%d", test_dev->dev_idx); + if (!misc_dev->name) { + pr_err("Cannot alloc misc_dev->name\n"); + return -ENOMEM; + } + misc_dev->groups = test_dev_groups; + + return 0; +} + +static int testdev_open(struct block_device *bdev, fmode_t mode) +{ + return -EINVAL; +} + +static blk_qc_t testdev_submit_bio(struct bio *bio) +{ + return BLK_QC_T_NONE; +} + +static void testdev_slot_free_notify(struct block_device *bdev, + unsigned long index) +{ +} + +static int testdev_rw_page(struct block_device *bdev, sector_t sector, + struct page *page, unsigned int op) +{ + return -EOPNOTSUPP; +} + +static const struct block_device_operations sysfs_testdev_ops = { + .open = testdev_open, + .submit_bio = testdev_submit_bio, + .swap_slot_free_notify = testdev_slot_free_notify, + .rw_page = testdev_rw_page, + .owner = THIS_MODULE +}; + +static int sysfs_test_dev_alloc_blockdev(struct sysfs_test_device *test_dev) +{ + int ret = -ENOMEM; + + test_dev->disk = blk_alloc_disk(NUMA_NO_NODE); + if (!test_dev->disk) { + pr_err("Error allocating disk structure for device %d\n", + test_dev->dev_idx); + goto out; + } + + test_dev->disk->major = sysfs_test_major; + test_dev->disk->first_minor = test_dev->dev_idx + 1; + test_dev->disk->fops = &sysfs_testdev_ops; + test_dev->disk->private_data = test_dev; + snprintf(test_dev->disk->disk_name, 16, "test_sysfs%d", + test_dev->dev_idx); + set_capacity(test_dev->disk, 0); + blk_queue_flag_set(QUEUE_FLAG_NONROT, test_dev->disk->queue); + blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, test_dev->disk->queue); + blk_queue_physical_block_size(test_dev->disk->queue, PAGE_SIZE); + blk_queue_max_discard_sectors(test_dev->disk->queue, UINT_MAX); + blk_queue_flag_set(QUEUE_FLAG_DISCARD, test_dev->disk->queue); + + return 0; +out: + return ret; +} + +static struct sysfs_test_device *alloc_test_dev_sysfs(int idx) +{ + struct sysfs_test_device *test_dev; + int ret; + + switch (test_devtype) { + case TESTDEV_TYPE_MISC: + fallthrough; + case TESTDEV_TYPE_BLOCK: + break; + default: + return NULL; + } + + test_dev = kzalloc(sizeof(struct sysfs_test_device), GFP_KERNEL); + if (!test_dev) + goto err_out; + + mutex_init(&test_dev->config_mutex); + test_dev->dev_idx = idx; + test_dev->devtype = test_devtype; + + if (test_dev->devtype == TESTDEV_TYPE_MISC) { + ret = sysfs_test_dev_alloc_miscdev(test_dev); + if (ret) + goto err_out_free; + } else if (test_dev->devtype == TESTDEV_TYPE_BLOCK) { + ret = sysfs_test_dev_alloc_blockdev(test_dev); + if (ret) + goto err_out_free; + } + return test_dev; + +err_out_free: + kfree(test_dev); + test_dev = NULL; +err_out: + return NULL; +} + +static int register_test_dev_sysfs_misc(struct sysfs_test_device *test_dev) +{ + int ret; + + ret = misc_register(&test_dev->misc_dev); + if (ret) + return ret; + + test_dev->dev = test_dev->misc_dev.this_device; + + return 0; +} + +static int register_test_dev_sysfs_block(struct sysfs_test_device *test_dev) +{ + device_add_disk(NULL, test_dev->disk, test_dev_groups); + test_dev->dev = disk_to_dev(test_dev->disk); + + return 0; +} + +static struct sysfs_test_device *register_test_dev_sysfs(void) +{ + struct sysfs_test_device *test_dev = NULL; + int ret; + + test_dev = alloc_test_dev_sysfs(0); + if (!test_dev) + goto out; + + if (test_dev->devtype == TESTDEV_TYPE_MISC) { + ret = register_test_dev_sysfs_misc(test_dev); + if (ret) { + pr_err("could not register misc device: %d\n", ret); + goto out_free_dev; + } + } else if (test_dev->devtype == TESTDEV_TYPE_BLOCK) { + ret = register_test_dev_sysfs_block(test_dev); + if (ret) { + pr_err("could not register block device: %d\n", ret); + goto out_free_dev; + } + } + + dev_info(test_dev->dev, "interface ready\n"); + +out: + return test_dev; +out_free_dev: + free_test_dev_sysfs(test_dev); + return NULL; +} + +static struct sysfs_test_device *register_test_dev_set_config(void) +{ + struct sysfs_test_device *test_dev; + struct test_config *config; + + test_dev = register_test_dev_sysfs(); + if (!test_dev) + return NULL; + + config = &test_dev->config; + + if (enable_lock) + config->enable_lock = true; + if (enable_lock_on_rmmod) + config->enable_lock_on_rmmod = true; + if (use_rtnl_lock) + config->use_rtnl_lock = true; + if (enable_busy_alloc) + config->enable_busy_alloc = true; + + config->write_delay_msec_y = write_delay_msec_y; + test_sysfs_reset_vals(test_dev); + + return test_dev; +} + +static void unregister_test_dev_sysfs_misc(struct sysfs_test_device *test_dev) +{ + misc_deregister(&test_dev->misc_dev); +} + +static void unregister_test_dev_sysfs_block(struct sysfs_test_device *test_dev) +{ + del_gendisk(test_dev->disk); + blk_cleanup_disk(test_dev->disk); +} + +static void unregister_test_dev_sysfs(struct sysfs_test_device *test_dev) +{ + test_dev_config_lock_rmmod(test_dev); + + dev_info(test_dev->dev, "removing interface\n"); + + if (test_dev->devtype == TESTDEV_TYPE_MISC) + unregister_test_dev_sysfs_misc(test_dev); + else if (test_dev->devtype == TESTDEV_TYPE_BLOCK) + unregister_test_dev_sysfs_block(test_dev); + + test_dev_config_unlock_rmmod(test_dev); + + free_test_dev_sysfs(test_dev); +} + +static struct dentry *debugfs_dir; + +/* When read represents how many times we have reset the first_test_dev */ +static u8 reset_first_test_dev; + +static ssize_t read_reset_first_test_dev(struct file *file, + char __user *user_buf, + size_t count, loff_t *ppos) +{ + ssize_t len; + char buf[32]; + + reset_first_test_dev++; + len = sprintf(buf, "%d\n", reset_first_test_dev); + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static ssize_t write_reset_first_test_dev(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + if (!try_module_get(THIS_MODULE)) + return -ENODEV; + + if (!first_test_dev) { + module_put(THIS_MODULE); + return -ENODEV; + } + + dev_info(first_test_dev->dev, "going to reset first interface ...\n"); + + unregister_test_dev_sysfs(first_test_dev); + first_test_dev = register_test_dev_set_config(); + + dev_info(first_test_dev->dev, "first interface reset complete\n"); + + module_put(THIS_MODULE); + + return count; +} + +static const struct file_operations fops_reset_first_test_dev = { + .read = read_reset_first_test_dev, + .write = write_reset_first_test_dev, + .open = simple_open, + .owner = THIS_MODULE, + .llseek = default_llseek, +}; + +static int __init test_sysfs_init(void) +{ + first_test_dev = register_test_dev_set_config(); + if (!first_test_dev) + return -ENOMEM; + + if (!enable_debugfs) + return 0; + + debugfs_dir = debugfs_create_dir("test_sysfs", NULL); + if (!debugfs_dir) { + unregister_test_dev_sysfs(first_test_dev); + return -ENOMEM; + } + + debugfs_create_file("reset_first_test_dev", 0600, debugfs_dir, + NULL, &fops_reset_first_test_dev); + return 0; +} +module_init(test_sysfs_init); + +static void __exit test_sysfs_exit(void) +{ + if (enable_debugfs) + debugfs_remove(debugfs_dir); + if (delay_rmmod_ms) + msleep(delay_rmmod_ms); + unregister_test_dev_sysfs(first_test_dev); + if (enable_verbose_rmmod) + pr_info("unregister_test_dev_sysfs() completed\n"); + first_test_dev = NULL; +} +module_exit(test_sysfs_exit); + +MODULE_AUTHOR("Luis Chamberlain mcgrof@kernel.org"); +MODULE_LICENSE("GPL"); diff --git a/tools/testing/selftests/sysfs/Makefile b/tools/testing/selftests/sysfs/Makefile new file mode 100644 index 000000000000..fde99caa2338 --- /dev/null +++ b/tools/testing/selftests/sysfs/Makefile @@ -0,0 +1,12 @@ +# SPDX-License-Identifier: GPL-2.0-only +# Makefile for sysfs selftests. + +# No binaries, but make sure arg-less "make" doesn't trigger "run_tests". +all: + +TEST_PROGS := sysfs.sh + +include ../lib.mk + +# Nothing to clean up. +clean: diff --git a/tools/testing/selftests/sysfs/config b/tools/testing/selftests/sysfs/config new file mode 100644 index 000000000000..9196f452ecd5 --- /dev/null +++ b/tools/testing/selftests/sysfs/config @@ -0,0 +1,2 @@ +CONFIG_SYSFS=m +CONFIG_TEST_SYSFS=m diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh new file mode 100755 index 000000000000..681b27579f6f --- /dev/null +++ b/tools/testing/selftests/sysfs/sysfs.sh @@ -0,0 +1,1202 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (C) 2021 Luis Chamberlain mcgrof@kernel.org +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the Free +# Software Foundation; either version 2 of the License, or at your option any +# later version; or, when distributed separately from the Linux kernel or +# when incorporated into other software packages, subject to the following +# license: +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of copyleft-next (version 0.3.1 or later) as published +# at http://copyleft-next.org/. + +# This performs a series tests against the sysfs filesystem. + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +TEST_NAME="sysfs" +TEST_DRIVER="test_${TEST_NAME}" +TEST_DIR=$(dirname $0) +TEST_FILE=$(mktemp) + +# This represents +# +# TEST_ID:TEST_COUNT:ENABLED:TARGET +# +# TEST_ID: is the test id number +# TEST_COUNT: number of times we should run the test +# ENABLED: 1 if enabled, 0 otherwise +# TARGET: test target file required on the test_sysfs module +# +# Once these are enabled please leave them as-is. Write your own test, +# we have tons of space. +ALL_TESTS="0001:3:1:test_dev_x:misc" +ALL_TESTS="$ALL_TESTS 0002:3:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0003:3:1:test_dev_x:misc" +ALL_TESTS="$ALL_TESTS 0004:3:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0005:1:1:test_dev_x:misc" +ALL_TESTS="$ALL_TESTS 0006:1:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0007:1:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0008:1:1:test_dev_x:misc" +ALL_TESTS="$ALL_TESTS 0009:1:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0010:1:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0011:1:1:test_dev_x:misc" +ALL_TESTS="$ALL_TESTS 0012:1:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0013:1:1:test_dev_y:misc" +ALL_TESTS="$ALL_TESTS 0014:3:1:test_dev_x:block" # block equivalent set +ALL_TESTS="$ALL_TESTS 0015:3:1:test_dev_x:block" +ALL_TESTS="$ALL_TESTS 0016:3:1:test_dev_x:block" +ALL_TESTS="$ALL_TESTS 0017:3:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0018:1:1:test_dev_x:block" +ALL_TESTS="$ALL_TESTS 0019:1:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0020:1:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0021:1:1:test_dev_x:block" +ALL_TESTS="$ALL_TESTS 0022:1:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0023:1:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0024:1:1:test_dev_x:block" +ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block" +ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test +ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock + +allow_user_defaults() +{ + if [ -z $DIR ]; then + case $TEST_DEV_TYPE in + misc) + DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0" + ;; + block) + DIR="/sys/devices/virtual/block/${TEST_DRIVER}0" + ;; + *) + DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0" + ;; + esac + fi + case $TEST_DEV_TYPE in + misc) + MODPROBE_TESTDEV_TYPE="" + ;; + block) + MODPROBE_TESTDEV_TYPE="devtype=1" + ;; + *) + MODPROBE_TESTDEV_TYPE="" + ;; + esac + if [ -z $SYSFS_DEBUGFS_DIR ]; then + SYSFS_DEBUGFS_DIR="/sys/kernel/debug/test_sysfs" + fi + if [ -z $PAGE_SIZE ]; then + PAGE_SIZE=$(getconf PAGESIZE) + fi + if [ -z $MAX_DIGITS ]; then + MAX_DIGITS=$(($PAGE_SIZE/8)) + fi + if [ -z $INT_MAX ]; then + INT_MAX=$(getconf INT_MAX) + fi + if [ -z $UINT_MAX ]; then + UINT_MAX=$(getconf UINT_MAX) + fi +} + +test_reqs() +{ + uid=$(id -u) + if [ $uid -ne 0 ]; then + echo $msg must be run as root >&2 + exit $ksft_skip + fi + + if ! which modprobe 2> /dev/null > /dev/null; then + echo "$0: You need modprobe installed" >&2 + exit $ksft_skip + fi + if ! which getconf 2> /dev/null > /dev/null; then + echo "$0: You need getconf installed" + exit $ksft_skip + fi + if ! which diff 2> /dev/null > /dev/null; then + echo "$0: You need diff installed" + exit $ksft_skip + fi + if ! which perl 2> /dev/null > /dev/null; then + echo "$0: You need perl installed" + exit $ksft_skip + fi +} + +call_modprobe() +{ + modprobe $TEST_DRIVER $MODPROBE_TESTDEV_TYPE $FIRST_MODPROBE_ARGS $MODPROBE_ARGS + return $? +} + +modprobe_reset() +{ + modprobe -q -r $TEST_DRIVER + call_modprobe + return $? +} + +modprobe_reset_enable_debugfs() +{ + FIRST_MODPROBE_ARGS="enable_debugfs=1" + modprobe_reset + unset FIRST_MODPROBE_ARGS +} + +modprobe_reset_enable_lock_on_rmmod() +{ + FIRST_MODPROBE_ARGS="enable_lock=1 enable_lock_on_rmmod=1 enable_verbose_writes=1" + modprobe_reset + unset FIRST_MODPROBE_ARGS +} + +modprobe_reset_enable_rtnl_lock_on_rmmod() +{ + FIRST_MODPROBE_ARGS="enable_lock=1 use_rtnl_lock=1 enable_lock_on_rmmod=1" + FIRST_MODPROBE_ARGS="$FIRST_MODPROBE_ARGS enable_verbose_writes=1" + modprobe_reset + unset FIRST_MODPROBE_ARGS +} + +load_req_mod() +{ + modprobe_reset + if [ ! -d $DIR ]; then + if ! modprobe -q -n $TEST_DRIVER; then + echo "$0: module $TEST_DRIVER not found [SKIP]" + echo "You must set CONFIG_TEST_SYSFS=m in your kernel" >&2 + exit $ksft_skip + fi + call_modprobe + if [ $? -ne 0 ]; then + echo "$0: modprobe $TEST_DRIVER failed." + exit + fi + fi +} + +config_reset() +{ + if ! echo -n "1" >"$DIR"/reset; then + echo "$0: reset should have worked" >&2 + exit 1 + fi +} + +debugfs_reset_first_test_dev_ignore_errors() +{ + echo -n "1" >"$SYSFS_DEBUGFS_DIR"/reset_first_test_dev +} + +set_orig() +{ + if [[ ! -z $TARGET ]] && [[ ! -z $ORIG ]]; then + if [ -f ${TARGET} ]; then + echo "${ORIG}" > "${TARGET}" + fi + fi +} + +set_test() +{ + echo "${TEST_STR}" > "${TARGET}" +} + +set_test_ignore_errors() +{ + echo "${TEST_STR}" > "${TARGET}" 2> /dev/null +} + +verify() +{ + local seen + seen=$(cat "$1") + target_short=$(basename $TARGET) + case $target_short in + test_dev_x) + if [ "${seen}" != "${TEST_STR}" ]; then + return 1 + fi + ;; + test_dev_y) + DIRNAME=$(dirname $1) + EXPECTED_RESULT="" + # If our target was the test file then what we write to it + # is the same as what that we expect when we read from it. + # When we write to test_dev_y directly though we expect + # a computed value which is driver specific. + if [[ "$DIRNAME" == "/tmp" ]]; then + let EXPECTED_RESULT="${TEST_STR}" + else + x=$(cat ${DIR}/test_dev_x) + let EXPECTED_RESULT="$x+${TEST_STR}+7" + fi + + if [[ "${seen}" != "${EXPECTED_RESULT}" ]]; then + return 1 + fi + ;; + *) + echo "Unsupported target type update test script: $target_short" + exit 1 + esac + return 0 +} + +verify_diff_w() +{ + echo "$TEST_STR" | diff -q -w -u - $1 > /dev/null + return $? +} + +test_rc() +{ + if [[ $rc != 0 ]]; then + echo "Failed test, return value: $rc" >&2 + exit $rc + fi +} + +test_finish() +{ + set_orig + rm -f "${TEST_FILE}" + + if [ ! -z ${old_strict} ]; then + echo ${old_strict} > ${WRITES_STRICT} + fi + exit $rc +} + +# kernfs requires us to write everything we want in one shot because +# There is no easy way for us to know if userspace is only doing a partial +# write, so we don't support them. We expect the entire buffer to come on +# the first write. If you're writing a value, first read the file, +# modify only the value you're changing, then write entire buffer back. +# Since we are only testing digits we just full single writes and old stuff. +# For more details, refer to kernfs_fop_write_iter(). +run_numerictests_single_write() +{ + echo "== Testing sysfs behavior against ${TARGET} ==" + + rc=0 + + echo -n "Writing test file ... " + echo "${TEST_STR}" > "${TEST_FILE}" + if ! verify "${TEST_FILE}"; then + echo "FAIL" >&2 + exit 1 + else + echo "ok" + fi + + echo -n "Checking the sysfs file is not set to test value ... " + if verify "${TARGET}"; then + echo "FAIL" >&2 + exit 1 + else + echo "ok" + fi + + echo -n "Writing to sysfs file from shell ... " + set_test + if ! verify "${TARGET}"; then + echo "FAIL" >&2 + exit 1 + else + echo "ok" + fi + + echo -n "Resetting sysfs file to original value ... " + set_orig + if verify "${TARGET}"; then + echo "FAIL" >&2 + exit 1 + else + echo "ok" + fi + + # Now that we've validated the sanity of "set_test" and "set_orig", + # we can use those functions to set starting states before running + # specific behavioral tests. + + echo -n "Writing to the entire sysfs file in a single write ... " + set_orig + dd if="${TEST_FILE}" of="${TARGET}" bs=4096 2>/dev/null + if ! verify "${TARGET}"; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + + echo -n "Writing to the sysfs file with multiple long writes ... " + set_orig + (perl -e 'print "A" x 50;'; echo "${TEST_STR}") | \ + dd of="${TARGET}" bs=50 2>/dev/null + if verify "${TARGET}"; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + test_rc +} + +reset_vals() +{ + echo -n 3 > $DIR/test_dev_x + echo -n 4 > $DIR/test_dev_x +} + +check_failure() +{ + echo -n "Testing that $1 fails as expected..." + reset_vals + TEST_STR="$1" + orig="$(cat $TARGET)" + echo -n "$TEST_STR" > $TARGET 2> /dev/null + + # write should fail and $TARGET should retain its original value + if [ $? = 0 ] || [ "$(cat $TARGET)" != "$orig" ]; then + echo "FAIL" >&2 + rc=1 + else + echo "ok" + fi + test_rc +} + +load_modreqs() +{ + export TEST_DEV_TYPE=$(get_test_type $1) + unset DIR + allow_user_defaults + load_req_mod +} + +target_exists() +{ + TARGET="${DIR}/$1" + TEST_ID="$2" + + if [ ! -f ${TARGET} ] ; then + echo "Target for test $TEST_ID: $TARGET does not exist, skipping test ..." + return 0 + fi + return 1 +} + +config_enable_lock() +{ + if ! echo -n 1 > $DIR/config_enable_lock; then + echo "$0: Unable to enable locks" >&2 + exit 1 + fi +} + +config_write_delay_msec_y() +{ + if ! echo -n $1 > $DIR/config_write_delay_msec_y ; then + echo "$0: Unable to set write_delay_msec_y to $1" >&2 + exit 1 + fi +} + +# Default filter for dmesg scanning. +# Ignore lockdep complaining about its own bugginess when scanning dmesg +# output, because we shouldn't be failing filesystem tests on account of +# lockdep. +_check_dmesg_filter() +{ + egrep -v -e "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low" \ + -e "BUG: MAX_STACK_TRACE_ENTRIES too low" +} + +check_dmesg() +{ + # filter out intentional WARNINGs or Oopses + local filter=${1:-_check_dmesg_filter} + + _dmesg_since_test_start | $filter >$seqres.dmesg + egrep -q -e "kernel BUG at" \ + -e "WARNING:" \ + -e "\bBUG:" \ + -e "Oops:" \ + -e "possible recursive locking detected" \ + -e "Internal error" \ + -e "(INFO|ERR): suspicious RCU usage" \ + -e "INFO: possible circular locking dependency detected" \ + -e "general protection fault:" \ + -e "BUG .* remaining" \ + -e "UBSAN:" \ + $seqres.dmesg + if [ $? -eq 0 ]; then + echo "something found in dmesg (see $seqres.dmesg)" + return 1 + else + if [ "$KEEP_DMESG" != "yes" ]; then + rm -f $seqres.dmesg + fi + return 0 + fi +} + +log_kernel_fstest_dmesg() +{ + export FSTYP="$1" + export seqnum="$FSTYP/$2" + export date_time=$(date +"%F %T") + echo "run fstests $seqnum at $date_time" > /dev/kmsg +} + +modprobe_loop() +{ + while true; do + call_modprobe > /dev/null 2>&1 + modprobe -r $TEST_DRIVER > /dev/null 2>&1 + done > /dev/null 2>&1 +} + +write_loop() +{ + while true; do + set_test_ignore_errors > /dev/null 2>&1 + TEST_STR=$(( $TEST_STR + 1 )) + done > /dev/null 2>&1 +} + +write_loop_reset() +{ + while true; do + set_test_ignore_errors > /dev/null 2>&1 + debugfs_reset_first_test_dev_ignore_errors > /dev/null 2>&1 + done > /dev/null 2>&1 +} + +write_loop_bg() +{ + BG_WRITES=1000 > /dev/null 2>&1 + while true; do + for i in $(seq 1 $BG_WRITES); do + set_test_ignore_errors > /dev/null 2>&1 & + TEST_STR=$(( $TEST_STR + 1 )) + done > /dev/null 2>&1 + wait + done > /dev/null 2>&1 + wait +} + +reset_loop() +{ + while true; do + debugfs_reset_first_test_dev_ignore_errors > /dev/null 2>&1 + done > /dev/null 2>&1 +} + +kill_trigger_loop() +{ + + local my_first_loop_pid=$1 + local my_second_loop_pid=$2 + local my_sleep_max=$3 + local my_loop=0 + + while true; do + sleep 1 + if [[ $my_loop -ge $my_sleep_max ]]; then + break + fi + let my_loop=$my_loop+1 + done + + kill -s TERM $my_first_loop_pid 2>&1 > /dev/null + kill -s TERM $my_second_loop_pid 2>&1 > /dev/null +} + +_dmesg_since_test_start() +{ + # search the dmesg log of last run of $seqnum for possible failures + # use sed \cregexpc address type, since $seqnum contains "/" + dmesg | tac | sed -ne "0,#run fstests $seqnum at $date_time#p" | tac +} + +sysfs_test_0001() +{ + TARGET="${DIR}/$(get_test_target 0001)" + config_reset + reset_vals + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + + run_numerictests_single_write +} + +sysfs_test_0002() +{ + TARGET="${DIR}/$(get_test_target 0002)" + config_reset + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + + run_numerictests_single_write +} + +sysfs_test_0003() +{ + TARGET="${DIR}/$(get_test_target 0003)" + config_reset + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + + config_enable_lock + + run_numerictests_single_write +} + +sysfs_test_0004() +{ + TARGET="${DIR}/$(get_test_target 0004)" + config_reset + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + + config_enable_lock + + run_numerictests_single_write +} + +sysfs_test_0005() +{ + TARGET="${DIR}/$(get_test_target 0005)" + modprobe_reset + config_reset + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop writing x while loading/unloading the module... " + + modprobe_loop & + modprobe_pid=$! + + write_loop & + write_pid=$! + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0006() +{ + TARGET="${DIR}/$(get_test_target 0006)" + modprobe_reset + config_reset + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop writing y while loading/unloading the module... " + modprobe_loop & + modprobe_pid=$! + + write_loop & + write_pid=$! + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0007() +{ + TARGET="${DIR}/$(get_test_target 0007)" + modprobe_reset + config_reset + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop writing y with a larger delay while loading/unloading the module... " + + MODPROBE_ARGS="write_delay_msec_y=1500" + modprobe_loop > /dev/null 2>&1 & + modprobe_pid=$! + unset MODPROBE_ARGS + + write_loop & + write_pid=$! + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0008() +{ + TARGET="${DIR}/$(get_test_target 0008)" + modprobe_reset + config_reset + reset_vals + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop busy writing x while loading/unloading the module... " + + modprobe_loop > /dev/null 2>&1 & + modprobe_pid=$! + + write_loop_bg > /dev/null 2>&1 & + write_pid=$! + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0009() +{ + TARGET="${DIR}/$(get_test_target 0009)" + modprobe_reset + config_reset + reset_vals + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop busy writing y while loading/unloading the module... " + + modprobe_loop > /dev/null 2>&1 & + modprobe_pid=$! + + write_loop_bg > /dev/null 2>&1 & + write_pid=$! + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0010() +{ + TARGET="${DIR}/$(get_test_target 0010)" + modprobe_reset + config_reset + reset_vals + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop busy writing y with a larger delay while loading/unloading the module... " + modprobe -q -r $TEST_DRIVER > /dev/null 2>&1 + + MODPROBE_ARGS="write_delay_msec_y=1500" + modprobe_loop > /dev/null 2>&1 & + modprobe_pid=$! + unset MODPROBE_ARGS + + write_loop_bg > /dev/null 2>&1 & + write_pid=$! + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0011() +{ + TARGET="${DIR}/$(get_test_target 0011)" + modprobe_reset_enable_debugfs + config_reset + reset_vals + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop writing x and resetting ... " + + write_loop > /dev/null 2>&1 & + write_pid=$! + + reset_loop > /dev/null 2>&1 & + reset_pid=$! + + kill_trigger_loop $write_pid $reset_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0012() +{ + TARGET="${DIR}/$(get_test_target 0012)" + modprobe_reset_enable_debugfs + config_reset + reset_vals + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop writing y and resetting ... " + + write_loop > /dev/null 2>&1 & + write_pid=$! + + reset_loop > /dev/null 2>&1 & + reset_pid=$! + + kill_trigger_loop $write_pid $reset_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0013() +{ + TARGET="${DIR}/$(get_test_target 0013)" + modprobe_reset_enable_debugfs + config_reset + reset_vals + config_write_delay_msec_y 1500 + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Loop writing y with a larger delay and resetting ... " + + write_loop > /dev/null 2>&1 & + write_pid=$! + + reset_loop > /dev/null 2>&1 & + reset_pid=$! + + kill_trigger_loop $write_pid $reset_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0014() +{ + sysfs_test_0001 +} + +sysfs_test_0015() +{ + sysfs_test_0002 +} + +sysfs_test_0016() +{ + sysfs_test_0003 +} + +sysfs_test_0017() +{ + sysfs_test_0004 +} + +sysfs_test_0018() +{ + sysfs_test_0005 +} + +sysfs_test_0019() +{ + sysfs_test_0006 +} + +sysfs_test_0020() +{ + sysfs_test_0007 +} + +sysfs_test_0021() +{ + sysfs_test_0008 +} + +sysfs_test_0022() +{ + sysfs_test_0009 +} + +sysfs_test_0023() +{ + sysfs_test_0010 +} + +sysfs_test_0024() +{ + sysfs_test_0011 +} + +sysfs_test_0025() +{ + sysfs_test_0012 +} + +sysfs_test_0026() +{ + sysfs_test_0013 +} + +sysfs_test_0027() +{ + TARGET="${DIR}/$(get_test_target 0027)" + modprobe_reset_enable_lock_on_rmmod + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Test for possible rmmod deadlock while writing x ... " + + write_loop > /dev/null 2>&1 & + write_pid=$! + + MODPROBE_ARGS="enable_lock=1 enable_lock_on_rmmod=1 enable_verbose_writes=1" + modprobe_loop > /dev/null 2>&1 & + modprobe_pid=$! + unset MODPROBE_ARGS + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0028() +{ + TARGET="${DIR}/$(get_test_target 0028)" + modprobe_reset_enable_lock_on_rmmod + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + WAIT_TIME=2 + + echo -n "Test for possible rmmod deadlock using rtnl_lock while writing x ... " + + write_loop > /dev/null 2>&1 & + write_pid=$! + + MODPROBE_ARGS="enable_lock=1 enable_lock_on_rmmod=1 use_rtnl_lock=1 enable_verbose_writes=1" + modprobe_loop > /dev/null 2>&1 & + modprobe_pid=$! + unset MODPROBE_ARGS + + kill_trigger_loop $modprobe_pid $write_pid $WAIT_TIME > /dev/null 2>&1 & + kill_pid=$! + + wait $kill_pid > /dev/null 2>&1 + + if [[ $? -eq 0 ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +test_gen_desc() +{ + echo -n "$1 x $(get_test_count $1)" +} + +list_tests() +{ + echo "Test ID list:" + echo + echo "TEST_ID x NUM_TEST" + echo "TEST_ID: Test ID" + echo "NUM_TESTS: Number of recommended times to run the test" + echo + echo "$(test_gen_desc 0001) - misc test writing x in different ways" + echo "$(test_gen_desc 0002) - misc test writing y in different ways" + echo "$(test_gen_desc 0003) - misc test writing x in different ways using a mutex lock" + echo "$(test_gen_desc 0004) - misc test writing y in different ways using a mutex lock" + echo "$(test_gen_desc 0005) - misc test writing x load and remove the test_sysfs module" + echo "$(test_gen_desc 0006) - misc writing y load and remove the test_sysfs module" + echo "$(test_gen_desc 0007) - misc test writing y larger delay, load, remove test_sysfs" + echo "$(test_gen_desc 0008) - misc test busy writing x remove test_sysfs module" + echo "$(test_gen_desc 0009) - misc test busy writing y remove the test_sysfs module" + echo "$(test_gen_desc 0010) - misc test busy writing y larger delay, remove test_sysfs" + echo "$(test_gen_desc 0011) - misc test writing x and resetting device" + echo "$(test_gen_desc 0012) - misc test writing y and resetting device" + echo "$(test_gen_desc 0013) - misc test writing y with a larger delay and resetting device" + echo "$(test_gen_desc 0014) - block test writing x in different ways" + echo "$(test_gen_desc 0015) - block test writing y in different ways" + echo "$(test_gen_desc 0016) - block test writing x in different ways using a mutex lock" + echo "$(test_gen_desc 0017) - block test writing y in different ways using a mutex lock" + echo "$(test_gen_desc 0018) - block test writing x load and remove the test_sysfs module" + echo "$(test_gen_desc 0019) - block test writing y load and remove the test_sysfs module" + echo "$(test_gen_desc 0020) - block test writing y larger delay, load, remove test_sysfs" + echo "$(test_gen_desc 0021) - block test busy writing x remove the test_sysfs module" + echo "$(test_gen_desc 0022) - block test busy writing y remove the test_sysfs module" + echo "$(test_gen_desc 0023) - block test busy writing y larger delay, remove test_sysfs" + echo "$(test_gen_desc 0024) - block test writing x and resetting device" + echo "$(test_gen_desc 0025) - block test writing y and resetting device" + echo "$(test_gen_desc 0026) - block test writing y larger delay and resetting device" + echo "$(test_gen_desc 0027) - test rmmod deadlock while writing x ... " + echo "$(test_gen_desc 0028) - test rmmod deadlock using rtnl_lock while writing x ..." +} + +usage() +{ + NUM_TESTS=$(grep -o ' ' <<<"$ALL_TESTS" | grep -c .) + let NUM_TESTS=$NUM_TESTS+1 + MAX_TEST=$(printf "%04d\n" $NUM_TESTS) + echo "Usage: $0 [ -t <4-number-digit> ] | [ -w <4-number-digit> ] |" + echo " [ -s <4-number-digit> ] | [ -c <4-number-digit> <test- count>" + echo " [ all ] [ -h | --help ] [ -l ]" + echo "" + echo "Valid tests: 0001-$MAX_TEST" + echo "" + echo " all Runs all tests (default)" + echo " -t Run test ID the number amount of times is recommended" + echo " -w Watch test ID run until it runs into an error" + echo " -c Run test ID once" + echo " -s Run test ID x test-count number of times" + echo " -l List all test ID list" + echo " -h|--help Help" + echo + echo "If an error every occurs execution will immediately terminate." + echo "If you are adding a new test try using -w <test-ID> first to" + echo "make sure the test passes a series of tests." + echo + echo Example uses: + echo + echo "$TEST_NAME.sh -- executes all tests" + echo "$TEST_NAME.sh -t 0002 -- Executes test ID 0002 number of times is recomended" + echo "$TEST_NAME.sh -w 0002 -- Watch test ID 0002 run until an error occurs" + echo "$TEST_NAME.sh -s 0002 -- Run test ID 0002 once" + echo "$TEST_NAME.sh -c 0002 3 -- Run test ID 0002 three times" + echo + list_tests + exit 1 +} + +test_num() +{ + re='^[0-9]+$' + if ! [[ $1 =~ $re ]]; then + usage + fi +} + +get_test_count() +{ + test_num $1 + TEST_NUM=$(echo $1 | sed 's/^0*//') + TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}') + echo ${TEST_DATA} | awk -F":" '{print $2}' +} + +get_test_enabled() +{ + test_num $1 + TEST_NUM=$(echo $1 | sed 's/^0*//') + TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}') + echo ${TEST_DATA} | awk -F":" '{print $3}' +} + +get_test_target() +{ + test_num $1 + TEST_NUM=$(echo $1 | sed 's/^0*//') + TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}') + echo ${TEST_DATA} | awk -F":" '{print $4}' +} + +get_test_type() +{ + test_num $1 + TEST_NUM=$(echo $1 | sed 's/^0*//') + TEST_DATA=$(echo $ALL_TESTS | awk '{print $'$TEST_NUM'}') + echo ${TEST_DATA} | awk -F":" '{print $5}' +} + +run_all_tests() +{ + for i in $ALL_TESTS ; do + TEST_ID=$(echo $i | awk -F":" '{print $1}') + ENABLED=$(get_test_enabled $TEST_ID) + TEST_COUNT=$(get_test_count $TEST_ID) + TEST_TARGET=$(get_test_target $TEST_ID) + if [[ $ENABLED -eq "1" ]]; then + test_case $TEST_ID $TEST_COUNT $TEST_TARGET + else + echo -n "Skipping test $TEST_ID as its disabled, likely " + echo "could crash your system ..." + fi + done +} + +watch_log() +{ + if [ $# -ne 3 ]; then + clear + fi + echo "Running test: $2 - run #$1" +} + +watch_case() +{ + i=0 + while [ 1 ]; do + + if [ $# -eq 1 ]; then + test_num $1 + watch_log $i ${TEST_NAME}_test_$1 + ${TEST_NAME}_test_$1 + check_dmesg + if [[ $? -eq 0 ]]; then + exit 1 + fi + else + watch_log $i all + run_all_tests + fi + let i=$i+1 + done +} + +test_case() +{ + NUM_TESTS=$2 + + i=0 + + load_modreqs $1 + if target_exists $3 $1; then + return + fi + + while [[ $i -lt $NUM_TESTS ]]; do + test_num $1 + watch_log $i ${TEST_NAME}_test_$1 noclear + log_kernel_fstest_dmesg sysfs $1 + RUN_TEST=${TEST_NAME}_test_$1 + $RUN_TEST + let i=$i+1 + done + check_dmesg + if [[ $? -ne 0 ]]; then + exit 1 + fi +} + +parse_args() +{ + if [ $# -eq 0 ]; then + run_all_tests + else + if [[ "$1" = "all" ]]; then + run_all_tests + elif [[ "$1" = "-w" ]]; then + shift + watch_case $@ + elif [[ "$1" = "-t" ]]; then + shift + test_num $1 + test_case $1 $(get_test_count $1) $(get_test_target $1) + elif [[ "$1" = "-c" ]]; then + shift + test_num $1 + test_num $2 + test_case $1 $2 $(get_test_target $1) + elif [[ "$1" = "-s" ]]; then + shift + test_case $1 1 $(get_test_target $1) + elif [[ "$1" = "-l" ]]; then + list_tests + elif [[ "$1" = "-h" || "$1" = "--help" ]]; then + usage + else + usage + fi + fi +} + +test_reqs +allow_user_defaults + +trap "test_finish" EXIT + +parse_args $@ + +exit 0
On Fri, Jul 02, 2021 at 05:46:29PM -0700, Luis Chamberlain wrote:
This adds a new selftest module which can be used to test sysfs, which would otherwise require using an existing driver. This lets us muck with a template driver to test breaking things without affecting system behaviour or requiring the dependencies of a real device driver.
A series of 28 tests are added. Support for using two device types are supported:
- misc
- block
Contrary to sysctls, sysfs requires a full write to happen at once, and so we reduce the digit tests to single writes. Two main sysfs knobs are provided for testing reading/storing, one which doesn't inclur any delays and another which can incur programmed delays. What locks are held, if any, are configurable, at module load time, or through dynamic configuration at run time.
Since sysfs is a technically filesystem, but a pseudo one, which requires a kernel user, our test_sysfs module and respective test script embraces fstests format for tests in the kernel ring bufffer. Likewise, a scraper for kernel crashes is provided which matches what fstests does as well.
Two tests are kept disabled as they currently cause a deadlock, and so this provides a mechanism to easily show proof and demo how the deadlock can happen:
Demos the deadlock with a device specific lock ./tools/testing/selftests/sysfs/sysfs.sh -t 0027
Demos the deadlock with rtnl_lock() ./tools/testing/selftests/sysfs/sysfs.sh -t 0028
Two separate solutions to the deadlock issue have been proposed, and so now its a matter of either documenting this limitation or eventually adopting a generic fix.
This selftests will shortly be expanded upon with more tests which require further kernel changes in order to provide better test coverage.
Why is this not using kunit? We should not be adding new in-kernel tests that are not using that api anymore.
Signed-off-by: Luis Chamberlain mcgrof@kernel.org
MAINTAINERS | 7 + lib/Kconfig.debug | 10 + lib/Makefile | 1 + lib/test_sysfs.c | 943 +++++++++++++++++++ tools/testing/selftests/sysfs/Makefile | 12 + tools/testing/selftests/sysfs/config | 2 + tools/testing/selftests/sysfs/sysfs.sh | 1202 ++++++++++++++++++++++++ 7 files changed, 2177 insertions(+) create mode 100644 lib/test_sysfs.c create mode 100644 tools/testing/selftests/sysfs/Makefile create mode 100644 tools/testing/selftests/sysfs/config create mode 100755 tools/testing/selftests/sysfs/sysfs.sh
diff --git a/MAINTAINERS b/MAINTAINERS index 66d047dc6880..fd369ed50040 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17958,6 +17958,13 @@ L: linux-mmc@vger.kernel.org S: Maintained F: drivers/mmc/host/sdhci-pci-dwc-mshc.c +SYSFS TEST DRIVER +M: Luis Chamberlain mcgrof@kernel.org +L: linux-kernel@vger.kernel.org +S: Maintained +F: lib/test_sysfs.c +F: tools/testing/selftests/sysfs/
SYSTEM CONFIGURATION (SYSCON) M: Lee Jones lee.jones@linaro.org M: Arnd Bergmann arnd@arndb.de diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index fb370c7c4756..568838ac1189 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2360,6 +2360,16 @@ config TEST_SYSCTL If unsure, say N. +config TEST_SYSFS
- tristate "sysfs test driver"
- depends on SYSFS
- help
This builds the "test_sysfs" module. This driver enables to test the
sysfs file system safely without affecting production knobs which
might alter system functionality.
If unsure, say N.
config BITFIELD_KUNIT tristate "KUnit test bitfield functions at runtime" depends on KUNIT diff --git a/lib/Makefile b/lib/Makefile index 5efd1b435a37..effd1ef806f0 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -61,6 +61,7 @@ obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o obj-$(CONFIG_TEST_BITOPS) += test_bitops.o CFLAGS_test_bitops.o += -Werror obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o +obj-$(CONFIG_TEST_SYSFS) += test_sysfs.o obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o obj-$(CONFIG_TEST_IDA) += test_ida.o obj-$(CONFIG_KASAN_KUNIT_TEST) += test_kasan.o diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c new file mode 100644 index 000000000000..bf43016d40b5 --- /dev/null +++ b/lib/test_sysfs.c @@ -0,0 +1,943 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
This does not match your "boiler-plate" text below, sorry.
thanks,
greg k-h
On Wed, Jul 21, 2021 at 01:32:41PM +0200, Greg KH wrote:
On Fri, Jul 02, 2021 at 05:46:29PM -0700, Luis Chamberlain wrote:
This selftests will shortly be expanded upon with more tests which require further kernel changes in order to provide better test coverage.
Why is this not using kunit? We should not be adding new in-kernel tests that are not using that api anymore.
No way. That cannot possibly be true. When was this decided? Did Shuah Khan, the maintainer of selftests, all of a sudden decide we are going to deprecate selftests in favor for trying to only use kunit? Did we have a conference where this was talked about and decided?
If so all these are huge news to me and I missed the memo!
If I would have been at such meeting I would have definitely yelled bloody murder!
kunit relies on UML and UML is a simple one core architecture, to start with. This means I cannot run tests for multicore with it, which is where many races do happen! Yes, you can run kunit on other architectures, but all that is new.
Second, I did help review kunit getting upstream, and suggested a few example tests, part of which were for sysctl to compare and contrast what is possible and what we cannot do.
Not everything we want to test should be written as a kunit test. No way.
In this case kunit is not ideal given I want to mimic something in userspace interaction, and expose races through error injection and if we can use as many cores to busy races out.
Trust me, I'm an advocate of kunit, and I'm even trying to see ideally what tests from fstests / blktests could be kunit'ified. But in this case, no. Using a selftests is much better target framework.
diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c new file mode 100644 index 000000000000..bf43016d40b5 --- /dev/null +++ b/lib/test_sysfs.c @@ -0,0 +1,943 @@ +// SPDX-License-Identifier: GPL-2.0-or-later
This does not match your "boiler-plate" text below, sorry.
Indeed, I'll fix it.
Luis
On Thu, Jul 22, 2021 at 03:34:49PM -0700, Luis Chamberlain wrote:
kunit relies on UML and UML is a simple one core architecture, to start with.
I thought the UML requirement was long gone, are you sure it is still present?
This means I cannot run tests for multicore with it, which is where many races do happen! Yes, you can run kunit on other architectures, but all that is new.
What do you mean by "new"? It should work today, in today's kernel tree, right?
In this case kunit is not ideal given I want to mimic something in userspace interaction, and expose races through error injection and if we can use as many cores to busy races out.
Can you not do that with kunit? If not, why not?
thanks,
greg k-h
On Fri, Jul 23, 2021 at 12:38:27PM +0200, Greg KH wrote:
On Thu, Jul 22, 2021 at 03:34:49PM -0700, Luis Chamberlain wrote:
kunit relies on UML and UML is a simple one core architecture, to start with.
I thought the UML requirement was long gone, are you sure it is still present?
It's *the* way to run kunit.
This means I cannot run tests for multicore with it, which is where many races do happen! Yes, you can run kunit on other architectures, but all that is new.
What do you mean by "new"? It should work today, in today's kernel tree, right?
That was experimental. And I know no one using it.
In this case kunit is not ideal given I want to mimic something in userspace interaction, and expose races through error injection and if we can use as many cores to busy races out.
Can you not do that with kunit? If not, why not?
kunit requires codifying everything in C, what we need is a lot of flexibility to do all sorts of races in userspace, and actually use userspace tools.
I am *not* going to even try to rewrite my selftest as a kunit test. It is just not going to happen.
Luis
This adds initial failure injection support to kernfs. We start off with debug knobs which when enabled allow test drivers, such as test_sysfs, to then make use of these to try to force certain difficult races to take place with a high degree of certainty.
This only adds runtime code *iff* the new bool CONFIG_FAIL_KERNFS_KNOBS is enabled in your kernel. If you don't have this enabled this provides no new functional. When CONFIG_FAIL_KERNFS_KNOBS is disabled the new routine kernfs_debug_should_wait() ends up being transformed to if (false), and so the compiler should optimize these out as dead code producing no new effective binary changes.
We start off with enabling failure injections in kernfs by allowing us to alter the way kernfs_fop_write_iter() behaves. We allow for the routine kernfs_fop_write_iter() to wait for a certain condition in the kernel to occur, after which it will sleep a predefined amount of time. This lets kernfs users to time exactly when it want kernfs_fop_write_iter() to complete, allowing for developing race conditions and test for correctness in kernfs.
You'd boot with this enabled on your kernel command line:
fail_kernfs_fop_write_iter=1,100,0,1
The values are <interval,probability,size,times>, we don't care for size, so for now we ignore it. The above ensures a failure will trigger only once.
*How* we allow for this routine to change behaviour is left to knobs we expose under debugfs:
# ls -1 /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/ wait_after_active wait_after_mutex wait_at_start wait_before_mutex
A debugfs entry also exists to allow us to sleep a configurabler amount of time after the completion:
/sys/kernel/debug/kernfs/sleep_after_wait_ms
These two sets of knobs allow us to construct races and demonstrate how the kernfs active reference should suffice to project against races.
Enabling CONFIG_FAULT_INJECTION_DEBUG_FS enables us to configure the differnt fault injection parametres for the new fail_kernfs_fop_write_iter fault injection at run time:
ls -1 /sys/kernel/debug/kernfs/fail_kernfs_fop_write_iter/ interval probability space task-filter times verbose verbose_ratelimit_burst verbose_ratelimit_interval_ms
Signed-off-by: Luis Chamberlain mcgrof@kernel.org
kernfs: fix for latest linux-next
Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- .../fault-injection/fault-injection.rst | 22 +++++ MAINTAINERS | 2 +- fs/kernfs/Makefile | 1 + fs/kernfs/failure-injection.c | 83 +++++++++++++++++++ fs/kernfs/file.c | 13 +++ fs/kernfs/kernfs-internal.h | 72 ++++++++++++++++ include/linux/kernfs.h | 5 ++ lib/Kconfig.debug | 10 +++ 8 files changed, 207 insertions(+), 1 deletion(-) create mode 100644 fs/kernfs/failure-injection.c
diff --git a/Documentation/fault-injection/fault-injection.rst b/Documentation/fault-injection/fault-injection.rst index f47d05ed0d94..d27620196ee8 100644 --- a/Documentation/fault-injection/fault-injection.rst +++ b/Documentation/fault-injection/fault-injection.rst @@ -24,6 +24,28 @@ Available fault injection capabilities
injects futex deadlock and uaddr fault errors.
+- fail_kernfs_fop_write_iter + + Allows for failures to be enabled inside kernfs_fop_write_iter(). Enabling + this does not immediately enable any errors to occur. You must configure + how you want this routine to fail or change behaviour by using the debugfs + knobs for it: + + # ls -1 /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/ + wait_after_active + wait_after_mutex + wait_at_start + wait_before_mutex + + You can also configure how long to sleep after a wait under + + /sys/kernel/debug/kernfs/sleep_after_wait_ms + + If you enable CONFIG_FAULT_INJECTION_DEBUG_FS the fail_add_disk failure + injection parameters are placed under: + + /sys/kernel/debug/kernfs/fail_kernfs_fop_write_iter/ + - fail_make_request
injects disk IO errors on devices permitted by setting diff --git a/MAINTAINERS b/MAINTAINERS index fd369ed50040..419405b2378a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -10246,7 +10246,7 @@ M: Greg Kroah-Hartman gregkh@linuxfoundation.org M: Tejun Heo tj@kernel.org S: Supported T: git git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git -F: fs/kernfs/ +F: fs/kernfs/* F: include/linux/kernfs.h
KEXEC diff --git a/fs/kernfs/Makefile b/fs/kernfs/Makefile index 4ca54ff54c98..bc5b32ca39f9 100644 --- a/fs/kernfs/Makefile +++ b/fs/kernfs/Makefile @@ -4,3 +4,4 @@ #
obj-y := mount.o inode.o dir.o file.o symlink.o +obj-$(CONFIG_FAIL_KERNFS_KNOBS) += failure-injection.o diff --git a/fs/kernfs/failure-injection.c b/fs/kernfs/failure-injection.c new file mode 100644 index 000000000000..7019385e8145 --- /dev/null +++ b/fs/kernfs/failure-injection.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/fault-inject.h> +#include <linux/delay.h> + +#include "kernfs-internal.h" + +static DECLARE_FAULT_ATTR(fail_kernfs_fop_write_iter); +struct kernfs_config_fail kernfs_config_fail; + +#define kernfs_config_fail(when) \ + kernfs_config_fail.kernfs_fop_write_iter_fail.wait_ ## when + +#define kernfs_config_fail(when) \ + kernfs_config_fail.kernfs_fop_write_iter_fail.wait_ ## when + +static int __init setup_fail_kernfs_fop_write_iter(char *str) +{ + return setup_fault_attr(&fail_kernfs_fop_write_iter, str); +} + +__setup("fail_kernfs_fop_write_iter=", setup_fail_kernfs_fop_write_iter); + +struct dentry *kernfs_debugfs_root; +struct dentry *config_fail_kernfs_fop_write_iter; + +static int __init kernfs_init_failure_injection(void) +{ + kernfs_config_fail.sleep_after_wait_ms = 100; + kernfs_debugfs_root = debugfs_create_dir("kernfs", NULL); + + fault_create_debugfs_attr("fail_kernfs_fop_write_iter", + kernfs_debugfs_root, &fail_kernfs_fop_write_iter); + + config_fail_kernfs_fop_write_iter = + debugfs_create_dir("config_fail_kernfs_fop_write_iter", + kernfs_debugfs_root); + + debugfs_create_u32("sleep_after_wait_ms", 0600, + kernfs_debugfs_root, + &kernfs_config_fail.sleep_after_wait_ms); + + debugfs_create_bool("wait_at_start", 0600, + config_fail_kernfs_fop_write_iter, + &kernfs_config_fail(at_start)); + debugfs_create_bool("wait_before_mutex", 0600, + config_fail_kernfs_fop_write_iter, + &kernfs_config_fail(before_mutex)); + debugfs_create_bool("wait_after_mutex", 0600, + config_fail_kernfs_fop_write_iter, + &kernfs_config_fail(after_mutex)); + debugfs_create_bool("wait_after_active", 0600, + config_fail_kernfs_fop_write_iter, + &kernfs_config_fail(after_active)); + return 0; +} +late_initcall(kernfs_init_failure_injection); + +int __kernfs_debug_should_wait_kernfs_fop_write_iter(bool evaluate) +{ + if (!evaluate) + return 0; + + return should_fail(&fail_kernfs_fop_write_iter, 0); +} + +DECLARE_COMPLETION(kernfs_debug_wait_completion); +EXPORT_SYMBOL_NS_GPL(kernfs_debug_wait_completion, KERNFS_DEBUG_PRIVATE); + +void kernfs_debug_wait(void) +{ + wait_for_completion(&kernfs_debug_wait_completion); + pr_info("%s received completion\n", __func__); + + /** + * The goal is wait for an event, and *then* once we have + * reached it, the other side will try to do something which + * it thinks will break. So we must give it some time to do + * that. The amount of time is configurable. + */ + msleep(kernfs_config_fail.sleep_after_wait_ms); + pr_info("%s ended\n", __func__); +} diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c index c75719312147..d45954600f11 100644 --- a/fs/kernfs/file.c +++ b/fs/kernfs/file.c @@ -259,6 +259,9 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) const struct kernfs_ops *ops; char *buf;
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, at_start)) + kernfs_debug_wait(); + if (of->atomic_write_len) { if (len > of->atomic_write_len) return -E2BIG; @@ -280,17 +283,27 @@ static ssize_t kernfs_fop_write_iter(struct kiocb *iocb, struct iov_iter *iter) } buf[len] = '\0'; /* guarantee string termination */
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, before_mutex)) + kernfs_debug_wait(); + /* * @of->mutex nests outside active ref and is used both to ensure that * the ops aren't called concurrently for the same open file. */ mutex_lock(&of->mutex); + + if (kernfs_debug_should_wait(kernfs_fop_write_iter, after_mutex)) + kernfs_debug_wait(); + if (!kernfs_get_active(of->kn)) { mutex_unlock(&of->mutex); len = -ENODEV; goto out_free; }
+ if (kernfs_debug_should_wait(kernfs_fop_write_iter, after_active)) + kernfs_debug_wait(); + ops = kernfs_ops(of->kn); if (ops->write) len = ops->write(of, buf, len, iocb->ki_pos); diff --git a/fs/kernfs/kernfs-internal.h b/fs/kernfs/kernfs-internal.h index ccc3b44f6306..221f341991d9 100644 --- a/fs/kernfs/kernfs-internal.h +++ b/fs/kernfs/kernfs-internal.h @@ -17,6 +17,7 @@
#include <linux/kernfs.h> #include <linux/fs_context.h> +#include <linux/stringify.h>
struct kernfs_iattrs { kuid_t ia_uid; @@ -127,4 +128,75 @@ void kernfs_drain_open_files(struct kernfs_node *kn); */ extern const struct inode_operations kernfs_symlink_iops;
+/* + * failure-injection.c + */ +#ifdef CONFIG_FAIL_KERNFS_KNOBS + +/** + * struct kernfs_fop_write_iter_fail - how kernfs_fop_write_iter_fail fails + * + * This lets you configure what part of kernfs_fop_write_iter() should behave + * in a specific way to allow userspace to capture possible failures in + * kernfs. The wait knobs are allowed to let you design capture possible + * race conditions which would otherwise be difficult to reproduce. A + * secondary driver would tell kernfs's wait completion when it is done. + * + * The point to the wait completion failure injection tests are to confirm + * that the kernfs active refcount suffice to ensure other objects in other + * layers are also gauranteed to exist, even they are opaque to kernfs. This + * includes kobjects, devices, and other objects built on top of this, like + * the block layer when using sysfs block device attributes. + * + * @wait_at_start: waits for completion from a third party at the start of + * the routine. + * @wait_before_mutex: waits for completion from a third party before we + * are allowed to continue before the of->mutex is held. + * @wait_after_mutex: waits for completion from a third party after we + * have held the of->mutex. + * @wait_after_active: waits for completion from a thid party after we + * have refcounted the struct kernfs_node. + */ +struct kernfs_fop_write_iter_fail { + bool wait_at_start; + bool wait_before_mutex; + bool wait_after_mutex; + bool wait_after_active; +}; + +/** + * struct kernfs_config_fail - kernfs configuration for failure injection + * + * You can kernfs failure injection on boot, and in particular we currently + * only support failures for kernfs_fop_write_iter(). However, we don't + * want to always enable errors on this call when failure injection is enabled + * as this routine is used by many parts of the kernel for proper functionality. + * The compromise we make is we let userspace start enabling which parts it + * wants to fail after boot, if and only if failure injection has been enabled. + * + * @kernfs_fop_write_iter_fail: configuration for how we want to allow + * for failure injection on kernfs_fop_write_iter() + * @sleep_after_wait_ms: how many ms to wait after completion is received. + */ +struct kernfs_config_fail { + struct kernfs_fop_write_iter_fail kernfs_fop_write_iter_fail; + u32 sleep_after_wait_ms; +}; + +extern struct kernfs_config_fail kernfs_config_fail; + +#define __kernfs_config_wait_var(func, when) \ + (kernfs_config_fail. func ## _fail.wait_ ## when) +#define __kernfs_debug_should_wait_func_name(func) __kernfs_debug_should_wait_## func + +#define kernfs_debug_should_wait(func, when) \ + __kernfs_debug_should_wait_func_name(func)(__kernfs_config_wait_var(func, when)) +int __kernfs_debug_should_wait_kernfs_fop_write_iter(bool evaluate); +void kernfs_debug_wait(void); +#else +static inline void kernfs_init_failure_injection(void) {} +#define kernfs_debug_should_wait(when) (false) +static inline void kernfs_debug_wait(void) {} +#endif + #endif /* __KERNFS_INTERNAL_H */ diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 9e8ca8743c26..0da36b8e18e1 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -410,6 +410,11 @@ void kernfs_init(void);
struct kernfs_node *kernfs_find_and_get_node_by_id(struct kernfs_root *root, u64 id); + +#ifdef CONFIG_FAIL_KERNFS_KNOBS +extern struct completion kernfs_debug_wait_completion; +#endif + #else /* CONFIG_KERNFS */
static inline enum kernfs_node_type kernfs_type(struct kernfs_node *kn) diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 568838ac1189..7dbfbfbd3cfc 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1930,6 +1930,16 @@ config FAULT_INJECTION_USERCOPY Provides fault-injection capability to inject failures in usercopy functions (copy_from_user(), get_user(), ...).
+config FAIL_KERNFS_KNOBS + bool "Fault-injection support in kernfs" + depends on FAULT_INJECTION + help + Provide fault-injection capability for kernfs. This only enables + the error injection functionality. To use it you must configure which + which path you want to trigger on error on using debugfs under + /sys/kernel/debug/kernfs/config_fail_kernfs_fop_write_iter/. By + default all of these are disabled. + config FAIL_MAKE_REQUEST bool "Fault-injection capability for disk IO" depends on FAULT_INJECTION && BLOCK
This extends test_sysfs with support for using the failure injection wait completion and knobs to force a few race conditions which demonstrates that kernfs active reference protection is sufficient for kobject / device protection at higher layers.
This adds 4 new tests which tries to remove the device attribute store operation in 4 different situations:
1) at the start of kernfs_kernfs_fop_write_iter() 2) before the of->mutex is held in kernfs_kernfs_fop_write_iter() 3) after the of->mutex is held in kernfs_kernfs_fop_write_iter() 4) after the kernfs node active reference is taken
A write fails in call cases.
No null dereferences are reproduced, even though this has been observed in some complex testing cases [0]. If this issue really exists we should have enough tools on the sysfs_test toolbox now to try to reproduce this easily without having to poke around other drivers.
[0] https://lkml.kernel.org/r/20210623215007.862787-1-mcgrof@kernel.org
Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- lib/Kconfig.debug | 3 + lib/test_sysfs.c | 31 +++++ tools/testing/selftests/sysfs/config | 3 + tools/testing/selftests/sysfs/sysfs.sh | 174 +++++++++++++++++++++++++ 4 files changed, 211 insertions(+)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 7dbfbfbd3cfc..717bc367a9db 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2373,6 +2373,9 @@ config TEST_SYSCTL config TEST_SYSFS tristate "sysfs test driver" depends on SYSFS + select FAULT_INJECTION + select FAULT_INJECTION_DEBUG_FS + select FAIL_KERNFS_KNOBS help This builds the "test_sysfs" module. This driver enables to test the sysfs file system safely without affecting production knobs which diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c index bf43016d40b5..f27ec0eab747 100644 --- a/lib/test_sysfs.c +++ b/lib/test_sysfs.c @@ -38,6 +38,11 @@ #include <linux/rtnetlink.h> #include <linux/genhd.h> #include <linux/blkdev.h> +#include <linux/kernfs.h> + +#ifdef CONFIG_FAIL_KERNFS_KNOBS +MODULE_IMPORT_NS(KERNFS_DEBUG_PRIVATE); +#endif
static bool enable_lock; module_param(enable_lock, bool_enable_only, 0644); @@ -82,6 +87,13 @@ static bool enable_verbose_rmmod; module_param(enable_verbose_rmmod, bool_enable_only, 0644); MODULE_PARM_DESC(enable_verbose_rmmod, "enable verbose print messages on rmmod");
+#ifdef CONFIG_FAIL_KERNFS_KNOBS +static bool enable_completion_on_rmmod; +module_param(enable_completion_on_rmmod, bool_enable_only, 0644); +MODULE_PARM_DESC(enable_completion_on_rmmod, + "enable sending a kernfs completion on rmmod"); +#endif + static int sysfs_test_major;
/** @@ -289,6 +301,12 @@ static ssize_t config_show(struct device *dev, "enable_verbose_writes:\t%s\n", enable_verbose_writes ? "true" : "false");
+#ifdef CONFIG_FAIL_KERNFS_KNOBS + len += snprintf(buf+len, PAGE_SIZE - len, + "enable_completion_on_rmmod:\t%s\n", + enable_completion_on_rmmod ? "true" : "false"); +#endif + test_dev_config_unlock(test_dev);
return len; @@ -926,10 +944,23 @@ static int __init test_sysfs_init(void) } module_init(test_sysfs_init);
+#ifdef CONFIG_FAIL_KERNFS_KNOBS +/* The goal is to race our device removal with a pending kernfs -> store call */ +static void test_sysfs_kernfs_send_completion_rmmod(void) +{ + if (!enable_completion_on_rmmod) + return; + complete(&kernfs_debug_wait_completion); +} +#else +static inline void test_sysfs_kernfs_send_completion_rmmod(void) {} +#endif + static void __exit test_sysfs_exit(void) { if (enable_debugfs) debugfs_remove(debugfs_dir); + test_sysfs_kernfs_send_completion_rmmod(); if (delay_rmmod_ms) msleep(delay_rmmod_ms); unregister_test_dev_sysfs(first_test_dev); diff --git a/tools/testing/selftests/sysfs/config b/tools/testing/selftests/sysfs/config index 9196f452ecd5..2876a229f95b 100644 --- a/tools/testing/selftests/sysfs/config +++ b/tools/testing/selftests/sysfs/config @@ -1,2 +1,5 @@ CONFIG_SYSFS=m CONFIG_TEST_SYSFS=m +CONFIG_FAULT_INJECTION=y +CONFIG_FAULT_INJECTION_DEBUG_FS=y +CONFIG_FAIL_KERNFS_KNOBS=y diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh index 681b27579f6f..f27ea61e0e95 100755 --- a/tools/testing/selftests/sysfs/sysfs.sh +++ b/tools/testing/selftests/sysfs/sysfs.sh @@ -62,6 +62,10 @@ ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block" ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block" ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock +ALL_TESTS="$ALL_TESTS 0029:1:1:test_dev_x:block" # kernfs race removal of store +ALL_TESTS="$ALL_TESTS 0030:1:1:test_dev_x:block" # kernfs race removal before mutex +ALL_TESTS="$ALL_TESTS 0031:1:1:test_dev_x:block" # kernfs race removal after mutex +ALL_TESTS="$ALL_TESTS 0032:1:1:test_dev_x:block" # kernfs race removal after active
allow_user_defaults() { @@ -92,6 +96,9 @@ allow_user_defaults() if [ -z $SYSFS_DEBUGFS_DIR ]; then SYSFS_DEBUGFS_DIR="/sys/kernel/debug/test_sysfs" fi + if [ -z $KERNFS_DEBUGFS_DIR ]; then + KERNFS_DEBUGFS_DIR="/sys/kernel/debug/kernfs" + fi if [ -z $PAGE_SIZE ]; then PAGE_SIZE=$(getconf PAGESIZE) fi @@ -167,6 +174,14 @@ modprobe_reset_enable_rtnl_lock_on_rmmod() unset FIRST_MODPROBE_ARGS }
+modprobe_reset_enable_completion() +{ + FIRST_MODPROBE_ARGS="enable_completion_on_rmmod=1 enable_verbose_writes=1" + FIRST_MODPROBE_ARGS="$FIRST_MODPROBE_ARGS enable_verbose_rmmod=1 delay_rmmod_ms=0" + modprobe_reset + unset FIRST_MODPROBE_ARGS +} + load_req_mod() { modprobe_reset @@ -197,6 +212,63 @@ debugfs_reset_first_test_dev_ignore_errors() echo -n "1" >"$SYSFS_DEBUGFS_DIR"/reset_first_test_dev }
+debugfs_kernfs_kernfs_fop_write_iter_exists() +{ + KNOB_DIR="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter" + if [[ ! -d $KNOB_DIR ]]; then + echo "kernfs debugfs does not exist $KNOB_DIR" + return 0; + fi + KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter" + if [[ ! -d $KNOB_DEBUGFS ]]; then + echo -n "kernfs debugfs for coniguring fail_kernfs_fop_write_iter " + echo "does not exist $KNOB_DIR" + return 0; + fi + return 1 +} + +debugfs_kernfs_kernfs_fop_write_iter_set_fail_once() +{ + KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter" + echo 1 > $KNOB_DEBUGFS/interval + echo 100 > $KNOB_DEBUGFS/probability + echo 0 > $KNOB_DEBUGFS/space + # Disable verbose messages on the kernel ring buffer which may + # confuse developers with a kernel panic. + echo 0 > $KNOB_DEBUGFS/verbose + + # Fail only once + echo 1 > $KNOB_DEBUGFS/times +} + +debugfs_kernfs_kernfs_fop_write_iter_set_fail_never() +{ + KNOB_DEBUGFS="${KERNFS_DEBUGFS_DIR}/fail_kernfs_fop_write_iter" + echo 0 > $KNOB_DEBUGFS/times +} + +debugfs_kernfs_set_wait_ms() +{ + SLEEP_AFTER_WAIT_MS="${KERNFS_DEBUGFS_DIR}/sleep_after_wait_ms" + echo $1 > $SLEEP_AFTER_WAIT_MS +} + +debugfs_kernfs_disable_wait_kernfs_fop_write_iter() +{ + ENABLE_WAIT_KNOB="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter/wait_" + for KNOB in ${ENABLE_WAIT_KNOB}*; do + echo 0 > $KNOB + done +} + +debugfs_kernfs_enable_wait_kernfs_fop_write_iter() +{ + ENABLE_WAIT_KNOB="${KERNFS_DEBUGFS_DIR}/config_fail_kernfs_fop_write_iter/wait_$1" + echo -n "1" > $ENABLE_WAIT_KNOB + return $? +} + set_orig() { if [[ ! -z $TARGET ]] && [[ ! -z $ORIG ]]; then @@ -972,6 +1044,104 @@ sysfs_test_0028() fi }
+sysfs_race_kernfs_kernfs_fop_write_iter() +{ + TARGET="${DIR}/$(get_test_target $1)" + WAIT_AT=$2 + EXPECT_WRITE_RETURNS=$3 + MSDELAY=$4 + + modprobe_reset_enable_completion + ORIG=$(cat "${TARGET}") + TEST_STR=$(( $ORIG + 1 )) + + echo -n "Test racing removal of sysfs store op with kernfs $WAIT_AT ... " + + if debugfs_kernfs_kernfs_fop_write_iter_exists; then + echo -n "skipping test as CONFIG_FAIL_KERNFS_KNOBS " + echo " or CONFIG_FAULT_INJECTION_DEBUG_FS is disabled" + return $ksft_skip + fi + + # Allow for failing the kernfs_kernfs_fop_write_iter call once, + # we'll provide exact context shortly afterwards. + debugfs_kernfs_kernfs_fop_write_iter_set_fail_once + + # First disable all waits + debugfs_kernfs_disable_wait_kernfs_fop_write_iter + + # Enable a wait_for_completion(&kernfs_debug_wait_completion) at the + # specified location inside the kernfs_fop_write_iter() routine + debugfs_kernfs_enable_wait_kernfs_fop_write_iter $WAIT_AT + + # Configure kernfs so that after its wait_for_completion() it + # will msleep() this amount of time and schedule(). We figure this + # will be sufficient time to allow for our module removal to complete. + debugfs_kernfs_set_wait_ms $MSDELAY + + # Now we trigger a kernfs write op, which will run kernfs_fop_write_iter, + # but will wait until our driver sends a respective completion + set_test_ignore_errors & + write_pid=$! + + # At this point kernfs_fop_write_iter() hasn't run our op, its + # waiting for our completion at the specified time $WAIT_AT. + # We now remove our module which will send a + # complete(&kernfs_debug_wait_completion) right before we deregister + # our device and the sysfs device attributes are removed. + # + # After the completion is sent, the test_sysfs driver races with + # kernfs to do the device deregistration with the kernfs msleep + # and schedule(). This should mean we've forced trying to remove the + # module prior to allowing kernfs to run our store operation. If the + # race did happen we'll panic with a null dereference on the store op. + # + # If no race happens we should see no write operation triggered. + modprobe -r $TEST_DRIVER > /dev/null 2>&1 + + debugfs_kernfs_kernfs_fop_write_iter_set_fail_never + + wait $write_pid + if [[ $? -eq $EXPECT_WRITE_RETURNS ]]; then + echo "ok" + else + echo "FAIL" >&2 + fi +} + +sysfs_test_0029() +{ + for delay in 0 2 4 8 16 32 64 128 246 512 1024; do + echo "Using delay-after-completion: $delay" + sysfs_race_kernfs_kernfs_fop_write_iter 0029 at_start 1 $delay + done +} + +sysfs_test_0030() +{ + for delay in 0 2 4 8 16 32 64 128 246 512 1024; do + echo "Using delay-after-completion: $delay" + sysfs_race_kernfs_kernfs_fop_write_iter 0030 before_mutex 1 $delay + done +} + +sysfs_test_0031() +{ + for delay in 0 2 4 8 16 32 64 128 246 512 1024; do + echo "Using delay-after-completion: $delay" + sysfs_race_kernfs_kernfs_fop_write_iter 0031 after_mutex 1 $delay + done +} + +# Even if we get the active reference the write fails +sysfs_test_0032() +{ + for delay in 0 2 4 8 16 32 64 128 246 512 1024; do + echo "Using delay-after-completion: $delay" + sysfs_race_kernfs_kernfs_fop_write_iter 0032 after_active 1 $delay + done +} + test_gen_desc() { echo -n "$1 x $(get_test_count $1)" @@ -1013,6 +1183,10 @@ list_tests() echo "$(test_gen_desc 0026) - block test writing y larger delay and resetting device" echo "$(test_gen_desc 0027) - test rmmod deadlock while writing x ... " echo "$(test_gen_desc 0028) - test rmmod deadlock using rtnl_lock while writing x ..." + echo "$(test_gen_desc 0029) - racing removal of store op with kernfs at start" + echo "$(test_gen_desc 0030) - racing removal of store op with kernfs before mutex" + echo "$(test_gen_desc 0031) - racing removal of store op with kernfs after mutex" + echo "$(test_gen_desc 0032) - racing removal of store op with kernfs after active" }
usage()
Two mechanisms have been proposed to fix the sysfs deadlock issue. The first approach proposed is by optionally allowing drivers to specify a module and augmenting attributes with module information [0]. A secondary approach is to use macros on drivers which needs this, in the meantime. This embraces the secondary approach, in lieu of agreement of a generic solution. This should be enough to allow for room for experimentation and demonstration of the issue.
This then also enables the two test cases which we have disabled as otherwise they would deadlock your system.
./tools/testing/selftests/sysfs/sysfs.sh -t 0027 Running test: sysfs_test_0027 - run #0 Test for possible rmmod deadlock while writing x ... ok
./tools/testing/selftests/sysfs/sysfs.sh -t 0028 Running test: sysfs_test_0028 - run #0 Test for possible rmmod deadlock using rtnl_lock while writing x ... ok
[0] https://lkml.kernel.org/r/20210401235925.GR4332@42.do-not-panic.com
Signed-off-by: Luis Chamberlain mcgrof@kernel.org --- lib/test_sysfs.c | 71 ++++++++++++++++++++++---- tools/testing/selftests/sysfs/sysfs.sh | 4 +- 2 files changed, 64 insertions(+), 11 deletions(-)
diff --git a/lib/test_sysfs.c b/lib/test_sysfs.c index f27ec0eab747..af14e992e1b8 100644 --- a/lib/test_sysfs.c +++ b/lib/test_sysfs.c @@ -94,6 +94,59 @@ MODULE_PARM_DESC(enable_completion_on_rmmod, "enable sending a kernfs completion on rmmod"); #endif
+#undef __ATTR_RO +#undef __ATTR_RW +#undef __ATTR_WO + +#define __ATTR_RO(_name) { \ + .attr = { .name = __stringify(_name), .mode = 0444 }, \ + .show = module_##_name##_show, \ +} +#define __ATTR_RW(_name) __ATTR(_name, 0644, module_##_name##_show, module_##_name##_store) +#define __ATTR_WO(_name) { \ + .attr = { .name = __stringify(_name), .mode = 0200 }, \ + .store = module_##_name##_store, \ +} + +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \ +static ssize_t module_ ## _name ## _store(struct device *dev, \ + struct device_attribute *attr, \ + const char *buf, size_t len) \ +{ \ + ssize_t __ret; \ + if (!try_module_get(THIS_MODULE)) \ + return -ENODEV; \ + __ret = _name ## _store(dev, attr, buf, len); \ + module_put(THIS_MODULE); \ + return __ret; \ +} + +#define MODULE_DEVICE_ATTR_FUNC_SHOW(_name) \ +static ssize_t module_ ## _name ## _show(struct device *dev, \ + struct device_attribute *attr, \ + char *buf) \ +{ \ + ssize_t __ret; \ + if (!try_module_get(THIS_MODULE)) \ + return -ENODEV; \ + __ret = _name ## _show(dev, attr, buf); \ + module_put(THIS_MODULE); \ + return __ret; \ +} + +#define MODULE_DEVICE_ATTR_WO(_name) \ +MODULE_DEVICE_ATTR_FUNC_STORE(_name); \ +static DEVICE_ATTR_WO(_name) + +#define MODULE_DEVICE_ATTR_RW(_name) \ +MODULE_DEVICE_ATTR_FUNC_STORE(_name); \ +MODULE_DEVICE_ATTR_FUNC_SHOW(_name); \ +static DEVICE_ATTR_RW(_name) + +#define MODULE_DEVICE_ATTR_RO(_name) \ +MODULE_DEVICE_ATTR_FUNC_SHOW(_name); \ +static DEVICE_ATTR_RO(_name) + static int sysfs_test_major;
/** @@ -311,7 +364,7 @@ static ssize_t config_show(struct device *dev,
return len; } -static DEVICE_ATTR_RO(config); +MODULE_DEVICE_ATTR_RO(config);
static ssize_t reset_store(struct device *dev, struct device_attribute *attr, @@ -336,7 +389,7 @@ static ssize_t reset_store(struct device *dev,
return count; } -static DEVICE_ATTR_WO(reset); +MODULE_DEVICE_ATTR_WO(reset);
static void test_dev_busy_alloc(struct sysfs_test_device *test_dev) { @@ -388,7 +441,7 @@ static ssize_t test_dev_x_show(struct device *dev,
return ret; } -static DEVICE_ATTR_RW(test_dev_x); +MODULE_DEVICE_ATTR_RW(test_dev_x);
static ssize_t test_dev_y_store(struct device *dev, struct device_attribute *attr, @@ -432,7 +485,7 @@ static ssize_t test_dev_y_show(struct device *dev,
return ret; } -static DEVICE_ATTR_RW(test_dev_y); +MODULE_DEVICE_ATTR_RW(test_dev_y);
static ssize_t config_enable_lock_store(struct device *dev, struct device_attribute *attr, @@ -477,7 +530,7 @@ static ssize_t config_enable_lock_show(struct device *dev,
return ret; } -static DEVICE_ATTR_RW(config_enable_lock); +MODULE_DEVICE_ATTR_RW(config_enable_lock);
static ssize_t config_enable_lock_on_rmmod_store(struct device *dev, struct device_attribute *attr, @@ -519,7 +572,7 @@ static ssize_t config_enable_lock_on_rmmod_show(struct device *dev,
return ret; } -static DEVICE_ATTR_RW(config_enable_lock_on_rmmod); +MODULE_DEVICE_ATTR_RW(config_enable_lock_on_rmmod);
static ssize_t config_use_rtnl_lock_store(struct device *dev, struct device_attribute *attr, @@ -558,7 +611,7 @@ static ssize_t config_use_rtnl_lock_show(struct device *dev,
return snprintf(buf, PAGE_SIZE, "%d\n", config->use_rtnl_lock); } -static DEVICE_ATTR_RW(config_use_rtnl_lock); +MODULE_DEVICE_ATTR_RW(config_use_rtnl_lock);
static ssize_t config_write_delay_msec_y_store(struct device *dev, struct device_attribute *attr, @@ -592,7 +645,7 @@ static ssize_t config_write_delay_msec_y_show(struct device *dev,
return snprintf(buf, PAGE_SIZE, "%d\n", config->write_delay_msec_y); } -static DEVICE_ATTR_RW(config_write_delay_msec_y); +MODULE_DEVICE_ATTR_RW(config_write_delay_msec_y);
static ssize_t config_enable_busy_alloc_store(struct device *dev, struct device_attribute *attr, @@ -626,7 +679,7 @@ static ssize_t config_enable_busy_alloc_show(struct device *dev,
return snprintf(buf, PAGE_SIZE, "%d\n", config->enable_busy_alloc); } -static DEVICE_ATTR_RW(config_enable_busy_alloc); +MODULE_DEVICE_ATTR_RW(config_enable_busy_alloc);
#define TEST_SYSFS_DEV_ATTR(name) (&dev_attr_##name.attr)
diff --git a/tools/testing/selftests/sysfs/sysfs.sh b/tools/testing/selftests/sysfs/sysfs.sh index f27ea61e0e95..2de9f37cb00b 100755 --- a/tools/testing/selftests/sysfs/sysfs.sh +++ b/tools/testing/selftests/sysfs/sysfs.sh @@ -60,8 +60,8 @@ ALL_TESTS="$ALL_TESTS 0023:1:1:test_dev_y:block" ALL_TESTS="$ALL_TESTS 0024:1:1:test_dev_x:block" ALL_TESTS="$ALL_TESTS 0025:1:1:test_dev_y:block" ALL_TESTS="$ALL_TESTS 0026:1:1:test_dev_y:block" -ALL_TESTS="$ALL_TESTS 0027:1:0:test_dev_x:block" # deadlock test -ALL_TESTS="$ALL_TESTS 0028:1:0:test_dev_x:block" # deadlock test with rntl_lock +ALL_TESTS="$ALL_TESTS 0027:1:1:test_dev_x:block" # deadlock test +ALL_TESTS="$ALL_TESTS 0028:1:1:test_dev_x:block" # deadlock test with rntl_lock ALL_TESTS="$ALL_TESTS 0029:1:1:test_dev_x:block" # kernfs race removal of store ALL_TESTS="$ALL_TESTS 0030:1:1:test_dev_x:block" # kernfs race removal before mutex ALL_TESTS="$ALL_TESTS 0031:1:1:test_dev_x:block" # kernfs race removal after mutex
On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
+#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \ +static ssize_t module_ ## _name ## _store(struct device *dev, \
struct device_attribute *attr, \
const char *buf, size_t len) \
+{ \
- ssize_t __ret; \
- if (!try_module_get(THIS_MODULE)) \
return -ENODEV; \
- __ret = _name ## _store(dev, attr, buf, len); \
- module_put(THIS_MODULE); \
- return __ret; \
+}
As I have pointed out before, doing try_module_get(THIS_MODULE) is racy and should not be added back to the kernel tree. We got rid of many instances of this "bad pattern" over the years, please do not encourage it to be added back as others will somehow think that it correct code.
I'll go over the rest of this after 5.14-rc1 is out, am busy until then.
thanks,
greg k-h
On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
+#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \ +static ssize_t module_ ## _name ## _store(struct device *dev, \
struct device_attribute *attr, \
const char *buf, size_t len) \
+{ \
- ssize_t __ret; \
- if (!try_module_get(THIS_MODULE)) \
return -ENODEV; \
- __ret = _name ## _store(dev, attr, buf, len); \
- module_put(THIS_MODULE); \
- return __ret; \
+}
As I have pointed out before, doing try_module_get(THIS_MODULE) is racy and should not be added back to the kernel tree. We got rid of many instances of this "bad pattern" over the years, please do not encourage it to be added back as others will somehow think that it correct code.
It is noted this is used in lieu of any agreed upon solution to *demonstrate* how this at least does fix it. In this case (and in the generic solution I also had suggested for kernfs a while ago), if the try fails, we give up. If it succeeds, we now know we can rely on the device pointer. If the refcount succeeds, can the module still not be present? Is try_module_get() racy in that way? In what way is it racy and where is this documented? Do we have a selftest to prove the race?
Luis
On Sat, Jul 03, 2021 at 10:28:28AM -0700, Luis Chamberlain wrote:
On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
+#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \ +static ssize_t module_ ## _name ## _store(struct device *dev, \
struct device_attribute *attr, \
const char *buf, size_t len) \
+{ \
- ssize_t __ret; \
- if (!try_module_get(THIS_MODULE)) \
return -ENODEV; \
- __ret = _name ## _store(dev, attr, buf, len); \
- module_put(THIS_MODULE); \
- return __ret; \
+}
As I have pointed out before, doing try_module_get(THIS_MODULE) is racy and should not be added back to the kernel tree. We got rid of many instances of this "bad pattern" over the years, please do not encourage it to be added back as others will somehow think that it correct code.
It is noted this is used in lieu of any agreed upon solution to *demonstrate* how this at least does fix it. In this case (and in the generic solution I also had suggested for kernfs a while ago), if the try fails, we give up. If it succeeds, we now know we can rely on the device pointer. If the refcount succeeds, can the module still not be present? Is try_module_get() racy in that way? In what way is it racy and where is this documented? Do we have a selftest to prove the race?
As I say in the other email where you tried to add this, think about what happens if the module is removed _right before_ you make this call.
Or a few instructions before that. The race is still there, this fixes nothing except make the window smaller.
thanks,
greg k-h
On Wed, Jul 21, 2021 at 01:33:54PM +0200, Greg KH wrote:
On Sat, Jul 03, 2021 at 10:28:28AM -0700, Luis Chamberlain wrote:
On Sat, Jul 03, 2021 at 06:49:46AM +0200, Greg KH wrote:
On Fri, Jul 02, 2021 at 05:46:32PM -0700, Luis Chamberlain wrote:
+#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \ +static ssize_t module_ ## _name ## _store(struct device *dev, \
struct device_attribute *attr, \
const char *buf, size_t len) \
+{ \
- ssize_t __ret; \
- if (!try_module_get(THIS_MODULE)) \
return -ENODEV; \
- __ret = _name ## _store(dev, attr, buf, len); \
- module_put(THIS_MODULE); \
- return __ret; \
+}
As I have pointed out before, doing try_module_get(THIS_MODULE) is racy and should not be added back to the kernel tree. We got rid of many instances of this "bad pattern" over the years, please do not encourage it to be added back as others will somehow think that it correct code.
It is noted this is used in lieu of any agreed upon solution to *demonstrate* how this at least does fix it. In this case (and in the generic solution I also had suggested for kernfs a while ago), if the try fails, we give up. If it succeeds, we now know we can rely on the device pointer. If the refcount succeeds, can the module still not be present? Is try_module_get() racy in that way? In what way is it racy and where is this documented? Do we have a selftest to prove the race?
As I say in the other email where you tried to add this, think about what happens if the module is removed _right before_ you make this call.
Or a few instructions before that. The race is still there, this fixes nothing except make the window smaller.
The kernfs active reference ensures that if the file is open the module must still exist. As such, the use within sysfs files should be safe as the module is the one in charge of removing the files.
Luis
linux-kselftest-mirror@lists.linaro.org