There are currently two ways in which ublk server exit is detected by
ublk_drv:
1. uring_cmd cancellation. If there are any outstanding uring_cmds which
have not been completed to the ublk server when it exits, io_uring
calls the uring_cmd callback with a special cancellation flag as the
issuing task is exiting.
2. I/O timeout. This is needed in addition to the above to handle the
"saturated queue" case, when all I/Os for a given queue are in the
ublk server, and therefore there are no outstanding uring_cmds to
cancel when the ublk server exits.
There are a couple of issues with this approach:
- It is complex and inelegant to have two methods to detect the same
condition
- The second method detects ublk server exit only after a long delay
(~30s, the default timeout assigned by the block layer). This delays
the nosrv behavior from kicking in and potential subsequent recovery
of the device.
The second issue is brought to light with the new test_generic_04. It
fails before this fix:
selftests: ublk: test_generic_04.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 30.0611 s, 0.0 kB/s
DEAD
dd took 31 seconds to exit (>= 5s tolerance)!
generic_04 : [FAIL]
Fix this by instead detecting and handling ublk server exit in the
character file release callback. This has several advantages:
- This one place can handle both saturated and unsaturated queues. Thus,
it replaces both preexisting methods of detecting ublk server exit.
- It runs quickly on ublk server exit - there is no 30s delay.
- It starts the process of removing task references in ublk_drv. This is
needed if we want to relax restrictions in the driver like letting
only one thread serve each queue
There is also the disadvantage that the character file release callback
can also be triggered by intentional close of the file, which is a
significant behavior change. Preexisting ublk servers (libublksrv) are
dependent on the ability to open/close the file multiple times. To
address this, only transition to a nosrv state if the file is released
while the ublk device is live. This allows for programs to open/close
the file multiple times during setup. It is still a behavior change if a
ublk server decides to close/reopen the file while the device is LIVE
(i.e. while it is responsible for serving I/O), but that would be highly
unusual. This behavior is in line with what is done by FUSE, which is
very similar to ublk in that a userspace daemon is providing services
traditionally provided by the kernel.
With this change in, the new test (and all other selftests, and all
ublksrv tests) pass:
selftests: ublk: test_generic_04.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.0376731 s, 0.0 kB/s
DEAD
generic_04 : [PASS]
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
---
Changes in v3:
- Quiesce queue earlier to avoid concurrent cancellation and "normal"
completion of io_uring cmds (Ming Lei)
- Fix del_gendisk hang, found by test_stress_02
- Remove unnecessary parameters in fault_inject target (Ming Lei)
- Fix delay implementation to have separate per-I/O delay instead of
blocking the whole thread (Ming Lei)
- Add delay_us to docs
- Link to v2: https://lore.kernel.org/r/20250402-ublk_timeout-v2-1-249bc5523000@purestora…
Changes in v2:
- Leave null ublk selftests target untouched, instead create new
fault_inject target for injecting per-I/O delay (Ming Lei)
- Allow multiple open/close of ublk character device with some
restrictions
- Drop patches which made it in separately at https://lore.kernel.org/r/20250401-ublk_selftests-v1-1-98129c9bc8bb@puresto…
- Consolidate more nosrv logic in ublk character device release, and
associated code cleanup
- Link to v1: https://lore.kernel.org/r/20250325-ublk_timeout-v1-0-262f0121a7bd@purestora…
---
drivers/block/ublk_drv.c | 228 +++++++++---------------
tools/testing/selftests/ublk/Makefile | 4 +-
tools/testing/selftests/ublk/fault_inject.c | 72 ++++++++
tools/testing/selftests/ublk/kublk.c | 6 +-
tools/testing/selftests/ublk/kublk.h | 4 +
tools/testing/selftests/ublk/test_generic_04.sh | 43 +++++
6 files changed, 215 insertions(+), 142 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 2fd05c1bd30b03343cb6f357f8c08dd92ff47af9..73baa9d22ccafb00723defa755a0b3aab7238934 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -162,7 +162,6 @@ struct ublk_queue {
bool force_abort;
bool timeout;
- bool canceling;
bool fail_io; /* copy of dev->state == UBLK_S_DEV_FAIL_IO */
unsigned short nr_io_ready; /* how many ios setup */
spinlock_t cancel_lock;
@@ -199,8 +198,6 @@ struct ublk_device {
struct completion completion;
unsigned int nr_queues_ready;
unsigned int nr_privileged_daemon;
-
- struct work_struct nosrv_work;
};
/* header of ublk_params */
@@ -209,8 +206,9 @@ struct ublk_params_header {
__u32 types;
};
-static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq);
-
+static void ublk_stop_dev_unlocked(struct ublk_device *ub);
+static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq);
+static void __ublk_quiesce_dev(struct ublk_device *ub);
static inline struct request *__ublk_check_and_get_req(struct ublk_device *ub,
struct ublk_queue *ubq, int tag, size_t offset);
static inline unsigned int ublk_req_build_flags(struct request *req);
@@ -1314,8 +1312,6 @@ static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
static enum blk_eh_timer_return ublk_timeout(struct request *rq)
{
struct ublk_queue *ubq = rq->mq_hctx->driver_data;
- unsigned int nr_inflight = 0;
- int i;
if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
if (!ubq->timeout) {
@@ -1326,26 +1322,6 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
return BLK_EH_DONE;
}
- if (!ubq_daemon_is_dying(ubq))
- return BLK_EH_RESET_TIMER;
-
- for (i = 0; i < ubq->q_depth; i++) {
- struct ublk_io *io = &ubq->ios[i];
-
- if (!(io->flags & UBLK_IO_FLAG_ACTIVE))
- nr_inflight++;
- }
-
- /* cancelable uring_cmd can't help us if all commands are in-flight */
- if (nr_inflight == ubq->q_depth) {
- struct ublk_device *ub = ubq->dev;
-
- if (ublk_abort_requests(ub, ubq)) {
- schedule_work(&ub->nosrv_work);
- }
- return BLK_EH_DONE;
- }
-
return BLK_EH_RESET_TIMER;
}
@@ -1356,19 +1332,16 @@ static blk_status_t ublk_prep_req(struct ublk_queue *ubq, struct request *rq)
if (unlikely(ubq->fail_io))
return BLK_STS_TARGET;
- /* With recovery feature enabled, force_abort is set in
- * ublk_stop_dev() before calling del_gendisk(). We have to
- * abort all requeued and new rqs here to let del_gendisk()
- * move on. Besides, we cannot not call io_uring_cmd_complete_in_task()
- * to avoid UAF on io_uring ctx.
+ /*
+ * force_abort is set in ublk_stop_dev() before calling
+ * del_gendisk(). We have to abort all requeued and new rqs here
+ * to let del_gendisk() move on. Besides, we cannot not call
+ * io_uring_cmd_complete_in_task() to avoid UAF on io_uring ctx.
*
* Note: force_abort is guaranteed to be seen because it is set
* before request queue is unqiuesced.
*/
- if (ublk_nosrv_should_queue_io(ubq) && unlikely(ubq->force_abort))
- return BLK_STS_IOERR;
-
- if (unlikely(ubq->canceling))
+ if (unlikely(ubq->force_abort))
return BLK_STS_IOERR;
/* fill iod to slot in io cmd buffer */
@@ -1391,16 +1364,6 @@ static blk_status_t ublk_queue_rq(struct blk_mq_hw_ctx *hctx,
if (res != BLK_STS_OK)
return res;
- /*
- * ->canceling has to be handled after ->force_abort and ->fail_io
- * is dealt with, otherwise this request may not be failed in case
- * of recovery, and cause hang when deleting disk
- */
- if (unlikely(ubq->canceling)) {
- __ublk_abort_rq(ubq, rq);
- return BLK_STS_OK;
- }
-
ublk_queue_cmd(ubq, rq);
return BLK_STS_OK;
}
@@ -1461,8 +1424,71 @@ static int ublk_ch_open(struct inode *inode, struct file *filp)
static int ublk_ch_release(struct inode *inode, struct file *filp)
{
struct ublk_device *ub = filp->private_data;
+ int i;
+
+ mutex_lock(&ub->mutex);
+ /*
+ * If the device is not live, we will not transition to a nosrv
+ * state. This protects against:
+ * - accidental poking of the ublk character device
+ * - some ublk servers which may open/close the ublk character
+ * device during startup
+ */
+ if (ub->dev_info.state != UBLK_S_DEV_LIVE)
+ goto out;
+
+ /*
+ * Since we are releasing the ublk character file descriptor, we
+ * know that there cannot be any concurrent file-related
+ * activity (e.g. uring_cmds or reads/writes). However, I/O
+ * might still be getting dispatched. Quiesce that too so that
+ * we don't need to worry about anything concurrent.
+ *
+ * We may have already quiesced the queue if we canceled any
+ * uring_cmds, so only quiesce if necessary (quiesce is not
+ * idempotent, it has an internal counter which we need to
+ * manage carefully).
+ */
+ if (!blk_queue_quiesced(ub->ub_disk->queue))
+ blk_mq_quiesce_queue(ub->ub_disk->queue);
+
+ /*
+ * Handle any requests outstanding to the ublk server
+ */
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_abort_queue(ub, ublk_get_queue(ub, i));
+ /*
+ * Transition the device to the nosrv state. What exactly this
+ * means depends on the recovery flags
+ */
+ if (ublk_nosrv_should_stop_dev(ub)) {
+ /*
+ * Allow any pending/future I/O to pass through quickly
+ * with an error. This is needed because del_gendisk
+ * waits for all pending I/O to complete
+ */
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_get_queue(ub, i)->force_abort = true;
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+
+ ublk_stop_dev_unlocked(ub);
+ } else {
+ if (ublk_nosrv_dev_should_queue_io(ub)) {
+ __ublk_quiesce_dev(ub);
+ } else {
+ ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_get_queue(ub, i)->fail_io = true;
+ }
+
+ /* pair with earlier quiesce */
+ blk_mq_unquiesce_queue(ub->ub_disk->queue);
+ }
+
+out:
clear_bit(UB_STATE_OPEN, &ub->state);
+ mutex_unlock(&ub->mutex);
return 0;
}
@@ -1556,57 +1582,6 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
}
}
-/* Must be called when queue is frozen */
-static bool ublk_mark_queue_canceling(struct ublk_queue *ubq)
-{
- bool canceled;
-
- spin_lock(&ubq->cancel_lock);
- canceled = ubq->canceling;
- if (!canceled)
- ubq->canceling = true;
- spin_unlock(&ubq->cancel_lock);
-
- return canceled;
-}
-
-static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
-{
- bool was_canceled = ubq->canceling;
- struct gendisk *disk;
-
- if (was_canceled)
- return false;
-
- spin_lock(&ub->lock);
- disk = ub->ub_disk;
- if (disk)
- get_device(disk_to_dev(disk));
- spin_unlock(&ub->lock);
-
- /* Our disk has been dead */
- if (!disk)
- return false;
-
- /*
- * Now we are serialized with ublk_queue_rq()
- *
- * Make sure that ubq->canceling is set when queue is frozen,
- * because ublk_queue_rq() has to rely on this flag for avoiding to
- * touch completed uring_cmd
- */
- blk_mq_quiesce_queue(disk->queue);
- was_canceled = ublk_mark_queue_canceling(ubq);
- if (!was_canceled) {
- /* abort queue is for making forward progress */
- ublk_abort_queue(ub, ubq);
- }
- blk_mq_unquiesce_queue(disk->queue);
- put_device(disk_to_dev(disk));
-
- return !was_canceled;
-}
-
static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
unsigned int issue_flags)
{
@@ -1634,9 +1609,8 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
{
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
struct ublk_queue *ubq = pdu->ubq;
+ struct ublk_device *ub = ubq->dev;
struct task_struct *task;
- struct ublk_device *ub;
- bool need_schedule;
struct ublk_io *io;
if (WARN_ON_ONCE(!ubq))
@@ -1649,16 +1623,20 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
if (WARN_ON_ONCE(task && task != ubq->ubq_daemon))
return;
- ub = ubq->dev;
- need_schedule = ublk_abort_requests(ub, ubq);
+ /*
+ * We could be the first to notice that the ublk server is dying
+ * here. If we are, quiesce the queue to eliminate concurrent
+ * "normal" io_uring cmd completions in the I/O submission path.
+ */
+ mutex_lock(&ub->mutex);
+ if (ub->dev_info.state == UBLK_S_DEV_LIVE &&
+ !blk_queue_quiesced(ub->ub_disk->queue))
+ blk_mq_quiesce_queue(ub->ub_disk->queue);
+ mutex_unlock(&ub->mutex);
io = &ubq->ios[pdu->tag];
WARN_ON_ONCE(io->cmd != cmd);
ublk_cancel_cmd(ubq, io, issue_flags);
-
- if (need_schedule) {
- schedule_work(&ub->nosrv_work);
- }
}
static inline bool ublk_queue_ready(struct ublk_queue *ubq)
@@ -1756,13 +1734,13 @@ static struct gendisk *ublk_detach_disk(struct ublk_device *ub)
return disk;
}
-static void ublk_stop_dev(struct ublk_device *ub)
+static void ublk_stop_dev_unlocked(struct ublk_device *ub)
+ __must_hold(&ub->mutex)
{
struct gendisk *disk;
- mutex_lock(&ub->mutex);
if (ub->dev_info.state == UBLK_S_DEV_DEAD)
- goto unlock;
+ return;
if (ublk_nosrv_dev_should_queue_io(ub)) {
if (ub->dev_info.state == UBLK_S_DEV_LIVE)
__ublk_quiesce_dev(ub);
@@ -1771,38 +1749,12 @@ static void ublk_stop_dev(struct ublk_device *ub)
del_gendisk(ub->ub_disk);
disk = ublk_detach_disk(ub);
put_disk(disk);
- unlock:
- mutex_unlock(&ub->mutex);
- ublk_cancel_dev(ub);
}
-static void ublk_nosrv_work(struct work_struct *work)
+static void ublk_stop_dev(struct ublk_device *ub)
{
- struct ublk_device *ub =
- container_of(work, struct ublk_device, nosrv_work);
- int i;
-
- if (ublk_nosrv_should_stop_dev(ub)) {
- ublk_stop_dev(ub);
- return;
- }
-
mutex_lock(&ub->mutex);
- if (ub->dev_info.state != UBLK_S_DEV_LIVE)
- goto unlock;
-
- if (ublk_nosrv_dev_should_queue_io(ub)) {
- __ublk_quiesce_dev(ub);
- } else {
- blk_mq_quiesce_queue(ub->ub_disk->queue);
- ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
- for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
- ublk_get_queue(ub, i)->fail_io = true;
- }
- blk_mq_unquiesce_queue(ub->ub_disk->queue);
- }
-
- unlock:
+ ublk_stop_dev_unlocked(ub);
mutex_unlock(&ub->mutex);
ublk_cancel_dev(ub);
}
@@ -2388,7 +2340,6 @@ static void ublk_remove(struct ublk_device *ub)
bool unprivileged;
ublk_stop_dev(ub);
- cancel_work_sync(&ub->nosrv_work);
cdev_device_del(&ub->cdev, &ub->cdev_dev);
unprivileged = ub->dev_info.flags & UBLK_F_UNPRIVILEGED_DEV;
ublk_put_device(ub);
@@ -2675,7 +2626,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
goto out_unlock;
mutex_init(&ub->mutex);
spin_lock_init(&ub->lock);
- INIT_WORK(&ub->nosrv_work, ublk_nosrv_work);
ret = ublk_alloc_dev_number(ub, header->dev_id);
if (ret < 0)
@@ -2807,7 +2757,6 @@ static inline void ublk_ctrl_cmd_dump(struct io_uring_cmd *cmd)
static int ublk_ctrl_stop_dev(struct ublk_device *ub)
{
ublk_stop_dev(ub);
- cancel_work_sync(&ub->nosrv_work);
return 0;
}
@@ -2927,7 +2876,6 @@ static void ublk_queue_reinit(struct ublk_device *ub, struct ublk_queue *ubq)
/* We have to reset it to NULL, otherwise ub won't accept new FETCH_REQ */
ubq->ubq_daemon = NULL;
ubq->timeout = false;
- ubq->canceling = false;
for (i = 0; i < ubq->q_depth; i++) {
struct ublk_io *io = &ubq->ios[i];
diff --git a/tools/testing/selftests/ublk/Makefile b/tools/testing/selftests/ublk/Makefile
index c7781efea0f33c02f340f90f547d3a37c1d1b8a0..afee027cccdd1b8f13f1cb9a90a3348cd54b18bc 100644
--- a/tools/testing/selftests/ublk/Makefile
+++ b/tools/testing/selftests/ublk/Makefile
@@ -6,6 +6,7 @@ LDLIBS += -lpthread -lm -luring
TEST_PROGS := test_generic_01.sh
TEST_PROGS += test_generic_02.sh
TEST_PROGS += test_generic_03.sh
+TEST_PROGS += test_generic_04.sh
TEST_PROGS += test_null_01.sh
TEST_PROGS += test_null_02.sh
@@ -26,7 +27,8 @@ TEST_GEN_PROGS_EXTENDED = kublk
include ../lib.mk
-$(TEST_GEN_PROGS_EXTENDED): kublk.c null.c file_backed.c common.c stripe.c
+$(TEST_GEN_PROGS_EXTENDED): kublk.c null.c file_backed.c common.c stripe.c \
+ fault_inject.c
check:
shellcheck -x -f gcc *.sh
diff --git a/tools/testing/selftests/ublk/fault_inject.c b/tools/testing/selftests/ublk/fault_inject.c
new file mode 100644
index 0000000000000000000000000000000000000000..3a8574e6a73767b1f9d0d81c62c7dbf28d2445d0
--- /dev/null
+++ b/tools/testing/selftests/ublk/fault_inject.c
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Fault injection ublk target. Hack this up however you like for
+ * testing specific behaviors of ublk_drv. Currently is a null target
+ * with a configurable delay before completing each I/O. This delay can
+ * be used to test ublk_drv's handling of I/O outstanding to the ublk
+ * server when it dies.
+ */
+
+#include "kublk.h"
+
+static int ublk_fault_inject_tgt_init(const struct dev_ctx *ctx,
+ struct ublk_dev *dev)
+{
+ const struct ublksrv_ctrl_dev_info *info = &dev->dev_info;
+ unsigned long dev_size = 250UL << 30;
+
+ dev->tgt.dev_size = dev_size;
+ dev->tgt.params = (struct ublk_params) {
+ .types = UBLK_PARAM_TYPE_BASIC,
+ .basic = {
+ .logical_bs_shift = 9,
+ .physical_bs_shift = 12,
+ .io_opt_shift = 12,
+ .io_min_shift = 9,
+ .max_sectors = info->max_io_buf_bytes >> 9,
+ .dev_sectors = dev_size >> 9,
+ },
+ };
+
+ dev->private_data = (void *)(ctx->delay_us * 1000);
+ return 0;
+}
+
+static int ublk_fault_inject_queue_io(struct ublk_queue *q, int tag)
+{
+ const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+ struct io_uring_sqe *sqe;
+ struct __kernel_timespec ts = {
+ .tv_nsec = (long long)q->dev->private_data,
+ };
+
+ ublk_queue_alloc_sqes(q, &sqe, 1);
+ io_uring_prep_timeout(sqe, &ts, 1, 0);
+ sqe->user_data = build_user_data(tag, ublksrv_get_op(iod), 0, 1);
+
+ ublk_queued_tgt_io(q, tag, 1);
+
+ return 0;
+}
+
+static void ublk_fault_inject_tgt_io_done(struct ublk_queue *q, int tag,
+ const struct io_uring_cqe *cqe)
+{
+ const struct ublksrv_io_desc *iod = ublk_get_iod(q, tag);
+
+ if (cqe->res != -ETIME)
+ ublk_err("%s: unexpected cqe res %d\n", __func__, cqe->res);
+
+ if (ublk_completed_tgt_io(q, tag))
+ ublk_complete_io(q, tag, iod->nr_sectors << 9);
+ else
+ ublk_err("%s: io not complete after 1 cqe\n", __func__);
+}
+
+const struct ublk_tgt_ops fault_inject_tgt_ops = {
+ .name = "fault_inject",
+ .init_tgt = ublk_fault_inject_tgt_init,
+ .queue_io = ublk_fault_inject_queue_io,
+ .tgt_io_done = ublk_fault_inject_tgt_io_done,
+};
diff --git a/tools/testing/selftests/ublk/kublk.c b/tools/testing/selftests/ublk/kublk.c
index 91c282bc767449a418cce7fc816dc8e9fc732d6a..b741d91b2288b19d450ad22a045b014da18c3f8d 100644
--- a/tools/testing/selftests/ublk/kublk.c
+++ b/tools/testing/selftests/ublk/kublk.c
@@ -10,6 +10,7 @@ static const struct ublk_tgt_ops *tgt_ops_list[] = {
&null_tgt_ops,
&loop_tgt_ops,
&stripe_tgt_ops,
+ &fault_inject_tgt_ops,
};
static const struct ublk_tgt_ops *ublk_find_tgt(const char *name)
@@ -1041,7 +1042,7 @@ static int cmd_dev_get_features(void)
static int cmd_dev_help(char *exe)
{
- printf("%s add -t [null|loop] [-q nr_queues] [-d depth] [-n dev_id] [backfile1] [backfile2] ...\n", exe);
+ printf("%s add -t [null|loop|stripe|fault_inject] [-q nr_queues] [-d depth] [-n dev_id] [--delay_us delay] [backfile1] [backfile2] ...\n", exe);
printf("\t default: nr_queues=2(max 4), depth=128(max 128), dev_id=-1(auto allocation)\n");
printf("%s del [-n dev_id] -a \n", exe);
printf("\t -a delete all devices -n delete specified device\n");
@@ -1064,6 +1065,7 @@ int main(int argc, char *argv[])
{ "zero_copy", 0, NULL, 'z' },
{ "foreground", 0, NULL, 0 },
{ "chunk_size", 1, NULL, 0 },
+ { "delay_us", 1, NULL, 0 },
{ 0, 0, 0, 0 }
};
int option_idx, opt;
@@ -1112,6 +1114,8 @@ int main(int argc, char *argv[])
ctx.fg = 1;
if (!strcmp(longopts[option_idx].name, "chunk_size"))
ctx.chunk_size = strtol(optarg, NULL, 10);
+ if (!strcmp(longopts[option_idx].name, "delay_us"))
+ ctx.delay_us = strtoll(optarg, NULL, 10);
}
}
diff --git a/tools/testing/selftests/ublk/kublk.h b/tools/testing/selftests/ublk/kublk.h
index 760ff8ffb8107037a19a8fb7ab408818845e010d..a1a8a802fb43f0fe9272f33c8a3161e9316a5507 100644
--- a/tools/testing/selftests/ublk/kublk.h
+++ b/tools/testing/selftests/ublk/kublk.h
@@ -70,6 +70,9 @@ struct dev_ctx {
/* stripe */
unsigned int chunk_size;
+ /* fault_inject */
+ long long delay_us;
+
int _evtfd;
};
@@ -357,6 +360,7 @@ static inline int ublk_queue_use_zc(const struct ublk_queue *q)
extern const struct ublk_tgt_ops null_tgt_ops;
extern const struct ublk_tgt_ops loop_tgt_ops;
extern const struct ublk_tgt_ops stripe_tgt_ops;
+extern const struct ublk_tgt_ops fault_inject_tgt_ops;
void backing_file_tgt_deinit(struct ublk_dev *dev);
int backing_file_tgt_init(struct ublk_dev *dev);
diff --git a/tools/testing/selftests/ublk/test_generic_04.sh b/tools/testing/selftests/ublk/test_generic_04.sh
new file mode 100755
index 0000000000000000000000000000000000000000..48af48164aa444d8ac6a58fef1743d2a16a56a14
--- /dev/null
+++ b/tools/testing/selftests/ublk/test_generic_04.sh
@@ -0,0 +1,43 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+. "$(cd "$(dirname "$0")" && pwd)"/test_common.sh
+
+TID="generic_04"
+ERR_CODE=0
+
+_prep_test "fault_inject" "fast cleanup when all I/Os of one hctx are in server"
+
+# configure ublk server to sleep 2s before completing each I/O
+dev_id=$(_add_ublk_dev -t fault_inject -q 2 -d 1 --delay_us 2000000)
+_check_add_dev $TID $?
+
+echo "dev id is ${dev_id}"
+
+STARTTIME=${SECONDS}
+
+dd if=/dev/urandom of=/dev/ublkb${dev_id} oflag=direct bs=4k count=1 &
+dd_pid=$!
+
+__ublk_kill_daemon ${dev_id} "DEAD"
+
+wait $dd_pid
+dd_exitcode=$?
+
+ENDTIME=${SECONDS}
+ELAPSED=$(($ENDTIME - $STARTTIME))
+
+# assert that dd sees an error and exits quickly after ublk server is
+# killed. previously this relied on seeing an I/O timeout and so would
+# take ~30s
+if [ $dd_exitcode -eq 0 ]; then
+ echo "dd unexpectedly exited successfully!"
+ ERR_CODE=255
+fi
+if [ $ELAPSED -ge 5 ]; then
+ echo "dd took $ELAPSED seconds to exit (>= 5s tolerance)!"
+ ERR_CODE=255
+fi
+
+_cleanup_test "fault_inject"
+_show_result $TID $ERR_CODE
---
base-commit: 710e2c687a16b28a873a282517a85faf02a8b7cc
change-id: 20250325-ublk_timeout-b06b9b51c591
Best regards,
--
Uday Shankar <ushankar(a)purestorage.com>
A hds-thresh value is not set correctly if input value is 0.
The cause is that ethtool_ringparam_get_cfg(), which is a internal
function that returns ringparameters from both ->get_ringparam() and
dev->cfg can't return a correct hds-thresh value.
The first patch fixes ethtool_ringparam_get_cfg() to set hds-thresh
value correcltly.
The second patch adds random test for hds-thresh value.
So that we can test 0 value for a hds-thresh properly.
v2:
- Skips set_hds_thresh_random test when hds-thresh-max value is too
small. (2/2)
- Change random range from 1-MAX to 1-(MAX-1). (2/2)
Taehee Yoo (2):
net: ethtool: fix ethtool_ringparam_get_cfg() returns a hds_thresh
value always as 0.
selftests: drv-net: test random value for hds-thresh
net/ethtool/common.c | 1 +
tools/testing/selftests/drivers/net/hds.py | 33 +++++++++++++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)
--
2.34.1
Bingbu reported an issue in [1] that udmabuf vmap failed and in [2], we
discussed the scenario of folio vmap due to the misuse of vmap_pfn
in udmabuf.
We reached the conclusion that vmap_pfn prohibits the use of page-based
PFNs:
Christoph Hellwig : 'No, vmap_pfn is entirely for memory not backed by
pages or folios, i.e. PCIe BARs and similar memory. This must not be
mixed with proper folio backed memory.'
But udmabuf still need consider HVO based folio's vmap, and need fix
vmap issue. This RFC code want to show the two point that I mentioned
in [2], and more deep talk it:
Point1. simple copy vmap_pfn code, don't bother common vmap_pfn, use by
itself and remove pfn_valid check.
Point2. implement folio array based vmap(vmap_folios), which can given a
range of each folio(offset, nr_pages), so can suit HVO folio's vmap.
Patch 1-2 implement point1, and add a test simple set in udmabuf driver.
Patch 3-5 implement point2, also can test it.
Kasireddy also show that 'another option is to just limit udmabuf's vmap()
to only shmem folios'(This I guess folio_test_hugetlb_vmemmap_optimized
can help.)
But I prefer point2 to solution this issue, and IMO, folio based vmap still
need.
Compare to page based vmap(or pfn based), we need split each large folio
into single page struct, this need more large array struct and more longer
iter. If each tail page struct not exist(like HVO), can only use pfn vmap,
but there are no common api to do this.
In [2], we talked that udmabuf can use hugetlb as the memory
provider, and can give a range use. So if HVO used in hugetlb, each folio's
tail page may freed, so we can't use page based vmap, only can use pfn
based, which show in point1.
Further more, Folio based vmap only need record each folio(and offset,
nr_pages if range need). For 20MB vmap, page based need 5120 pages(40KB),
2MB folios only need 10 folio(80Byte).
Matthew show that Vishal also offered a folio based vmap - vmap_file[3].
This RFC patch want a range based folio, not only a full folio's map(like
file's folio), to resolve some problem like HVO's range folio vmap.
Please give me more suggestion.
Test case:
//enable/disable HVO
1. echo [1|0] > /proc/sys/vm/hugetlb_optimize_vmemmap
//prepare HUGETLB
2. echo 10 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
3. ./udmabuf_vmap
4. check output, and dmesg if any warn.
[1] https://lore.kernel.org/all/9172a601-c360-0d5b-ba1b-33deba430455@linux.inte…
[2] https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/
[3] https://lore.kernel.org/linux-mm/20250131001806.92349-1-vishal.moola@gmail.…
Huan Yang (6):
udmabuf: try fix udmabuf vmap
udmabuf: try udmabuf vmap test
mm/vmalloc: try add vmap folios range
udmabuf: use vmap_range_folios
udmabuf: vmap test suit for pages and pfns compare
udmabuf: remove no need code
drivers/dma-buf/udmabuf.c | 29 +++++++++-----------
include/linux/vmalloc.h | 57 +++++++++++++++++++++++++++++++++++++++
mm/vmalloc.c | 47 ++++++++++++++++++++++++++++++++
3 files changed, 117 insertions(+), 16 deletions(-)
--
2.48.1
Hi!
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
https://netdev-3.bots.linux.dev/kunit/results/60103/stdout
It seems like new cases continue to appear and we have to keep adding
them to the local ignored list. Is it possible to get these fixed or
can we exclude the cirrus tests from KUNIT_ALL_TESTS?
If no explicit XARCH is specified, use the toolchains default.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: drop dependency from sysroot to defconfig
selftests/nolibc: only consider XARCH for CFLAGS when requested
tools/testing/selftests/nolibc/Makefile | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
---
base-commit: 9a9b20007ab833c1aa3791efcfdf67e7e3ea8902
change-id: 20250330-nolibc-nolibc-test-native-6d4d84d764eb
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.
The two failed use cases are as follows:
1) memory.low is set to 0, but low events can still be triggered and
so the cgroup may have a non-zero low event count. I doubt users are
looking for that as they didn't set memory.low at all.
2) memory.low is set to a non-zero value but the cgroup has no task in
it so that it has an effective low value of 0. Again it may have a
non-zero low event count if memory reclaim happens. This is probably
not a result expected by the users and it is really doubtful that
users will check an empty cgroup with no task in it and expecting
some non-zero event counts.
The simple and naive fix of changing the operator to ">", however,
changes the memory reclaim behavior which can lead to other failures
as low events are needed to facilitate memory reclaim. So we can't do
that without some relatively riskier changes in memory reclaim.
Another simpler alternative is to avoid reporting below_low failure
if either memory.low or its effective equivalent is 0 which is done
by this patch specifically for the two failed use cases above.
With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.
To be consistent, similar change is appled to mem_cgroup_below_min()
as to avoid the two failed use cases above with low replaced by min.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
include/linux/memcontrol.h | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 53364526d877..4d4a1f159eaa 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -601,21 +601,31 @@ static inline bool mem_cgroup_unprotected(struct mem_cgroup *target,
static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long elow;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.elow) >=
- page_counter_read(&memcg->memory);
+ elow = READ_ONCE(memcg->memory.elow);
+ if (!elow || !READ_ONCE(memcg->memory.low))
+ return false;
+
+ return page_counter_read(&memcg->memory) <= elow;
}
static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long emin;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.emin) >=
- page_counter_read(&memcg->memory);
+ emin = READ_ONCE(memcg->memory.emin);
+ if (!emin || !READ_ONCE(memcg->memory.min))
+ return false;
+
+ return page_counter_read(&memcg->memory) <= emin;
}
int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
--
2.48.1
┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v16:
- remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
because there are better patches, which under review.
- Add document for pcie-ep msi-map usage
- other change to see each patch's change log
About IMMUTABLE (No change for this part, tglx provide feedback)
> - This IMMUTABLE thing serves no purpose, because you don't randomly
> plug this end-point block on any MSI controller. They come as part
> of an SoC.
"Yes and no. The problem is that the EP implementation is meant to be a
generic library and while GIC-ITS guarantees immutability of the
address/data pair after setup, there are architectures (x86, loongson,
riscv) where the base MSI controller does not and immutability is only
achieved when interrupt remapping is enabled. The latter can be disabled
at boot-time and then the EP implementation becomes a lottery across
affinity changes.
That was my concern about this library implementation and that's why I
asked for a mechanism to ensure that the underlying irqdomain provides a
immutable address/data pair.
So it does not matter for GIC-ITS, but in the larger picture it matters.
Thanks,
tglx
"
So it does not matter for GIC-ITS, but in the larger picture it matters.
- Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com
Changes in v15:
- rebase to v6.14-rc1
- fix build issue find by kernel test robot
- Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (15):
platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
dt-bindings: pci: pci-msi: Add support for PCI Endpoint msi-map
irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
PCI: endpoint: Set ID and of_node for function driver
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
pci: imx6: Add helper function imx_pcie_add_lut_by_rid()
pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode
arm64: dts: imx95: Add msi-map for pci-ep device
Documentation/devicetree/bindings/pci/pci-msi.txt | 51 ++++++++
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/base/platform-msi.c | 1 +
drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++
drivers/irqchip/irq-gic-v3-its.c | 2 +-
drivers/misc/pci_endpoint_test.c | 82 ++++++++++++
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Makefile | 1 +
drivers/pci/endpoint/functions/pci-epf-test.c | 142 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 90 +++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 48 +++++++
include/linux/irqdomain.h | 7 +
include/linux/pci-ep-msi.h | 28 ++++
include/linux/pci-epf.h | 21 +++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 28 ++++
16 files changed, 527 insertions(+), 9 deletions(-)
---
base-commit: a4949bd40778aa9beac77c89e4c6a1da52875c8b
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
---
Frank Li <Frank.Li(a)nxp.com>
Currently if the filesystem for the cgroups version it wants to use is
not mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh
tests will attempt to mount it on the hard coded path
/dev/cgroup/memory, deleting that directory when the test finishes. This
will fail if there is not a preexisting directory at that path, and
since the directory is deleted subsequent runs of the test will fail.
Instead of relying on this hard coded directory name use mktemp to
generate a temporary directory to use as a mountpoint, fixing both the
assumption and the disruption caused by deleting a preexisting
directory.
This means that if the relevant cgroup filesystem is not already mounted
then we rely on having coreutils (which provides mktemp) installed. I
suspect that many current users are relying on having things automounted
by default, and given that the script relies on bash it's probably not
an unreasonable requirement.
Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 4 ++--
tools/testing/selftests/mm/hugetlb_reparenting_test.sh | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index 67df7b47087f..e1fe16bcbbe8 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -29,7 +29,7 @@ fi
if [[ $cgroup2 ]]; then
cgroup_path=$(mount -t cgroup2 | head -1 | awk '{print $3}')
if [[ -z "$cgroup_path" ]]; then
- cgroup_path=/dev/cgroup/memory
+ cgroup_path=$(mktemp -d)
mount -t cgroup2 none $cgroup_path
do_umount=1
fi
@@ -37,7 +37,7 @@ if [[ $cgroup2 ]]; then
else
cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk '{print $3}')
if [[ -z "$cgroup_path" ]]; then
- cgroup_path=/dev/cgroup/memory
+ cgroup_path=$(mktemp -d)
mount -t cgroup memory,hugetlb $cgroup_path
do_umount=1
fi
diff --git a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
index 11f9bbe7dc22..0b0d4ba1af27 100755
--- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
@@ -23,7 +23,7 @@ fi
if [[ $cgroup2 ]]; then
CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk '{print $3}')
if [[ -z "$CGROUP_ROOT" ]]; then
- CGROUP_ROOT=/dev/cgroup/memory
+ CGROUP_ROOT=$(mktemp -d)
mount -t cgroup2 none $CGROUP_ROOT
do_umount=1
fi
---
base-commit: a4cda136f021ad44b8b52286aafd613030a6db5f
change-id: 20250403-kselftest-mm-cgroup2-detection-b761fd232f9d
Best regards,
--
Mark Brown <broonie(a)kernel.org>