The patch titled
Subject: mm/damon/core: initialize damo_filter->list from damos_new_filter()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-core-initialize-damo_filter-list-from-damos_new_filter.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/core: initialize damo_filter->list from damos_new_filter()
Date: Sat, 29 Jul 2023 20:37:32 +0000
damos_new_filter() is not initializing the list field of newly allocated
filter object. However, DAMON sysfs interface and DAMON_RECLAIM are not
initializing it after calling damos_new_filter(). As a result, accessing
uninitialized memory is possible. Actually, adding multiple DAMOS filters
via DAMON sysfs interface caused NULL pointer dereferencing. Initialize
the field just after the allocation from damos_new_filter().
Link: https://lkml.kernel.org/r/20230729203733.38949-2-sj@kernel.org
Fixes: 98def236f63c ("mm/damon/core: implement damos filter")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/core.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/damon/core.c~mm-damon-core-initialize-damo_filter-list-from-damos_new_filter
+++ a/mm/damon/core.c
@@ -273,6 +273,7 @@ struct damos_filter *damos_new_filter(en
return NULL;
filter->type = type;
filter->matching = matching;
+ INIT_LIST_HEAD(&filter->list);
return filter;
}
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-core-initialize-damo_filter-list-from-damos_new_filter.patch
mm-damon-core-test-add-a-test-for-damos_new_filter.patch
[ Upstream commit 7dae593cd226a0bca61201cf85ceb9335cf63682 ]
In a couple of situations like
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
return -ENOSPC;
the error is not actually "No space left on device", but "Out of memory".
It is semantically correct to return -ENOMEM in all failed kstrndup()
and kzalloc() cases in this driver, as it is not a problem with disk
space, but with kernel memory allocator failing allocation.
The semantically correct should be:
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
return -ENOMEM;
Cc: Dan Carpenter <error27(a)gmail.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Brian Norris <computersforpeace(a)gmail.com>
Cc: stable(a)vger.kernel.org # 4.19, 4.14
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Fixes: 0a8adf584759c ("test: add firmware_class loader test")
Fixes: eb910947c82f9 ("test: firmware_class: add asynchronous request trigger")
Fixes: 061132d2b9c95 ("test_firmware: add test custom fallback trigger")
Link: https://lore.kernel.org/all/20230606070808.9300-1-mirsad.todorovac@alu.uniz…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is the backport of the patch to 4.19 and 4.14 branches. There are no ]
[ semantic differences in the commit. Backport is provided for completenes sake ]
[ so it would apply to all of the supported LTS kernels ]
---
v1:
no semantic difference, but there is change in code (one less patched instance of -ENOSPC),
so I am submitting the patch for a review.
lib/test_firmware.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index e4688821eab8..b5e779bcfb34 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -160,7 +160,7 @@ static int __kstrncpy(char **dst, const char *name, size_t count, gfp_t gfp)
{
*dst = kstrndup(name, count, gfp);
if (!*dst)
- return -ENOSPC;
+ return -ENOMEM;
return count;
}
@@ -456,7 +456,7 @@ static ssize_t trigger_request_store(struct device *dev,
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
- return -ENOSPC;
+ return -ENOMEM;
pr_info("loading '%s'\n", name);
@@ -497,7 +497,7 @@ static ssize_t trigger_async_request_store(struct device *dev,
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
- return -ENOSPC;
+ return -ENOMEM;
pr_info("loading '%s'\n", name);
@@ -540,7 +540,7 @@ static ssize_t trigger_custom_fallback_store(struct device *dev,
name = kstrndup(buf, count, GFP_KERNEL);
if (!name)
- return -ENOSPC;
+ return -ENOMEM;
pr_info("loading '%s' using custom fallback mechanism\n", name);
--
2.34.1
pl330_pause() does not set anything to indicate paused condition which
causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250
DMA flush after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix
DMA Rx rearm race"). The function comment for pl330_pause() claims
pause is supported but resume is not which is enough for 8250 DMA flush
to work as long as DMA status reports DMA_PAUSED when appropriate.
Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED
in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the
descriptor is PAUSED.
Reported-by: Richard Tresidder <rtresidd(a)electromag.com.au>
Tested-by: Richard Tresidder <rtresidd(a)electromag.com.au>
Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/linux-serial/f8a86ecd-64b1-573f-c2fa-59f541083f1a@e…
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
---
$ diff -u <(git grep -l -e '\.device_pause' -e '->device_pause') <(git grep -l DMA_PAUSED)
...tells there might a few other drivers which do not properly return
DMA_PAUSED status despite having a pause function.
drivers/dma/pl330.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 0d9257fbdfb0..daad25f2c498 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -403,6 +403,12 @@ enum desc_status {
* of a channel can be BUSY at any time.
*/
BUSY,
+ /*
+ * Pause was called while descriptor was BUSY. Due to hardware
+ * limitations, only termination is possible for descriptors
+ * that have been paused.
+ */
+ PAUSED,
/*
* Sitting on the channel work_list but xfer done
* by PL330 core
@@ -2041,7 +2047,7 @@ static inline void fill_queue(struct dma_pl330_chan *pch)
list_for_each_entry(desc, &pch->work_list, node) {
/* If already submitted */
- if (desc->status == BUSY)
+ if (desc->status == BUSY || desc->status == PAUSED)
continue;
ret = pl330_submit_req(pch->thread, desc);
@@ -2326,6 +2332,7 @@ static int pl330_pause(struct dma_chan *chan)
{
struct dma_pl330_chan *pch = to_pchan(chan);
struct pl330_dmac *pl330 = pch->dmac;
+ struct dma_pl330_desc *desc;
unsigned long flags;
pm_runtime_get_sync(pl330->ddma.dev);
@@ -2335,6 +2342,10 @@ static int pl330_pause(struct dma_chan *chan)
_stop(pch->thread);
spin_unlock(&pl330->lock);
+ list_for_each_entry(desc, &pch->work_list, node) {
+ if (desc->status == BUSY)
+ desc->status = PAUSED;
+ }
spin_unlock_irqrestore(&pch->lock, flags);
pm_runtime_mark_last_busy(pl330->ddma.dev);
pm_runtime_put_autosuspend(pl330->ddma.dev);
@@ -2425,7 +2436,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
else if (running && desc == running)
transferred =
pl330_get_current_xferred_count(pch, desc);
- else if (desc->status == BUSY)
+ else if (desc->status == BUSY || desc->status == PAUSED)
/*
* Busy but not running means either just enqueued,
* or finished and not yet marked done
@@ -2442,6 +2453,9 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
case DONE:
ret = DMA_COMPLETE;
break;
+ case PAUSED:
+ ret = DMA_PAUSED;
+ break;
case PREP:
case BUSY:
ret = DMA_IN_PROGRESS;
--
2.30.2
Some architectures do not populate the entire range categorised by
KCORE_TEXT, so we must ensure that the kernel address we read from is
valid.
Unfortunately there is no solution currently available to do so with a
purely iterator solution so reinstate the bounce buffer in this instance so
we can use copy_from_kernel_nofault() in order to avoid page faults when
regions are unmapped.
This change partly reverts commit 2e1c0170771e ("fs/proc/kcore: avoid
bounce buffer for ktext data"), reinstating the bounce buffer, but adapts
the code to continue to use an iterator.
Fixes: 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
Reported-by: Jiri Olsa <olsajiri(a)gmail.com>
Closes: https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava
Cc: stable(a)vger.kernel.org
Signed-off-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
---
fs/proc/kcore.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 9cb32e1a78a0..3bc689038232 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -309,6 +309,8 @@ static void append_kcore_note(char *notes, size_t *i, const char *name,
static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
{
+ struct file *file = iocb->ki_filp;
+ char *buf = file->private_data;
loff_t *fpos = &iocb->ki_pos;
size_t phdrs_offset, notes_offset, data_offset;
size_t page_offline_frozen = 1;
@@ -554,11 +556,22 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
fallthrough;
case KCORE_VMEMMAP:
case KCORE_TEXT:
+ /*
+ * Sadly we must use a bounce buffer here to be able to
+ * make use of copy_from_kernel_nofault(), as these
+ * memory regions might not always be mapped on all
+ * architectures.
+ */
+ if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
+ if (iov_iter_zero(tsz, iter) != tsz) {
+ ret = -EFAULT;
+ goto out;
+ }
/*
* We use _copy_to_iter() to bypass usermode hardening
* which would otherwise prevent this operation.
*/
- if (_copy_to_iter((char *)start, tsz, iter) != tsz) {
+ } else if (_copy_to_iter(buf, tsz, iter) != tsz) {
ret = -EFAULT;
goto out;
}
@@ -595,6 +608,10 @@ static int open_kcore(struct inode *inode, struct file *filp)
if (ret)
return ret;
+ filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!filp->private_data)
+ return -ENOMEM;
+
if (kcore_need_update)
kcore_update_ram();
if (i_size_read(inode) != proc_root_kcore->size) {
@@ -605,9 +622,16 @@ static int open_kcore(struct inode *inode, struct file *filp)
return 0;
}
+static int release_kcore(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+ return 0;
+}
+
static const struct proc_ops kcore_proc_ops = {
.proc_read_iter = read_kcore_iter,
.proc_open = open_kcore,
+ .proc_release = release_kcore,
.proc_lseek = default_llseek,
};
--
2.41.0
The patch titled
Subject: fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-proc-kcore-reinstate-bounce-buffer-for-kcore_text-regions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lstoakes(a)gmail.com>
Subject: fs/proc/kcore: reinstate bounce buffer for KCORE_TEXT regions
Date: Mon, 31 Jul 2023 22:50:21 +0100
Some architectures do not populate the entire range categorised by
KCORE_TEXT, so we must ensure that the kernel address we read from is
valid.
Unfortunately there is no solution currently available to do so with a
purely iterator solution so reinstate the bounce buffer in this instance
so we can use copy_from_kernel_nofault() in order to avoid page faults
when regions are unmapped.
This change partly reverts commit 2e1c0170771e ("fs/proc/kcore: avoid
bounce buffer for ktext data"), reinstating the bounce buffer, but adapts
the code to continue to use an iterator.
Link: https://lkml.kernel.org/r/20230731215021.70911-1-lstoakes@gmail.com
Fixes: 2e1c0170771e ("fs/proc/kcore: avoid bounce buffer for ktext data")
Signed-off-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
Reported-by: Jiri Olsa <olsajiri(a)gmail.com>
Closes: https://lore.kernel.org/all/ZHc2fm+9daF6cgCE@krava
Tested-by: Jiri Olsa <jolsa(a)kernel.org>
Tested-by: Will Deacon <will(a)kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mike Galbraith <efault(a)gmx.de>
Cc: Thorsten Leemhuis <regressions(a)leemhuis.info>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/kcore.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
--- a/fs/proc/kcore.c~fs-proc-kcore-reinstate-bounce-buffer-for-kcore_text-regions
+++ a/fs/proc/kcore.c
@@ -309,6 +309,8 @@ static void append_kcore_note(char *note
static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
{
+ struct file *file = iocb->ki_filp;
+ char *buf = file->private_data;
loff_t *fpos = &iocb->ki_pos;
size_t phdrs_offset, notes_offset, data_offset;
size_t page_offline_frozen = 1;
@@ -555,10 +557,21 @@ static ssize_t read_kcore_iter(struct ki
case KCORE_VMEMMAP:
case KCORE_TEXT:
/*
+ * Sadly we must use a bounce buffer here to be able to
+ * make use of copy_from_kernel_nofault(), as these
+ * memory regions might not always be mapped on all
+ * architectures.
+ */
+ if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
+ if (iov_iter_zero(tsz, iter) != tsz) {
+ ret = -EFAULT;
+ goto out;
+ }
+ /*
* We use _copy_to_iter() to bypass usermode hardening
* which would otherwise prevent this operation.
*/
- if (_copy_to_iter((char *)start, tsz, iter) != tsz) {
+ } else if (_copy_to_iter(buf, tsz, iter) != tsz) {
ret = -EFAULT;
goto out;
}
@@ -595,6 +608,10 @@ static int open_kcore(struct inode *inod
if (ret)
return ret;
+ filp->private_data = kmalloc(PAGE_SIZE, GFP_KERNEL);
+ if (!filp->private_data)
+ return -ENOMEM;
+
if (kcore_need_update)
kcore_update_ram();
if (i_size_read(inode) != proc_root_kcore->size) {
@@ -605,9 +622,16 @@ static int open_kcore(struct inode *inod
return 0;
}
+static int release_kcore(struct inode *inode, struct file *file)
+{
+ kfree(file->private_data);
+ return 0;
+}
+
static const struct proc_ops kcore_proc_ops = {
.proc_read_iter = read_kcore_iter,
.proc_open = open_kcore,
+ .proc_release = release_kcore,
.proc_lseek = default_llseek,
};
_
Patches currently in -mm which might be from lstoakes(a)gmail.com are
fs-proc-kcore-reinstate-bounce-buffer-for-kcore_text-regions.patch
fs-proc-kcore-reinstate-bounce-buffer-for-kcore_text-regions-fix.patch
[ commit be37bed754ed90b2655382f93f9724b3c1aae847 upstream ]
Dan Carpenter spotted that test_fw_config->reqs will be leaked if
trigger_batched_requests_store() is called two or more times.
The same appears with trigger_batched_requests_async_store().
This bug wasn't triggered by the tests, but observed by Dan's visual
inspection of the code.
The recommended workaround was to return -EBUSY if test_fw_config->reqs
is already allocated.
Fixes: c92316bf8e94 ("test_firmware: add batched firmware tests")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v4.19
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Suggested-by: Takashi Iwai <tiwai(a)suse.de>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Acked-by: Luis Chamberlain <mcgrof(a)kernel.org>
Link: https://lore.kernel.org/r/20230509084746.48259-2-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is a backport to v4.19 stable branch without a change in code from the 5.4+ patch ]
---
v1:
patch sumbmitted verbatim from the 5.4+ branch to 4.19
lib/test_firmware.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index f4cc874021da..e4688821eab8 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -618,6 +618,11 @@ static ssize_t trigger_batched_requests_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs =
vzalloc(array3_size(sizeof(struct test_batched_req),
test_fw_config->num_requests, 2));
@@ -721,6 +726,11 @@ ssize_t trigger_batched_requests_async_store(struct device *dev,
mutex_lock(&test_fw_mutex);
+ if (test_fw_config->reqs) {
+ rc = -EBUSY;
+ goto out_bail;
+ }
+
test_fw_config->reqs =
vzalloc(array3_size(sizeof(struct test_batched_req),
test_fw_config->num_requests, 2));
--
2.34.1
From: Alexander Stein <alexander.stein(a)ew.tq-group.com>
When hactive is not aligned to 8 pixels, it is aligned accordingly and
hfront porch needs to be reduced the same amount. Unfortunately the front
porch is set to the difference rather than reducing it. There are some
Samsung TVs which can't cope with a front porch of instead of 70.
Fixes: 94dfec48fca7 ("drm/imx: Add 8 pixel alignment fix")
Signed-off-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
Reviewed-by: Philipp Zabel <p.zabel(a)pengutronix.de>
Link: https://lore.kernel.org/r/20230515072137.116211-1-alexander.stein@ew.tq-gro…
[p.zabel(a)pengutronix.de: Fixed subject]
Signed-off-by: Philipp Zabel <p.zabel(a)pengutronix.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230515072137.116211-1-alexa…
---
drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c b/drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
index 5f26090b0c98..89585b31b985 100644
--- a/drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
+++ b/drivers/gpu/drm/imx/ipuv3/ipuv3-crtc.c
@@ -310,7 +310,7 @@ static void ipu_crtc_mode_set_nofb(struct drm_crtc *crtc)
dev_warn(ipu_crtc->dev, "8-pixel align hactive %d -> %d\n",
sig_cfg.mode.hactive, new_hactive);
- sig_cfg.mode.hfront_porch = new_hactive - sig_cfg.mode.hactive;
+ sig_cfg.mode.hfront_porch -= new_hactive - sig_cfg.mode.hactive;
sig_cfg.mode.hactive = new_hactive;
}
--
2.34.1
Hi,
Please help backport following two commits to 6.1.y
f781f661e8c9 dma-buf: keep the signaling time of merged fences v3
00ae1491f970 dma-buf: fix an error pointer vs NULL bug
The first one is to fix some Android CTS failures founded with android14-6.1 GKI kernel.
run cts -m CtsDeqpTestCases -t dEQP-EGL.functional.get_frame_timestamps*
The second patch is to fix the error introduced by the first one.
Thanks!
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7b72d661f1f2f950ab8c12de7e2bc48bdac8ed69
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080154-handed-folic-f52f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7b72d661f1f2 ("io_uring: gate iowait schedule on having pending requests")
8a796565cec3 ("io_uring: Use io_schedule* in cqring wait")
d33a39e57768 ("io_uring: keep timeout in io_wait_queue")
46ae7eef44f6 ("io_uring: optimise non-timeout waiting")
846072f16eed ("io_uring: mimimise io_cqring_wait_schedule")
3fcf19d592d5 ("io_uring: parse check_cq out of wq waiting")
12521a5d5cb7 ("io_uring: fix CQ waiting timeout handling")
52ea806ad983 ("io_uring: finish waiting before flushing overflow entries")
35d90f95cfa7 ("io_uring: include task_work run after scheduling in wait for events")
1b346e4aa8e7 ("io_uring: don't check overflow flush failures")
a85381d8326d ("io_uring: skip overflow CQE posting for dying ring")
c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
b4c98d59a787 ("io_uring: introduce io_has_work")
78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()")
c34398a8c018 ("io_uring: remove __io_req_task_work_add")
ed5ccb3beeba ("io_uring: remove priority tw list optimisation")
4a0fef62788b ("io_uring: optimize io_uring_task layout")
253993210bd8 ("io_uring: introduce locking helpers for CQE posting")
305bef988708 ("io_uring: hide eventfd assumptions in eventfd paths")
affa87db9010 ("io_uring: fix multi ctx cancellation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7b72d661f1f2f950ab8c12de7e2bc48bdac8ed69 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 24 Jul 2023 11:28:17 -0600
Subject: [PATCH] io_uring: gate iowait schedule on having pending requests
A previous commit made all cqring waits marked as iowait, as a way to
improve performance for short schedules with pending IO. However, for
use cases that have a special reaper thread that does nothing but
wait on events on the ring, this causes a cosmetic issue where we
know have one core marked as being "busy" with 100% iowait.
While this isn't a grave issue, it is confusing to users. Rather than
always mark us as being in iowait, gate setting of current->in_iowait
to 1 by whether or not the waiting task has pending requests.
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/io-uring/CAMEGJJ2RxopfNQ7GNLhr7X9=bHXKo+G5OOe0LUq=+…
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217699
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217700
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Reported-by: Phil Elwell <phil(a)raspberrypi.com>
Tested-by: Andres Freund <andres(a)anarazel.de>
Fixes: 8a796565cec3 ("io_uring: Use io_schedule* in cqring wait")
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 89a611541bc4..f4591b912ea8 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2493,11 +2493,20 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
return 0;
}
+static bool current_pending_io(void)
+{
+ struct io_uring_task *tctx = current->io_uring;
+
+ if (!tctx)
+ return false;
+ return percpu_counter_read_positive(&tctx->inflight);
+}
+
/* when returns >0, the caller should retry */
static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
struct io_wait_queue *iowq)
{
- int token, ret;
+ int io_wait, ret;
if (unlikely(READ_ONCE(ctx->check_cq)))
return 1;
@@ -2511,17 +2520,19 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
return 0;
/*
- * Use io_schedule_prepare/finish, so cpufreq can take into account
- * that the task is waiting for IO - turns out to be important for low
- * QD IO.
+ * Mark us as being in io_wait if we have pending requests, so cpufreq
+ * can take into account that the task is waiting for IO - turns out
+ * to be important for low QD IO.
*/
- token = io_schedule_prepare();
+ io_wait = current->in_iowait;
+ if (current_pending_io())
+ current->in_iowait = 1;
ret = 0;
if (iowq->timeout == KTIME_MAX)
schedule();
else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
ret = -ETIME;
- io_schedule_finish(token);
+ current->in_iowait = io_wait;
return ret;
}