The driver's fsync() is supposed to flush any pending operation to
hardware. It is implemented in this driver by cancelling the queued
deferred IO first, then schedule it for "immediate execution" by calling
schedule_delayed_work() again with delay=0. However, setting delay=0
only means the work is scheduled immediately, it does not mean the work
is executed immediately. There is no guarantee that the work is finished
after schedule_delayed_work() returns. After this driver's fsync()
returns, there can still be pending work. Furthermore, if close() is
called by users immediately after fsync(), the pending work gets
cancelled and fsync() may do nothing.
To ensure that the deferred IO completes, use flush_delayed_work()
instead. Write operations to this driver either write to the device
directly, or invoke schedule_delayed_work(); so by flushing the
workqueue, it can be guaranteed that all previous writes make it to the
device.
Fixes: 5e841b88d23d ("fb: fsync() method for deferred I/O flush.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
drivers/video/fbdev/core/fb_defio.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 274f5d0fa247..6c8b81c452f0 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -132,11 +132,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasy
return 0;
inode_lock(inode);
- /* Kill off the delayed work */
- cancel_delayed_work_sync(&info->deferred_work);
-
- /* Run it immediately */
- schedule_delayed_work(&info->deferred_work, 0);
+ flush_delayed_work(&info->deferred_work);
inode_unlock(inode);
return 0;
--
2.39.2
In ext4_move_extents(), moved_len is only updated when all moves are
successfully executed, and only discards orig_inode and donor_inode
preallocations when moved_len is not zero. When the loop fails to exit
after successfully moving some extents, moved_len is not updated and
remains at 0, so it does not discard the preallocations.
If the moved extents overlap with the preallocated extents, the
overlapped extents are freed twice in ext4_mb_release_inode_pa() and
ext4_process_freed_data() (as described in commit 94d7c16cbbbd), and
bb_free is incremented twice. Hence when trim is executed, a zero-division
bug is triggered in mb_update_avg_fragment_size() because bb_free is not
zero and bb_fragments is zero.
Therefore, update move_len after each extent move to avoid the issue.
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR…
Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base")
CC: stable(a)vger.kernel.org # 3.18
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
fs/ext4/move_extent.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 3aa57376d9c2..4b9b503c6346 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -672,7 +672,7 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
*/
ext4_double_up_write_data_sem(orig_inode, donor_inode);
/* Swap original branches with new branches */
- move_extent_per_page(o_filp, donor_inode,
+ *moved_len += move_extent_per_page(o_filp, donor_inode,
orig_page_index, donor_page_index,
offset_in_page, cur_len,
unwritten, &ret);
@@ -682,7 +682,6 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp, __u64 orig_blk,
o_start += cur_len;
d_start += cur_len;
}
- *moved_len = o_start - orig_blk;
if (*moved_len > len)
*moved_len = len;
--
2.31.1
Hi all,
5.10.194 includes 331d85f0bc6e (ovl: Always reevaluate the file
signature for IMA), which resulted in a performance regression for
workloads that use IMA and run from overlayfs. 5.10.202 includes
cd5a262a07a5 (ima: detect changes to the backing overlay file), which
resolved the regression in the upstream kernel. However, from my
testing [1], this change doesn't resolve the regression on stable
kernels.
From what I can tell, cd5a262a07a5 (ima: detect changes to the
backing overlay file) depends on both db1d1e8b9867 (IMA: use
vfs_getattr_nosec to get the i_version) and a1175d6b1bda (vfs: plumb
i_version handling into struct kstat). These two dependent changes
were not backported to stable kernels. As a result, IMA seems to be
caching the wrong i_version value when using overlayfs. From my
testing, backporting these two dependent changes is sufficient to
resolve the issue in stable kernels.
Would it make sense to backport those changes to stable kernels? It's
possible that they may not follow the stable kernel patching rules. I
think the issue can also be fixed directly in stable trees with the
following diff (which doesn't make sense in the upstream kernel):
diff --git a/security/integrity/ima/ima_api.c b/security/integrity/ima/ima_api.c
index 70efd4aa1bd1..c84ae6b62b3a 100644
--- a/security/integrity/ima/ima_api.c
+++ b/security/integrity/ima/ima_api.c
@@ -239,7 +239,7 @@ int ima_collect_measurement(struct integrity_iint_cache *iint,
* which do not support i_version, support is limited to an initial
* measurement/appraisal/audit.
*/
- i_version = inode_query_iversion(inode);
+ i_version = inode_query_iversion(real_inode);
hash.hdr.algo = algo;
/* Initialize hash digest to 0's in case of failure */
I've verified that this diff resolves the performance regression.
Which approach would make the most sense to fix the issue in stable
kernels? Backporting the dependent commits, or merging the above diff?
I'd be happy to prepare and mail patches either way. Looking forward
to your thoughts.
Thanks,
-Robert
[1] I'm benchmarking with the following:
cat <<EOF > /sys/kernel/security/ima/policy
audit func=BPRM_CHECK
audit func=FILE_MMAP mask=MAY_EXEC
EOF
docker run --rm debian bash -c "TIMEFORMAT=%3R; time bash -c :"
A good result looks like 0.002 (units are seconds), and a bad result
looks like 0.034.