The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 166d3863231667c4f64dee72b77d1102cdfad11f Mon Sep 17 00:00:00 2001
From: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Date: Sun, 10 Jul 2022 19:05:22 -0400
Subject: [PATCH] xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
The error paths of gntdev_mmap() can call unmap_grant_pages() even
though not all of the pages have been successfully mapped. This will
trigger the WARN_ON()s in __unmap_grant_pages_done(). The number of
warnings can be very large; I have observed thousands of lines of
warnings in the systemd journal.
Avoid this problem by only warning on unmapping failure if the handle
being unmapped is not INVALID_GRANT_HANDLE. The handle field of any
page that was not successfully mapped will be INVALID_GRANT_HANDLE, so
this catches all cases where unmapping can legitimately fail.
Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
Cc: stable(a)vger.kernel.org
Suggested-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko(a)epam.com>
Reviewed-by: Juergen Gross <jgross(a)suse.com>
Link: https://lore.kernel.org/r/20220710230522.1563-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross(a)suse.com>
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4b56c39f766d..84b143eef395 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -396,13 +396,15 @@ static void __unmap_grant_pages_done(int result,
unsigned int offset = data->unmap_ops - map->unmap_ops;
for (i = 0; i < data->count; i++) {
- WARN_ON(map->unmap_ops[offset+i].status);
+ WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
+ map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
pr_debug("unmap handle=%d st=%d\n",
map->unmap_ops[offset+i].handle,
map->unmap_ops[offset+i].status);
map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
if (use_ptemod) {
- WARN_ON(map->kunmap_ops[offset+i].status);
+ WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
+ map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
pr_debug("kunmap handle=%u st=%d\n",
map->kunmap_ops[offset+i].handle,
map->kunmap_ops[offset+i].status);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 166d3863231667c4f64dee72b77d1102cdfad11f Mon Sep 17 00:00:00 2001
From: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Date: Sun, 10 Jul 2022 19:05:22 -0400
Subject: [PATCH] xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
The error paths of gntdev_mmap() can call unmap_grant_pages() even
though not all of the pages have been successfully mapped. This will
trigger the WARN_ON()s in __unmap_grant_pages_done(). The number of
warnings can be very large; I have observed thousands of lines of
warnings in the systemd journal.
Avoid this problem by only warning on unmapping failure if the handle
being unmapped is not INVALID_GRANT_HANDLE. The handle field of any
page that was not successfully mapped will be INVALID_GRANT_HANDLE, so
this catches all cases where unmapping can legitimately fail.
Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
Cc: stable(a)vger.kernel.org
Suggested-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko(a)epam.com>
Reviewed-by: Juergen Gross <jgross(a)suse.com>
Link: https://lore.kernel.org/r/20220710230522.1563-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross(a)suse.com>
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4b56c39f766d..84b143eef395 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -396,13 +396,15 @@ static void __unmap_grant_pages_done(int result,
unsigned int offset = data->unmap_ops - map->unmap_ops;
for (i = 0; i < data->count; i++) {
- WARN_ON(map->unmap_ops[offset+i].status);
+ WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
+ map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
pr_debug("unmap handle=%d st=%d\n",
map->unmap_ops[offset+i].handle,
map->unmap_ops[offset+i].status);
map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
if (use_ptemod) {
- WARN_ON(map->kunmap_ops[offset+i].status);
+ WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
+ map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
pr_debug("kunmap handle=%u st=%d\n",
map->kunmap_ops[offset+i].handle,
map->kunmap_ops[offset+i].status);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 166d3863231667c4f64dee72b77d1102cdfad11f Mon Sep 17 00:00:00 2001
From: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Date: Sun, 10 Jul 2022 19:05:22 -0400
Subject: [PATCH] xen/gntdev: Ignore failure to unmap INVALID_GRANT_HANDLE
The error paths of gntdev_mmap() can call unmap_grant_pages() even
though not all of the pages have been successfully mapped. This will
trigger the WARN_ON()s in __unmap_grant_pages_done(). The number of
warnings can be very large; I have observed thousands of lines of
warnings in the systemd journal.
Avoid this problem by only warning on unmapping failure if the handle
being unmapped is not INVALID_GRANT_HANDLE. The handle field of any
page that was not successfully mapped will be INVALID_GRANT_HANDLE, so
this catches all cases where unmapping can legitimately fail.
Fixes: dbe97cff7dd9 ("xen/gntdev: Avoid blocking in unmap_grant_pages()")
Cc: stable(a)vger.kernel.org
Suggested-by: Juergen Gross <jgross(a)suse.com>
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko(a)epam.com>
Reviewed-by: Juergen Gross <jgross(a)suse.com>
Link: https://lore.kernel.org/r/20220710230522.1563-1-demi@invisiblethingslab.com
Signed-off-by: Juergen Gross <jgross(a)suse.com>
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4b56c39f766d..84b143eef395 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -396,13 +396,15 @@ static void __unmap_grant_pages_done(int result,
unsigned int offset = data->unmap_ops - map->unmap_ops;
for (i = 0; i < data->count; i++) {
- WARN_ON(map->unmap_ops[offset+i].status);
+ WARN_ON(map->unmap_ops[offset + i].status != GNTST_okay &&
+ map->unmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
pr_debug("unmap handle=%d st=%d\n",
map->unmap_ops[offset+i].handle,
map->unmap_ops[offset+i].status);
map->unmap_ops[offset+i].handle = INVALID_GRANT_HANDLE;
if (use_ptemod) {
- WARN_ON(map->kunmap_ops[offset+i].status);
+ WARN_ON(map->kunmap_ops[offset + i].status != GNTST_okay &&
+ map->kunmap_ops[offset + i].handle != INVALID_GRANT_HANDLE);
pr_debug("kunmap handle=%u st=%d\n",
map->kunmap_ops[offset+i].handle,
map->kunmap_ops[offset+i].status);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 39cdb68c64d84e71a4a717000b6e5de208ee60cc Mon Sep 17 00:00:00 2001
From: Yangxi Xiang <xyangxi5(a)gmail.com>
Date: Tue, 28 Jun 2022 17:33:22 +0800
Subject: [PATCH] vt: fix memory overlapping when deleting chars in the buffer
A memory overlapping copy occurs when deleting a long line. This memory
overlapping copy can cause data corruption when scr_memcpyw is optimized
to memcpy because memcpy does not ensure its behavior if the destination
buffer overlaps with the source buffer. The line buffer is not always
broken, because the memcpy utilizes the hardware acceleration, whose
result is not deterministic.
Fix this problem by using replacing the scr_memcpyw with scr_memmovew.
Fixes: 81732c3b2fed ("tty vt: Fix line garbage in virtual console on command line edition")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Yangxi Xiang <xyangxi5(a)gmail.com>
Link: https://lore.kernel.org/r/20220628093322.5688-1-xyangxi5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index f8c87c4d7399..dfc1f4b445f3 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -855,7 +855,7 @@ static void delete_char(struct vc_data *vc, unsigned int nr)
unsigned short *p = (unsigned short *) vc->vc_pos;
vc_uniscr_delete(vc, nr);
- scr_memcpyw(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
+ scr_memmovew(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
scr_memsetw(p + vc->vc_cols - vc->state.x - nr, vc->vc_video_erase_char,
nr * 2);
vc->vc_need_wrap = 0;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 39cdb68c64d84e71a4a717000b6e5de208ee60cc Mon Sep 17 00:00:00 2001
From: Yangxi Xiang <xyangxi5(a)gmail.com>
Date: Tue, 28 Jun 2022 17:33:22 +0800
Subject: [PATCH] vt: fix memory overlapping when deleting chars in the buffer
A memory overlapping copy occurs when deleting a long line. This memory
overlapping copy can cause data corruption when scr_memcpyw is optimized
to memcpy because memcpy does not ensure its behavior if the destination
buffer overlaps with the source buffer. The line buffer is not always
broken, because the memcpy utilizes the hardware acceleration, whose
result is not deterministic.
Fix this problem by using replacing the scr_memcpyw with scr_memmovew.
Fixes: 81732c3b2fed ("tty vt: Fix line garbage in virtual console on command line edition")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Yangxi Xiang <xyangxi5(a)gmail.com>
Link: https://lore.kernel.org/r/20220628093322.5688-1-xyangxi5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index f8c87c4d7399..dfc1f4b445f3 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -855,7 +855,7 @@ static void delete_char(struct vc_data *vc, unsigned int nr)
unsigned short *p = (unsigned short *) vc->vc_pos;
vc_uniscr_delete(vc, nr);
- scr_memcpyw(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
+ scr_memmovew(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
scr_memsetw(p + vc->vc_cols - vc->state.x - nr, vc->vc_video_erase_char,
nr * 2);
vc->vc_need_wrap = 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 39cdb68c64d84e71a4a717000b6e5de208ee60cc Mon Sep 17 00:00:00 2001
From: Yangxi Xiang <xyangxi5(a)gmail.com>
Date: Tue, 28 Jun 2022 17:33:22 +0800
Subject: [PATCH] vt: fix memory overlapping when deleting chars in the buffer
A memory overlapping copy occurs when deleting a long line. This memory
overlapping copy can cause data corruption when scr_memcpyw is optimized
to memcpy because memcpy does not ensure its behavior if the destination
buffer overlaps with the source buffer. The line buffer is not always
broken, because the memcpy utilizes the hardware acceleration, whose
result is not deterministic.
Fix this problem by using replacing the scr_memcpyw with scr_memmovew.
Fixes: 81732c3b2fed ("tty vt: Fix line garbage in virtual console on command line edition")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Yangxi Xiang <xyangxi5(a)gmail.com>
Link: https://lore.kernel.org/r/20220628093322.5688-1-xyangxi5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index f8c87c4d7399..dfc1f4b445f3 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -855,7 +855,7 @@ static void delete_char(struct vc_data *vc, unsigned int nr)
unsigned short *p = (unsigned short *) vc->vc_pos;
vc_uniscr_delete(vc, nr);
- scr_memcpyw(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
+ scr_memmovew(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
scr_memsetw(p + vc->vc_cols - vc->state.x - nr, vc->vc_video_erase_char,
nr * 2);
vc->vc_need_wrap = 0;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 39cdb68c64d84e71a4a717000b6e5de208ee60cc Mon Sep 17 00:00:00 2001
From: Yangxi Xiang <xyangxi5(a)gmail.com>
Date: Tue, 28 Jun 2022 17:33:22 +0800
Subject: [PATCH] vt: fix memory overlapping when deleting chars in the buffer
A memory overlapping copy occurs when deleting a long line. This memory
overlapping copy can cause data corruption when scr_memcpyw is optimized
to memcpy because memcpy does not ensure its behavior if the destination
buffer overlaps with the source buffer. The line buffer is not always
broken, because the memcpy utilizes the hardware acceleration, whose
result is not deterministic.
Fix this problem by using replacing the scr_memcpyw with scr_memmovew.
Fixes: 81732c3b2fed ("tty vt: Fix line garbage in virtual console on command line edition")
Cc: stable <stable(a)kernel.org>
Signed-off-by: Yangxi Xiang <xyangxi5(a)gmail.com>
Link: https://lore.kernel.org/r/20220628093322.5688-1-xyangxi5@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index f8c87c4d7399..dfc1f4b445f3 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -855,7 +855,7 @@ static void delete_char(struct vc_data *vc, unsigned int nr)
unsigned short *p = (unsigned short *) vc->vc_pos;
vc_uniscr_delete(vc, nr);
- scr_memcpyw(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
+ scr_memmovew(p, p + nr, (vc->vc_cols - vc->state.x - nr) * 2);
scr_memsetw(p + vc->vc_cols - vc->state.x - nr, vc->vc_video_erase_char,
nr * 2);
vc->vc_need_wrap = 0;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43b5240ca6b33108998810593248186b1e3ae34a Mon Sep 17 00:00:00 2001
From: Muchun Song <songmuchun(a)bytedance.com>
Date: Thu, 9 Jun 2022 18:40:32 +0800
Subject: [PATCH] mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE
"numa_stat" should not be included in the scope of CONFIG_HUGETLB_PAGE, if
CONFIG_HUGETLB_PAGE is not configured even if CONFIG_NUMA is configured,
"numa_stat" is missed form /proc. Move it out of CONFIG_HUGETLB_PAGE to
fix it.
Fixes: 4518085e127d ("mm, sysctl: make NUMA stats configurable")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index e52b6e372c60..aaf0b1f1dc57 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2091,6 +2091,17 @@ static struct ctl_table vm_table[] = {
.extra1 = SYSCTL_ZERO,
.extra2 = SYSCTL_TWO_HUNDRED,
},
+#ifdef CONFIG_NUMA
+ {
+ .procname = "numa_stat",
+ .data = &sysctl_vm_numa_stat,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = sysctl_vm_numa_stat_handler,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE,
+ },
+#endif
#ifdef CONFIG_HUGETLB_PAGE
{
.procname = "nr_hugepages",
@@ -2107,15 +2118,6 @@ static struct ctl_table vm_table[] = {
.mode = 0644,
.proc_handler = &hugetlb_mempolicy_sysctl_handler,
},
- {
- .procname = "numa_stat",
- .data = &sysctl_vm_numa_stat,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = sysctl_vm_numa_stat_handler,
- .extra1 = SYSCTL_ZERO,
- .extra2 = SYSCTL_ONE,
- },
#endif
{
.procname = "hugetlb_shm_group",
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fac47b43c760ea90e64b895dba60df0327be7775 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Mon, 11 Jul 2022 12:11:21 +0800
Subject: [PATCH] netfs: do not unlock and put the folio twice
check_write_begin() will unlock and put the folio when return
non-zero. So we should avoid unlocking and putting it twice in
netfs layer.
Change the way ->check_write_begin() works in the following two ways:
(1) Pass it a pointer to the folio pointer, allowing it to unlock and put
the folio prior to doing the stuff it wants to do, provided it clears
the folio pointer.
(2) Change the return values such that 0 with folio pointer set means
continue, 0 with folio pointer cleared means re-get and all error
codes indicating an error (no special treatment for -EAGAIN).
[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com
Co-developed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
index 4d19b19bcc08..73a4176144b3 100644
--- a/Documentation/filesystems/netfs_library.rst
+++ b/Documentation/filesystems/netfs_library.rst
@@ -301,7 +301,7 @@ through which it can issue requests and negotiate::
void (*issue_read)(struct netfs_io_subrequest *subreq);
bool (*is_still_valid)(struct netfs_io_request *rreq);
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
- struct folio *folio, void **_fsdata);
+ struct folio **foliop, void **_fsdata);
void (*done)(struct netfs_io_request *rreq);
};
@@ -381,8 +381,10 @@ The operations are as follows:
allocated/grabbed the folio to be modified to allow the filesystem to flush
conflicting state before allowing it to be modified.
- It should return 0 if everything is now fine, -EAGAIN if the folio should be
- regrabbed and any other error code to abort the operation.
+ It may unlock and discard the folio it was given and set the caller's folio
+ pointer to NULL. It should return 0 if everything is now fine (``*foliop``
+ left set) or the op should be retried (``*foliop`` cleared) and any other
+ error code to abort the operation.
* ``done``
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 42118a4f3383..d1cfb235c4b9 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -375,7 +375,7 @@ static int afs_begin_cache_operation(struct netfs_io_request *rreq)
}
static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
- struct folio *folio, void **_fsdata)
+ struct folio **foliop, void **_fsdata)
{
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6dee88815491..d6e5916138e4 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -63,7 +63,7 @@
(CONGESTION_ON_THRESH(congestion_kb) >> 2))
static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,
- struct folio *folio, void **_fsdata);
+ struct folio **foliop, void **_fsdata);
static inline struct ceph_snap_context *page_snap_context(struct page *page)
{
@@ -1288,18 +1288,19 @@ ceph_find_incompatible(struct page *page)
}
static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,
- struct folio *folio, void **_fsdata)
+ struct folio **foliop, void **_fsdata)
{
struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_snap_context *snapc;
- snapc = ceph_find_incompatible(folio_page(folio, 0));
+ snapc = ceph_find_incompatible(folio_page(*foliop, 0));
if (snapc) {
int r;
- folio_unlock(folio);
- folio_put(folio);
+ folio_unlock(*foliop);
+ folio_put(*foliop);
+ *foliop = NULL;
if (IS_ERR(snapc))
return PTR_ERR(snapc);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 42f892c5712e..0ce535852151 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -319,8 +319,9 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
* conflicting writes once the folio is grabbed and locked. It is passed a
* pointer to the fsdata cookie that gets returned to the VM to be passed to
* write_end. It is permitted to sleep. It should return 0 if the request
- * should go ahead; unlock the folio and return -EAGAIN to cause the folio to
- * be regot; or return an error.
+ * should go ahead or it may return an error. It may also unlock and put the
+ * folio, provided it sets ``*foliop`` to NULL, in which case a return of 0
+ * will cause the folio to be re-got and the process to be retried.
*
* The calling netfs must initialise a netfs context contiguous to the vfs
* inode before calling this.
@@ -348,13 +349,13 @@ int netfs_write_begin(struct netfs_inode *ctx,
if (ctx->ops->check_write_begin) {
/* Allow the netfs (eg. ceph) to flush conflicts. */
- ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
+ ret = ctx->ops->check_write_begin(file, pos, len, &folio, _fsdata);
if (ret < 0) {
trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
- if (ret == -EAGAIN)
- goto retry;
goto error;
}
+ if (!folio)
+ goto retry;
}
if (folio_test_uptodate(folio))
@@ -416,8 +417,10 @@ int netfs_write_begin(struct netfs_inode *ctx,
error_put:
netfs_put_request(rreq, false, netfs_rreq_trace_put_failed);
error:
- folio_unlock(folio);
- folio_put(folio);
+ if (folio) {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
_leave(" = %d", ret);
return ret;
}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 1773e5df8e65..1b18dfa52e48 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -214,7 +214,7 @@ struct netfs_request_ops {
void (*issue_read)(struct netfs_io_subrequest *subreq);
bool (*is_still_valid)(struct netfs_io_request *rreq);
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
- struct folio *folio, void **_fsdata);
+ struct folio **foliop, void **_fsdata);
void (*done)(struct netfs_io_request *rreq);
};
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fac47b43c760ea90e64b895dba60df0327be7775 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Mon, 11 Jul 2022 12:11:21 +0800
Subject: [PATCH] netfs: do not unlock and put the folio twice
check_write_begin() will unlock and put the folio when return
non-zero. So we should avoid unlocking and putting it twice in
netfs layer.
Change the way ->check_write_begin() works in the following two ways:
(1) Pass it a pointer to the folio pointer, allowing it to unlock and put
the folio prior to doing the stuff it wants to do, provided it clears
the folio pointer.
(2) Change the return values such that 0 with folio pointer set means
continue, 0 with folio pointer cleared means re-get and all error
codes indicating an error (no special treatment for -EAGAIN).
[ bagasdotme: use Sphinx code text syntax for *foliop pointer ]
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/56423
Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com
Co-developed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/Documentation/filesystems/netfs_library.rst b/Documentation/filesystems/netfs_library.rst
index 4d19b19bcc08..73a4176144b3 100644
--- a/Documentation/filesystems/netfs_library.rst
+++ b/Documentation/filesystems/netfs_library.rst
@@ -301,7 +301,7 @@ through which it can issue requests and negotiate::
void (*issue_read)(struct netfs_io_subrequest *subreq);
bool (*is_still_valid)(struct netfs_io_request *rreq);
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
- struct folio *folio, void **_fsdata);
+ struct folio **foliop, void **_fsdata);
void (*done)(struct netfs_io_request *rreq);
};
@@ -381,8 +381,10 @@ The operations are as follows:
allocated/grabbed the folio to be modified to allow the filesystem to flush
conflicting state before allowing it to be modified.
- It should return 0 if everything is now fine, -EAGAIN if the folio should be
- regrabbed and any other error code to abort the operation.
+ It may unlock and discard the folio it was given and set the caller's folio
+ pointer to NULL. It should return 0 if everything is now fine (``*foliop``
+ left set) or the op should be retried (``*foliop`` cleared) and any other
+ error code to abort the operation.
* ``done``
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 42118a4f3383..d1cfb235c4b9 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -375,7 +375,7 @@ static int afs_begin_cache_operation(struct netfs_io_request *rreq)
}
static int afs_check_write_begin(struct file *file, loff_t pos, unsigned len,
- struct folio *folio, void **_fsdata)
+ struct folio **foliop, void **_fsdata)
{
struct afs_vnode *vnode = AFS_FS_I(file_inode(file));
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 6dee88815491..d6e5916138e4 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -63,7 +63,7 @@
(CONGESTION_ON_THRESH(congestion_kb) >> 2))
static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,
- struct folio *folio, void **_fsdata);
+ struct folio **foliop, void **_fsdata);
static inline struct ceph_snap_context *page_snap_context(struct page *page)
{
@@ -1288,18 +1288,19 @@ ceph_find_incompatible(struct page *page)
}
static int ceph_netfs_check_write_begin(struct file *file, loff_t pos, unsigned int len,
- struct folio *folio, void **_fsdata)
+ struct folio **foliop, void **_fsdata)
{
struct inode *inode = file_inode(file);
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_snap_context *snapc;
- snapc = ceph_find_incompatible(folio_page(folio, 0));
+ snapc = ceph_find_incompatible(folio_page(*foliop, 0));
if (snapc) {
int r;
- folio_unlock(folio);
- folio_put(folio);
+ folio_unlock(*foliop);
+ folio_put(*foliop);
+ *foliop = NULL;
if (IS_ERR(snapc))
return PTR_ERR(snapc);
diff --git a/fs/netfs/buffered_read.c b/fs/netfs/buffered_read.c
index 42f892c5712e..0ce535852151 100644
--- a/fs/netfs/buffered_read.c
+++ b/fs/netfs/buffered_read.c
@@ -319,8 +319,9 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
* conflicting writes once the folio is grabbed and locked. It is passed a
* pointer to the fsdata cookie that gets returned to the VM to be passed to
* write_end. It is permitted to sleep. It should return 0 if the request
- * should go ahead; unlock the folio and return -EAGAIN to cause the folio to
- * be regot; or return an error.
+ * should go ahead or it may return an error. It may also unlock and put the
+ * folio, provided it sets ``*foliop`` to NULL, in which case a return of 0
+ * will cause the folio to be re-got and the process to be retried.
*
* The calling netfs must initialise a netfs context contiguous to the vfs
* inode before calling this.
@@ -348,13 +349,13 @@ int netfs_write_begin(struct netfs_inode *ctx,
if (ctx->ops->check_write_begin) {
/* Allow the netfs (eg. ceph) to flush conflicts. */
- ret = ctx->ops->check_write_begin(file, pos, len, folio, _fsdata);
+ ret = ctx->ops->check_write_begin(file, pos, len, &folio, _fsdata);
if (ret < 0) {
trace_netfs_failure(NULL, NULL, ret, netfs_fail_check_write_begin);
- if (ret == -EAGAIN)
- goto retry;
goto error;
}
+ if (!folio)
+ goto retry;
}
if (folio_test_uptodate(folio))
@@ -416,8 +417,10 @@ int netfs_write_begin(struct netfs_inode *ctx,
error_put:
netfs_put_request(rreq, false, netfs_rreq_trace_put_failed);
error:
- folio_unlock(folio);
- folio_put(folio);
+ if (folio) {
+ folio_unlock(folio);
+ folio_put(folio);
+ }
_leave(" = %d", ret);
return ret;
}
diff --git a/include/linux/netfs.h b/include/linux/netfs.h
index 1773e5df8e65..1b18dfa52e48 100644
--- a/include/linux/netfs.h
+++ b/include/linux/netfs.h
@@ -214,7 +214,7 @@ struct netfs_request_ops {
void (*issue_read)(struct netfs_io_subrequest *subreq);
bool (*is_still_valid)(struct netfs_io_request *rreq);
int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
- struct folio *folio, void **_fsdata);
+ struct folio **foliop, void **_fsdata);
void (*done)(struct netfs_io_request *rreq);
};