This is a note to let you know that I've just added the patch titled
afs: Fix afs_kill_pages()
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
afs-fix-afs_kill_pages.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 14:12:34 CET 2017
From: David Howells <dhowells(a)redhat.com>
Date: Thu, 16 Mar 2017 16:27:48 +0000
Subject: afs: Fix afs_kill_pages()
From: David Howells <dhowells(a)redhat.com>
[ Upstream commit 7286a35e893176169b09715096a4aca557e2ccd2 ]
Fix afs_kill_pages() in two ways:
(1) If a writeback has been partially flushed, then if we try and kill the
pages it contains, some of them may no longer be undergoing writeback
and end_page_writeback() will assert.
Fix this by checking to see whether the page in question is actually
undergoing writeback before ending that writeback.
(2) The loop that scans for pages to kill doesn't increase the first page
index, and so the loop may not terminate, but it will try to process
the same pages over and over again.
Fix this by increasing the first page index to one after the last page
we processed.
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/afs/write.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -299,10 +299,14 @@ static void afs_kill_pages(struct afs_vn
ASSERTCMP(pv.nr, ==, count);
for (loop = 0; loop < count; loop++) {
- ClearPageUptodate(pv.pages[loop]);
+ struct page *page = pv.pages[loop];
+ ClearPageUptodate(page);
if (error)
- SetPageError(pv.pages[loop]);
- end_page_writeback(pv.pages[loop]);
+ SetPageError(page);
+ if (PageWriteback(page))
+ end_page_writeback(page);
+ if (page->index >= first)
+ first = page->index + 1;
}
__pagevec_release(&pv);
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-4.9/afs-flush-outstanding-writes-when-an-fd-is-closed.patch
queue-4.9/afs-fix-the-maths-in-afs_fs_store_data.patch
queue-4.9/afs-populate-group-id-from-vnode-status.patch
queue-4.9/afs-prevent-callback-expiry-timer-overflow.patch
queue-4.9/crypto-rsa-fix-buffer-overread-when-stripping-leading-zeroes.patch
queue-4.9/afs-adjust-mode-bits-processing.patch
queue-4.9/rxrpc-wake-up-the-transmitter-if-rx-window-size-increases-on-the-peer.patch
queue-4.9/afs-invalid-op-id-should-abort-with-rxgen_opcode.patch
queue-4.9/afs-fix-page-leak-in-afs_write_begin.patch
queue-4.9/afs-better-abort-and-net-error-handling.patch
queue-4.9/afs-fix-missing-put_page.patch
queue-4.9/afs-migrate-vlocation-fields-to-64-bit.patch
queue-4.9/rxrpc-ignore-busy-packets-on-old-calls.patch
queue-4.9/afs-populate-and-use-client-modification-time.patch
queue-4.9/afs-fix-afs_kill_pages.patch
queue-4.9/afs-fix-abort-on-signal-while-waiting-for-call-completion.patch
This is a note to let you know that I've just added the patch titled
afs: Fix abort on signal while waiting for call completion
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
afs-fix-abort-on-signal-while-waiting-for-call-completion.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 14:12:34 CET 2017
From: David Howells <dhowells(a)redhat.com>
Date: Thu, 16 Mar 2017 16:27:49 +0000
Subject: afs: Fix abort on signal while waiting for call completion
From: David Howells <dhowells(a)redhat.com>
[ Upstream commit 954cd6dc02a65065aecb7150962c0870c5b0e322 ]
Fix the way in which a call that's in progress and being waited for is
aborted in the case that EINTR is detected. We should be sending
RX_USER_ABORT rather than RX_CALL_DEAD as the abort code.
Note that since the only two ways out of the loop are if the call completes
or if a signal happens, the kill-the-call clause after the loop has
finished can only happen in the case of EINTR. This means that we only
have one abort case to deal with, not two, and the "KWC" case can never
happen and so can be deleted.
Note further that simply aborting the call isn't necessarily the best thing
here since at this point: the request has been entirely sent and it's
likely the server will do the operation anyway - whether we abort it or
not. In future, we should punt the handling of the remainder of the call
off to a background thread.
Reported-by: Marc Dionne <marc.c.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/afs/rxrpc.c | 19 ++++++-------------
1 file changed, 6 insertions(+), 13 deletions(-)
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -492,7 +492,6 @@ call_complete:
*/
static int afs_wait_for_call_to_complete(struct afs_call *call)
{
- const char *abort_why;
int ret;
DECLARE_WAITQUEUE(myself, current);
@@ -511,13 +510,8 @@ static int afs_wait_for_call_to_complete
continue;
}
- abort_why = "KWC";
- ret = call->error;
- if (call->state == AFS_CALL_COMPLETE)
- break;
- abort_why = "KWI";
- ret = -EINTR;
- if (signal_pending(current))
+ if (call->state == AFS_CALL_COMPLETE ||
+ signal_pending(current))
break;
schedule();
}
@@ -525,15 +519,14 @@ static int afs_wait_for_call_to_complete
remove_wait_queue(&call->waitq, &myself);
__set_current_state(TASK_RUNNING);
- /* kill the call */
+ /* Kill off the call if it's still live. */
if (call->state < AFS_CALL_COMPLETE) {
- _debug("call incomplete");
+ _debug("call interrupted");
rxrpc_kernel_abort_call(afs_socket, call->rxcall,
- RX_CALL_DEAD, -ret, abort_why);
- } else if (call->error < 0) {
- ret = call->error;
+ RX_USER_ABORT, -EINTR, "KWI");
}
+ ret = call->error;
_debug("call complete");
afs_end_call(call);
_leave(" = %d", ret);
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-4.9/afs-flush-outstanding-writes-when-an-fd-is-closed.patch
queue-4.9/afs-fix-the-maths-in-afs_fs_store_data.patch
queue-4.9/afs-populate-group-id-from-vnode-status.patch
queue-4.9/afs-prevent-callback-expiry-timer-overflow.patch
queue-4.9/crypto-rsa-fix-buffer-overread-when-stripping-leading-zeroes.patch
queue-4.9/afs-adjust-mode-bits-processing.patch
queue-4.9/rxrpc-wake-up-the-transmitter-if-rx-window-size-increases-on-the-peer.patch
queue-4.9/afs-invalid-op-id-should-abort-with-rxgen_opcode.patch
queue-4.9/afs-fix-page-leak-in-afs_write_begin.patch
queue-4.9/afs-better-abort-and-net-error-handling.patch
queue-4.9/afs-fix-missing-put_page.patch
queue-4.9/afs-migrate-vlocation-fields-to-64-bit.patch
queue-4.9/rxrpc-ignore-busy-packets-on-old-calls.patch
queue-4.9/afs-populate-and-use-client-modification-time.patch
queue-4.9/afs-fix-afs_kill_pages.patch
queue-4.9/afs-fix-abort-on-signal-while-waiting-for-call-completion.patch
This is a note to let you know that I've just added the patch titled
afs: Deal with an empty callback array
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
afs-deal-with-an-empty-callback-array.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 14:12:34 CET 2017
From: Marc Dionne <marc.dionne(a)auristor.com>
Date: Thu, 16 Mar 2017 16:27:44 +0000
Subject: afs: Deal with an empty callback array
From: Marc Dionne <marc.dionne(a)auristor.com>
[ Upstream commit bcd89270d93b7edebb5de5e5e7dca1a77a33496e ]
Servers may send a callback array that is the same size as
the FID array, or an empty array. If the callback count is
0, the code would attempt to read (fid_count * 12) bytes of
data, which would fail and result in an unmarshalling error.
This would lead to stale data for remotely modified files
or directories.
Store the callback array size in the internal afs_call
structure and use that to determine the amount of data to
read.
Signed-off-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/afs/cmservice.c | 11 +++++------
fs/afs/internal.h | 5 ++++-
2 files changed, 9 insertions(+), 7 deletions(-)
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -168,7 +168,6 @@ static int afs_deliver_cb_callback(struc
struct afs_callback *cb;
struct afs_server *server;
__be32 *bp;
- u32 tmp;
int ret, loop;
_enter("{%u}", call->unmarshall);
@@ -230,9 +229,9 @@ static int afs_deliver_cb_callback(struc
if (ret < 0)
return ret;
- tmp = ntohl(call->tmp);
- _debug("CB count: %u", tmp);
- if (tmp != call->count && tmp != 0)
+ call->count2 = ntohl(call->tmp);
+ _debug("CB count: %u", call->count2);
+ if (call->count2 != call->count && call->count2 != 0)
return -EBADMSG;
call->offset = 0;
call->unmarshall++;
@@ -240,14 +239,14 @@ static int afs_deliver_cb_callback(struc
case 4:
_debug("extract CB array");
ret = afs_extract_data(call, call->buffer,
- call->count * 3 * 4, false);
+ call->count2 * 3 * 4, false);
if (ret < 0)
return ret;
_debug("unmarshall CB array");
cb = call->request;
bp = call->buffer;
- for (loop = call->count; loop > 0; loop--, cb++) {
+ for (loop = call->count2; loop > 0; loop--, cb++) {
cb->version = ntohl(*bp++);
cb->expiry = ntohl(*bp++);
cb->type = ntohl(*bp++);
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -105,7 +105,10 @@ struct afs_call {
unsigned request_size; /* size of request data */
unsigned reply_max; /* maximum size of reply */
unsigned first_offset; /* offset into mapping[first] */
- unsigned last_to; /* amount of mapping[last] */
+ union {
+ unsigned last_to; /* amount of mapping[last] */
+ unsigned count2; /* count used in unmarshalling */
+ };
unsigned char unmarshall; /* unmarshalling phase */
bool incoming; /* T if incoming call */
bool send_pages; /* T if data from mapping should be sent */
Patches currently in stable-queue which might be from marc.dionne(a)auristor.com are
queue-4.9/afs-populate-group-id-from-vnode-status.patch
queue-4.9/afs-adjust-mode-bits-processing.patch
queue-4.9/afs-populate-and-use-client-modification-time.patch
queue-4.9/afs-deal-with-an-empty-callback-array.patch
This is a note to let you know that I've just added the patch titled
afs: Better abort and net error handling
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
afs-better-abort-and-net-error-handling.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 14:12:34 CET 2017
From: David Howells <dhowells(a)redhat.com>
Date: Thu, 16 Mar 2017 16:27:47 +0000
Subject: afs: Better abort and net error handling
From: David Howells <dhowells(a)redhat.com>
[ Upstream commit 70af0e3bd65142f9e674961c975451638a7ce1d5 ]
If we receive a network error, a remote abort or a protocol error whilst
we're still transmitting data, make sure we return an appropriate error to
the caller rather than ESHUTDOWN or ECONNABORTED.
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/afs/rxrpc.c | 35 +++++++++++++++++++++++++++--------
1 file changed, 27 insertions(+), 8 deletions(-)
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -321,6 +321,8 @@ int afs_make_call(struct in_addr *addr,
struct rxrpc_call *rxcall;
struct msghdr msg;
struct kvec iov[1];
+ size_t offset;
+ u32 abort_code;
int ret;
_enter("%x,{%d},", addr->s_addr, ntohs(call->port));
@@ -368,9 +370,11 @@ int afs_make_call(struct in_addr *addr,
msg.msg_controllen = 0;
msg.msg_flags = (call->send_pages ? MSG_MORE : 0);
- /* have to change the state *before* sending the last packet as RxRPC
- * might give us the reply before it returns from sending the
- * request */
+ /* We have to change the state *before* sending the last packet as
+ * rxrpc might give us the reply before it returns from sending the
+ * request. Further, if the send fails, we may already have been given
+ * a notification and may have collected it.
+ */
if (!call->send_pages)
call->state = AFS_CALL_AWAIT_REPLY;
ret = rxrpc_kernel_send_data(afs_socket, rxcall,
@@ -389,7 +393,17 @@ int afs_make_call(struct in_addr *addr,
return wait_mode->wait(call);
error_do_abort:
- rxrpc_kernel_abort_call(afs_socket, rxcall, RX_USER_ABORT, -ret, "KSD");
+ call->state = AFS_CALL_COMPLETE;
+ if (ret != -ECONNABORTED) {
+ rxrpc_kernel_abort_call(afs_socket, rxcall, RX_USER_ABORT,
+ -ret, "KSD");
+ } else {
+ abort_code = 0;
+ offset = 0;
+ rxrpc_kernel_recv_data(afs_socket, rxcall, NULL, 0, &offset,
+ false, &abort_code);
+ ret = call->type->abort_to_error(abort_code);
+ }
error_kill_call:
afs_end_call(call);
_leave(" = %d", ret);
@@ -434,16 +448,18 @@ static void afs_deliver_to_call(struct a
case -EINPROGRESS:
case -EAGAIN:
goto out;
+ case -ECONNABORTED:
+ goto call_complete;
case -ENOTCONN:
abort_code = RX_CALL_DEAD;
rxrpc_kernel_abort_call(afs_socket, call->rxcall,
abort_code, -ret, "KNC");
- goto do_abort;
+ goto save_error;
case -ENOTSUPP:
abort_code = RXGEN_OPCODE;
rxrpc_kernel_abort_call(afs_socket, call->rxcall,
abort_code, -ret, "KIV");
- goto do_abort;
+ goto save_error;
case -ENODATA:
case -EBADMSG:
case -EMSGSIZE:
@@ -453,7 +469,7 @@ static void afs_deliver_to_call(struct a
abort_code = RXGEN_SS_UNMARSHAL;
rxrpc_kernel_abort_call(afs_socket, call->rxcall,
abort_code, EBADMSG, "KUM");
- goto do_abort;
+ goto save_error;
}
}
@@ -464,8 +480,9 @@ out:
_leave("");
return;
-do_abort:
+save_error:
call->error = ret;
+call_complete:
call->state = AFS_CALL_COMPLETE;
goto done;
}
@@ -513,6 +530,8 @@ static int afs_wait_for_call_to_complete
_debug("call incomplete");
rxrpc_kernel_abort_call(afs_socket, call->rxcall,
RX_CALL_DEAD, -ret, abort_why);
+ } else if (call->error < 0) {
+ ret = call->error;
}
_debug("call complete");
Patches currently in stable-queue which might be from dhowells(a)redhat.com are
queue-4.9/afs-flush-outstanding-writes-when-an-fd-is-closed.patch
queue-4.9/afs-fix-the-maths-in-afs_fs_store_data.patch
queue-4.9/afs-populate-group-id-from-vnode-status.patch
queue-4.9/afs-prevent-callback-expiry-timer-overflow.patch
queue-4.9/crypto-rsa-fix-buffer-overread-when-stripping-leading-zeroes.patch
queue-4.9/afs-adjust-mode-bits-processing.patch
queue-4.9/rxrpc-wake-up-the-transmitter-if-rx-window-size-increases-on-the-peer.patch
queue-4.9/afs-invalid-op-id-should-abort-with-rxgen_opcode.patch
queue-4.9/afs-fix-page-leak-in-afs_write_begin.patch
queue-4.9/afs-better-abort-and-net-error-handling.patch
queue-4.9/afs-fix-missing-put_page.patch
queue-4.9/afs-migrate-vlocation-fields-to-64-bit.patch
queue-4.9/rxrpc-ignore-busy-packets-on-old-calls.patch
queue-4.9/afs-populate-and-use-client-modification-time.patch
queue-4.9/afs-fix-afs_kill_pages.patch
queue-4.9/afs-fix-abort-on-signal-while-waiting-for-call-completion.patch
This is a note to let you know that I've just added the patch titled
afs: Adjust mode bits processing
to the 4.9-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
afs-adjust-mode-bits-processing.patch
and it can be found in the queue-4.9 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 14:12:34 CET 2017
From: Marc Dionne <marc.dionne(a)auristor.com>
Date: Thu, 16 Mar 2017 16:27:44 +0000
Subject: afs: Adjust mode bits processing
From: Marc Dionne <marc.dionne(a)auristor.com>
[ Upstream commit 627f46943ff90bcc32ddeb675d881c043c6fa2ae ]
Mode bits for an afs file should not be enforced in the usual
way.
For files, the absence of user bits can restrict file access
with respect to what is granted by the server.
These bits apply regardless of the owner or the current uid; the
rest of the mode bits (group, other) are ignored.
Signed-off-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/afs/security.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/fs/afs/security.c
+++ b/fs/afs/security.c
@@ -340,17 +340,22 @@ int afs_permission(struct inode *inode,
} else {
if (!(access & AFS_ACE_LOOKUP))
goto permission_denied;
+ if ((mask & MAY_EXEC) && !(inode->i_mode & S_IXUSR))
+ goto permission_denied;
if (mask & (MAY_EXEC | MAY_READ)) {
if (!(access & AFS_ACE_READ))
goto permission_denied;
+ if (!(inode->i_mode & S_IRUSR))
+ goto permission_denied;
} else if (mask & MAY_WRITE) {
if (!(access & AFS_ACE_WRITE))
goto permission_denied;
+ if (!(inode->i_mode & S_IWUSR))
+ goto permission_denied;
}
}
key_put(key);
- ret = generic_permission(inode, mask);
_leave(" = %d", ret);
return ret;
Patches currently in stable-queue which might be from marc.dionne(a)auristor.com are
queue-4.9/afs-populate-group-id-from-vnode-status.patch
queue-4.9/afs-adjust-mode-bits-processing.patch
queue-4.9/afs-populate-and-use-client-modification-time.patch
queue-4.9/afs-deal-with-an-empty-callback-array.patch
This is a note to let you know that I've just added the patch titled
xprtrdma: Don't defer fencing an async RPC's chunks
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
xprtrdma-don-t-defer-fencing-an-async-rpc-s-chunks.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 13:29:00 CET 2017
From: Chuck Lever <chuck.lever(a)oracle.com>
Date: Mon, 9 Oct 2017 12:03:26 -0400
Subject: xprtrdma: Don't defer fencing an async RPC's chunks
From: Chuck Lever <chuck.lever(a)oracle.com>
[ Upstream commit 8f66b1a529047a972cb9602a919c53a95f3d7a2b ]
In current kernels, waiting in xprt_release appears to be safe to
do. I had erroneously believed that for ASYNC RPCs, waiting of any
kind in xprt_release->xprt_rdma_free would result in deadlock. I've
done injection testing and consulted with Trond to confirm that
waiting in the RPC release path is safe.
For the very few times where RPC resources haven't yet been released
earlier by the reply handler, it is safe to wait synchronously in
xprt_rdma_free for invalidation rather than defering it to MR
recovery.
Note: When the QP is error state, posting a LocalInvalidate should
flush and mark the MR as bad. There is no way the remote HCA can
access that MR via a QP in error state, so it is effectively already
inaccessible and thus safe for the Upper Layer to access. The next
time the MR is used it should be recognized and cleaned up properly
by frwr_op_map.
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker(a)Netapp.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sunrpc/xprtrdma/transport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -686,7 +686,7 @@ xprt_rdma_free(struct rpc_task *task)
dprintk("RPC: %s: called on 0x%p\n", __func__, req->rl_reply);
if (!list_empty(&req->rl_registered))
- ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
+ ia->ri_ops->ro_unmap_sync(r_xprt, &req->rl_registered);
rpcrdma_unmap_sges(ia, req);
rpcrdma_buffer_put(req);
}
Patches currently in stable-queue which might be from chuck.lever(a)oracle.com are
queue-4.14/xprtrdma-don-t-defer-fencing-an-async-rpc-s-chunks.patch
queue-4.14/sunrpc-fix-a-race-in-the-receive-code-path.patch
This is a note to let you know that I've just added the patch titled
xfs: truncate pagecache before writeback in xfs_setattr_size()
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
xfs-truncate-pagecache-before-writeback-in-xfs_setattr_size.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 13:28:59 CET 2017
From: Eryu Guan <eguan(a)redhat.com>
Date: Wed, 1 Nov 2017 21:43:50 -0700
Subject: xfs: truncate pagecache before writeback in xfs_setattr_size()
From: Eryu Guan <eguan(a)redhat.com>
[ Upstream commit 350976ae21873b0d36584ea005076356431b8f79 ]
On truncate down, if new size is not block size aligned, we zero the
rest of block to avoid exposing stale data to user, and
iomap_truncate_page() skips zeroing if the range is already in
unwritten state or a hole. Then we writeback from on-disk i_size to
the new size if this range hasn't been written to disk yet, and
truncate page cache beyond new EOF and set in-core i_size.
The problem is that we could write data between di_size and newsize
before removing the page cache beyond newsize, as the extents may
still be in unwritten state right after a buffer write. As such, the
page of data that newsize lies in has not been zeroed by page cache
invalidation before it is written, and xfs_do_writepage() hasn't
triggered it's "zero data beyond EOF" case because we haven't
updated in-core i_size yet. Then a subsequent mmap read could see
non-zeros past EOF.
I occasionally see this in fsx runs in fstests generic/112, a
simplified fsx operation sequence is like (assuming 4k block size
xfs):
fallocate 0x0 0x1000 0x0 keep_size
write 0x0 0x1000 0x0
truncate 0x0 0x800 0x1000
punch_hole 0x0 0x800 0x800
mapread 0x0 0x800 0x800
where fallocate allocates unwritten extent but doesn't update
i_size, buffer write populates the page cache and extent is still
unwritten, truncate skips zeroing page past new EOF and writes the
page to disk, punch_hole invalidates the page cache, at last mapread
reads the block back and sees non-zero beyond EOF.
Fix it by moving truncate_setsize() to before writeback so the page
cache invalidation zeros the partial page at the new EOF. This also
triggers "zero data beyond EOF" in xfs_do_writepage() at writeback
time, because newsize has been set and page straddles the newsize.
Also fixed the wrong 'end' param of filemap_write_and_wait_range()
call while we're at it, the 'end' is inclusive and should be
'newsize - 1'.
Suggested-by: Dave Chinner <dchinner(a)redhat.com>
Signed-off-by: Eryu Guan <eguan(a)redhat.com>
Acked-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/xfs/xfs_iops.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -886,22 +886,6 @@ xfs_setattr_size(
return error;
/*
- * We are going to log the inode size change in this transaction so
- * any previous writes that are beyond the on disk EOF and the new
- * EOF that have not been written out need to be written here. If we
- * do not write the data out, we expose ourselves to the null files
- * problem. Note that this includes any block zeroing we did above;
- * otherwise those blocks may not be zeroed after a crash.
- */
- if (did_zeroing ||
- (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
- error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
- ip->i_d.di_size, newsize);
- if (error)
- return error;
- }
-
- /*
* We've already locked out new page faults, so now we can safely remove
* pages from the page cache knowing they won't get refaulted until we
* drop the XFS_MMAP_EXCL lock after the extent manipulations are
@@ -917,9 +901,29 @@ xfs_setattr_size(
* user visible changes). There's not much we can do about this, except
* to hope that the caller sees ENOMEM and retries the truncate
* operation.
+ *
+ * And we update in-core i_size and truncate page cache beyond newsize
+ * before writeback the [di_size, newsize] range, so we're guaranteed
+ * not to write stale data past the new EOF on truncate down.
*/
truncate_setsize(inode, newsize);
+ /*
+ * We are going to log the inode size change in this transaction so
+ * any previous writes that are beyond the on disk EOF and the new
+ * EOF that have not been written out need to be written here. If we
+ * do not write the data out, we expose ourselves to the null files
+ * problem. Note that this includes any block zeroing we did above;
+ * otherwise those blocks may not be zeroed after a crash.
+ */
+ if (did_zeroing ||
+ (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) {
+ error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping,
+ ip->i_d.di_size, newsize - 1);
+ if (error)
+ return error;
+ }
+
error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp);
if (error)
return error;
Patches currently in stable-queue which might be from eguan(a)redhat.com are
queue-4.14/ext4-fix-fdatasync-2-after-fallocate-2-operation.patch
queue-4.14/xfs-truncate-pagecache-before-writeback-in-xfs_setattr_size.patch
This is a note to let you know that I've just added the patch titled
xfs: return a distinct error code value for IGET_INCORE cache misses
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
xfs-return-a-distinct-error-code-value-for-iget_incore-cache-misses.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 13:28:59 CET 2017
From: "Darrick J. Wong" <darrick.wong(a)oracle.com>
Date: Tue, 17 Oct 2017 21:37:32 -0700
Subject: xfs: return a distinct error code value for IGET_INCORE cache misses
From: "Darrick J. Wong" <darrick.wong(a)oracle.com>
[ Upstream commit ed438b476b611c67089760037139f93ea8ed41d5 ]
For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache,
return ENODATA so that we don't confuse it with the pre-existing ENOENT
cases (inode is in cache, but freed).
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Reviewed-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/xfs/xfs_icache.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -610,7 +610,7 @@ again:
} else {
rcu_read_unlock();
if (flags & XFS_IGET_INCORE) {
- error = -ENOENT;
+ error = -ENODATA;
goto out_error_or_again;
}
XFS_STATS_INC(mp, xs_ig_missed);
Patches currently in stable-queue which might be from darrick.wong(a)oracle.com are
queue-4.14/xfs-fix-incorrect-extent-state-in-xfs_bmap_add_extent_unwritten_real.patch
queue-4.14/xfs-fix-log-block-underflow-during-recovery-cycle-verification.patch
queue-4.14/xfs-truncate-pagecache-before-writeback-in-xfs_setattr_size.patch
queue-4.14/xfs-return-a-distinct-error-code-value-for-iget_incore-cache-misses.patch
This is a note to let you know that I've just added the patch titled
xfs: fix log block underflow during recovery cycle verification
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
xfs-fix-log-block-underflow-during-recovery-cycle-verification.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Mon Dec 18 13:28:59 CET 2017
From: Brian Foster <bfoster(a)redhat.com>
Date: Thu, 26 Oct 2017 09:31:16 -0700
Subject: xfs: fix log block underflow during recovery cycle verification
From: Brian Foster <bfoster(a)redhat.com>
[ Upstream commit 9f2a4505800607e537e9dd9dea4f55c4b0c30c7a ]
It is possible for mkfs to format very small filesystems with too
small of an internal log with respect to the various minimum size
and block count requirements. If this occurs when the log happens to
be smaller than the scan window used for cycle verification and the
scan wraps the end of the log, the start_blk calculation in
xlog_find_head() underflows and leads to an attempt to scan an
invalid range of log blocks. This results in log recovery failure
and a failed mount.
Since there may be filesystems out in the wild with this kind of
geometry, we cannot simply refuse to mount. Instead, cap the scan
window for cycle verification to the size of the physical log. This
ensures that the cycle verification proceeds as expected when the
scan wraps the end of the log.
Reported-by: Zorro Lang <zlang(a)redhat.com>
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/xfs/xfs_log_recover.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -753,7 +753,7 @@ xlog_find_head(
* in the in-core log. The following number can be made tighter if
* we actually look at the block size of the filesystem.
*/
- num_scan_bblks = XLOG_TOTAL_REC_SHIFT(log);
+ num_scan_bblks = min_t(int, log_bbnum, XLOG_TOTAL_REC_SHIFT(log));
if (head_blk >= num_scan_bblks) {
/*
* We are guaranteed that the entire check can be performed
Patches currently in stable-queue which might be from bfoster(a)redhat.com are
queue-4.14/xfs-fix-incorrect-extent-state-in-xfs_bmap_add_extent_unwritten_real.patch
queue-4.14/xfs-fix-log-block-underflow-during-recovery-cycle-verification.patch
queue-4.14/xfs-truncate-pagecache-before-writeback-in-xfs_setattr_size.patch
queue-4.14/xfs-return-a-distinct-error-code-value-for-iget_incore-cache-misses.patch