From: Josef Bacik <jbacik(a)fb.com>
We discovered a box that had double allocations, and suspected the space
cache may be to blame. While auditing the write out path I noticed that
if we've already setup the space cache we will just carry on. This
means that any error we hit after cache_save_setup before we go to
actually write the cache out we won't reset the inode generation, so
whatever was already written will be considered correct, except it'll be
stale. Fix this by _always_ resetting the generation on the block group
inode, this way we only ever have valid or invalid cache.
With this patch I was no longer able to reproduce cache corruption with
dm-log-writes and my bpf error injection tool.
Cc: stable(a)vger.kernel.org
Signed-off-by: Josef Bacik <jbacik(a)fb.com>
---
fs/btrfs/extent-tree.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index e2d7e86b51d1..67b26b209e23 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -3526,13 +3526,6 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
goto again;
}
- /* We've already setup this transaction, go ahead and exit */
- if (block_group->cache_generation == trans->transid &&
- i_size_read(inode)) {
- dcs = BTRFS_DC_SETUP;
- goto out_put;
- }
-
/*
* We want to set the generation to 0, that way if anything goes wrong
* from here on out we know not to trust this cache when we load up next
@@ -3556,6 +3549,13 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group,
}
WARN_ON(ret);
+ /* We've already setup this transaction, go ahead and exit */
+ if (block_group->cache_generation == trans->transid &&
+ i_size_read(inode)) {
+ dcs = BTRFS_DC_SETUP;
+ goto out_put;
+ }
+
if (i_size_read(inode) > 0) {
ret = btrfs_check_trunc_cache_free_space(fs_info,
&fs_info->global_block_rsv);
--
2.7.5
From: Bob Moore <robert.moore(a)intel.com>
[ Upstream commit 57707a9a7780fab426b8ae9b4c7b65b912a748b3 ]
ACPICA commit 9f76de2d249b18804e35fb55d14b1c2604d627a1
ACPICA commit b2e89d72ef1e9deefd63c3fd1dee90f893575b3a
ACPICA commit 23b5bbe6d78afd3c5abf3adb91a1b098a3000b2e
The declared buffer length must be the same as the length of the
byte initializer list, otherwise not a valid resource descriptor.
Link: https://github.com/acpica/acpica/commit/9f76de2d
Link: https://github.com/acpica/acpica/commit/b2e89d72
Link: https://github.com/acpica/acpica/commit/23b5bbe6
Signed-off-by: Bob Moore <robert.moore(a)intel.com>
Signed-off-by: Lv Zheng <lv.zheng(a)intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Sasha Levin <alexander.levin(a)verizon.com>
---
drivers/acpi/acpica/utresrc.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/acpi/acpica/utresrc.c b/drivers/acpi/acpica/utresrc.c
index 1de3376da66a..2ad99ea3d496 100644
--- a/drivers/acpi/acpica/utresrc.c
+++ b/drivers/acpi/acpica/utresrc.c
@@ -421,8 +421,10 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state,
ACPI_FUNCTION_TRACE(ut_walk_aml_resources);
- /* The absolute minimum resource template is one end_tag descriptor */
-
+ /*
+ * The absolute minimum resource template is one end_tag descriptor.
+ * However, we will treat a lone end_tag as just a simple buffer.
+ */
if (aml_length < sizeof(struct aml_resource_end_tag)) {
return_ACPI_STATUS(AE_AML_NO_RESOURCE_END_TAG);
}
@@ -454,9 +456,8 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state,
/* Invoke the user function */
if (user_function) {
- status =
- user_function(aml, length, offset, resource_index,
- context);
+ status = user_function(aml, length, offset,
+ resource_index, context);
if (ACPI_FAILURE(status)) {
return_ACPI_STATUS(status);
}
@@ -480,6 +481,12 @@ acpi_ut_walk_aml_resources(struct acpi_walk_state *walk_state,
*context = aml;
}
+ /* Check if buffer is defined to be longer than the resource length */
+
+ if (aml_length > (offset + length)) {
+ return_ACPI_STATUS(AE_AML_NO_RESOURCE_END_TAG);
+ }
+
/* Normal exit */
return_ACPI_STATUS(AE_OK);
--
2.11.0
From: Andreas Rohner <andreas.rohner(a)gmx.net>
Subject: nilfs2: fix race condition that causes file system corruption
There is a race condition between nilfs_dirty_inode() and
nilfs_set_file_dirty().
When a file is opened, nilfs_dirty_inode() is called to update the access
timestamp in the inode. It calls __nilfs_mark_inode_dirty() in a separate
transaction. __nilfs_mark_inode_dirty() caches the ifile buffer_head in
the i_bh field of the inode info structure and marks it as dirty.
After some data was written to the file in another transaction, the
function nilfs_set_file_dirty() is called, which adds the inode to the
ns_dirty_files list.
Then the segment construction calls nilfs_segctor_collect_dirty_files(),
which goes through the ns_dirty_files list and checks the i_bh field. If
there is a cached buffer_head in i_bh it is not marked as dirty again.
Since nilfs_dirty_inode() and nilfs_set_file_dirty() use separate
transactions, it is possible that a segment construction that writes out
the ifile occurs in-between the two. If this happens the inode is not on
the ns_dirty_files list, but its ifile block is still marked as dirty and
written out.
In the next segment construction, the data for the file is written out and
nilfs_bmap_propagate() updates the b-tree. Eventually the bmap root is
written into the i_bh block, which is not dirty, because it was written
out in another segment construction.
As a result the bmap update can be lost, which leads to file system
corruption. Either the virtual block address points to an unallocated DAT
block, or the DAT entry will be reused for something different.
The error can remain undetected for a long time. A typical error message
would be one of the "bad btree" errors or a warning that a DAT entry could
not be found.
This bug can be reproduced reliably by a simple benchmark that creates and
overwrites millions of 4k files.
Link: http://lkml.kernel.org/r/1509367935-3086-2-git-send-email-konishi.ryusuke@l…
Signed-off-by: Andreas Rohner <andreas.rohner(a)gmx.net>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)lab.ntt.co.jp>
Tested-by: Andreas Rohner <andreas.rohner(a)gmx.net>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)lab.ntt.co.jp>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff -puN fs/nilfs2/segment.c~nilfs2-fix-race-condition-that-causes-file-system-corruption fs/nilfs2/segment.c
--- a/fs/nilfs2/segment.c~nilfs2-fix-race-condition-that-causes-file-system-corruption
+++ a/fs/nilfs2/segment.c
@@ -1954,8 +1954,6 @@ static int nilfs_segctor_collect_dirty_f
err, ii->vfs_inode.i_ino);
return err;
}
- mark_buffer_dirty(ibh);
- nilfs_mdt_mark_dirty(ifile);
spin_lock(&nilfs->ns_inode_lock);
if (likely(!ii->i_bh))
ii->i_bh = ibh;
@@ -1964,6 +1962,10 @@ static int nilfs_segctor_collect_dirty_f
goto retry;
}
+ // Always redirty the buffer to avoid race condition
+ mark_buffer_dirty(ii->i_bh);
+ nilfs_mdt_mark_dirty(ifile);
+
clear_bit(NILFS_I_QUEUED, &ii->i_state);
set_bit(NILFS_I_BUSY, &ii->i_state);
list_move_tail(&ii->i_dirty, &sci->sc_dirty_files);
_
From: NeilBrown <neilb(a)suse.com>
Subject: autofs: don't fail mount for transient error
Currently if the autofs kernel module gets an error when writing to the
pipe which links to the daemon, then it marks the whole moutpoint as
catatonic, and it will stop working.
It is possible that the error is transient. This can happen if the daemon
is slow and more than 16 requests queue up. If a subsequent process tries
to queue a request, and is then signalled, the write to the pipe will
return -ERESTARTSYS and autofs will take that as total failure.
So change the code to assess -ERESTARTSYS and -ENOMEM as transient
failures which only abort the current request, not the whole mountpoint.
It isn't a crash or a data corruption, but having autofs mountpoints
suddenly stop working is rather inconvenient.
Ian said:
: And given the problems with a half dozen (or so) user space applications
: consuming large amounts of CPU under heavy mount and umount activity this
: could happen more easily than we expect.
Link: http://lkml.kernel.org/r/87y3norvgp.fsf@notabene.neil.brown.name
Signed-off-by: NeilBrown <neilb(a)suse.com>
Acked-by: Ian Kent <raven(a)themaw.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/autofs4/waitq.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff -puN fs/autofs4/waitq.c~autofs-dont-fail-mount-for-transient-error fs/autofs4/waitq.c
--- a/fs/autofs4/waitq.c~autofs-dont-fail-mount-for-transient-error
+++ a/fs/autofs4/waitq.c
@@ -81,7 +81,8 @@ static int autofs4_write(struct autofs_s
spin_unlock_irqrestore(¤t->sighand->siglock, flags);
}
- return (bytes > 0);
+ /* if 'wr' returned 0 (impossible) we assume -EIO (safe) */
+ return bytes == 0 ? 0 : wr < 0 ? wr : -EIO;
}
static void autofs4_notify_daemon(struct autofs_sb_info *sbi,
@@ -95,6 +96,7 @@ static void autofs4_notify_daemon(struct
} pkt;
struct file *pipe = NULL;
size_t pktsz;
+ int ret;
pr_debug("wait id = 0x%08lx, name = %.*s, type=%d\n",
(unsigned long) wq->wait_queue_token,
@@ -169,7 +171,18 @@ static void autofs4_notify_daemon(struct
mutex_unlock(&sbi->wq_mutex);
if (autofs4_write(sbi, pipe, &pkt, pktsz))
+ switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) {
+ case 0:
+ break;
+ case -ENOMEM:
+ case -ERESTARTSYS:
+ /* Just fail this one */
+ autofs4_wait_release(sbi, wq->wait_queue_token, ret);
+ break;
+ default:
autofs4_catatonic_mode(sbi);
+ break;
+ }
fput(pipe);
}
_