Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 62b84b3cf313 Linux 4.19.8-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Patch merge: OK
Compile: OK
Kernel tests: OK
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 2 architectures:
aarch64:
make options: make INSTALL_MOD_STRIP=1 -j56 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/62b84b3cf313e064c6ad249e95…
x86_64:
make options: make INSTALL_MOD_STRIP=1 -j56 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/62b84b3cf313e064c6ad249e95a…
Hardware testing
----------------
We booted each kernel and ran the following tests:
arm64:
/distribution/kpkginstall (boot test)
LTP lite - release 20180515
xfstests: ext4
xfstests: xfs
/kernel/misc/amtu
x86_64:
/distribution/kpkginstall (boot test)
LTP lite - release 20180515
xfstests: ext4
xfstests: xfs
/kernel/misc/amtu
From: Josef Bacik <jbacik(a)fb.com>
With my delayed refs patches in place we started seeing a large amount
of aborts in __btrfs_free_extent
BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0
Call Trace:
? btrfs_merge_delayed_refs+0xaf/0x340
__btrfs_run_delayed_refs+0x6ea/0xfc0
? btrfs_set_path_blocking+0x31/0x60
btrfs_run_delayed_refs+0xeb/0x180
btrfs_commit_transaction+0x179/0x7f0
? btrfs_check_space_for_delayed_refs+0x30/0x50
? should_end_transaction.isra.19+0xe/0x40
btrfs_drop_snapshot+0x41c/0x7c0
btrfs_clean_one_deleted_snapshot+0xb5/0xd0
cleaner_kthread+0xf6/0x120
kthread+0xf8/0x130
? btree_invalidatepage+0x90/0x90
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
This was because btrfs_drop_snapshot depends on the root not being modified
while it's dropping the snapshot. It will unlock the root node (and really
every node) as it walks down the tree, only to re-lock it when it needs to do
something. This is a problem because if we modify the tree we could cow a block
in our path, which free's our reference to that block. Then once we get back to
that shared block we'll free our reference to it again, and get ENOENT when
trying to lookup our extent reference to that block in __btrfs_free_extent.
This is ultimately happening because we have delayed items left to be processed
for our deleted snapshot _after_ all of the inodes are closed for the snapshot.
We only run the delayed inode item if we're deleting the inode, and even then we
do not run the delayed insertions or delayed removals. These can be run at any
point after our final inode does it's last iput, which is what triggers the
snapshot deletion. We can end up with the snapshot deletion happening and then
have the delayed items run on that file system, resulting in the above problem.
This problem has existed forever, however my patches made it much easier to hit
as I wake up the cleaner much more often to deal with delayed iputs, which made
us more likely to start the snapshot dropping work before the transaction
commits, which is when the delayed items would generally be run. Before,
generally speaking, we would run the delayed items, commit the transaction, and
wakeup the cleaner thread to start deleting snapshots, which means we were less
likely to hit this problem. You could still hit it if you had multiple
snapshots to be deleted and ended up with lots of delayed items, but it was
definitely harder.
Fix for now by simply running all the delayed items before starting to drop the
snapshot. We could make this smarter in the future by making the delayed items
per-root, and then simply drop any delayed items for roots that we are going to
delete. But for now just a quick and easy solution is the safest.
Cc: stable(a)vger.kernel.org
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
v1->v2:
- check for errors from btrfs_run_delayed_items.
- Dave I can reroll the series, but the second version of patch 1 is the same,
let me know what you want.
fs/btrfs/extent-tree.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index dcb699dd57f3..473084eb7a2d 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9330,6 +9330,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_free;
}
+ err = btrfs_run_delayed_items(trans);
+ if (err)
+ goto out_end_trans;
+
if (block_rsv)
trans->block_rsv = block_rsv;
--
2.14.3
This is a note to let you know that I've just added the patch titled
USB: serial: console: fix reported terminal settings
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From f51ccf46217c28758b1f3b5bc0ccfc00eca658b2 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Tue, 4 Dec 2018 17:00:36 +0100
Subject: USB: serial: console: fix reported terminal settings
The USB-serial console implementation has never reported the actual
terminal settings used. Despite storing the corresponding cflags in its
struct console, these were never honoured on later tty open() where the
tty termios would be left initialised to the driver defaults.
Unlike the serial console implementation, the USB-serial code calls
subdriver open() already at console setup. While calling set_termios()
and write() before open() looks like it could work for some USB-serial
drivers, others definitely do not expect this, so modelling this after
serial core is going to be intrusive, if at all possible.
Instead, use a (renamed) tty helper to save the termios data used at
console setup so that the tty termios reflects the actual terminal
settings after a subsequent tty open().
Note that the calls to tty_init_termios() (tty_driver_install()) and
tty_save_termios() are serialised using the disconnect mutex.
This specifically fixes a regression that was triggered by a recent
change adding software flow control to the pl2303 driver: a getty trying
to disable flow control while leaving the baud rate unchanged would now
also set the baud rate to the driver default (prior to the flow-control
change this had been a noop).
Fixes: 7041d9c3f01b ("USB: serial: pl2303: add support for tx xon/xoff flow control")
Cc: stable <stable(a)vger.kernel.org> # 4.18
Cc: Florian Zumbiehl <florz(a)florz.de>
Reported-by: Jarkko Nikula <jarkko.nikula(a)linux.intel.com>
Tested-by: Jarkko Nikula <jarkko.nikula(a)linux.intel.com>
Acked-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/tty/tty_io.c | 11 +++++++++--
drivers/usb/serial/console.c | 2 +-
include/linux/tty.h | 1 +
3 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index ee80dfbd5442..687250ec8032 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1373,7 +1373,13 @@ struct tty_struct *tty_init_dev(struct tty_driver *driver, int idx)
return ERR_PTR(retval);
}
-static void tty_free_termios(struct tty_struct *tty)
+/**
+ * tty_save_termios() - save tty termios data in driver table
+ * @tty: tty whose termios data to save
+ *
+ * Locking: Caller guarantees serialisation with tty_init_termios().
+ */
+void tty_save_termios(struct tty_struct *tty)
{
struct ktermios *tp;
int idx = tty->index;
@@ -1392,6 +1398,7 @@ static void tty_free_termios(struct tty_struct *tty)
}
*tp = tty->termios;
}
+EXPORT_SYMBOL_GPL(tty_save_termios);
/**
* tty_flush_works - flush all works of a tty/pty pair
@@ -1491,7 +1498,7 @@ static void release_tty(struct tty_struct *tty, int idx)
WARN_ON(!mutex_is_locked(&tty_mutex));
if (tty->ops->shutdown)
tty->ops->shutdown(tty);
- tty_free_termios(tty);
+ tty_save_termios(tty);
tty_driver_remove_tty(tty->driver, tty);
tty->port->itty = NULL;
if (tty->link)
diff --git a/drivers/usb/serial/console.c b/drivers/usb/serial/console.c
index 17940589c647..7d289302ff6c 100644
--- a/drivers/usb/serial/console.c
+++ b/drivers/usb/serial/console.c
@@ -101,7 +101,6 @@ static int usb_console_setup(struct console *co, char *options)
cflag |= PARENB;
break;
}
- co->cflag = cflag;
/*
* no need to check the index here: if the index is wrong, console
@@ -164,6 +163,7 @@ static int usb_console_setup(struct console *co, char *options)
serial->type->set_termios(tty, port, &dummy);
tty_port_tty_set(&port->port, NULL);
+ tty_save_termios(tty);
tty_kref_put(tty);
}
tty_port_set_initialized(&port->port, 1);
diff --git a/include/linux/tty.h b/include/linux/tty.h
index 414db2bce715..392138fe59b6 100644
--- a/include/linux/tty.h
+++ b/include/linux/tty.h
@@ -556,6 +556,7 @@ extern struct tty_struct *tty_init_dev(struct tty_driver *driver, int idx);
extern void tty_release_struct(struct tty_struct *tty, int idx);
extern int tty_release(struct inode *inode, struct file *filp);
extern void tty_init_termios(struct tty_struct *tty);
+extern void tty_save_termios(struct tty_struct *tty);
extern int tty_standard_install(struct tty_driver *driver,
struct tty_struct *tty);
--
2.19.2
Suppose adapter (open) recovery is between opened QDIO queues and before
(the end of) initial posting of status read buffers (SRBs). This time
window can be seconds long due to FSF_PROT_HOST_CONNECTION_INITIALIZING
causing by design looping with exponential increase sleeps in the function
performing exchange config data during recovery
[zfcp_erp_adapter_strat_fsf_xconf()]. Recovery triggered by local link up.
Suppose an event occurs for which the FCP channel would send an
unsolicited notification to zfcp by means of a previously posted SRB.
We saw it with local cable pull (link down) in multi-initiator zoning
with multiple NPIV-enabled subchannels of the same shared FCP channel.
As soon as zfcp_erp_adapter_strategy_open_fsf() starts posting the
initial status read buffers from within the adapter's ERP thread,
the channel does send an unsolicited notification.
Since v2.6.27 commit d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted
status can lead to I/O stall"), zfcp_fsf_status_read_handler() schedules
adapter->stat_work to re-fill the just consumed SRB from a work item.
Now the ERP thread and the work item post SRBs in parallel.
Both contexts call the helper function zfcp_status_read_refill().
The tracking of missing (to be posted / re-filled) SRBs is not thread-safe
due to separate atomic_read() and atomic_dec(), in order to depend on
posting success. Hence, both contexts can see
atomic_read(&adapter->stat_miss) == 1. One of the two contexts posts
one too many SRB. Zfcp gets QDIO_ERROR_SLSB_STATE on the output queue
(trace tag "qdireq1") leading to zfcp_erp_adapter_shutdown() in
zfcp_qdio_handler_error().
An obvious and seemingly clean fix would be to schedule stat_work
from the ERP thread and wait for it to finish. This would serialize
all SRB re-fills. However, we already have another work item wait on the
ERP thread: adapter->scan_work runs zfcp_fc_scan_ports() which calls
zfcp_fc_eval_gpn_ft(). The latter calls zfcp_erp_wait() to wait for all
the open port recoveries during zfcp auto port scan, but in fact it waits
for any pending recovery including an adapter recovery. This approach
leads to a deadlock.
[see also v3.19 commit 18f87a67e6d6 ("zfcp: auto port scan resiliency");
v2.6.37 commit d3e1088d6873
("[SCSI] zfcp: No ERP escalation on gpn_ft eval");
v2.6.28 commit fca55b6fb587
("[SCSI] zfcp: fix deadlock between wq triggered port scan and ERP")
fixing v2.6.27 commit c57a39a45a76
("[SCSI] zfcp: wait until adapter is finished with ERP during auto-port");
v2.6.27 commit cc8c282963bd
("[SCSI] zfcp: Automatically attach remote ports")]
Instead make the accounting of missing SRBs atomic for parallel
execution in both the ERP thread and adapter->stat_work.
Signed-off-by: Steffen Maier <maier(a)linux.ibm.com>
Fixes: d26ab06ede83 ("[SCSI] zfcp: receiving an unsolicted status can lead to I/O stall")
Cc: <stable(a)vger.kernel.org> #2.6.27+
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
---
drivers/s390/scsi/zfcp_aux.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/s390/scsi/zfcp_aux.c b/drivers/s390/scsi/zfcp_aux.c
index df10f4e07a4a..882789fff574 100644
--- a/drivers/s390/scsi/zfcp_aux.c
+++ b/drivers/s390/scsi/zfcp_aux.c
@@ -271,16 +271,16 @@ static void zfcp_free_low_mem_buffers(struct zfcp_adapter *adapter)
*/
int zfcp_status_read_refill(struct zfcp_adapter *adapter)
{
- while (atomic_read(&adapter->stat_miss) > 0)
+ while (atomic_add_unless(&adapter->stat_miss, -1, 0))
if (zfcp_fsf_status_read(adapter->qdio)) {
+ atomic_inc(&adapter->stat_miss); /* undo add -1 */
if (atomic_read(&adapter->stat_miss) >=
adapter->stat_read_buf_num) {
zfcp_erp_adapter_reopen(adapter, 0, "axsref1");
return 1;
}
break;
- } else
- atomic_dec(&adapter->stat_miss);
+ }
return 0;
}
--
2.16.4