From: Martin Wilck <mwilck(a)suse.com>
We have observed a few crashes run_timer_softirq(), where a broken
timer_list struct belonging to an anatt_timer was encountered. The broken
structures look like this, and we see actually multiple ones attached to
the same timer base:
crash> struct timer_list 0xffff92471bcfdc90
struct timer_list {
entry = {
next = 0xdead000000000122, // LIST_POISON2
pprev = 0x0
},
expires = 4296022933,
function = 0xffffffffc06de5e0 <nvme_anatt_timeout>,
flags = 20
}
If such a timer is encountered in run_timer_softirq(), the kernel
crashes. The test scenario was an I/O load test with lots of NVMe
controllers, some of which were removed and re-added on the storage side.
I think this may happen if the rdma recovery_work starts, in this call
chain:
nvme_rdma_error_recovery_work()
/* this stops all sorts of activity for the controller, but not
the multipath-related work queue and timer */
nvme_rdma_reconnect_or_remove(ctrl)
=> kicks reconnect_work
work queue: reconnect_work
nvme_rdma_reconnect_ctrl_work()
nvme_rdma_setup_ctrl()
nvme_rdma_configure_admin_queue()
nvme_init_identify()
nvme_mpath_init()
# this sets some fields of the timer_list without taking a lock
timer_setup()
nvme_read_ana_log()
mod_timer() or del_timer_sync()
Similar for TCP. The idea for the patch is based on the observation that
nvme_rdma_reset_ctrl_work() calls nvme_stop_ctrl()->nvme_mpath_stop(),
whereas nvme_rdma_error_recovery_work() stops only the keepalive timer, but
not the anatt timer. Also, nvme_mpath_init() is the only place where
the anatt_timer structure is accessed without locking.
[The following paragraph was contributed by Chao Leng <lengchao(a)huawei.com>]
The process maybe:1.ana_work add the timer;2.error recovery occurs,
in reconnecting, reinitialize the timer and call nvme_read_ana_log,
nvme_read_ana_log may add the timer again.
The same timer is added twice, crash will happens later.
This situation has actually been observed in a crash dump, where we
found an anatt timer pending that had been started ~80s ago, despite a
log message telling that the anatt timer for the same controller had
timed out a few seconds ago. This could only be explained by the same
timer having been attached multiple times.
Signed-off-by: Martin Wilck <mwilck(a)suse.com>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Chao Leng <lengchao(a)huawei.com>
Cc: stable(a)vger.kernel.org
----
Changes in v4: Updated commit message with Chao Leng's analysis, as
suggested by Daniel Wagner.
Changes in v3: Changed the subject line, as suggested by Sagi Grimberg
Changes in v2: Moved call to nvme_mpath_stop() further down, directly before
the call of nvme_rdma_reconnect_or_remove() (Chao Leng)
---
drivers/nvme/host/multipath.c | 1 +
drivers/nvme/host/rdma.c | 1 +
drivers/nvme/host/tcp.c | 1 +
3 files changed, 3 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index a1d476e1ac02..c63dd5dfa7ff 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -586,6 +586,7 @@ void nvme_mpath_stop(struct nvme_ctrl *ctrl)
del_timer_sync(&ctrl->anatt_timer);
cancel_work_sync(&ctrl->ana_work);
}
+EXPORT_SYMBOL_GPL(nvme_mpath_stop);
#define SUBSYS_ATTR_RW(_name, _mode, _show, _store) \
struct device_attribute subsys_attr_##_name = \
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index be905d4fdb47..fc07a7b0dc1d 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -1202,6 +1202,7 @@ static void nvme_rdma_error_recovery_work(struct work_struct *work)
return;
}
+ nvme_mpath_stop(&ctrl->ctrl);
nvme_rdma_reconnect_or_remove(ctrl);
}
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index a0f00cb8f9f3..46287b4f4d10 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -2068,6 +2068,7 @@ static void nvme_tcp_error_recovery_work(struct work_struct *work)
return;
}
+ nvme_mpath_stop(ctrl);
nvme_tcp_reconnect_or_remove(ctrl);
}
--
2.31.1
The problem was in calculate_skip() function.
int skip = calculate_skip(i_size_read(inode) >> msblk->block_log);
i_size_read(inode) and msblk->block_log are unsigned integers,
but calculate_skip had a signed int as argument. This cast led
to wrong skip value and then to divide by zero bug.
Fixes: 1701aecb6849 ("Squashfs: regular file operations")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+e8f781243ce16ac2f962(a)syzkaller.appspotmail.com
Signed-off-by: Pavel Skripkin <paskripkin(a)gmail.com>
---
fs/squashfs/file.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index 7b1128398976..2ebcbd4f84cc 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -44,8 +44,8 @@
* Locate cache slot in range [offset, index] for specified inode. If
* there's more than one return the slot closest to index.
*/
-static struct meta_index *locate_meta_index(struct inode *inode, int offset,
- int index)
+static struct meta_index *locate_meta_index(struct inode *inode, unsigned int offset,
+ unsigned int index)
{
struct meta_index *meta = NULL;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
@@ -83,8 +83,8 @@ static struct meta_index *locate_meta_index(struct inode *inode, int offset,
/*
* Find and initialise an empty cache slot for index offset.
*/
-static struct meta_index *empty_meta_index(struct inode *inode, int offset,
- int skip)
+static struct meta_index *empty_meta_index(struct inode *inode, unsigned int offset,
+ unsigned int skip)
{
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
struct meta_index *meta = NULL;
@@ -211,11 +211,11 @@ static long long read_indexes(struct super_block *sb, int n,
* If the skip factor is limited in this way then the file will use multiple
* slots.
*/
-static inline int calculate_skip(int blocks)
+static inline unsigned int calculate_skip(unsigned int blocks)
{
- int skip = blocks / ((SQUASHFS_META_ENTRIES + 1)
+ unsigned int skip = blocks / ((SQUASHFS_META_ENTRIES + 1)
* SQUASHFS_META_INDEXES);
- return min(SQUASHFS_CACHED_BLKS - 1, skip + 1);
+ return min((unsigned int) SQUASHFS_CACHED_BLKS - 1, skip + 1);
}
@@ -224,12 +224,12 @@ static inline int calculate_skip(int blocks)
* on-disk locations of the datablock and block list metadata block
* <index_block, index_offset> for index (scaled to nearest cache index).
*/
-static int fill_meta_index(struct inode *inode, int index,
+static int fill_meta_index(struct inode *inode, unsigned int index,
u64 *index_block, int *index_offset, u64 *data_block)
{
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
- int skip = calculate_skip(i_size_read(inode) >> msblk->block_log);
- int offset = 0;
+ unsigned int skip = calculate_skip(i_size_read(inode) >> msblk->block_log);
+ unsigned int offset = 0;
struct meta_index *meta;
struct meta_entry *meta_entry;
u64 cur_index_block = squashfs_i(inode)->block_list_start;
@@ -323,7 +323,7 @@ static int fill_meta_index(struct inode *inode, int index,
* Get the on-disk location and compressed size of the datablock
* specified by index. Fill_meta_index() does most of the work.
*/
-static int read_blocklist(struct inode *inode, int index, u64 *block)
+static int read_blocklist(struct inode *inode, unsigned int index, u64 *block)
{
u64 start;
long long blks;
@@ -448,7 +448,7 @@ static int squashfs_readpage(struct file *file, struct page *page)
{
struct inode *inode = page->mapping->host;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
- int index = page->index >> (msblk->block_log - PAGE_SHIFT);
+ unsigned int index = page->index >> (msblk->block_log - PAGE_SHIFT);
int file_end = i_size_read(inode) >> msblk->block_log;
int expected = index == file_end ?
(i_size_read(inode) & (msblk->block_size - 1)) :
--
2.31.1
Hi,
Please may we request that commit 6e6026f2dd20 ("igb: Enable RSS for
Intel I211 Ethernet Controller") be backported to the 5.4 and 5.10 LTS
kernels?
The Intel i211 Ethernet Controller supports 2 Receive Side Scaling
(RSS) queues, the patch corrects the issue that the i211 should not be
excluded from having this feature enabled.
Best regards,
Nick
---------- Forwarded message ---------
From: Jakub Kicinski <kuba(a)kernel.org>
Date: Mon, 3 May 2021 at 19:30
Subject: Re: [PATCH net] igb: Enable RSS for Intel I211 Ethernet Controller
To: Nick Lowe <nick.lowe(a)gmail.com>
Cc: Matt Corallo <linux-wired-list(a)bluematt.me>, Nguyen, Anthony L
<anthony.l.nguyen(a)intel.com>, netdev(a)vger.kernel.org
<netdev(a)vger.kernel.org>, davem(a)davemloft.net <davem(a)davemloft.net>,
Brandeburg, Jesse <jesse.brandeburg(a)intel.com>,
intel-wired-lan(a)lists.osuosl.org <intel-wired-lan(a)lists.osuosl.org>
On Mon, 3 May 2021 13:32:24 +0100 Nick Lowe wrote:
> Hi all,
>
> Now that the 5.12 kernel has released, please may we consider
> backporting commit 6e6026f2dd2005844fb35c3911e8083c09952c6c to both
> the 5.4 and 5.10 LTS kernels so that RSS starts to function with the
> i211?
No objections here. Please submit the backport request to stable@.
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opt…