From: Baokun Li <libaokun1(a)huawei.com>
[ Upstream commit 9e8e819f8f272c4e5dcd0bd6c7450e36481ed139 ]
When setting values of type unsigned int through sysfs, we use kstrtoul()
to parse it and then truncate part of it as the final set value, when the
set value is greater than UINT_MAX, the set value will not match what we
see because of the truncation. As follows:
$ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
$ cat /sys/fs/ext4/sda/mb_max_linear_groups
0
So we use kstrtouint() to parse the attr_pointer_ui type to avoid the
inconsistency described above. In addition, a judgment is added to avoid
setting s_resv_clusters less than 0.
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20240319113325.3110393-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Stable-dep-of: 13df4d44a3aa ("ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()")
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
fs/ext4/sysfs.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 6d332dff79dd..ca820620b974 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
int ret;
ret = kstrtoull(skip_spaces(buf), 0, &val);
- if (ret || val >= clusters)
+ if (ret || val >= clusters || (s64)val < 0)
return -EINVAL;
atomic64_set(&sbi->s_resv_clusters, val);
@@ -451,7 +451,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
s_kobj);
struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
void *ptr = calc_ptr(a, sbi);
- unsigned long t;
+ unsigned int t;
+ unsigned long lt;
int ret;
switch (a->attr_id) {
@@ -460,7 +461,7 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
case attr_pointer_ui:
if (!ptr)
return 0;
- ret = kstrtoul(skip_spaces(buf), 0, &t);
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
if (ret)
return ret;
if (a->attr_ptr == ptr_ext4_super_block_offset)
@@ -471,10 +472,10 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
case attr_pointer_ul:
if (!ptr)
return 0;
- ret = kstrtoul(skip_spaces(buf), 0, &t);
+ ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)
return ret;
- *((unsigned long *) ptr) = t;
+ *((unsigned long *) ptr) = lt;
return len;
case attr_inode_readahead:
return inode_readahead_blks_store(sbi, buf, len);
--
2.39.2
Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
fully fix the potential null pointer dereference issue. While
gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
and gs_start_tx() release the lock during endpoint request submission.
This creates a window where gs_close() could set port->port_tty to NULL,
leading to a dereference when the lock is reacquired.
This patch adds a null pointer check for port->port_tty after RX/TX
submission, and removes the initial null pointer check in gs_start_io()
since the caller must hold port_lock and guarantee non-null values for
port_usb and port_tty.
Fixes: ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in gs_start_io")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kuen-Han Tsai <khtsai(a)google.com>
---
Explanation:
CPU1: CPU2:
gserial_connect() // lock
gs_close() // await lock
gs_start_rx() // unlock
usb_ep_queue()
gs_close() // lock, reset port_tty and unlock
gs_start_rx() // lock
tty_wakeup() // dereference
Stack traces:
[ 51.494375][ T278] ttyGS1: shutdown
[ 51.494817][ T269] android_work: sent uevent USB_STATE=DISCONNECTED
[ 52.115792][ T1508] usb: [dm_bind] generic ttyGS1: super speed IN/ep1in OUT/ep1out
[ 52.516288][ T1026] android_work: sent uevent USB_STATE=CONNECTED
[ 52.551667][ T1533] gserial_connect: start ttyGS1
[ 52.565634][ T1533] [khtsai] enter gs_start_io, ttyGS1, port->port.tty=0000000046bd4060
[ 52.565671][ T1533] [khtsai] gs_start_rx, unlock port ttyGS1
[ 52.591552][ T1533] [khtsai] gs_start_rx, lock port ttyGS1
[ 52.619901][ T1533] [khtsai] gs_start_rx, unlock port ttyGS1
[ 52.638659][ T1325] [khtsai] gs_close, lock port ttyGS1
[ 52.656842][ T1325] gs_close: ttyGS1 (0000000046bd4060,00000000be9750a5) ...
[ 52.683005][ T1325] [khtsai] gs_close, clear ttyGS1
[ 52.683007][ T1325] gs_close: ttyGS1 (0000000046bd4060,00000000be9750a5) done!
[ 52.708643][ T1325] [khtsai] gs_close, unlock port ttyGS1
[ 52.747592][ T1533] [khtsai] gs_start_rx, lock port ttyGS1
[ 52.747616][ T1533] [khtsai] gs_start_io, ttyGS1, going to call tty_wakeup(), port->port.tty=0000000000000000
[ 52.747629][ T1533] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001f8
---
drivers/usb/gadget/function/u_serial.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index a92eb6d90976..2f1890c8f473 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -539,20 +539,16 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
static int gs_start_io(struct gs_port *port)
{
struct list_head *head = &port->read_pool;
- struct usb_ep *ep;
+ struct usb_ep *ep = port->port_usb->out;
int status;
unsigned started;
- if (!port->port_usb || !port->port.tty)
- return -EIO;
-
/* Allocate RX and TX I/O buffers. We can't easily do this much
* earlier (with GFP_KERNEL) because the requests are coupled to
* endpoints, as are the packet sizes we'll be using. Different
* configurations may use different endpoints with a given port;
* and high speed vs full speed changes packet sizes too.
*/
- ep = port->port_usb->out;
status = gs_alloc_requests(ep, head, gs_read_complete,
&port->read_allocated);
if (status)
@@ -569,12 +565,22 @@ static int gs_start_io(struct gs_port *port)
port->n_read = 0;
started = gs_start_rx(port);
+ /*
+ * The TTY may be set to NULL by gs_close() after gs_start_rx() or
+ * gs_start_tx() release locks for endpoint request submission.
+ */
+ if (!port->port.tty)
+ goto out;
+
if (started) {
gs_start_tx(port);
/* Unblock any pending writes into our circular buffer, in case
* we didn't in gs_start_tx() */
+ if (!port->port.tty)
+ goto out;
tty_wakeup(port->port.tty);
} else {
+out:
gs_free_requests(ep, head, &port->read_allocated);
gs_free_requests(port->port_usb->in, &port->write_pool,
&port->write_allocated);
--
2.43.0.275.g3460e3d667-goog
Since the below commit, there are regressions for legacy setups:
1/ conntracks are created while there are no listener
2/ a listener starts and dumps all conntracks to get the current state
3/ conntracks deleted before the listener has started are not advertised
This is problematic in containers, where conntracks could be created early.
This sysctl is part of unsafe sysctl and could not be changed easily in
some environments.
Let's switch back to the legacy behavior.
CC: stable(a)vger.kernel.org
Fixes: 90d1daa45849 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
---
Documentation/networking/nf_conntrack-sysctl.rst | 10 ++++++----
net/netfilter/nf_conntrack_ecache.c | 2 +-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index c383a394c665..edc04f99e1aa 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -34,13 +34,15 @@ nf_conntrack_count - INTEGER (read-only)
nf_conntrack_events - BOOLEAN
- 0 - disabled
- - 1 - enabled
- - 2 - auto (default)
+ - 1 - enabled (default)
+ - 2 - auto
If this option is enabled, the connection tracking code will
provide userspace with connection tracking events via ctnetlink.
- The default allocates the extension if a userspace program is
- listening to ctnetlink events.
+ The 'auto' allocates the extension if a userspace program is
+ listening to ctnetlink events. Note that conntracks created
+ before the first listener has started won't trigger any netlink
+ event.
nf_conntrack_expect_max - INTEGER
Maximum size of expectation table. Default value is
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 69948e1d6974..4c8559529e18 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -334,7 +334,7 @@ bool nf_ct_ecache_ext_add(struct nf_conn *ct, u16 ctmask, u16 expmask, gfp_t gfp
}
EXPORT_SYMBOL_GPL(nf_ct_ecache_ext_add);
-#define NF_CT_EVENTS_DEFAULT 2
+#define NF_CT_EVENTS_DEFAULT 1
static int nf_ct_events __read_mostly = NF_CT_EVENTS_DEFAULT;
void nf_conntrack_ecache_pernet_init(struct net *net)
--
2.43.1
There's no reason to have jbd2_journal_get_max_txn_bufs() public
function. Currently all users are internal and can use
journal->j_max_transaction_buffers instead. This saves some unnecessary
recomputations of the limit as a bonus which becomes important as this
function gets more complex in the following patch.
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/jbd2/commit.c | 2 +-
fs/jbd2/journal.c | 5 +++++
include/linux/jbd2.h | 5 -----
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 75ea4e9a5cab..e7fc912693bd 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -766,7 +766,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (first_block < journal->j_tail)
freed += journal->j_last - journal->j_first;
/* Update tail only if we free significant amount of space */
- if (freed < jbd2_journal_get_max_txn_bufs(journal))
+ if (freed < journal->j_max_transaction_buffers)
update_tail = 0;
}
J_ASSERT(commit_transaction->t_state == T_COMMIT);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 03c4b9214f56..1bb73750d307 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1698,6 +1698,11 @@ journal_t *jbd2_journal_init_inode(struct inode *inode)
return journal;
}
+static int jbd2_journal_get_max_txn_bufs(journal_t *journal)
+{
+ return (journal->j_total_len - journal->j_fc_wbufsize) / 4;
+}
+
/*
* Given a journal_t structure, initialise the various fields for
* startup of a new journaling session. We use this both when creating
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index ab04c1c27fae..f91b930abe20 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1660,11 +1660,6 @@ int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
int jbd2_fc_release_bufs(journal_t *journal);
-static inline int jbd2_journal_get_max_txn_bufs(journal_t *journal)
-{
- return (journal->j_total_len - journal->j_fc_wbufsize) / 4;
-}
-
/*
* is_journal_abort
*
--
2.35.3
From: Martin Wilck <martin.wilck(a)suse.com>
[ Upstream commit 10157b1fc1a762293381e9145041253420dfc6ad ]
When a host is configured with a few LUNs and I/O is running, injecting FC
faults repeatedly leads to path recovery problems. The LUNs have 4 paths
each and 3 of them come back active after say an FC fault which makes 2 of
the paths go down, instead of all 4. This happens after several iterations
of continuous FC faults.
Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC
ACCESS STATE TRANSITION) instead of retrying.
[mwilck: The original patch was developed by Rajashekhar M A and Hannes
Reinecke. I moved the code to alua_check_sense() as suggested by Mike
Christie [1]. Evan Milne had raised the question whether pg->state should
be set to transitioning in the UA case [2]. I believe that doing this is
correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
errors. Our handler schedules an RTPG, which will only result in an I/O
error condition if the transitioning timeout expires.]
[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVE…
Co-developed-by: Rajashekhar M A <rajs(a)netapp.com>
Co-developed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Martin Wilck <martin.wilck(a)suse.com>
Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Mike Christie <michael.christie(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 31 +++++++++++++++-------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 0781f991e7845..f5fc8631883d5 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -406,28 +406,40 @@ static char print_alua_state(unsigned char state)
}
}
-static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
- struct scsi_sense_hdr *sense_hdr)
+static void alua_handle_state_transition(struct scsi_device *sdev)
{
struct alua_dh_data *h = sdev->handler_data;
struct alua_port_group *pg;
+ rcu_read_lock();
+ pg = rcu_dereference(h->pg);
+ if (pg)
+ pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+ rcu_read_unlock();
+ alua_check(sdev, false);
+}
+
+static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
+ struct scsi_sense_hdr *sense_hdr)
+{
switch (sense_hdr->sense_key) {
case NOT_READY:
if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
/*
* LUN Not Accessible - ALUA state transition
*/
- rcu_read_lock();
- pg = rcu_dereference(h->pg);
- if (pg)
- pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
- rcu_read_unlock();
- alua_check(sdev, false);
+ alua_handle_state_transition(sdev);
return NEEDS_RETRY;
}
break;
case UNIT_ATTENTION:
+ if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
+ /*
+ * LUN Not Accessible - ALUA state transition
+ */
+ alua_handle_state_transition(sdev);
+ return NEEDS_RETRY;
+ }
if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
/*
* Power On, Reset, or Bus Device Reset.
@@ -494,7 +506,8 @@ static int alua_tur(struct scsi_device *sdev)
retval = scsi_test_unit_ready(sdev, ALUA_FAILOVER_TIMEOUT * HZ,
ALUA_FAILOVER_RETRIES, &sense_hdr);
- if (sense_hdr.sense_key == NOT_READY &&
+ if ((sense_hdr.sense_key == NOT_READY ||
+ sense_hdr.sense_key == UNIT_ATTENTION) &&
sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
return SCSI_DH_RETRY;
else if (retval)
--
2.43.0
In the cases where the power domain connected to logics is allowed to
transition from a level(L)-->power collapse(0)-->retention(1) or
vice versa retention(1)-->power collapse(0)-->level(L) will cause the
logic to lose the configurations. The ARC does not support retention
to collapse transition on MxC rails.
The targets from SM8450 onwards the PLL logics of clock controllers are
connected to MxC rails and the recommended configurations are carried
out during the clock controller probes. The MxC transition as mentioned
above should be skipped to ensure the PLL settings are intact across
clock controller power on & off.
On older targets that do not split MX into MxA and MxC does not collapse
the logic and it is parked always at RETENTION, thus this issue is never
observed on those targets.
Cc: stable(a)vger.kernel.org # v5.17
Reviewed-by: Bjorn Andersson <andersson(a)kernel.org>
Signed-off-by: Taniya Das <quic_tdas(a)quicinc.com>
---
[Changes in v2]: Incorporate the comments in the commit text.
---
drivers/pmdomain/qcom/rpmhpd.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/pmdomain/qcom/rpmhpd.c b/drivers/pmdomain/qcom/rpmhpd.c
index de9121ef4216..d2cb4271a1ca 100644
--- a/drivers/pmdomain/qcom/rpmhpd.c
+++ b/drivers/pmdomain/qcom/rpmhpd.c
@@ -40,6 +40,7 @@
* @addr: Resource address as looped up using resource name from
* cmd-db
* @state_synced: Indicator that sync_state has been invoked for the rpmhpd resource
+ * @skip_retention_level: Indicate that retention level should not be used for the power domain
*/
struct rpmhpd {
struct device *dev;
@@ -56,6 +57,7 @@ struct rpmhpd {
const char *res_name;
u32 addr;
bool state_synced;
+ bool skip_retention_level;
};
struct rpmhpd_desc {
@@ -173,6 +175,7 @@ static struct rpmhpd mxc = {
.pd = { .name = "mxc", },
.peer = &mxc_ao,
.res_name = "mxc.lvl",
+ .skip_retention_level = true,
};
static struct rpmhpd mxc_ao = {
@@ -180,6 +183,7 @@ static struct rpmhpd mxc_ao = {
.active_only = true,
.peer = &mxc,
.res_name = "mxc.lvl",
+ .skip_retention_level = true,
};
static struct rpmhpd nsp = {
@@ -819,6 +823,9 @@ static int rpmhpd_update_level_mapping(struct rpmhpd *rpmhpd)
return -EINVAL;
for (i = 0; i < rpmhpd->level_count; i++) {
+ if (rpmhpd->skip_retention_level && buf[i] == RPMH_REGULATOR_LEVEL_RETENTION)
+ continue;
+
rpmhpd->level[i] = buf[i];
/* Remember the first corner with non-zero level */
---
base-commit: 62c97045b8f720c2eac807a5f38e26c9ed512371
change-id: 20240625-avoid_mxc_retention-b095a761d981
Best regards,
--
Taniya Das <quic_tdas(a)quicinc.com>
From: Nilay Shroff <nilay(a)linux.ibm.com>
[ Upstream commit d3a043733f25d743f3aa617c7f82dbcb5ee2211a ]
In current native multipath design when a shared namespace is created,
we loop through each possible numa-node, calculate the NUMA distance of
that node from each nvme controller and then cache the optimal IO path
for future reference while sending IO. The issue with this design is that
we may refer to the NUMA distance table for an offline node which may not
be populated at the time and so we may inadvertently end up finding and
caching a non-optimal path for IO. Then latter when the corresponding
numa-node becomes online and hence the NUMA distance table entry for that
node is created, ideally we should re-calculate the multipath node distance
for the newly added node however that doesn't happen unless we rescan/reset
the controller. So essentially, we may keep using non-optimal IO path for a
node which is made online after namespace is created.
This patch helps fix this issue ensuring that when a shared namespace is
created, we calculate the multipath node distance for each online numa-node
instead of each possible numa-node. Then latter when a node becomes online
and we receive any IO on that newly added node, we would calculate the
multipath node distance for newly added node but this time NUMA distance
table would have been already populated for newly added node. Hence we
would be able to correctly calculate the multipath node distance and choose
the optimal path for the IO.
Signed-off-by: Nilay Shroff <nilay(a)linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/nvme/host/multipath.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 0a88d7bdc5e37..6a444ce273366 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -592,7 +592,7 @@ static void nvme_mpath_set_live(struct nvme_ns *ns)
int node, srcu_idx;
srcu_idx = srcu_read_lock(&head->srcu);
- for_each_node(node)
+ for_each_online_node(node)
__nvme_find_path(head, node);
srcu_read_unlock(&head->srcu, srcu_idx);
}
--
2.43.0