From: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
The qib driver load has been failing with the
following message:
[ 21.186138] sysfs: cannot create duplicate filename '/devices/pci0000:80/0000:80:02.0/0000:81:00.0/infiniband/qib0/ports/1/linkcontrol'
The patch below has two "linkcontrol" names causing the duplication.
Fix by using the correct "diag_counters" name on the second instance.
Fixes: 4a7aaf88c89f ("RDMA/qib: Use attributes for the port sysfs")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro(a)cornelisnetworks.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn(a)cornelisnetworks.com>
---
drivers/infiniband/hw/qib/qib_sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/qib/qib_sysfs.c b/drivers/infiniband/hw/qib/qib_sysfs.c
index 0a3b281..41c2729 100644
--- a/drivers/infiniband/hw/qib/qib_sysfs.c
+++ b/drivers/infiniband/hw/qib/qib_sysfs.c
@@ -541,7 +541,7 @@ static ssize_t rc_delayed_comp_store(struct ib_device *ibdev, u32 port_num,
};
static const struct attribute_group port_diagc_group = {
- .name = "linkcontrol",
+ .name = "diag_counters",
.attrs = port_diagc_attributes,
};
--
1.8.3.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c832962ac972082b3a1f89775c9d4274c8cb5670 Mon Sep 17 00:00:00 2001
From: Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
Date: Tue, 15 Feb 2022 18:53:03 +0200
Subject: [PATCH] net: bridge: multicast: notify switchdev driver whenever MC
processing gets disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).
This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.
Fix this by adding a notify message in such case.
Fixes: 147c1e9b902c ("switchdev: bridge: Offload multicast disabled")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
Acked-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de2409889489..db4f2641d1cd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -82,6 +82,9 @@ static void br_multicast_find_del_pg(struct net_bridge *br,
struct net_bridge_port_group *pg);
static void __br_multicast_stop(struct net_bridge_mcast *brmctx);
+static int br_mc_disabled_update(struct net_device *dev, bool value,
+ struct netlink_ext_ack *extack);
+
static struct net_bridge_port_group *
br_sg_port_find(struct net_bridge *br,
struct net_bridge_port_group_sg_key *sg_p)
@@ -1156,6 +1159,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
return mp;
if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
+ br_mc_disabled_update(br->dev, false, NULL);
br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
return ERR_PTR(-E2BIG);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c832962ac972082b3a1f89775c9d4274c8cb5670 Mon Sep 17 00:00:00 2001
From: Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
Date: Tue, 15 Feb 2022 18:53:03 +0200
Subject: [PATCH] net: bridge: multicast: notify switchdev driver whenever MC
processing gets disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).
This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.
Fix this by adding a notify message in such case.
Fixes: 147c1e9b902c ("switchdev: bridge: Offload multicast disabled")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
Acked-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de2409889489..db4f2641d1cd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -82,6 +82,9 @@ static void br_multicast_find_del_pg(struct net_bridge *br,
struct net_bridge_port_group *pg);
static void __br_multicast_stop(struct net_bridge_mcast *brmctx);
+static int br_mc_disabled_update(struct net_device *dev, bool value,
+ struct netlink_ext_ack *extack);
+
static struct net_bridge_port_group *
br_sg_port_find(struct net_bridge *br,
struct net_bridge_port_group_sg_key *sg_p)
@@ -1156,6 +1159,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
return mp;
if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
+ br_mc_disabled_update(br->dev, false, NULL);
br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
return ERR_PTR(-E2BIG);
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c832962ac972082b3a1f89775c9d4274c8cb5670 Mon Sep 17 00:00:00 2001
From: Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
Date: Tue, 15 Feb 2022 18:53:03 +0200
Subject: [PATCH] net: bridge: multicast: notify switchdev driver whenever MC
processing gets disabled
Whenever bridge driver hits the max capacity of MDBs, it disables
the MC processing (by setting corresponding bridge option), but never
notifies switchdev about such change (the notifiers are called only upon
explicit setting of this option, through the registered netlink interface).
This could lead to situation when Software MDB processing gets disabled,
but this event never gets offloaded to the underlying Hardware.
Fix this by adding a notify message in such case.
Fixes: 147c1e9b902c ("switchdev: bridge: Offload multicast disabled")
Signed-off-by: Oleksandr Mazur <oleksandr.mazur(a)plvision.eu>
Acked-by: Nikolay Aleksandrov <nikolay(a)nvidia.com>
Link: https://lore.kernel.org/r/20220215165303.31908-1-oleksandr.mazur@plvision.eu
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/bridge/br_multicast.c b/net/bridge/br_multicast.c
index de2409889489..db4f2641d1cd 100644
--- a/net/bridge/br_multicast.c
+++ b/net/bridge/br_multicast.c
@@ -82,6 +82,9 @@ static void br_multicast_find_del_pg(struct net_bridge *br,
struct net_bridge_port_group *pg);
static void __br_multicast_stop(struct net_bridge_mcast *brmctx);
+static int br_mc_disabled_update(struct net_device *dev, bool value,
+ struct netlink_ext_ack *extack);
+
static struct net_bridge_port_group *
br_sg_port_find(struct net_bridge *br,
struct net_bridge_port_group_sg_key *sg_p)
@@ -1156,6 +1159,7 @@ struct net_bridge_mdb_entry *br_multicast_new_group(struct net_bridge *br,
return mp;
if (atomic_read(&br->mdb_hash_tbl.nelems) >= br->hash_max) {
+ br_mc_disabled_update(br->dev, false, NULL);
br_opt_toggle(br, BROPT_MULTICAST_ENABLED, false);
return ERR_PTR(-E2BIG);
}
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 86006f996346e8a5a1ea80637ec949ceeea4ecbc Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Fri, 11 Feb 2022 09:14:18 -0800
Subject: [PATCH] ice: enable parsing IPSEC SPI headers for RSS
The COMMS package can enable the hardware parser to recognize IPSEC
frames with ESP header and SPI identifier. If this package is available
and configured for loading in /lib/firmware, then the driver will
succeed in enabling this protocol type for RSS.
This in turn allows the hardware to hash over the SPI and use it to pick
a consistent receive queue for the same secure flow. Without this all
traffic is steered to the same queue for multiple traffic threads from
the same IP address. For that reason this is marked as a fix, as the
driver supports the model, but it wasn't enabled.
If the package is not available, adding this type will fail, but the
failure is ignored on purpose as it has no negative affect.
Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 0c187cf04fcf..53256aca27c7 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1684,6 +1684,12 @@ static void ice_vsi_set_rss_flow_fld(struct ice_vsi *vsi)
if (status)
dev_dbg(dev, "ice_add_rss_cfg failed for sctp6 flow, vsi = %d, error = %d\n",
vsi_num, status);
+
+ status = ice_add_rss_cfg(hw, vsi_handle, ICE_FLOW_HASH_ESP_SPI,
+ ICE_FLOW_SEG_HDR_ESP);
+ if (status)
+ dev_dbg(dev, "ice_add_rss_cfg failed for esp/spi flow, vsi = %d, error = %d\n",
+ vsi_num, status);
}
/**
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 86006f996346e8a5a1ea80637ec949ceeea4ecbc Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Fri, 11 Feb 2022 09:14:18 -0800
Subject: [PATCH] ice: enable parsing IPSEC SPI headers for RSS
The COMMS package can enable the hardware parser to recognize IPSEC
frames with ESP header and SPI identifier. If this package is available
and configured for loading in /lib/firmware, then the driver will
succeed in enabling this protocol type for RSS.
This in turn allows the hardware to hash over the SPI and use it to pick
a consistent receive queue for the same secure flow. Without this all
traffic is steered to the same queue for multiple traffic threads from
the same IP address. For that reason this is marked as a fix, as the
driver supports the model, but it wasn't enabled.
If the package is not available, adding this type will fail, but the
failure is ignored on purpose as it has no negative affect.
Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 0c187cf04fcf..53256aca27c7 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1684,6 +1684,12 @@ static void ice_vsi_set_rss_flow_fld(struct ice_vsi *vsi)
if (status)
dev_dbg(dev, "ice_add_rss_cfg failed for sctp6 flow, vsi = %d, error = %d\n",
vsi_num, status);
+
+ status = ice_add_rss_cfg(hw, vsi_handle, ICE_FLOW_HASH_ESP_SPI,
+ ICE_FLOW_SEG_HDR_ESP);
+ if (status)
+ dev_dbg(dev, "ice_add_rss_cfg failed for esp/spi flow, vsi = %d, error = %d\n",
+ vsi_num, status);
}
/**
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 86006f996346e8a5a1ea80637ec949ceeea4ecbc Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Fri, 11 Feb 2022 09:14:18 -0800
Subject: [PATCH] ice: enable parsing IPSEC SPI headers for RSS
The COMMS package can enable the hardware parser to recognize IPSEC
frames with ESP header and SPI identifier. If this package is available
and configured for loading in /lib/firmware, then the driver will
succeed in enabling this protocol type for RSS.
This in turn allows the hardware to hash over the SPI and use it to pick
a consistent receive queue for the same secure flow. Without this all
traffic is steered to the same queue for multiple traffic threads from
the same IP address. For that reason this is marked as a fix, as the
driver supports the model, but it wasn't enabled.
If the package is not available, adding this type will fail, but the
failure is ignored on purpose as it has no negative affect.
Fixes: c90ed40cefe1 ("ice: Enable writing hardware filtering tables")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Gurucharan G <gurucharanx.g(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 0c187cf04fcf..53256aca27c7 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1684,6 +1684,12 @@ static void ice_vsi_set_rss_flow_fld(struct ice_vsi *vsi)
if (status)
dev_dbg(dev, "ice_add_rss_cfg failed for sctp6 flow, vsi = %d, error = %d\n",
vsi_num, status);
+
+ status = ice_add_rss_cfg(hw, vsi_handle, ICE_FLOW_HASH_ESP_SPI,
+ ICE_FLOW_SEG_HDR_ESP);
+ if (status)
+ dev_dbg(dev, "ice_add_rss_cfg failed for esp/spi flow, vsi = %d, error = %d\n",
+ vsi_num, status);
}
/**
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab75cec1e461f8a35559054c146c21428430b8 Mon Sep 17 00:00:00 2001
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Date: Wed, 16 Feb 2022 22:18:08 +0800
Subject: [PATCH] bonding: force carrier update when releasing slave
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0
Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh(a)canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 238b56d77c36..aebeb46e6fa6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2379,10 +2379,9 @@ static int __bond_release_one(struct net_device *bond_dev,
bond_select_active_slave(bond);
}
- if (!bond_has_slaves(bond)) {
- bond_set_carrier(bond);
+ bond_set_carrier(bond);
+ if (!bond_has_slaves(bond))
eth_hw_addr_random(bond_dev);
- }
unblock_netpoll_tx();
synchronize_rcu();
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab75cec1e461f8a35559054c146c21428430b8 Mon Sep 17 00:00:00 2001
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Date: Wed, 16 Feb 2022 22:18:08 +0800
Subject: [PATCH] bonding: force carrier update when releasing slave
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0
Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh(a)canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 238b56d77c36..aebeb46e6fa6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2379,10 +2379,9 @@ static int __bond_release_one(struct net_device *bond_dev,
bond_select_active_slave(bond);
}
- if (!bond_has_slaves(bond)) {
- bond_set_carrier(bond);
+ bond_set_carrier(bond);
+ if (!bond_has_slaves(bond))
eth_hw_addr_random(bond_dev);
- }
unblock_netpoll_tx();
synchronize_rcu();
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a6ab75cec1e461f8a35559054c146c21428430b8 Mon Sep 17 00:00:00 2001
From: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Date: Wed, 16 Feb 2022 22:18:08 +0800
Subject: [PATCH] bonding: force carrier update when releasing slave
In __bond_release_one(), bond_set_carrier() is only called when bond
device has no slave. Therefore, if we remove the up slave from a master
with two slaves and keep the down slave, the master will remain up.
Fix this by moving bond_set_carrier() out of if (!bond_has_slaves(bond))
statement.
Reproducer:
$ insmod bonding.ko mode=0 miimon=100 max_bonds=2
$ ifconfig bond0 up
$ ifenslave bond0 eth0 eth1
$ ifconfig eth0 down
$ ifenslave -d bond0 eth1
$ cat /proc/net/bonding/bond0
Fixes: ff59c4563a8d ("[PATCH] bonding: support carrier state for master")
Signed-off-by: Zhang Changzhong <zhangchangzhong(a)huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh(a)canonical.com>
Link: https://lore.kernel.org/r/1645021088-38370-1-git-send-email-zhangchangzhong…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 238b56d77c36..aebeb46e6fa6 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2379,10 +2379,9 @@ static int __bond_release_one(struct net_device *bond_dev,
bond_select_active_slave(bond);
}
- if (!bond_has_slaves(bond)) {
- bond_set_carrier(bond);
+ bond_set_carrier(bond);
+ if (!bond_has_slaves(bond))
eth_hw_addr_random(bond_dev);
- }
unblock_netpoll_tx();
synchronize_rcu();
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 35a79e64de29e8d57a5989aac57611c0cd29e13e Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin(a)gmail.com>
Date: Wed, 16 Feb 2022 00:20:52 -0500
Subject: [PATCH] ping: fix the dif and sdif check in ping_lookup
When 'ping' changes to use PING socket instead of RAW socket by:
# sysctl -w net.ipv4.ping_group_range="0 100"
There is another regression caused when matching sk_bound_dev_if
and dif, RAW socket is using inet_iif() while PING socket lookup
is using skb->dev->ifindex, the cmd below fails due to this:
# ip link add dummy0 type dummy
# ip link set dummy0 up
# ip addr add 192.168.111.1/24 dev dummy0
# ping -I dummy0 192.168.111.1 -c1
The issue was also reported on:
https://github.com/iputils/iputils/issues/104
But fixed in iputils in a wrong way by not binding to device when
destination IP is on device, and it will cause some of kselftests
to fail, as Jianlin noticed.
This patch is to use inet(6)_iif and inet(6)_sdif to get dif and
sdif for PING socket, and keep consistent with RAW socket.
Fixes: c319b4d76b9e ("net: ipv4: add IPPROTO_ICMP socket kind")
Reported-by: Jianlin Shi <jishi(a)redhat.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c
index bcf7bc71cb56..3a5994b50571 100644
--- a/net/ipv4/ping.c
+++ b/net/ipv4/ping.c
@@ -172,16 +172,23 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident)
struct sock *sk = NULL;
struct inet_sock *isk;
struct hlist_nulls_node *hnode;
- int dif = skb->dev->ifindex;
+ int dif, sdif;
if (skb->protocol == htons(ETH_P_IP)) {
+ dif = inet_iif(skb);
+ sdif = inet_sdif(skb);
pr_debug("try to find: num = %d, daddr = %pI4, dif = %d\n",
(int)ident, &ip_hdr(skb)->daddr, dif);
#if IS_ENABLED(CONFIG_IPV6)
} else if (skb->protocol == htons(ETH_P_IPV6)) {
+ dif = inet6_iif(skb);
+ sdif = inet6_sdif(skb);
pr_debug("try to find: num = %d, daddr = %pI6c, dif = %d\n",
(int)ident, &ipv6_hdr(skb)->daddr, dif);
#endif
+ } else {
+ pr_err("ping: protocol(%x) is not supported\n", ntohs(skb->protocol));
+ return NULL;
}
read_lock_bh(&ping_table.lock);
@@ -221,7 +228,7 @@ static struct sock *ping_lookup(struct net *net, struct sk_buff *skb, u16 ident)
}
if (sk->sk_bound_dev_if && sk->sk_bound_dev_if != dif &&
- sk->sk_bound_dev_if != inet_sdif(skb))
+ sk->sk_bound_dev_if != sdif)
continue;
sock_hold(sk);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 430065e2671905ac675f97b7af240cc255964e93 Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 16 Feb 2022 20:48:18 +0000
Subject: [PATCH] net: dsa: lan9303: add VLAN IDs to master device
If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received. Do this in the
port_enable() function, and remove them in port_disable().
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index c0c91440340a..0029d279616f 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -82,6 +82,7 @@ config NET_DSA_REALTEK_SMI
config NET_DSA_SMSC_LAN9303
tristate
+ depends on VLAN_8021Q || VLAN_8021Q=n
select NET_DSA_TAG_LAN9303
select REGMAP
help
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 873a5588171b..3969d89fa4db 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -10,6 +10,7 @@
#include <linux/mii.h>
#include <linux/phy.h>
#include <linux/if_bridge.h>
+#include <linux/if_vlan.h>
#include <linux/etherdevice.h>
#include "lan9303.h"
@@ -1083,21 +1084,27 @@ static void lan9303_adjust_link(struct dsa_switch *ds, int port,
static int lan9303_port_enable(struct dsa_switch *ds, int port,
struct phy_device *phy)
{
+ struct dsa_port *dp = dsa_to_port(ds, port);
struct lan9303 *chip = ds->priv;
- if (!dsa_is_user_port(ds, port))
+ if (!dsa_port_is_user(dp))
return 0;
+ vlan_vid_add(dp->cpu_dp->master, htons(ETH_P_8021Q), port);
+
return lan9303_enable_processing_port(chip, port);
}
static void lan9303_port_disable(struct dsa_switch *ds, int port)
{
+ struct dsa_port *dp = dsa_to_port(ds, port);
struct lan9303 *chip = ds->priv;
- if (!dsa_is_user_port(ds, port))
+ if (!dsa_port_is_user(dp))
return;
+ vlan_vid_del(dp->cpu_dp->master, htons(ETH_P_8021Q), port);
+
lan9303_disable_processing_port(chip, port);
lan9303_phy_write(ds, chip->phy_addr_base + port, MII_BMCR, BMCR_PDOWN);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 430065e2671905ac675f97b7af240cc255964e93 Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 16 Feb 2022 20:48:18 +0000
Subject: [PATCH] net: dsa: lan9303: add VLAN IDs to master device
If the master device does VLAN filtering, the IDs used by the switch
must be added for any frames to be received. Do this in the
port_enable() function, and remove them in port_disable().
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Florian Fainelli <f.fainelli(a)gmail.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://lore.kernel.org/r/20220216204818.28746-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index c0c91440340a..0029d279616f 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -82,6 +82,7 @@ config NET_DSA_REALTEK_SMI
config NET_DSA_SMSC_LAN9303
tristate
+ depends on VLAN_8021Q || VLAN_8021Q=n
select NET_DSA_TAG_LAN9303
select REGMAP
help
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 873a5588171b..3969d89fa4db 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -10,6 +10,7 @@
#include <linux/mii.h>
#include <linux/phy.h>
#include <linux/if_bridge.h>
+#include <linux/if_vlan.h>
#include <linux/etherdevice.h>
#include "lan9303.h"
@@ -1083,21 +1084,27 @@ static void lan9303_adjust_link(struct dsa_switch *ds, int port,
static int lan9303_port_enable(struct dsa_switch *ds, int port,
struct phy_device *phy)
{
+ struct dsa_port *dp = dsa_to_port(ds, port);
struct lan9303 *chip = ds->priv;
- if (!dsa_is_user_port(ds, port))
+ if (!dsa_port_is_user(dp))
return 0;
+ vlan_vid_add(dp->cpu_dp->master, htons(ETH_P_8021Q), port);
+
return lan9303_enable_processing_port(chip, port);
}
static void lan9303_port_disable(struct dsa_switch *ds, int port)
{
+ struct dsa_port *dp = dsa_to_port(ds, port);
struct lan9303 *chip = ds->priv;
- if (!dsa_is_user_port(ds, port))
+ if (!dsa_port_is_user(dp))
return;
+ vlan_vid_del(dp->cpu_dp->master, htons(ETH_P_8021Q), port);
+
lan9303_disable_processing_port(chip, port);
lan9303_phy_write(ds, chip->phy_addr_base + port, MII_BMCR, BMCR_PDOWN);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 017b355bbdc6620fd8fe05fe297f553ce9d855ee Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 16 Feb 2022 12:46:34 +0000
Subject: [PATCH] net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present. Otherwise,
use __skb_vlan_pop() like the other tag parsers do. This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index cb548188f813..98d7d7120bab 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -77,7 +77,6 @@ static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev)
static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
{
- __be16 *lan9303_tag;
u16 lan9303_tag1;
unsigned int source_port;
@@ -87,14 +86,15 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- lan9303_tag = dsa_etype_header_pos_rx(skb);
-
- if (lan9303_tag[0] != htons(ETH_P_8021Q)) {
- dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid VLAN marker\n");
- return NULL;
+ if (skb_vlan_tag_present(skb)) {
+ lan9303_tag1 = skb_vlan_tag_get(skb);
+ __vlan_hwaccel_clear_tag(skb);
+ } else {
+ skb_push_rcsum(skb, ETH_HLEN);
+ __skb_vlan_pop(skb, &lan9303_tag1);
+ skb_pull_rcsum(skb, ETH_HLEN);
}
- lan9303_tag1 = ntohs(lan9303_tag[1]);
source_port = lan9303_tag1 & 0x3;
skb->dev = dsa_master_find_slave(dev, 0, source_port);
@@ -103,13 +103,6 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- /* remove the special VLAN tag between the MAC addresses
- * and the current ethertype field.
- */
- skb_pull_rcsum(skb, 2 + 2);
-
- dsa_strip_etype_header(skb, LAN9303_TAG_LEN);
-
if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
dsa_default_offload_fwd_mark(skb);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 017b355bbdc6620fd8fe05fe297f553ce9d855ee Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 16 Feb 2022 12:46:34 +0000
Subject: [PATCH] net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present. Otherwise,
use __skb_vlan_pop() like the other tag parsers do. This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index cb548188f813..98d7d7120bab 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -77,7 +77,6 @@ static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev)
static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
{
- __be16 *lan9303_tag;
u16 lan9303_tag1;
unsigned int source_port;
@@ -87,14 +86,15 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- lan9303_tag = dsa_etype_header_pos_rx(skb);
-
- if (lan9303_tag[0] != htons(ETH_P_8021Q)) {
- dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid VLAN marker\n");
- return NULL;
+ if (skb_vlan_tag_present(skb)) {
+ lan9303_tag1 = skb_vlan_tag_get(skb);
+ __vlan_hwaccel_clear_tag(skb);
+ } else {
+ skb_push_rcsum(skb, ETH_HLEN);
+ __skb_vlan_pop(skb, &lan9303_tag1);
+ skb_pull_rcsum(skb, ETH_HLEN);
}
- lan9303_tag1 = ntohs(lan9303_tag[1]);
source_port = lan9303_tag1 & 0x3;
skb->dev = dsa_master_find_slave(dev, 0, source_port);
@@ -103,13 +103,6 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- /* remove the special VLAN tag between the MAC addresses
- * and the current ethertype field.
- */
- skb_pull_rcsum(skb, 2 + 2);
-
- dsa_strip_etype_header(skb, LAN9303_TAG_LEN);
-
if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
dsa_default_offload_fwd_mark(skb);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 017b355bbdc6620fd8fe05fe297f553ce9d855ee Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 16 Feb 2022 12:46:34 +0000
Subject: [PATCH] net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present. Otherwise,
use __skb_vlan_pop() like the other tag parsers do. This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index cb548188f813..98d7d7120bab 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -77,7 +77,6 @@ static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev)
static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
{
- __be16 *lan9303_tag;
u16 lan9303_tag1;
unsigned int source_port;
@@ -87,14 +86,15 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- lan9303_tag = dsa_etype_header_pos_rx(skb);
-
- if (lan9303_tag[0] != htons(ETH_P_8021Q)) {
- dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid VLAN marker\n");
- return NULL;
+ if (skb_vlan_tag_present(skb)) {
+ lan9303_tag1 = skb_vlan_tag_get(skb);
+ __vlan_hwaccel_clear_tag(skb);
+ } else {
+ skb_push_rcsum(skb, ETH_HLEN);
+ __skb_vlan_pop(skb, &lan9303_tag1);
+ skb_pull_rcsum(skb, ETH_HLEN);
}
- lan9303_tag1 = ntohs(lan9303_tag[1]);
source_port = lan9303_tag1 & 0x3;
skb->dev = dsa_master_find_slave(dev, 0, source_port);
@@ -103,13 +103,6 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- /* remove the special VLAN tag between the MAC addresses
- * and the current ethertype field.
- */
- skb_pull_rcsum(skb, 2 + 2);
-
- dsa_strip_etype_header(skb, LAN9303_TAG_LEN);
-
if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
dsa_default_offload_fwd_mark(skb);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 017b355bbdc6620fd8fe05fe297f553ce9d855ee Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 16 Feb 2022 12:46:34 +0000
Subject: [PATCH] net: dsa: lan9303: handle hwaccel VLAN tags
Check for a hwaccel VLAN tag on rx and use it if present. Otherwise,
use __skb_vlan_pop() like the other tag parsers do. This fixes the case
where the VLAN tag has already been consumed by the master.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Vladimir Oltean <olteanv(a)gmail.com>
Link: https://lore.kernel.org/r/20220216124634.23123-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/dsa/tag_lan9303.c b/net/dsa/tag_lan9303.c
index cb548188f813..98d7d7120bab 100644
--- a/net/dsa/tag_lan9303.c
+++ b/net/dsa/tag_lan9303.c
@@ -77,7 +77,6 @@ static struct sk_buff *lan9303_xmit(struct sk_buff *skb, struct net_device *dev)
static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
{
- __be16 *lan9303_tag;
u16 lan9303_tag1;
unsigned int source_port;
@@ -87,14 +86,15 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- lan9303_tag = dsa_etype_header_pos_rx(skb);
-
- if (lan9303_tag[0] != htons(ETH_P_8021Q)) {
- dev_warn_ratelimited(&dev->dev, "Dropping packet due to invalid VLAN marker\n");
- return NULL;
+ if (skb_vlan_tag_present(skb)) {
+ lan9303_tag1 = skb_vlan_tag_get(skb);
+ __vlan_hwaccel_clear_tag(skb);
+ } else {
+ skb_push_rcsum(skb, ETH_HLEN);
+ __skb_vlan_pop(skb, &lan9303_tag1);
+ skb_pull_rcsum(skb, ETH_HLEN);
}
- lan9303_tag1 = ntohs(lan9303_tag[1]);
source_port = lan9303_tag1 & 0x3;
skb->dev = dsa_master_find_slave(dev, 0, source_port);
@@ -103,13 +103,6 @@ static struct sk_buff *lan9303_rcv(struct sk_buff *skb, struct net_device *dev)
return NULL;
}
- /* remove the special VLAN tag between the MAC addresses
- * and the current ethertype field.
- */
- skb_pull_rcsum(skb, 2 + 2);
-
- dsa_strip_etype_header(skb, LAN9303_TAG_LEN);
-
if (!(lan9303_tag1 & LAN9303_TAG_RX_TRAPPED_TO_CPU))
dsa_default_offload_fwd_mark(skb);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2614140dc0f467a83aa3bb4b6ee2d6480a76202 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Date: Fri, 11 Feb 2022 19:45:06 +0200
Subject: [PATCH] net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
removing VLAN
mv88e6xxx is special among DSA drivers in that it requires the VTU to
contain the VID of the FDB entry it modifies in
mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
Sometimes due to races this is not always satisfied even if external
code does everything right (first deletes the FDB entries, then the
VLAN), because DSA commits to hardware FDB entries asynchronously since
commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
notification").
Therefore, the mv88e6xxx driver must close this race condition by
itself, by asking DSA to flush the switchdev workqueue of any FDB
deletions in progress, prior to exiting a VLAN.
Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
Reported-by: Rafael Richter <rafael.richter(a)gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8530dbe403f4..ab1676553714 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2284,6 +2284,13 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
if (!mv88e6xxx_max_vid(chip))
return -EOPNOTSUPP;
+ /* The ATU removal procedure needs the FID to be mapped in the VTU,
+ * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA
+ * switchdev workqueue to ensure that all FDB entries are deleted
+ * before we remove the VLAN.
+ */
+ dsa_flush_workqueue();
+
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 57b3e4e7413b..85a5ba3772f5 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1187,6 +1187,7 @@ void dsa_unregister_switch(struct dsa_switch *ds);
int dsa_register_switch(struct dsa_switch *ds);
void dsa_switch_shutdown(struct dsa_switch *ds);
struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
+void dsa_flush_workqueue(void);
#ifdef CONFIG_PM_SLEEP
int dsa_switch_suspend(struct dsa_switch *ds);
int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d9d0d227092c..c43f7446a75d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -349,6 +349,7 @@ void dsa_flush_workqueue(void)
{
flush_workqueue(dsa_owq);
}
+EXPORT_SYMBOL_GPL(dsa_flush_workqueue);
int dsa_devlink_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 760306f0012f..23c79e91ac67 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -147,7 +147,6 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
bool dsa_schedule_work(struct work_struct *work);
-void dsa_flush_workqueue(void);
const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2614140dc0f467a83aa3bb4b6ee2d6480a76202 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Date: Fri, 11 Feb 2022 19:45:06 +0200
Subject: [PATCH] net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
removing VLAN
mv88e6xxx is special among DSA drivers in that it requires the VTU to
contain the VID of the FDB entry it modifies in
mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
Sometimes due to races this is not always satisfied even if external
code does everything right (first deletes the FDB entries, then the
VLAN), because DSA commits to hardware FDB entries asynchronously since
commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
notification").
Therefore, the mv88e6xxx driver must close this race condition by
itself, by asking DSA to flush the switchdev workqueue of any FDB
deletions in progress, prior to exiting a VLAN.
Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
Reported-by: Rafael Richter <rafael.richter(a)gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8530dbe403f4..ab1676553714 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2284,6 +2284,13 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
if (!mv88e6xxx_max_vid(chip))
return -EOPNOTSUPP;
+ /* The ATU removal procedure needs the FID to be mapped in the VTU,
+ * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA
+ * switchdev workqueue to ensure that all FDB entries are deleted
+ * before we remove the VLAN.
+ */
+ dsa_flush_workqueue();
+
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 57b3e4e7413b..85a5ba3772f5 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1187,6 +1187,7 @@ void dsa_unregister_switch(struct dsa_switch *ds);
int dsa_register_switch(struct dsa_switch *ds);
void dsa_switch_shutdown(struct dsa_switch *ds);
struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
+void dsa_flush_workqueue(void);
#ifdef CONFIG_PM_SLEEP
int dsa_switch_suspend(struct dsa_switch *ds);
int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d9d0d227092c..c43f7446a75d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -349,6 +349,7 @@ void dsa_flush_workqueue(void)
{
flush_workqueue(dsa_owq);
}
+EXPORT_SYMBOL_GPL(dsa_flush_workqueue);
int dsa_devlink_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 760306f0012f..23c79e91ac67 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -147,7 +147,6 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
bool dsa_schedule_work(struct work_struct *work);
-void dsa_flush_workqueue(void);
const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2614140dc0f467a83aa3bb4b6ee2d6480a76202 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Date: Fri, 11 Feb 2022 19:45:06 +0200
Subject: [PATCH] net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
removing VLAN
mv88e6xxx is special among DSA drivers in that it requires the VTU to
contain the VID of the FDB entry it modifies in
mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
Sometimes due to races this is not always satisfied even if external
code does everything right (first deletes the FDB entries, then the
VLAN), because DSA commits to hardware FDB entries asynchronously since
commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
notification").
Therefore, the mv88e6xxx driver must close this race condition by
itself, by asking DSA to flush the switchdev workqueue of any FDB
deletions in progress, prior to exiting a VLAN.
Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
Reported-by: Rafael Richter <rafael.richter(a)gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8530dbe403f4..ab1676553714 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2284,6 +2284,13 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
if (!mv88e6xxx_max_vid(chip))
return -EOPNOTSUPP;
+ /* The ATU removal procedure needs the FID to be mapped in the VTU,
+ * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA
+ * switchdev workqueue to ensure that all FDB entries are deleted
+ * before we remove the VLAN.
+ */
+ dsa_flush_workqueue();
+
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 57b3e4e7413b..85a5ba3772f5 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1187,6 +1187,7 @@ void dsa_unregister_switch(struct dsa_switch *ds);
int dsa_register_switch(struct dsa_switch *ds);
void dsa_switch_shutdown(struct dsa_switch *ds);
struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
+void dsa_flush_workqueue(void);
#ifdef CONFIG_PM_SLEEP
int dsa_switch_suspend(struct dsa_switch *ds);
int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d9d0d227092c..c43f7446a75d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -349,6 +349,7 @@ void dsa_flush_workqueue(void)
{
flush_workqueue(dsa_owq);
}
+EXPORT_SYMBOL_GPL(dsa_flush_workqueue);
int dsa_devlink_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 760306f0012f..23c79e91ac67 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -147,7 +147,6 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
bool dsa_schedule_work(struct work_struct *work);
-void dsa_flush_workqueue(void);
const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a2614140dc0f467a83aa3bb4b6ee2d6480a76202 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Date: Fri, 11 Feb 2022 19:45:06 +0200
Subject: [PATCH] net: dsa: mv88e6xxx: flush switchdev FDB workqueue before
removing VLAN
mv88e6xxx is special among DSA drivers in that it requires the VTU to
contain the VID of the FDB entry it modifies in
mv88e6xxx_port_db_load_purge(), otherwise it will return -EOPNOTSUPP.
Sometimes due to races this is not always satisfied even if external
code does everything right (first deletes the FDB entries, then the
VLAN), because DSA commits to hardware FDB entries asynchronously since
commit c9eb3e0f8701 ("net: dsa: Add support for learning FDB through
notification").
Therefore, the mv88e6xxx driver must close this race condition by
itself, by asking DSA to flush the switchdev workqueue of any FDB
deletions in progress, prior to exiting a VLAN.
Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification")
Reported-by: Rafael Richter <rafael.richter(a)gin.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 8530dbe403f4..ab1676553714 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2284,6 +2284,13 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
if (!mv88e6xxx_max_vid(chip))
return -EOPNOTSUPP;
+ /* The ATU removal procedure needs the FID to be mapped in the VTU,
+ * but FDB deletion runs concurrently with VLAN deletion. Flush the DSA
+ * switchdev workqueue to ensure that all FDB entries are deleted
+ * before we remove the VLAN.
+ */
+ dsa_flush_workqueue();
+
mv88e6xxx_reg_lock(chip);
err = mv88e6xxx_port_get_pvid(chip, port, &pvid);
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 57b3e4e7413b..85a5ba3772f5 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -1187,6 +1187,7 @@ void dsa_unregister_switch(struct dsa_switch *ds);
int dsa_register_switch(struct dsa_switch *ds);
void dsa_switch_shutdown(struct dsa_switch *ds);
struct dsa_switch *dsa_switch_find(int tree_index, int sw_index);
+void dsa_flush_workqueue(void);
#ifdef CONFIG_PM_SLEEP
int dsa_switch_suspend(struct dsa_switch *ds);
int dsa_switch_resume(struct dsa_switch *ds);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d9d0d227092c..c43f7446a75d 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -349,6 +349,7 @@ void dsa_flush_workqueue(void)
{
flush_workqueue(dsa_owq);
}
+EXPORT_SYMBOL_GPL(dsa_flush_workqueue);
int dsa_devlink_param_get(struct devlink *dl, u32 id,
struct devlink_param_gset_ctx *ctx)
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 760306f0012f..23c79e91ac67 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -147,7 +147,6 @@ void dsa_tag_driver_put(const struct dsa_device_ops *ops);
const struct dsa_device_ops *dsa_find_tagger_by_name(const char *buf);
bool dsa_schedule_work(struct work_struct *work);
-void dsa_flush_workqueue(void);
const char *dsa_tag_protocol_to_str(const struct dsa_device_ops *ops);
static inline int dsa_tag_protocol_overhead(const struct dsa_device_ops *ops)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6bb9681a43f34f2cab4aad6e2a02da4ce54d13c5 Mon Sep 17 00:00:00 2001
From: Mans Rullgard <mans(a)mansr.com>
Date: Wed, 9 Feb 2022 14:54:54 +0000
Subject: [PATCH] net: dsa: lan9303: fix reset on probe
The reset input to the LAN9303 chip is active low, and devicetree
gpio handles reflect this. Therefore, the gpio should be requested
with an initial state of high in order for the reset signal to be
asserted. Other uses of the gpio already use the correct polarity.
Fixes: a1292595e006 ("net: dsa: add new DSA switch driver for the SMSC-LAN9303")
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
Reviewed-by: Florian Fianelil <f.fainelli(a)gmail.com>
Link: https://lore.kernel.org/r/20220209145454.19749-1-mans@mansr.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index d55784d19fa4..873a5588171b 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -1310,7 +1310,7 @@ static int lan9303_probe_reset_gpio(struct lan9303 *chip,
struct device_node *np)
{
chip->reset_gpio = devm_gpiod_get_optional(chip->dev, "reset",
- GPIOD_OUT_LOW);
+ GPIOD_OUT_HIGH);
if (IS_ERR(chip->reset_gpio))
return PTR_ERR(chip->reset_gpio);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0b0dff5b3b98c5c7ce848151df9da0b3cdf0cc8b Mon Sep 17 00:00:00 2001
From: Willem de Bruijn <willemb(a)google.com>
Date: Tue, 15 Feb 2022 11:00:37 -0500
Subject: [PATCH] ipv6: per-netns exclusive flowlabel checks
Ipv6 flowlabels historically require a reservation before use.
Optionally in exclusive mode (e.g., user-private).
Commit 59c820b2317f ("ipv6: elide flowlabel check if no exclusive
leases exist") introduced a fastpath that avoids this check when no
exclusive leases exist in the system, and thus any flowlabel use
will be granted.
That allows skipping the control operation to reserve a flowlabel
entirely. Though with a warning if the fast path fails:
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Still, this is subtle. Better isolate network namespaces from each
other. Flowlabels are per-netns. Also record per-netns whether
exclusive leases are in use. Then behavior does not change based on
activity in other netns.
Changes
v2
- wrap in IS_ENABLED(CONFIG_IPV6) to avoid breakage if disabled
Fixes: 59c820b2317f ("ipv6: elide flowlabel check if no exclusive leases exist")
Link: https://lore.kernel.org/netdev/MWHPR2201MB1072BCCCFCE779E4094837ACD0329@MWH…
Reported-by: Congyu Liu <liu3101(a)purdue.edu>
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
Tested-by: Congyu Liu <liu3101(a)purdue.edu>
Link: https://lore.kernel.org/r/20220215160037.1976072-1-willemdebruijn.kernel@gm…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 3afcb128e064..92eec13d1693 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -393,17 +393,20 @@ static inline void txopt_put(struct ipv6_txoptions *opt)
kfree_rcu(opt, rcu);
}
+#if IS_ENABLED(CONFIG_IPV6)
struct ip6_flowlabel *__fl6_sock_lookup(struct sock *sk, __be32 label);
extern struct static_key_false_deferred ipv6_flowlabel_exclusive;
static inline struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk,
__be32 label)
{
- if (static_branch_unlikely(&ipv6_flowlabel_exclusive.key))
+ if (static_branch_unlikely(&ipv6_flowlabel_exclusive.key) &&
+ READ_ONCE(sock_net(sk)->ipv6.flowlabel_has_excl))
return __fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT);
return NULL;
}
+#endif
struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
struct ip6_flowlabel *fl,
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index a4b550380316..6bd7e5a85ce7 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -77,9 +77,10 @@ struct netns_ipv6 {
spinlock_t fib6_gc_lock;
unsigned int ip6_rt_gc_expire;
unsigned long ip6_rt_last_gc;
+ unsigned char flowlabel_has_excl;
#ifdef CONFIG_IPV6_MULTIPLE_TABLES
- unsigned int fib6_rules_require_fldissect;
bool fib6_has_custom_rules;
+ unsigned int fib6_rules_require_fldissect;
#ifdef CONFIG_IPV6_SUBTREES
unsigned int fib6_routes_require_src;
#endif
diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
index aa673a6a7e43..ceb85c67ce39 100644
--- a/net/ipv6/ip6_flowlabel.c
+++ b/net/ipv6/ip6_flowlabel.c
@@ -450,8 +450,10 @@ fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq,
err = -EINVAL;
goto done;
}
- if (fl_shared_exclusive(fl) || fl->opt)
+ if (fl_shared_exclusive(fl) || fl->opt) {
+ WRITE_ONCE(sock_net(sk)->ipv6.flowlabel_has_excl, 1);
static_branch_deferred_inc(&ipv6_flowlabel_exclusive);
+ }
return fl;
done:
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 59f39bfa6553d598cb22f694d45e89547f420d85 Mon Sep 17 00:00:00 2001
From: Robin Murphy <robin.murphy(a)arm.com>
Date: Wed, 13 Oct 2021 10:36:54 -0400
Subject: [PATCH] drm/cma-helper: Set VM_DONTEXPAND for mmap
drm_gem_cma_mmap() cannot assume every implementation of dma_mmap_wc()
will end up calling remap_pfn_range() (which happens to set the relevant
vma flag, among others), so in order to make sure expectations around
VM_DONTEXPAND are met, let it explicitly set the flag like most other
GEM mmap implementations do.
This avoids repeated warnings on a small minority of systems where the
display is behind an IOMMU, and has a simple driver which does not
override drm_gem_cma_default_funcs. Arm hdlcd is an in-tree affected
driver. Out-of-tree, the Apple DCP driver is affected; this fix is
required for DCP to be mainlined.
[Alyssa: Update commit message.]
Fixes: c40069cb7bd6 ("drm: add mmap() to drm_gem_object_funcs")
Acked-by: Daniel Vetter <daniel(a)ffwll.ch>
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig(a)collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211013143654.39031-1-alyssa…
diff --git a/drivers/gpu/drm/drm_gem_cma_helper.c b/drivers/gpu/drm/drm_gem_cma_helper.c
index cefd0cbf9deb..dc275c466c9c 100644
--- a/drivers/gpu/drm/drm_gem_cma_helper.c
+++ b/drivers/gpu/drm/drm_gem_cma_helper.c
@@ -512,6 +512,7 @@ int drm_gem_cma_mmap(struct drm_gem_cma_object *cma_obj, struct vm_area_struct *
*/
vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
vma->vm_flags &= ~VM_PFNMAP;
+ vma->vm_flags |= VM_DONTEXPAND;
if (cma_obj->map_noncoherent) {
vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
From: Sean Christopherson <seanjc(a)google.com>
commit 55467fcd55b89c622e62b4afe60ac0eb2fae91f2 upstream.
Always signal that emulation is possible for !SEV guests regardless of
whether or not the CPU provided a valid instruction byte stream. KVM can
read all guest state (memory and registers) for !SEV guests, i.e. can
fetch the code stream from memory even if the CPU failed to do so because
of the SMAP errata.
Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)")
Cc: stable(a)vger.kernel.org
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Brijesh Singh <brijesh.singh(a)amd.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Liam Merwick <liam.merwick(a)oracle.com>
Message-Id: <20220120010719.711476-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[jwang: adjust context for kernel 5.10.101]
Signed-off-by: Jack Wang <jinpu.wang(a)ionos.com>
---
arch/x86/kvm/svm/svm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d515c8e68314..7773a765f548 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4103,6 +4103,10 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int i
bool smep, smap, is_user;
unsigned long cr4;
+ /* Emulation is always possible when KVM has access to all guest state. */
+ if (!sev_guest(vcpu->kvm))
+ return true;
+
/*
* Detect and workaround Errata 1096 Fam_17h_00_0Fh.
*
@@ -4151,9 +4155,6 @@ static bool svm_can_emulate_instruction(struct kvm_vcpu *vcpu, void *insn, int i
smap = cr4 & X86_CR4_SMAP;
is_user = svm_get_cpl(vcpu) == 3;
if (smap && (!smep || is_user)) {
- if (!sev_guest(vcpu->kvm))
- return true;
-
pr_err_ratelimited("KVM: SEV Guest triggered AMD Erratum 1096\n");
/*
--
2.25.1
Wp-gpios property can be used on NVMEM nodes and the same property can
be also used on MTD NAND nodes. In case of the wp-gpios property is
defined at NAND level node, the GPIO management is done at NAND driver
level. Write protect is disabled when the driver is probed or resumed
and is enabled when the driver is released or suspended.
When no partitions are defined in the NAND DT node, then the NAND DT node
will be passed to NVMEM framework. If wp-gpios property is defined in
this node, the GPIO resource is taken twice and the NAND controller
driver fails to probe.
A new Boolean flag named ignore_wp has been added in nvmem_config.
In case ignore_wp is set, it means that the GPIO is handled by the
provider. Lets set this flag in MTD layer to avoid the conflict on
wp_gpios property.
Fixes: 2a127da461a9 ("nvmem: add support for the write-protect pin")
Signed-off-by: Christophe Kerello <christophe.kerello(a)foss.st.com>
Cc: stable(a)vger.kernel.org
---
Changes in v3:
- add a fixes tag
- rename skip_wp_gpio by ignore_wp in nvmen_config.
drivers/mtd/mtdcore.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 70f492dce158..eef87b28d6c8 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -546,6 +546,7 @@ static int mtd_nvmem_add(struct mtd_info *mtd)
config.stride = 1;
config.read_only = true;
config.root_only = true;
+ config.ignore_wp = true;
config.no_of_node = !of_device_is_compatible(node, "nvmem-cells");
config.priv = mtd;
@@ -833,6 +834,7 @@ static struct nvmem_device *mtd_otp_nvmem_register(struct mtd_info *mtd,
config.owner = THIS_MODULE;
config.type = NVMEM_TYPE_OTP;
config.root_only = true;
+ config.ignore_wp = true;
config.reg_read = reg_read;
config.size = size;
config.of_node = np;
--
2.25.1
Wp-gpios property can be used on NVMEM nodes and the same property can
be also used on MTD NAND nodes. In case of the wp-gpios property is
defined at NAND level node, the GPIO management is done at NAND driver
level. Write protect is disabled when the driver is probed or resumed
and is enabled when the driver is released or suspended.
When no partitions are defined in the NAND DT node, then the NAND DT node
will be passed to NVMEM framework. If wp-gpios property is defined in
this node, the GPIO resource is taken twice and the NAND controller
driver fails to probe.
It would be possible to set config->wp_gpio at MTD level before calling
nvmem_register function but NVMEM framework will toggle this GPIO on
each write when this GPIO should only be controlled at NAND level driver
to ensure that the Write Protect has not been enabled.
A way to fix this conflict is to add a new boolean flag in nvmem_config
named ignore_wp. In case ignore_wp is set, the GPIO resource will
be managed by the provider.
Fixes: 2a127da461a9 ("nvmem: add support for the write-protect pin")
Signed-off-by: Christophe Kerello <christophe.kerello(a)foss.st.com>
Cc: stable(a)vger.kernel.org
---
Changes in v3:
- add a fixes tag.
- rename skip_wp_gpio by ignore_wp in nvmen_config.
Changes in v2:
- rework the proposal done to fix a conflict between MTD and NVMEM on
wp-gpios property.
drivers/nvmem/core.c | 2 +-
include/linux/nvmem-provider.h | 4 +++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvmem/core.c b/drivers/nvmem/core.c
index 23a38dcf0fc4..9fd1602b539d 100644
--- a/drivers/nvmem/core.c
+++ b/drivers/nvmem/core.c
@@ -771,7 +771,7 @@ struct nvmem_device *nvmem_register(const struct nvmem_config *config)
if (config->wp_gpio)
nvmem->wp_gpio = config->wp_gpio;
- else
+ else if (!config->ignore_wp)
nvmem->wp_gpio = gpiod_get_optional(config->dev, "wp",
GPIOD_OUT_HIGH);
if (IS_ERR(nvmem->wp_gpio)) {
diff --git a/include/linux/nvmem-provider.h b/include/linux/nvmem-provider.h
index 98efb7b5660d..c9a3ac9efeaa 100644
--- a/include/linux/nvmem-provider.h
+++ b/include/linux/nvmem-provider.h
@@ -70,7 +70,8 @@ struct nvmem_keepout {
* @word_size: Minimum read/write access granularity.
* @stride: Minimum read/write access stride.
* @priv: User context passed to read/write callbacks.
- * @wp-gpio: Write protect pin
+ * @wp-gpio: Write protect pin
+ * @ignore_wp: Write Protect pin is managed by the provider.
*
* Note: A default "nvmem<id>" name will be assigned to the device if
* no name is specified in its configuration. In such case "<id>" is
@@ -92,6 +93,7 @@ struct nvmem_config {
enum nvmem_type type;
bool read_only;
bool root_only;
+ bool ignore_wp;
struct device_node *of_node;
bool no_of_node;
nvmem_reg_read_t reg_read;
--
2.25.1
Hello Dear Friend,
With due respect to your person and much sincerity of purpose I wish
to write to you today for our mutual benefit in this investment
transaction.
I'm Mrs. Aisha Al-Gaddafi, presently residing herein Oman the
Southeastern coast of the Arabian Peninsula in Western Asia, I'm a
single Mother and a widow with three Children. I am the only
biological Daughter of the late Libyan President (Late Colonel.
Muammar Gaddafi). I have an investment funds worth Twenty Seven
Million Five Hundred Thousand United State Dollars ($27.500.000.00 )
and i need an investment Manager/Partner and because of my Asylum
Status I will authorize you the ownership of the investment funds,
However, I am interested in you for investment project assistance in
your country, may be from there,. we can build a business relationship
in the nearest future.
I am willing to negotiate an investment/business profit sharing ratio
with you based on the future investment earning profits. If you are
willing to handle this project kindly reply urgently to enable me to
provide you more information about the investment funds.
Your urgent reply will be appreciated if only you are interested in
this investment project..
Best Regards
Mrs. Aisha Al-Gaddafi.
The function generic_copy_file_checks() checks that the ends of the
input and output file ranges do not overflow. Unfortunately, there is
an issue with the check itself.
Due to the integer promotion rules in C, the expressions
(pos_in + count) and (pos_out + count) have an unsigned type because
the count variable has the type uint64_t. Thus, in many cases where we
should detect signed integer overflow to have occurred (and thus one or
more of the ranges being invalid), the expressions will instead be
interpreted as large unsigned integers. This means the check is broken.
Fix this by explicitly casting the expressions to loff_t.
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Reviewed-by: Anton Altaparmakov <anton(a)tuxera.com>
Signed-off-by: Ari Sundholm <ari(a)tuxera.com>
---
fs/read_write.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 0074afa7ecb3..64166e74adc5 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1431,7 +1431,8 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in,
return -ETXTBSY;
/* Ensure offsets don't wrap. */
- if (pos_in + count < pos_in || pos_out + count < pos_out)
+ if ((loff_t)(pos_in + count) < pos_in ||
+ (loff_t)(pos_out + count) < pos_out)
return -EOVERFLOW;
/* Shorten the copy to EOF */
--
2.20.1
commit aceeafefff736057e8f93f19bbfbef26abd94604 upstream
Adds a driver private tee_context to struct optee.
The new driver internal tee_context is used when allocating driver
private shared memory. This decouples the shared memory object from its
original tee_context. This is needed when the life time of such a memory
allocation outlives the client tee_context.
This patch fixes the problem described below:
The addition of a shutdown hook by commit f25889f93184 ("optee: fix tee out
of memory failure seen during kexec reboot") introduced a kernel shutdown
regression that can be triggered after running the OP-TEE xtest suites.
Once the shutdown hook is called it is not possible to communicate any more
with the supplicant process because the system is not scheduling task any
longer. Thus if the optee driver shutdown path receives a supplicant RPC
request from the OP-TEE we will deadlock the kernel's shutdown.
Fixes: f25889f93184 ("optee: fix tee out of memory failure seen during kexec reboot")
Fixes: 217e0250cccb ("tee: use reference counting for tee_context")
Reported-by: Lars Persson <larper(a)axis.com>
Cc: stable(a)vger.kernel.org # 1e2c3ef0496e tee: export teedev_open() and teedev_close_context()
Cc: stable(a)vger.kernel.org
Reviewed-by: Sumit Garg <sumit.garg(a)linaro.org>
[JW: backport to 5.15-stable + update commit message]
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
---
drivers/tee/optee/core.c | 8 ++++++++
drivers/tee/optee/optee_private.h | 2 ++
drivers/tee/optee/rpc.c | 8 +++++---
3 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 5363ebebfc35..bbd97239de95 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -588,6 +588,7 @@ static int optee_remove(struct platform_device *pdev)
/* Unregister OP-TEE specific client devices on TEE bus */
optee_unregister_devices();
+ teedev_close_context(optee->ctx);
/*
* Ask OP-TEE to free all cached shared memory objects to decrease
* reference counters and also avoid wild pointers in secure world
@@ -633,6 +634,7 @@ static int optee_probe(struct platform_device *pdev)
struct optee *optee = NULL;
void *memremaped_shm = NULL;
struct tee_device *teedev;
+ struct tee_context *ctx;
u32 sec_caps;
int rc;
@@ -719,6 +721,12 @@ static int optee_probe(struct platform_device *pdev)
optee_supp_init(&optee->supp);
optee->memremaped_shm = memremaped_shm;
optee->pool = pool;
+ ctx = teedev_open(optee->teedev);
+ if (IS_ERR(ctx)) {
+ rc = rc = PTR_ERR(ctx);
+ goto err;
+ }
+ optee->ctx = ctx;
/*
* Ensure that there are no pre-existing shm objects before enabling
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index f6bb4a763ba9..ea09533e30cd 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -70,6 +70,7 @@ struct optee_supp {
* struct optee - main service struct
* @supp_teedev: supplicant device
* @teedev: client device
+ * @ctx: driver internal TEE context
* @invoke_fn: function to issue smc or hvc
* @call_queue: queue of threads waiting to call @invoke_fn
* @wait_queue: queue of threads from secure world waiting for a
@@ -87,6 +88,7 @@ struct optee {
struct tee_device *supp_teedev;
struct tee_device *teedev;
optee_invoke_fn *invoke_fn;
+ struct tee_context *ctx;
struct optee_call_queue call_queue;
struct optee_wait_queue wait_queue;
struct optee_supp supp;
diff --git a/drivers/tee/optee/rpc.c b/drivers/tee/optee/rpc.c
index efbaff7ad7e5..456833d82007 100644
--- a/drivers/tee/optee/rpc.c
+++ b/drivers/tee/optee/rpc.c
@@ -285,6 +285,7 @@ static struct tee_shm *cmd_alloc_suppl(struct tee_context *ctx, size_t sz)
}
static void handle_rpc_func_cmd_shm_alloc(struct tee_context *ctx,
+ struct optee *optee,
struct optee_msg_arg *arg,
struct optee_call_ctx *call_ctx)
{
@@ -314,7 +315,8 @@ static void handle_rpc_func_cmd_shm_alloc(struct tee_context *ctx,
shm = cmd_alloc_suppl(ctx, sz);
break;
case OPTEE_RPC_SHM_TYPE_KERNEL:
- shm = tee_shm_alloc(ctx, sz, TEE_SHM_MAPPED | TEE_SHM_PRIV);
+ shm = tee_shm_alloc(optee->ctx, sz,
+ TEE_SHM_MAPPED | TEE_SHM_PRIV);
break;
default:
arg->ret = TEEC_ERROR_BAD_PARAMETERS;
@@ -471,7 +473,7 @@ static void handle_rpc_func_cmd(struct tee_context *ctx, struct optee *optee,
break;
case OPTEE_RPC_CMD_SHM_ALLOC:
free_pages_list(call_ctx);
- handle_rpc_func_cmd_shm_alloc(ctx, arg, call_ctx);
+ handle_rpc_func_cmd_shm_alloc(ctx, optee, arg, call_ctx);
break;
case OPTEE_RPC_CMD_SHM_FREE:
handle_rpc_func_cmd_shm_free(ctx, arg);
@@ -502,7 +504,7 @@ void optee_handle_rpc(struct tee_context *ctx, struct optee_rpc_param *param,
switch (OPTEE_SMC_RETURN_GET_RPC_FUNC(param->a0)) {
case OPTEE_SMC_RPC_FUNC_ALLOC:
- shm = tee_shm_alloc(ctx, param->a1,
+ shm = tee_shm_alloc(optee->ctx, param->a1,
TEE_SHM_MAPPED | TEE_SHM_PRIV);
if (!IS_ERR(shm) && !tee_shm_get_pa(shm, 0, &pa)) {
reg_pair_from_64(¶m->a1, ¶m->a2, pa);
--
2.31.1
You're right, I did not get an email about 5.4, maybe it was caught up somewhere.
Greg, can you apply this to 5.4, too?
From: Adrian Hunter <adrian.hunter(a)intel.com>
Sent: Friday, February 18, 2022 12:02 PM
To: Christian Löhle
Subject: Re: [PATCH] mmc: block: fix read single on recovery logic
On 18/02/2022 12:36, Christian Löhle wrote:
> This is the backport for 4.19, it applied for more recent branches and is not applicable to 4.14.
Was 5.4 done? There was a "failed to apply email for that also"
>
>
>
> From: gregkh(a)linuxfoundation.org <gregkh(a)linuxfoundation.org>
> Sent: Friday, February 18, 2022 11:33 AM
> To: Christian Löhle
> Cc: stable(a)vger.kernel.org; Ulf Hansson; Adrian Hunter
> Subject: Re: [PATCH] mmc: block: fix read single on recovery logic
>
> On Fri, Feb 18, 2022 at 09:50:54AM +0000, Christian Löhle wrote:
>> commit 54309fde1a352ad2674ebba004a79f7d20b9f037 upstream.
>>
>> On reads with MMC_READ_MULTIPLE_BLOCK that fail,
>> the recovery handler will use MMC_READ_SINGLE_BLOCK for
>> each of the blocks, up to MMC_READ_SINGLE_RETRIES times each.
>> The logic for this is fixed to never report unsuccessful reads
>> as success to the block layer.
>>
>> On command error with retries remaining, blk_update_request was
>> called with whatever value error was set last to.
>> In case it was last set to BLK_STS_OK (default), the read will be
>> reported as success, even though there was no data read from the device.
>> This could happen on a CRC mismatch for the response,
>> a card rejecting the command (e.g. again due to a CRC mismatch).
>> In case it was last set to BLK_STS_IOERR, the error is reported correctly,
>> but no retries will be attempted.
>>
>> Fixes: 81196976ed946c ("mmc: block: Add blk-mq support")
>>
>> Signed-off-by: Christian Loehle <cloehle(a)hyperstone.com>
>> ---
>> drivers/mmc/core/block.c | 28 ++++++++++++++--------------
>> 1 file changed, 14 insertions(+), 14 deletions(-)
>
> What stable branch(s) is this backport for?
>
> thanks,
>
> greg k-h
> =
> Hyperstone GmbH | Reichenaustr. 39a | 78467 Konstanz
> Managing Director: Dr. Jan Peter Berns.
> Commercial register of local courts: Freiburg HRB381782
>
=
Hyperstone GmbH | Reichenaustr. 39a | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54309fde1a352ad2674ebba004a79f7d20b9f037 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20L=C3=B6hle?= <CLoehle(a)hyperstone.com>
Date: Fri, 4 Feb 2022 15:11:37 +0000
Subject: [PATCH] mmc: block: fix read single on recovery logic
On reads with MMC_READ_MULTIPLE_BLOCK that fail,
the recovery handler will use MMC_READ_SINGLE_BLOCK for
each of the blocks, up to MMC_READ_SINGLE_RETRIES times each.
The logic for this is fixed to never report unsuccessful reads
as success to the block layer.
On command error with retries remaining, blk_update_request was
called with whatever value error was set last to.
In case it was last set to BLK_STS_OK (default), the read will be
reported as success, even though there was no data read from the device.
This could happen on a CRC mismatch for the response,
a card rejecting the command (e.g. again due to a CRC mismatch).
In case it was last set to BLK_STS_IOERR, the error is reported correctly,
but no retries will be attempted.
Fixes: 81196976ed946c ("mmc: block: Add blk-mq support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Loehle <cloehle(a)hyperstone.com>
Reviewed-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/bc706a6ab08c4fe2834ba0c05a804672@hyperstone.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 4e61b28a002f..8d718aa56d33 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1682,31 +1682,31 @@ static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
struct mmc_card *card = mq->card;
struct mmc_host *host = card->host;
blk_status_t error = BLK_STS_OK;
- int retries = 0;
do {
u32 status;
int err;
+ int retries = 0;
- mmc_blk_rw_rq_prep(mqrq, card, 1, mq);
+ while (retries++ <= MMC_READ_SINGLE_RETRIES) {
+ mmc_blk_rw_rq_prep(mqrq, card, 1, mq);
- mmc_wait_for_req(host, mrq);
+ mmc_wait_for_req(host, mrq);
- err = mmc_send_status(card, &status);
- if (err)
- goto error_exit;
-
- if (!mmc_host_is_spi(host) &&
- !mmc_ready_for_data(status)) {
- err = mmc_blk_fix_state(card, req);
+ err = mmc_send_status(card, &status);
if (err)
goto error_exit;
- }
- if (mrq->cmd->error && retries++ < MMC_READ_SINGLE_RETRIES)
- continue;
+ if (!mmc_host_is_spi(host) &&
+ !mmc_ready_for_data(status)) {
+ err = mmc_blk_fix_state(card, req);
+ if (err)
+ goto error_exit;
+ }
- retries = 0;
+ if (!mrq->cmd->error)
+ break;
+ }
if (mrq->cmd->error ||
mrq->data->error ||
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 44cad52cc14ae10062f142ec16ede489bccf4469
Gitweb: https://git.kernel.org/tip/44cad52cc14ae10062f142ec16ede489bccf4469
Author: Andy Lutomirski <luto(a)kernel.org>
AuthorDate: Mon, 14 Feb 2022 13:05:49 +01:00
Committer: Borislav Petkov <bp(a)suse.de>
CommitterDate: Fri, 18 Feb 2022 11:23:21 +01:00
x86/ptrace: Fix xfpregs_set()'s incorrect xmm clearing
xfpregs_set() handles 32-bit REGSET_XFP and 64-bit REGSET_FP. The actual
code treats these regsets as modern FX state (i.e. the beginning part of
XSTATE). The declarations of the regsets thought they were the legacy
i387 format. The code thought they were the 32-bit (no xmm8..15) variant
of XSTATE and, for good measure, made the high bits disappear by zeroing
the wrong part of the buffer. The latter broke ptrace, and everything
else confused anyone trying to understand the code. In particular, the
nonsense definitions of the regsets confused me when I wrote this code.
Clean this all up. Change the declarations to match reality (which
shouldn't change the generated code, let alone the ABI) and fix
xfpregs_set() to clear the correct bits and to only do so for 32-bit
callers.
Fixes: 6164331d15f7 ("x86/fpu: Rewrite xfpregs_set()")
Reported-by: Luís Ferreira <contact(a)lsferreira.net>
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215524
Link: https://lore.kernel.org/r/YgpFnZpF01WwR8wU@zn.tnic
---
arch/x86/kernel/fpu/regset.c | 9 ++++-----
arch/x86/kernel/ptrace.c | 4 ++--
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 437d7c9..75ffaef 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -91,11 +91,9 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
const void *kbuf, const void __user *ubuf)
{
struct fpu *fpu = &target->thread.fpu;
- struct user32_fxsr_struct newstate;
+ struct fxregs_state newstate;
int ret;
- BUILD_BUG_ON(sizeof(newstate) != sizeof(struct fxregs_state));
-
if (!cpu_feature_enabled(X86_FEATURE_FXSR))
return -ENODEV;
@@ -116,9 +114,10 @@ int xfpregs_set(struct task_struct *target, const struct user_regset *regset,
/* Copy the state */
memcpy(&fpu->fpstate->regs.fxsave, &newstate, sizeof(newstate));
- /* Clear xmm8..15 */
+ /* Clear xmm8..15 for 32-bit callers */
BUILD_BUG_ON(sizeof(fpu->__fpstate.regs.fxsave.xmm_space) != 16 * 16);
- memset(&fpu->fpstate->regs.fxsave.xmm_space[8], 0, 8 * 16);
+ if (in_ia32_syscall())
+ memset(&fpu->fpstate->regs.fxsave.xmm_space[8*4], 0, 8 * 16);
/* Mark FP and SSE as in use when XSAVE is enabled */
if (use_xsave())
diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 6d2244c..8d2f2f9 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -1224,7 +1224,7 @@ static struct user_regset x86_64_regsets[] __ro_after_init = {
},
[REGSET_FP] = {
.core_note_type = NT_PRFPREG,
- .n = sizeof(struct user_i387_struct) / sizeof(long),
+ .n = sizeof(struct fxregs_state) / sizeof(long),
.size = sizeof(long), .align = sizeof(long),
.active = regset_xregset_fpregs_active, .regset_get = xfpregs_get, .set = xfpregs_set
},
@@ -1271,7 +1271,7 @@ static struct user_regset x86_32_regsets[] __ro_after_init = {
},
[REGSET_XFP] = {
.core_note_type = NT_PRXFPREG,
- .n = sizeof(struct user32_fxsr_struct) / sizeof(u32),
+ .n = sizeof(struct fxregs_state) / sizeof(u32),
.size = sizeof(u32), .align = sizeof(u32),
.active = regset_xregset_fpregs_active, .regset_get = xfpregs_get, .set = xfpregs_set
},
Hi,
we observed a regression on our atmel platforms on branch 5.4.Y from
version 5.4.174.
uarts are no longer functional if DMA is enabled.
the following patch
Commit e6af9b05bec63cd4d1de2a33968cd0be2a91282a dmaengine: at_xdmac:
Start transfer for cyclic channels in issue_pending
Link:
https://lore.kernel.org/r/20211215110115.191749-3-tudor.ambarus@microchip.c…
has not been backported but is needed if the patch
Commit 4f4b9b5895614eb2e2b5f4cab7858f44bd113e1b tty: serial: atmel: Call
dma_async_issue_pending()
Link:
https://lore.kernel.org/r/20211125090028.786832-4-tudor.ambarus@microchip.c…
is applied.
enclosed commit fix this issue and work for me.
Best regards,
Mickael GARDET
From 9877f93c066be8c2f4e33a6edd4dfa3d6d6974d9 Mon Sep 17 00:00:00 2001
From: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Date: Wed, 15 Dec 2021 13:01:05 +0200
Subject: [PATCH] dmaengine: at_xdmac: Start transfer for cyclic channels
in issue_pending
commit e6af9b05bec63cd4d1de2a33968cd0be2a91282a upstream.
Cyclic channels must too call issue_pending in order to start a transfer.
Start the transfer in issue_pending regardless of the type of channel.
This wrongly worked before, because in the past the transfer was started
at tx_submit level when only a desc in the transfer list.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel
eXtended DMA Controller driver")
Change-Id: If1bf3e13329cebb9904ae40620f6cf2b7f06fe9f
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)microchip.com>
Link:
https://lore.kernel.org/r/20211215110115.191749-3-tudor.ambarus@microchip.c…
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
---
drivers/dma/at_xdmac.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c
index f63d141481a3..9aae6b3da356 100644
--- a/drivers/dma/at_xdmac.c
+++ b/drivers/dma/at_xdmac.c
@@ -1726,11 +1726,13 @@ static irqreturn_t at_xdmac_interrupt(int irq,
void *dev_id)
static void at_xdmac_issue_pending(struct dma_chan *chan)
{
struct at_xdmac_chan *atchan = to_at_xdmac_chan(chan);
+ unsigned long flags;
dev_dbg(chan2dev(&atchan->chan), "%s\n", __func__);
- if (!at_xdmac_chan_is_cyclic(atchan))
- at_xdmac_advance_work(atchan);
+ spin_lock_irqsave(&atchan->lock, flags);
+ at_xdmac_advance_work(atchan);
+ spin_unlock_irqrestore(&atchan->lock, flags);
return;
}
base-commit: 7b3eb66d0daf61e91cccdb2fe5d271ae5adc5a76
--
2.35.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b5a42d9c85f0e731f01c8d1129001fd8531a8a0 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Mon, 3 Jan 2022 11:32:36 -0600
Subject: [PATCH] taskstats: Cleanup the use of task->exit_code
In the function bacct_add_task the code reading task->exit_code was
introduced in commit f3cef7a99469 ("[PATCH] csa: basic accounting over
taskstats"), and it is not entirely clear what the taskstats interface
is trying to return as only returning the exit_code of the first task
in a process doesn't make a lot of sense.
As best as I can figure the intent is to return task->exit_code after
a task exits. The field is returned with per task fields, so the
exit_code of the entire process is not wanted. Only the value of the
first task is returned so this is not a useful way to get the per task
ptrace stop code. The ordinary case of returning this value is
returning after a task exits, which also precludes use for getting
a ptrace value.
It is common to for the first task of a process to also be the last
task of a process so this field may have done something reasonable by
accident in testing.
Make ac_exitcode a reliable per task value by always returning it for
every exited task.
Setting ac_exitcode in a sensible mannter makes it possible to continue
to provide this value going forward.
Cc: Balbir Singh <bsingharora(a)gmail.com>
Fixes: f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats")
Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/tsacct.c b/kernel/tsacct.c
index f00de83d0246..1d261fbe367b 100644
--- a/kernel/tsacct.c
+++ b/kernel/tsacct.c
@@ -38,11 +38,10 @@ void bacct_add_tsk(struct user_namespace *user_ns,
stats->ac_btime = clamp_t(time64_t, btime, 0, U32_MAX);
stats->ac_btime64 = btime;
- if (thread_group_leader(tsk)) {
+ if (tsk->flags & PF_EXITING)
stats->ac_exitcode = tsk->exit_code;
- if (tsk->flags & PF_FORKNOEXEC)
- stats->ac_flag |= AFORK;
- }
+ if (thread_group_leader(tsk) && (tsk->flags & PF_FORKNOEXEC))
+ stats->ac_flag |= AFORK;
if (tsk->flags & PF_SUPERPRIV)
stats->ac_flag |= ASU;
if (tsk->flags & PF_DUMPCORE)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e7b1bfed61e301853b5e35472820d919498278 Mon Sep 17 00:00:00 2001
From: Guillaume Nault <gnault(a)redhat.com>
Date: Mon, 10 Jan 2022 14:43:06 +0100
Subject: [PATCH] xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"),
clear the ECN bits from iph->tos when setting ->flowi4_tos.
This ensures that the last bit of ->flowi4_tos is cleared, so
ip_route_output_key_hash() isn't going to restrict the scope of the
route lookup.
Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason
to clear the high order bits.
Found by code inspection, compile tested only.
Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups")
Signed-off-by: Guillaume Nault <gnault(a)redhat.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index dccb8f3318ef..04d1ce9b510f 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -31,6 +31,7 @@
#include <linux/if_tunnel.h>
#include <net/dst.h>
#include <net/flow.h>
+#include <net/inet_ecn.h>
#include <net/xfrm.h>
#include <net/ip.h>
#include <net/gre.h>
@@ -3295,7 +3296,7 @@ decode_session4(struct sk_buff *skb, struct flowi *fl, bool reverse)
fl4->flowi4_proto = iph->protocol;
fl4->daddr = reverse ? iph->saddr : iph->daddr;
fl4->saddr = reverse ? iph->daddr : iph->saddr;
- fl4->flowi4_tos = iph->tos;
+ fl4->flowi4_tos = iph->tos & ~INET_ECN_MASK;
if (!ip_is_fragment(iph)) {
switch (iph->protocol) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea958422291de248b9e2eaaeea36004e84b64043 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Thu, 10 Feb 2022 12:36:42 +0200
Subject: [PATCH] drm/i915/opregion: check port number bounds for SWSCI display
power state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The mapping from enum port to whatever port numbering scheme is used by
the SWSCI Display Power State Notification is odd, and the memory of it
has faded. In any case, the parameter only has space for ports numbered
[0..4], and UBSAN reports bit shift beyond it when the platform has port
F or more.
Since the SWSCI functionality is supposed to be obsolete for new
platforms (i.e. ones that might have port F or more), just bail out
early if the mapped and mangled port number is beyond what the Display
Power State Notification can support.
Fixes: 9c4b0a683193 ("drm/i915: add opregion function to notify bios of encoder enable/disable")
Cc: <stable(a)vger.kernel.org> # v3.13+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4800
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cc363f42d6b5a5932b6d218fefcc8…
(cherry picked from commit 24a644ebbfd3b13cda702f98907f9dd123e34bf9)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c
index 0065111593a6..4a2662838cd8 100644
--- a/drivers/gpu/drm/i915/display/intel_opregion.c
+++ b/drivers/gpu/drm/i915/display/intel_opregion.c
@@ -360,6 +360,21 @@ int intel_opregion_notify_encoder(struct intel_encoder *intel_encoder,
port++;
}
+ /*
+ * The port numbering and mapping here is bizarre. The now-obsolete
+ * swsci spec supports ports numbered [0..4]. Port E is handled as a
+ * special case, but port F and beyond are not. The functionality is
+ * supposed to be obsolete for new platforms. Just bail out if the port
+ * number is out of bounds after mapping.
+ */
+ if (port > 4) {
+ drm_dbg_kms(&dev_priv->drm,
+ "[ENCODER:%d:%s] port %c (index %u) out of bounds for display power state notification\n",
+ intel_encoder->base.base.id, intel_encoder->base.name,
+ port_name(intel_encoder->port), port);
+ return -EINVAL;
+ }
+
if (!enable)
parm |= 4 << 8;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea958422291de248b9e2eaaeea36004e84b64043 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Thu, 10 Feb 2022 12:36:42 +0200
Subject: [PATCH] drm/i915/opregion: check port number bounds for SWSCI display
power state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The mapping from enum port to whatever port numbering scheme is used by
the SWSCI Display Power State Notification is odd, and the memory of it
has faded. In any case, the parameter only has space for ports numbered
[0..4], and UBSAN reports bit shift beyond it when the platform has port
F or more.
Since the SWSCI functionality is supposed to be obsolete for new
platforms (i.e. ones that might have port F or more), just bail out
early if the mapped and mangled port number is beyond what the Display
Power State Notification can support.
Fixes: 9c4b0a683193 ("drm/i915: add opregion function to notify bios of encoder enable/disable")
Cc: <stable(a)vger.kernel.org> # v3.13+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4800
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cc363f42d6b5a5932b6d218fefcc8…
(cherry picked from commit 24a644ebbfd3b13cda702f98907f9dd123e34bf9)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c
index 0065111593a6..4a2662838cd8 100644
--- a/drivers/gpu/drm/i915/display/intel_opregion.c
+++ b/drivers/gpu/drm/i915/display/intel_opregion.c
@@ -360,6 +360,21 @@ int intel_opregion_notify_encoder(struct intel_encoder *intel_encoder,
port++;
}
+ /*
+ * The port numbering and mapping here is bizarre. The now-obsolete
+ * swsci spec supports ports numbered [0..4]. Port E is handled as a
+ * special case, but port F and beyond are not. The functionality is
+ * supposed to be obsolete for new platforms. Just bail out if the port
+ * number is out of bounds after mapping.
+ */
+ if (port > 4) {
+ drm_dbg_kms(&dev_priv->drm,
+ "[ENCODER:%d:%s] port %c (index %u) out of bounds for display power state notification\n",
+ intel_encoder->base.base.id, intel_encoder->base.name,
+ port_name(intel_encoder->port), port);
+ return -EINVAL;
+ }
+
if (!enable)
parm |= 4 << 8;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea958422291de248b9e2eaaeea36004e84b64043 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Thu, 10 Feb 2022 12:36:42 +0200
Subject: [PATCH] drm/i915/opregion: check port number bounds for SWSCI display
power state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The mapping from enum port to whatever port numbering scheme is used by
the SWSCI Display Power State Notification is odd, and the memory of it
has faded. In any case, the parameter only has space for ports numbered
[0..4], and UBSAN reports bit shift beyond it when the platform has port
F or more.
Since the SWSCI functionality is supposed to be obsolete for new
platforms (i.e. ones that might have port F or more), just bail out
early if the mapped and mangled port number is beyond what the Display
Power State Notification can support.
Fixes: 9c4b0a683193 ("drm/i915: add opregion function to notify bios of encoder enable/disable")
Cc: <stable(a)vger.kernel.org> # v3.13+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4800
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cc363f42d6b5a5932b6d218fefcc8…
(cherry picked from commit 24a644ebbfd3b13cda702f98907f9dd123e34bf9)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c
index 0065111593a6..4a2662838cd8 100644
--- a/drivers/gpu/drm/i915/display/intel_opregion.c
+++ b/drivers/gpu/drm/i915/display/intel_opregion.c
@@ -360,6 +360,21 @@ int intel_opregion_notify_encoder(struct intel_encoder *intel_encoder,
port++;
}
+ /*
+ * The port numbering and mapping here is bizarre. The now-obsolete
+ * swsci spec supports ports numbered [0..4]. Port E is handled as a
+ * special case, but port F and beyond are not. The functionality is
+ * supposed to be obsolete for new platforms. Just bail out if the port
+ * number is out of bounds after mapping.
+ */
+ if (port > 4) {
+ drm_dbg_kms(&dev_priv->drm,
+ "[ENCODER:%d:%s] port %c (index %u) out of bounds for display power state notification\n",
+ intel_encoder->base.base.id, intel_encoder->base.name,
+ port_name(intel_encoder->port), port);
+ return -EINVAL;
+ }
+
if (!enable)
parm |= 4 << 8;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea958422291de248b9e2eaaeea36004e84b64043 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Thu, 10 Feb 2022 12:36:42 +0200
Subject: [PATCH] drm/i915/opregion: check port number bounds for SWSCI display
power state
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The mapping from enum port to whatever port numbering scheme is used by
the SWSCI Display Power State Notification is odd, and the memory of it
has faded. In any case, the parameter only has space for ports numbered
[0..4], and UBSAN reports bit shift beyond it when the platform has port
F or more.
Since the SWSCI functionality is supposed to be obsolete for new
platforms (i.e. ones that might have port F or more), just bail out
early if the mapped and mangled port number is beyond what the Display
Power State Notification can support.
Fixes: 9c4b0a683193 ("drm/i915: add opregion function to notify bios of encoder enable/disable")
Cc: <stable(a)vger.kernel.org> # v3.13+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4800
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/cc363f42d6b5a5932b6d218fefcc8…
(cherry picked from commit 24a644ebbfd3b13cda702f98907f9dd123e34bf9)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_opregion.c b/drivers/gpu/drm/i915/display/intel_opregion.c
index 0065111593a6..4a2662838cd8 100644
--- a/drivers/gpu/drm/i915/display/intel_opregion.c
+++ b/drivers/gpu/drm/i915/display/intel_opregion.c
@@ -360,6 +360,21 @@ int intel_opregion_notify_encoder(struct intel_encoder *intel_encoder,
port++;
}
+ /*
+ * The port numbering and mapping here is bizarre. The now-obsolete
+ * swsci spec supports ports numbered [0..4]. Port E is handled as a
+ * special case, but port F and beyond are not. The functionality is
+ * supposed to be obsolete for new platforms. Just bail out if the port
+ * number is out of bounds after mapping.
+ */
+ if (port > 4) {
+ drm_dbg_kms(&dev_priv->drm,
+ "[ENCODER:%d:%s] port %c (index %u) out of bounds for display power state notification\n",
+ intel_encoder->base.base.id, intel_encoder->base.name,
+ port_name(intel_encoder->port), port);
+ return -EINVAL;
+ }
+
if (!enable)
parm |= 4 << 8;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 698bef8ff5d2edea5d1c9d6e5adf1bfed1e8a106 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Mon, 7 Feb 2022 15:26:59 +0200
Subject: [PATCH] drm/i915: Fix dbuf slice config lookup
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Apparently I totally fumbled the loop condition when I
removed the ARRAY_SIZE() stuff from the dbuf slice config
lookup. Comparing the loop index with the active_pipes bitmask
is utter nonsense, what we want to do is check to see if the
mask is zero or not.
Note that the code actually ended up working correctly despite
the fumble, up until commit eef173954432 ("drm/i915: Allow
!join_mbus cases for adlp+ dbuf configuration") when things
broke for real.
Cc: stable(a)vger.kernel.org
Fixes: 05e8155afe35 ("drm/i915: Use a sentinel to terminate the dbuf slice arrays")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220207132700.481-1-ville.sy…
Reviewed-by: Jani Nikula <jani.nikula(a)intel.com>
(cherry picked from commit a28fde308c3c1c174249ff9559b57f24e6850086)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 3edba7fd0c49..da6ce24a42b2 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4870,7 +4870,7 @@ static u8 compute_dbuf_slices(enum pipe pipe, u8 active_pipes, bool join_mbus,
{
int i;
- for (i = 0; i < dbuf_slices[i].active_pipes; i++) {
+ for (i = 0; dbuf_slices[i].active_pipes != 0; i++) {
if (dbuf_slices[i].active_pipes == active_pipes &&
dbuf_slices[i].join_mbus == join_mbus)
return dbuf_slices[i].dbuf_mask[pipe];
I'm announcing the release of the 4.4.302 kernel.
This kernel branch is now END-OF-LIFE. It will not be getting any more
updates from the kernel stable team, and will most likely quickly become
insecure and out-of-date. Do not use it anymore unless you really know
what you are doing.
Note, the CIP project at https://www.cip-project.org/ is considering to
maintain the 4.4 branch in a limited capability going forward. If you
really need to use this kernel version, please contact them.
Some simple statistics for the 4.4.y kernel branch:
First release: January 10, 2016
Last release: Febuary 3, 2022
Days supported: 2216
Changes added: 18712
Changes added per day: 8.44
Unique developers: 3532
Unique companies: 503
It was a good kernel branch, helped out by many to work as well as it
has, thanks to all for your help with this. It has powered many
millions, maybe a few billion, devices out in the world, but now it's
time to say good-bye.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/s390/hypfs/hypfs_vm.c | 6 +-
arch/x86/kvm/x86.c | 14 ++--
drivers/gpu/drm/msm/msm_drv.c | 2
drivers/gpu/drm/radeon/ci_dpm.c | 6 --
drivers/hwmon/lm90.c | 2
drivers/input/serio/i8042-x86ia64io.h | 11 ++-
drivers/media/i2c/tc358743.c | 2
drivers/s390/scsi/zfcp_fc.c | 13 ++++
drivers/scsi/bnx2fc/bnx2fc_fcoe.c | 20 +-----
drivers/tty/n_gsm.c | 4 +
drivers/tty/serial/8250/8250_pci.c | 100 +++++++++++++++++++++++++++++++++-
drivers/tty/serial/stm32-usart.c | 2
drivers/usb/core/hcd.c | 14 ++++
drivers/usb/core/urb.c | 12 ++++
drivers/usb/storage/unusual_devs.h | 10 +++
fs/udf/inode.c | 9 +--
include/linux/netdevice.h | 1
include/net/ip.h | 21 +++----
kernel/power/wakelock.c | 12 +---
net/bluetooth/hci_event.c | 10 +--
net/bluetooth/mgmt.c | 8 +-
net/can/bcm.c | 20 +++---
net/core/net-procfs.c | 38 +++++++++++-
net/ipv4/ip_output.c | 11 +++
net/ipv4/raw.c | 5 +
net/ipv6/ip6_tunnel.c | 8 +-
net/packet/af_packet.c | 2
28 files changed, 267 insertions(+), 98 deletions(-)
Alan Stern (2):
usb-storage: Add unusual-devs entry for VL817 USB-SATA bridge
USB: core: Fix hang in usb_kill_urb by adding memory barriers
Brian Gix (1):
Bluetooth: refactor malicious adv data check
Cameron Williams (1):
tty: Add support for Brainboxes UC cards.
Congyu Liu (1):
net: fix information leakage in /proc/net/ptype
Eric Dumazet (3):
ipv4: avoid using shared IP generator for connected sockets
ipv4: raw: lock the socket in raw_bind()
ipv4: tcp: send zero IPID in SYNACK messages
Greg Kroah-Hartman (2):
PM: wakeup: simplify the output logic of pm_show_wakelocks()
Linux 4.4.302
Guenter Roeck (1):
hwmon: (lm90) Reduce maximum conversion rate for G781
Guillaume Bertholon (5):
Bluetooth: MGMT: Fix misplaced BT_HS check
Revert "drm/radeon/ci: disable mclk switching for high refresh rates (v2)"
Revert "tc358743: fix register i2c_rd/wr function fix"
KVM: x86: Fix misplaced backport of "work around leak of uninitialized stack contents"
Input: i8042 - Fix misplaced backport of "add ASUS Zenbook Flip to noselftest list"
Ido Schimmel (1):
ipv6_tunnel: Rate limit warning messages
Jan Kara (2):
udf: Restore i_lenAlloc when inode expansion fails
udf: Fix NULL ptr deref when converting from inline format
Jianguo Wu (1):
net-procfs: show net devices bound packet types
John Meneghini (1):
scsi: bnx2fc: Flush destroy_work queue before calling bnx2fc_interface_put()
Steffen Maier (1):
scsi: zfcp: Fix failed recovery on gone remote port with non-NPIV FCP devices
Valentin Caron (1):
serial: stm32: fix software flow control transfer
Vasily Gorbik (1):
s390/hypfs: include z/VM guests with access control group set
Xianting Tian (1):
drm/msm: Fix wrong size calculation
Ziyang Xuan (1):
can: bcm: fix UAF of bcm op
daniel.starke(a)siemens.com (1):
tty: n_gsm: fix SW flow control encoding/handling
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
If the only thing that is changing is SAGV vs. no SAGV but
the number of active planes and the total data rates end up
unchanged we currently bail out of intel_bw_atomic_check()
early and forget to actually compute the new WGV point
mask and thus won't actually enable/disable SAGV as requested.
This ends up poorly if we end up running with SAGV enabled
when we shouldn't. Usually ends up in underruns.
To fix this let's go through the QGV point mask computation
if either the data rates/number of planes, or the state
of SAGV is changing.
v2: Check more carefully if things are changing to avoid
the extra calculations/debugs from introducing unwanted
overhead
Cc: stable(a)vger.kernel.org
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com> #v1
Fixes: 20f505f22531 ("drm/i915: Restrict qgv points which don't have enough bandwidth.")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_bw.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c
index 7bbe0dc5926b..1fd1d2182d8f 100644
--- a/drivers/gpu/drm/i915/display/intel_bw.c
+++ b/drivers/gpu/drm/i915/display/intel_bw.c
@@ -830,6 +830,7 @@ int intel_bw_atomic_check(struct intel_atomic_state *state)
unsigned int max_bw_point = 0, max_bw = 0;
unsigned int num_qgv_points = dev_priv->max_bw[0].num_qgv_points;
unsigned int num_psf_gv_points = dev_priv->max_bw[0].num_psf_gv_points;
+ bool changed = false;
u32 mask = 0;
/* FIXME earlier gens need some checks too */
@@ -873,6 +874,8 @@ int intel_bw_atomic_check(struct intel_atomic_state *state)
new_bw_state->data_rate[crtc->pipe] = new_data_rate;
new_bw_state->num_active_planes[crtc->pipe] = new_active_planes;
+ changed = true;
+
drm_dbg_kms(&dev_priv->drm,
"pipe %c data rate %u num active planes %u\n",
pipe_name(crtc->pipe),
@@ -880,7 +883,19 @@ int intel_bw_atomic_check(struct intel_atomic_state *state)
new_bw_state->num_active_planes[crtc->pipe]);
}
- if (!new_bw_state)
+ old_bw_state = intel_atomic_get_old_bw_state(state);
+ new_bw_state = intel_atomic_get_new_bw_state(state);
+
+ if (new_bw_state &&
+ intel_can_enable_sagv(dev_priv, old_bw_state) !=
+ intel_can_enable_sagv(dev_priv, new_bw_state))
+ changed = true;
+
+ /*
+ * If none of our inputs (data rates, number of active
+ * planes, SAGV yes/no) changed then nothing to do here.
+ */
+ if (!changed)
return 0;
ret = intel_atomic_lock_global_state(&new_bw_state->base);
@@ -966,7 +981,6 @@ int intel_bw_atomic_check(struct intel_atomic_state *state)
*/
new_bw_state->qgv_points_mask = ~allowed_points & mask;
- old_bw_state = intel_atomic_get_old_bw_state(state);
/*
* If the actual mask had changed we need to make sure that
* the commits are serialized(in case this is a nomodeset, nonblocking)
--
2.34.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
When changing between SAGV vs. no SAGV on tgl+ we have to
update the use_sagv_wm flag for all the crtcs or else
an active pipe not already in the state will end up using
the wrong watermarks. That is especially bad when we end up
with the tighter non-SAGV watermarks with SAGV enabled.
Usually ends up in underruns.
Cc: stable(a)vger.kernel.org
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
Fixes: 7241c57d3140 ("drm/i915: Add TGL+ SAGV support")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/intel_pm.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d4d487f040a1..d0fcc4586983 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4008,6 +4008,17 @@ static int intel_compute_sagv_mask(struct intel_atomic_state *state)
return ret;
}
+ if (intel_can_enable_sagv(dev_priv, new_bw_state) !=
+ intel_can_enable_sagv(dev_priv, old_bw_state)) {
+ ret = intel_atomic_serialize_global_state(&new_bw_state->base);
+ if (ret)
+ return ret;
+ } else if (new_bw_state->pipe_sagv_reject != old_bw_state->pipe_sagv_reject) {
+ ret = intel_atomic_lock_global_state(&new_bw_state->base);
+ if (ret)
+ return ret;
+ }
+
for_each_new_intel_crtc_in_state(state, crtc,
new_crtc_state, i) {
struct skl_pipe_wm *pipe_wm = &new_crtc_state->wm.skl.optimal;
@@ -4023,17 +4034,6 @@ static int intel_compute_sagv_mask(struct intel_atomic_state *state)
intel_can_enable_sagv(dev_priv, new_bw_state);
}
- if (intel_can_enable_sagv(dev_priv, new_bw_state) !=
- intel_can_enable_sagv(dev_priv, old_bw_state)) {
- ret = intel_atomic_serialize_global_state(&new_bw_state->base);
- if (ret)
- return ret;
- } else if (new_bw_state->pipe_sagv_reject != old_bw_state->pipe_sagv_reject) {
- ret = intel_atomic_lock_global_state(&new_bw_state->base);
- if (ret)
- return ret;
- }
-
return 0;
}
--
2.34.1
Allthough kernel text is always mapped with BATs, we still have
inittext mapped with pages, so TLB miss handling is required
when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.
The final solution should be to set a BAT that also maps inittext
but that BAT then needs to be cleared at end of init, and it will
require more changes to be able to do it properly.
As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big
deal so let's fix it simply for now to enable easy stable application.
Reported-by: Maxime Bizon <mbizon(a)freebox.fr>
Fixes: 035b19a15a98 ("powerpc/32s: Always map kernel text and rodata with BATs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
---
arch/powerpc/kernel/head_book3s_32.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 68e5c0a7e99d..2e2a8211b17b 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -421,14 +421,14 @@ InstructionTLBMiss:
*/
/* Get PTE (linux-style) and check access */
mfspr r3,SPRN_IMISS
-#ifdef CONFIG_MODULES
+#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
lis r1, TASK_SIZE@h /* check if kernel address */
cmplw 0,r1,r3
#endif
mfspr r2, SPRN_SDR1
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
rlwinm r2, r2, 28, 0xfffff000
-#ifdef CONFIG_MODULES
+#if defined(CONFIG_MODULES) || defined(CONFIG_DEBUG_PAGEALLOC) || defined(CONFIG_KFENCE)
bgt- 112f
lis r2, (swapper_pg_dir - PAGE_OFFSET)@ha /* if kernel address, use */
li r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
--
2.33.1
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit e386dfc56f837da66d00a078e5314bc8382fab83 upstream.
Commit 054aa8d439b9 ("fget: check that the fd still exists after getting
a ref to it") fixed a race with getting a reference to a file just as it
was being closed. It was a fairly minimal patch, and I didn't think
re-checking the file pointer lookup would be a measurable overhead,
since it was all right there and cached.
But I was wrong, as pointed out by the kernel test robot.
The 'poll2' case of the will-it-scale.per_thread_ops benchmark regressed
quite noticeably. Admittedly it seems to be a very artificial test:
doing "poll()" system calls on regular files in a very tight loop in
multiple threads.
That means that basically all the time is spent just looking up file
descriptors without ever doing anything useful with them (not that doing
'poll()' on a regular file is useful to begin with). And as a result it
shows the extra "re-check fd" cost as a sore thumb.
Happily, the regression is fixable by just writing the code to loook up
the fd to be better and clearer. There's still a cost to verify the
file pointer, but now it's basically in the noise even for that
benchmark that does nothing else - and the code is more understandable
and has better comments too.
[ Side note: this patch is also a classic case of one that looks very
messy with the default greedy Myers diff - it's much more legible with
either the patience of histogram diff algorithm ]
Link: https://lore.kernel.org/lkml/20211210053743.GA36420@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/20211213083154.GA20853@linux.intel.com/
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Tested-by: Carel Si <beibei.si(a)intel.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Miklos Szeredi <mszeredi(a)redhat.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
fs/file.c | 72 ++++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 56 insertions(+), 16 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index 9d02352fa18c..79a76d04c7c3 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -817,28 +817,68 @@ void do_close_on_exec(struct files_struct *files)
spin_unlock(&files->file_lock);
}
-static struct file *__fget_files(struct files_struct *files, unsigned int fd,
- fmode_t mask, unsigned int refs)
+static inline struct file *__fget_files_rcu(struct files_struct *files,
+ unsigned int fd, fmode_t mask, unsigned int refs)
{
- struct file *file;
+ for (;;) {
+ struct file *file;
+ struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+ struct file __rcu **fdentry;
- rcu_read_lock();
-loop:
- file = fcheck_files(files, fd);
- if (file) {
- /* File object ref couldn't be taken.
- * dup2() atomicity guarantee is the reason
- * we loop to catch the new file (or NULL pointer)
+ if (unlikely(fd >= fdt->max_fds))
+ return NULL;
+
+ fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+ file = rcu_dereference_raw(*fdentry);
+ if (unlikely(!file))
+ return NULL;
+
+ if (unlikely(file->f_mode & mask))
+ return NULL;
+
+ /*
+ * Ok, we have a file pointer. However, because we do
+ * this all locklessly under RCU, we may be racing with
+ * that file being closed.
+ *
+ * Such a race can take two forms:
+ *
+ * (a) the file ref already went down to zero,
+ * and get_file_rcu_many() fails. Just try
+ * again:
*/
- if (file->f_mode & mask)
- file = NULL;
- else if (!get_file_rcu_many(file, refs))
- goto loop;
- else if (__fcheck_files(files, fd) != file) {
+ if (unlikely(!get_file_rcu_many(file, refs)))
+ continue;
+
+ /*
+ * (b) the file table entry has changed under us.
+ * Note that we don't need to re-check the 'fdt->fd'
+ * pointer having changed, because it always goes
+ * hand-in-hand with 'fdt'.
+ *
+ * If so, we need to put our refs and try again.
+ */
+ if (unlikely(rcu_dereference_raw(files->fdt) != fdt) ||
+ unlikely(rcu_dereference_raw(*fdentry) != file)) {
fput_many(file, refs);
- goto loop;
+ continue;
}
+
+ /*
+ * Ok, we have a ref to the file, and checked that it
+ * still exists.
+ */
+ return file;
}
+}
+
+static struct file *__fget_files(struct files_struct *files, unsigned int fd,
+ fmode_t mask, unsigned int refs)
+{
+ struct file *file;
+
+ rcu_read_lock();
+ file = __fget_files_rcu(files, fd, mask, refs);
rcu_read_unlock();
return file;
--
2.31.1
The mmc0 clock gate bit was mistakenly assigned to "i2s" clock.
You can find that the same bit is assigned to "mmc0" too.
It leads to mmc0 hang for a long time after any sound activity
also it prevented PM_SLEEP to work properly.
I guess it was introduced by copy-paste from jz4740 driver
where it is really controls I2S clock gate.
Changelog v2 .. v3:
- Added tags Fixes and Reviewed-by.
Changelog v1 .. v2:
- Added useful info above to the commit itself.
Siarhei Volkau (1):
clk: jz4725b: fix mmc0 clock gating
drivers/clk/ingenic/jz4725b-cgu.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--
2.35.1
In v5.16 btrfs had a big rework (originally considered as a refactor
only) on defrag, which brings quite some regressions.
With those regressions, extra scrutiny is brought to defrag code, and
some existing bugs are exposed.
For v5.15, there are those patches need to be backported:
(Tracked through github issue:
https://github.com/btrfs/linux/issues/422)
- Already upstreamed:
"btrfs: don't hold CPU for too long when defragging a file"
The first patch.
- In maintainer's tree
btrfs: defrag: don't try to merge regular extents with preallocated extents
btrfs: defrag: don't defrag extents which are already at max capacity
btrfs: defrag: remove an ambiguous condition for rejection
Those will be backported when they get merged upstream.
- One special fix for v5.15
The 2nd patch.
The problem is, that patch has no upstream commit, since v5.16
reworked the defrag code and fix the problem unintentionally.
It's not a good idea to bring the whole rework (along with its bugs)
to stable kernel.
So here I crafted a small fix for v5.15 only.
Not sure if this is conflicted with any stable policy.
Qu Wenruo (2):
btrfs: don't hold CPU for too long when defragging a file
btrfs: defrag: use the same cluster size for defrag ioctl and
autodefrag
fs/btrfs/ioctl.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--
2.35.1
The patch titled
Subject: userfaultfd: mark uffd_wp regardless of VM_WRITE flag
has been added to the -mm tree. Its filename is
userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/userfaultfd-mark-uffd_wp-regardle…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/userfaultfd-mark-uffd_wp-regardle…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Nadav Amit <namit(a)vmware.com>
Subject: userfaultfd: mark uffd_wp regardless of VM_WRITE flag
When a PTE is set by UFFD operations such as UFFDIO_COPY, the PTE is
currently only marked as write-protected if the VMA has VM_WRITE flag set.
This seems incorrect or at least would be unexpected by the users.
Consider the following sequence of operations that are being performed on
a certain page:
mprotect(PROT_READ)
UFFDIO_COPY(UFFDIO_COPY_MODE_WP)
mprotect(PROT_READ|PROT_WRITE)
At this point the user would expect to still get UFFD notification when
the page is accessed for write, but the user would not get one, since the
PTE was not marked as UFFD_WP during UFFDIO_COPY.
Fix it by always marking PTEs as UFFD_WP regardless on the
write-permission in the VMA flags.
Link: https://lkml.kernel.org/r/20220217211602.2769-1-namit@vmware.com
Fixes: 292924b26024 ("userfaultfd: wp: apply _PAGE_UFFD_WP bit")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
--- a/mm/userfaultfd.c~userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag
+++ a/mm/userfaultfd.c
@@ -72,12 +72,15 @@ int mfill_atomic_install_pte(struct mm_s
_dst_pte = pte_mkdirty(_dst_pte);
if (page_in_cache && !vm_shared)
writable = false;
- if (writable) {
- if (wp_copy)
- _dst_pte = pte_mkuffd_wp(_dst_pte);
- else
- _dst_pte = pte_mkwrite(_dst_pte);
- }
+
+ /*
+ * Always mark a PTE as write-protected when needed, regardless of
+ * VM_WRITE, which the user might change.
+ */
+ if (wp_copy)
+ _dst_pte = pte_mkuffd_wp(_dst_pte);
+ else if (writable)
+ _dst_pte = pte_mkwrite(_dst_pte);
dst_pte = pte_offset_map_lock(dst_mm, dst_pmd, dst_addr, &ptl);
_
Patches currently in -mm which might be from namit(a)vmware.com are
userfaultfd-mark-uffd_wp-regardless-of-vm_write-flag.patch
Hi Greg,
Please consider cherry-pick this patch series into 5.16.x stable. It
includes a fix to a bug in 5.16 stable which allows a user with cap_bpf
privileges to get root privileges. The patch that fixes the bug is
patch 7/9: bpf: Make per_cpu_ptr return rdonly
The rest are the depedences required by the fix patch. This patchset has
been merged in mainline v5.17. The patches were not planned to backport
because of its complex dependences.
Tested by compile, build and run through a subset of bpf test_progs.
Hao Luo (9):
bpf: Introduce composable reg, ret and arg types.
bpf: Replace ARG_XXX_OR_NULL with ARG_XXX | PTR_MAYBE_NULL
bpf: Replace RET_XXX_OR_NULL with RET_XXX | PTR_MAYBE_NULL
bpf: Replace PTR_TO_XXX_OR_NULL with PTR_TO_XXX | PTR_MAYBE_NULL
bpf: Introduce MEM_RDONLY flag
bpf: Convert PTR_TO_MEM_OR_NULL to composable types.
bpf: Make per_cpu_ptr return rdonly PTR_TO_MEM.
bpf: Add MEM_RDONLY for helper args that are pointers to rdonly mem.
bpf/selftests: Test PTR_TO_RDONLY_MEM
include/linux/bpf.h | 101 +++-
include/linux/bpf_verifier.h | 17 +
kernel/bpf/btf.c | 12 +-
kernel/bpf/cgroup.c | 2 +-
kernel/bpf/helpers.c | 12 +-
kernel/bpf/map_iter.c | 4 +-
kernel/bpf/ringbuf.c | 2 +-
kernel/bpf/syscall.c | 2 +-
kernel/bpf/verifier.c | 488 +++++++++---------
kernel/trace/bpf_trace.c | 26 +-
net/core/bpf_sk_storage.c | 2 +-
net/core/filter.c | 64 +--
net/core/sock_map.c | 2 +-
.../selftests/bpf/prog_tests/ksyms_btf.c | 14 +
.../bpf/progs/test_ksyms_btf_write_check.c | 29 ++
15 files changed, 444 insertions(+), 333 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_ksyms_btf_write_check.c
--
2.35.1.265.g69c8d7142f-goog
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 23e7b1bfed61e301853b5e35472820d919498278 Mon Sep 17 00:00:00 2001
From: Guillaume Nault <gnault(a)redhat.com>
Date: Mon, 10 Jan 2022 14:43:06 +0100
Subject: [PATCH] xfrm: Don't accidentally set RTO_ONLINK in decode_session4()
Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"),
clear the ECN bits from iph->tos when setting ->flowi4_tos.
This ensures that the last bit of ->flowi4_tos is cleared, so
ip_route_output_key_hash() isn't going to restrict the scope of the
route lookup.
Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason
to clear the high order bits.
Found by code inspection, compile tested only.
Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups")
Signed-off-by: Guillaume Nault <gnault(a)redhat.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index dccb8f3318ef..04d1ce9b510f 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -31,6 +31,7 @@
#include <linux/if_tunnel.h>
#include <net/dst.h>
#include <net/flow.h>
+#include <net/inet_ecn.h>
#include <net/xfrm.h>
#include <net/ip.h>
#include <net/gre.h>
@@ -3295,7 +3296,7 @@ decode_session4(struct sk_buff *skb, struct flowi *fl, bool reverse)
fl4->flowi4_proto = iph->protocol;
fl4->daddr = reverse ? iph->saddr : iph->daddr;
fl4->saddr = reverse ? iph->daddr : iph->saddr;
- fl4->flowi4_tos = iph->tos;
+ fl4->flowi4_tos = iph->tos & ~INET_ECN_MASK;
if (!ip_is_fragment(iph)) {
switch (iph->protocol) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54309fde1a352ad2674ebba004a79f7d20b9f037 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20L=C3=B6hle?= <CLoehle(a)hyperstone.com>
Date: Fri, 4 Feb 2022 15:11:37 +0000
Subject: [PATCH] mmc: block: fix read single on recovery logic
On reads with MMC_READ_MULTIPLE_BLOCK that fail,
the recovery handler will use MMC_READ_SINGLE_BLOCK for
each of the blocks, up to MMC_READ_SINGLE_RETRIES times each.
The logic for this is fixed to never report unsuccessful reads
as success to the block layer.
On command error with retries remaining, blk_update_request was
called with whatever value error was set last to.
In case it was last set to BLK_STS_OK (default), the read will be
reported as success, even though there was no data read from the device.
This could happen on a CRC mismatch for the response,
a card rejecting the command (e.g. again due to a CRC mismatch).
In case it was last set to BLK_STS_IOERR, the error is reported correctly,
but no retries will be attempted.
Fixes: 81196976ed946c ("mmc: block: Add blk-mq support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christian Loehle <cloehle(a)hyperstone.com>
Reviewed-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/bc706a6ab08c4fe2834ba0c05a804672@hyperstone.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 4e61b28a002f..8d718aa56d33 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1682,31 +1682,31 @@ static void mmc_blk_read_single(struct mmc_queue *mq, struct request *req)
struct mmc_card *card = mq->card;
struct mmc_host *host = card->host;
blk_status_t error = BLK_STS_OK;
- int retries = 0;
do {
u32 status;
int err;
+ int retries = 0;
- mmc_blk_rw_rq_prep(mqrq, card, 1, mq);
+ while (retries++ <= MMC_READ_SINGLE_RETRIES) {
+ mmc_blk_rw_rq_prep(mqrq, card, 1, mq);
- mmc_wait_for_req(host, mrq);
+ mmc_wait_for_req(host, mrq);
- err = mmc_send_status(card, &status);
- if (err)
- goto error_exit;
-
- if (!mmc_host_is_spi(host) &&
- !mmc_ready_for_data(status)) {
- err = mmc_blk_fix_state(card, req);
+ err = mmc_send_status(card, &status);
if (err)
goto error_exit;
- }
- if (mrq->cmd->error && retries++ < MMC_READ_SINGLE_RETRIES)
- continue;
+ if (!mmc_host_is_spi(host) &&
+ !mmc_ready_for_data(status)) {
+ err = mmc_blk_fix_state(card, req);
+ if (err)
+ goto error_exit;
+ }
- retries = 0;
+ if (!mrq->cmd->error)
+ break;
+ }
if (mrq->cmd->error ||
mrq->data->error ||
Upstream commit 2b17c400aeb44daf041627722581ade527bb3c1d
The fixes tag of the uptream patch points to commit 921ca574cd38 ("can:
isotp: add SF_BROADCAST support for functional addressing") which showed
up in Linux 5.11 but the described issue already existed in Linux 5.10.
Norbert Slusarek writes:
A race condition was found in isotp_setsockopt() which allows to
change socket options after the socket was bound.
For the specific case of SF_BROADCAST support, this might lead to possible
use-after-free because can_rx_unregister() is not called.
Checking for the flag under the socket lock in isotp_bind() and taking
the lock in isotp_setsockopt() fixes the issue.
Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol")
Link: https://lore.kernel.org/r/trinity-e6ae9efa-9afb-4326-84c0-f3609b9b8168-1620…
Reported-by: Norbert Slusarek <nslusarek(a)gmx.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Signed-off-by: Norbert Slusarek <nslusarek(a)gmx.net>
Acked-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
---
net/can/isotp.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/net/can/isotp.c b/net/can/isotp.c
index 37db4d232313..3f11d2b314b6 100644
--- a/net/can/isotp.c
+++ b/net/can/isotp.c
@@ -1191,20 +1191,17 @@ static int isotp_getname(struct socket *sock, struct sockaddr *uaddr, int peer)
addr->can_addr.tp.tx_id = so->txid;
return ISOTP_MIN_NAMELEN;
}
-static int isotp_setsockopt(struct socket *sock, int level, int optname,
+static int isotp_setsockopt_locked(struct socket *sock, int level, int optname,
sockptr_t optval, unsigned int optlen)
{
struct sock *sk = sock->sk;
struct isotp_sock *so = isotp_sk(sk);
int ret = 0;
- if (level != SOL_CAN_ISOTP)
- return -EINVAL;
-
if (so->bound)
return -EISCONN;
switch (optname) {
case CAN_ISOTP_OPTS:
@@ -1275,10 +1272,26 @@ static int isotp_setsockopt(struct socket *sock, int level, int optname,
}
return ret;
}
+static int isotp_setsockopt(struct socket *sock, int level, int optname,
+ sockptr_t optval, unsigned int optlen)
+
+{
+ struct sock *sk = sock->sk;
+ int ret;
+
+ if (level != SOL_CAN_ISOTP)
+ return -EINVAL;
+
+ lock_sock(sk);
+ ret = isotp_setsockopt_locked(sock, level, optname, optval, optlen);
+ release_sock(sk);
+ return ret;
+}
+
static int isotp_getsockopt(struct socket *sock, int level, int optname,
char __user *optval, int __user *optlen)
{
struct sock *sk = sock->sk;
struct isotp_sock *so = isotp_sk(sk);
--
2.30.2
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit bfb3aa735f82c8d98b32a669934ee7d6b346264d ]
An outgoing CPU is marked offline in a stop-machine handler and most
of that CPU's services stop at that point, including IRQ work queues.
However, that CPU must take another pass through the scheduler and through
a number of CPU-hotplug notifiers, many of which contain RCU readers.
In the past, these readers were not a problem because the outgoing CPU
has interrupts disabled, so that rcu_read_unlock_special() would not
be invoked, and thus RCU would never attempt to queue IRQ work on the
outgoing CPU.
This changed with the advent of the CONFIG_RCU_STRICT_GRACE_PERIOD
Kconfig option, in which rcu_read_unlock_special() is invoked upon exit
from almost all RCU read-side critical sections. Worse yet, because
interrupts are disabled, rcu_read_unlock_special() cannot immediately
report a quiescent state and will therefore attempt to defer this
reporting, for example, by queueing IRQ work. Which fails with a splat
because the CPU is already marked as being offline.
But it turns out that there is no need to report this quiescent state
because rcu_report_dead() will do this job shortly after the outgoing
CPU makes its final dive into the idle loop. This commit therefore
makes rcu_read_unlock_special() refrain from queuing IRQ work onto
outgoing CPUs.
Fixes: 44bad5b3cca2 ("rcu: Do full report for .need_qs for strict GPs")
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Cc: Jann Horn <jannh(a)google.com>
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
---
kernel/rcu/tree_plugin.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 6ed153f226b3925..244f32e98360fdf 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -628,7 +628,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
set_tsk_need_resched(current);
set_preempt_need_resched();
if (IS_ENABLED(CONFIG_IRQ_WORK) && irqs_were_disabled &&
- !rdp->defer_qs_iw_pending && exp) {
+ !rdp->defer_qs_iw_pending && exp && cpu_online(rdp->cpu)) {
// Get scheduler to re-evaluate and call hooks.
// If !IRQ_WORK, FQS scan will eventually IPI.
init_irq_work(&rdp->defer_qs_iw,
--
2.26.0.106.g9fadedd
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0764db9b49c932b89ee4d9e3236dff4bb07b4a66 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Fri, 11 Feb 2022 16:32:32 -0800
Subject: [PATCH] mm: memcg: synchronize objcg lists with a dedicated spinlock
Alexander reported a circular lock dependency revealed by the mmap1 ltp
test:
LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1))
WARNING: possible circular locking dependency detected
5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted
------------------------------------------------------
mmap1/202299 is trying to acquire lock:
00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0
but task is already holding lock:
00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&sighand->siglock){-.-.}-{2:2}:
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
__lock_task_sighand+0x90/0x190
cgroup_freeze_task+0x2e/0x90
cgroup_migrate_execute+0x11c/0x608
cgroup_update_dfl_csses+0x246/0x270
cgroup_subtree_control_write+0x238/0x518
kernfs_fop_write_iter+0x13e/0x1e0
new_sync_write+0x100/0x190
vfs_write+0x22c/0x2d8
ksys_write+0x6c/0xf8
__do_syscall+0x1da/0x208
system_call+0x82/0xb0
-> #0 (css_set_lock){..-.}-{2:2}:
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sighand->siglock);
lock(css_set_lock);
lock(&sighand->siglock);
lock(css_set_lock);
*** DEADLOCK ***
2 locks held by mmap1/202299:
#0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180
#1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168
stack backtrace:
CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1
Hardware name: IBM 3906 M04 704 (LPAR)
Call Trace:
dump_stack_lvl+0x76/0x98
check_noncircular+0x136/0x158
check_prev_add+0xe0/0xed8
validate_chain+0x736/0xb20
__lock_acquire+0x604/0xbd8
lock_acquire.part.0+0xe2/0x238
lock_acquire+0xb0/0x200
_raw_spin_lock_irqsave+0x6a/0xd8
obj_cgroup_release+0x4a/0xe0
percpu_ref_put_many.constprop.0+0x150/0x168
drain_obj_stock+0x94/0xe8
refill_obj_stock+0x94/0x278
obj_cgroup_charge+0x164/0x1d8
kmem_cache_alloc+0xac/0x528
__sigqueue_alloc+0x150/0x308
__send_signal+0x260/0x550
send_signal+0x7e/0x348
force_sig_info_to_task+0x104/0x180
force_sig_fault+0x48/0x58
__do_pgm_check+0x120/0x1f0
pgm_check_handler+0x11e/0x180
INFO: lockdep is turned off.
In this example a slab allocation from __send_signal() caused a
refilling and draining of a percpu objcg stock, resulted in a releasing
of another non-related objcg. Objcg release path requires taking the
css_set_lock, which is used to synchronize objcg lists.
This can create a circular dependency with the sighandler lock, which is
taken with the locked css_set_lock by the freezer code (to freeze a
task).
In general it seems that using css_set_lock to synchronize objcg lists
makes any slab allocations and deallocation with the locked css_set_lock
and any intervened locks risky.
To fix the problem and make the code more robust let's stop using
css_set_lock to synchronize objcg lists and use a new dedicated spinlock
instead.
Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reported-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Tested-by: Alexander Egorenkov <egorenar(a)linux.ibm.com>
Reviewed-by: Waiman Long <longman(a)redhat.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Jeremy Linton <jeremy.linton(a)arm.com>
Tested-by: Jeremy Linton <jeremy.linton(a)arm.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b72d75141e12..0abbd685703b 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -219,7 +219,7 @@ struct obj_cgroup {
struct mem_cgroup *memcg;
atomic_t nr_charged_bytes;
union {
- struct list_head list;
+ struct list_head list; /* protected by objcg_lock */
struct rcu_head rcu;
};
};
@@ -315,7 +315,8 @@ struct mem_cgroup {
#ifdef CONFIG_MEMCG_KMEM
int kmemcg_id;
struct obj_cgroup __rcu *objcg;
- struct list_head objcg_list; /* list of inherited objcgs */
+ /* list of inherited objcgs, protected by objcg_lock */
+ struct list_head objcg_list;
#endif
MEMCG_PADDING(_pad2_);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09d342c7cbd0..36e9f38c919d 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -254,7 +254,7 @@ struct mem_cgroup *vmpressure_to_memcg(struct vmpressure *vmpr)
}
#ifdef CONFIG_MEMCG_KMEM
-extern spinlock_t css_set_lock;
+static DEFINE_SPINLOCK(objcg_lock);
bool mem_cgroup_kmem_disabled(void)
{
@@ -298,9 +298,9 @@ static void obj_cgroup_release(struct percpu_ref *ref)
if (nr_pages)
obj_cgroup_uncharge_pages(objcg, nr_pages);
- spin_lock_irqsave(&css_set_lock, flags);
+ spin_lock_irqsave(&objcg_lock, flags);
list_del(&objcg->list);
- spin_unlock_irqrestore(&css_set_lock, flags);
+ spin_unlock_irqrestore(&objcg_lock, flags);
percpu_ref_exit(ref);
kfree_rcu(objcg, rcu);
@@ -332,7 +332,7 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
objcg = rcu_replace_pointer(memcg->objcg, NULL, true);
- spin_lock_irq(&css_set_lock);
+ spin_lock_irq(&objcg_lock);
/* 1) Ready to reparent active objcg. */
list_add(&objcg->list, &memcg->objcg_list);
@@ -342,7 +342,7 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
/* 3) Move already reparented objcgs to the parent's list */
list_splice(&memcg->objcg_list, &parent->objcg_list);
- spin_unlock_irq(&css_set_lock);
+ spin_unlock_irq(&objcg_lock);
percpu_ref_kill(&objcg->refcnt);
}
subject: drm/nouveau/pmu/gm200-: use alternate falcon reset sequence
commit id: 4cdd2450bf739bada353e82d27b00db9af8c3001
kernels: 5.16 5.15 and 5.10
1d2271d2fb85e54bfc9630a6c30ac0feb9ffb983 got backported to a bunch of
kernel versions, which wasn't really intentional from nouveaus side. I
assume this happened because autosel picked it up due to the
"Reported-by" tag? Anyway, It actually requires
4cdd2450bf739bada353e82d27b00db9af8c3001 as well in order to not
regress systems. One bug report can be found here:
https://gitlab.freedesktop.org/drm/nouveau/-/issues/149
Users did test that applying 4cdd2450bf739bada353e82d27b00db9af8c3001
on top fixes the problems.
We still want to figure out what to do with 5.4 and older, because the
fix doesn't apply there, so we will either request a revert or send
modified patches.
Thanks
commit 57bc3d3ae8c14df3ceb4e17d26ddf9eeab304581 upstream.
ax88179_rx_fixup() contains several out-of-bounds accesses that can be
triggered by a malicious (or defective) USB device, in particular:
- The metadata array (hdr_off..hdr_off+2*pkt_cnt) can be out of bounds,
causing OOB reads and (on big-endian systems) OOB endianness flips.
- A packet can overlap the metadata array, causing a later OOB
endianness flip to corrupt data used by a cloned SKB that has already
been handed off into the network stack.
- A packet SKB can be constructed whose tail is far beyond its end,
causing out-of-bounds heap data to be considered part of the SKB's
data.
I have tested that this can be used by a malicious USB device to send a
bogus ICMPv6 Echo Request and receive an ICMPv6 Echo Reply in response
that contains random kernel heap data.
It's probably also possible to get OOB writes from this on a
little-endian system somehow - maybe by triggering skb_cow() via IP
options processing -, but I haven't tested that.
Fixes: e2ca90c276e1 ("ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver")
Cc: stable(a)kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
drivers/net/usb/ax88179_178a.c | 68 +++++++++++++++++++---------------
1 file changed, 39 insertions(+), 29 deletions(-)
diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c
index b2434b479846..684eec0aa0d6 100644
--- a/drivers/net/usb/ax88179_178a.c
+++ b/drivers/net/usb/ax88179_178a.c
@@ -1373,59 +1373,69 @@ static int ax88179_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
u16 hdr_off;
u32 *pkt_hdr;
- /* This check is no longer done by usbnet */
- if (skb->len < dev->net->hard_header_len)
+ /* At the end of the SKB, there's a header telling us how many packets
+ * are bundled into this buffer and where we can find an array of
+ * per-packet metadata (which contains elements encoded into u16).
+ */
+ if (skb->len < 4)
return 0;
-
skb_trim(skb, skb->len - 4);
memcpy(&rx_hdr, skb_tail_pointer(skb), 4);
le32_to_cpus(&rx_hdr);
-
pkt_cnt = (u16)rx_hdr;
hdr_off = (u16)(rx_hdr >> 16);
+
+ if (pkt_cnt == 0)
+ return 0;
+
+ /* Make sure that the bounds of the metadata array are inside the SKB
+ * (and in front of the counter at the end).
+ */
+ if (pkt_cnt * 2 + hdr_off > skb->len)
+ return 0;
pkt_hdr = (u32 *)(skb->data + hdr_off);
- while (pkt_cnt--) {
+ /* Packets must not overlap the metadata array */
+ skb_trim(skb, hdr_off);
+
+ for (; ; pkt_cnt--, pkt_hdr++) {
u16 pkt_len;
le32_to_cpus(pkt_hdr);
pkt_len = (*pkt_hdr >> 16) & 0x1fff;
- /* Check CRC or runt packet */
- if ((*pkt_hdr & AX_RXHDR_CRC_ERR) ||
- (*pkt_hdr & AX_RXHDR_DROP_ERR)) {
- skb_pull(skb, (pkt_len + 7) & 0xFFF8);
- pkt_hdr++;
- continue;
- }
-
- if (pkt_cnt == 0) {
- skb->len = pkt_len;
- /* Skip IP alignment pseudo header */
- skb_pull(skb, 2);
- skb_set_tail_pointer(skb, skb->len);
- skb->truesize = pkt_len + sizeof(struct sk_buff);
- ax88179_rx_checksum(skb, pkt_hdr);
- return 1;
- }
+ if (pkt_len > skb->len)
+ return 0;
- ax_skb = skb_clone(skb, GFP_ATOMIC);
- if (ax_skb) {
+ /* Check CRC or runt packet */
+ if (((*pkt_hdr & (AX_RXHDR_CRC_ERR | AX_RXHDR_DROP_ERR)) == 0) &&
+ pkt_len >= 2 + ETH_HLEN) {
+ bool last = (pkt_cnt == 0);
+
+ if (last) {
+ ax_skb = skb;
+ } else {
+ ax_skb = skb_clone(skb, GFP_ATOMIC);
+ if (!ax_skb)
+ return 0;
+ }
ax_skb->len = pkt_len;
/* Skip IP alignment pseudo header */
skb_pull(ax_skb, 2);
skb_set_tail_pointer(ax_skb, ax_skb->len);
ax_skb->truesize = pkt_len + sizeof(struct sk_buff);
ax88179_rx_checksum(ax_skb, pkt_hdr);
+
+ if (last)
+ return 1;
+
usbnet_skb_return(dev, ax_skb);
- } else {
- return 0;
}
- skb_pull(skb, (pkt_len + 7) & 0xFFF8);
- pkt_hdr++;
+ /* Trim this packet away from the SKB */
+ if (!skb_pull(skb, (pkt_len + 7) & 0xFFF8))
+ return 0;
}
- return 1;
}
static struct sk_buff *
base-commit: 6b09c9f0e648f3b91449afb6a308488f3af414c1
--
2.35.1.265.g69c8d7142f-goog
commit 1cf5f151d25fcca94689efd91afa0253621fb33a upstream.
-Wunaligned-access is a new warning in clang that is default enabled for
arm and arm64 under certain circumstances within the clang frontend (see
LLVM commit below). On v5.17-rc2, an ARCH=arm allmodconfig build shows
1284 total/70 unique instances of this warning (most of the instances
are in header files), which is quite noisy.
To keep a normal build green through CONFIG_WERROR, only show this
warning with W=1, which will allow automated build systems to catch new
instances of the warning so that the total number can be driven down to
zero eventually since catching unaligned accesses at compile time would
be generally useful.
Cc: stable(a)vger.kernel.org
Link: https://github.com/llvm/llvm-project/commit/35737df4dcd28534bd3090157c224c1…
Link: https://github.com/ClangBuiltLinux/linux/issues/1569
Link: https://github.com/ClangBuiltLinux/linux/issues/1576
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
[nathan: Fix conflict due to lack of afe956c577b2d]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I am not sure how many people are using ToT clang with ARCH=arm on
stable but given how noisy this warning can be, I think it is worth
applying this to all applicable stable branches.
This applies to 4.9 through 5.4 with 'patch -Np1'.
scripts/Makefile.extrawarn | 1 +
1 file changed, 1 insertion(+)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index ca08f2fe7c34..854e2ba9daa2 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -49,6 +49,7 @@ KBUILD_CFLAGS += -Wno-format
KBUILD_CFLAGS += -Wno-sign-compare
KBUILD_CFLAGS += -Wno-format-zero-length
KBUILD_CFLAGS += $(call cc-disable-warning, pointer-to-enum-cast)
+KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access)
endif
endif
base-commit: 52871671099d1bb3fca5ed076029e4b937bfc053
--
2.35.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
If the only thing that is changing is SAGV vs. no SAGV but
the number of active planes and the total data rates end up
unchanged we currently bail out of intel_bw_atomic_check()
early and forget to actually compute the new WGV point
mask and thus won't actually enable/disable SAGV as requested.
This ends up poorly if we end up running with SAGV enabled
when we shouldn't. Usually ends up in underruns.
To fix this let's go through the QGV point mask computation
if anyone else already added the bw state for us.
Cc: stable(a)vger.kernel.org
Cc: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
Fixes: 20f505f22531 ("drm/i915: Restrict qgv points which don't have enough bandwidth.")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_bw.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_bw.c b/drivers/gpu/drm/i915/display/intel_bw.c
index 23aa8e06de18..d72ccee7d53b 100644
--- a/drivers/gpu/drm/i915/display/intel_bw.c
+++ b/drivers/gpu/drm/i915/display/intel_bw.c
@@ -846,6 +846,13 @@ int intel_bw_atomic_check(struct intel_atomic_state *state)
if (num_psf_gv_points > 0)
mask |= REG_GENMASK(num_psf_gv_points - 1, 0) << ADLS_PSF_PT_SHIFT;
+ /*
+ * If we already have the bw state then recompute everything
+ * even if pipe data_rate / active_planes didn't change.
+ * Other things (such as SAGV) may have changed.
+ */
+ new_bw_state = intel_atomic_get_new_bw_state(state);
+
for_each_oldnew_intel_crtc_in_state(state, crtc, old_crtc_state,
new_crtc_state, i) {
unsigned int old_data_rate =
--
2.34.1
This is a note to let you know that I've just added the patch titled
usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 62e3f0afe246720f7646eb1b034a6897dac34405 Mon Sep 17 00:00:00 2001
From: Hans de Goede <hdegoede(a)redhat.com>
Date: Sun, 13 Feb 2022 14:05:17 +0100
Subject: usb: dwc3: pci: Fix Bay Trail phy GPIO mappings
When the Bay Trail phy GPIO mappings where added cs and reset were swapped,
this did not cause any issues sofar, because sofar they were always driven
high/low at the same time.
Note the new mapping has been verified both in /sys/kernel/debug/gpio
output on Android factory images on multiple devices, as well as in
the schematics for some devices.
Fixes: 5741022cbdf3 ("usb: dwc3: pci: Add GPIO lookup table on platforms without ACPI GPIO resources")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20220213130524.18748-3-hdegoede@redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/dwc3-pci.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
index 18ab49b8e66e..06d0e88ec8af 100644
--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -86,8 +86,8 @@ static const struct acpi_gpio_mapping acpi_dwc3_byt_gpios[] = {
static struct gpiod_lookup_table platform_bytcr_gpios = {
.dev_id = "0000:00:16.0",
.table = {
- GPIO_LOOKUP("INT33FC:00", 54, "reset", GPIO_ACTIVE_HIGH),
- GPIO_LOOKUP("INT33FC:02", 14, "cs", GPIO_ACTIVE_HIGH),
+ GPIO_LOOKUP("INT33FC:00", 54, "cs", GPIO_ACTIVE_HIGH),
+ GPIO_LOOKUP("INT33FC:02", 14, "reset", GPIO_ACTIVE_HIGH),
{}
},
};
--
2.35.1
This is a note to let you know that I've just added the patch titled
usb: dwc2: drd: fix soft connect when gadget is unconfigured
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From 32fde84362c40961726a5c91f35ad37355ccc0c6 Mon Sep 17 00:00:00 2001
From: Fabrice Gasnier <fabrice.gasnier(a)foss.st.com>
Date: Wed, 16 Feb 2022 09:12:15 +0100
Subject: usb: dwc2: drd: fix soft connect when gadget is unconfigured
When the gadget driver hasn't been (yet) configured, and the cable is
connected to a HOST, the SFTDISCON gets cleared unconditionally, so the
HOST tries to enumerate it.
At the host side, this can result in a stuck USB port or worse. When
getting lucky, some dmesg can be observed at the host side:
new high-speed USB device number ...
device descriptor read/64, error -110
Fix it in drd, by checking the enabled flag before calling
dwc2_hsotg_core_connect(). It will be called later, once configured,
by the normal flow:
- udc_bind_to_driver
- usb_gadget_connect
- dwc2_hsotg_pullup
- dwc2_hsotg_core_connect
Fixes: 17f934024e84 ("usb: dwc2: override PHY input signals with usb role switch support")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Fabrice Gasnier <fabrice.gasnier(a)foss.st.com>
Link: https://lore.kernel.org/r/1644999135-13478-1-git-send-email-fabrice.gasnier…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc2/core.h | 2 ++
drivers/usb/dwc2/drd.c | 6 ++++--
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h
index 8a63da3ab39d..88c337bf564f 100644
--- a/drivers/usb/dwc2/core.h
+++ b/drivers/usb/dwc2/core.h
@@ -1418,6 +1418,7 @@ void dwc2_hsotg_core_connect(struct dwc2_hsotg *hsotg);
void dwc2_hsotg_disconnect(struct dwc2_hsotg *dwc2);
int dwc2_hsotg_set_test_mode(struct dwc2_hsotg *hsotg, int testmode);
#define dwc2_is_device_connected(hsotg) (hsotg->connected)
+#define dwc2_is_device_enabled(hsotg) (hsotg->enabled)
int dwc2_backup_device_registers(struct dwc2_hsotg *hsotg);
int dwc2_restore_device_registers(struct dwc2_hsotg *hsotg, int remote_wakeup);
int dwc2_gadget_enter_hibernation(struct dwc2_hsotg *hsotg);
@@ -1454,6 +1455,7 @@ static inline int dwc2_hsotg_set_test_mode(struct dwc2_hsotg *hsotg,
int testmode)
{ return 0; }
#define dwc2_is_device_connected(hsotg) (0)
+#define dwc2_is_device_enabled(hsotg) (0)
static inline int dwc2_backup_device_registers(struct dwc2_hsotg *hsotg)
{ return 0; }
static inline int dwc2_restore_device_registers(struct dwc2_hsotg *hsotg,
diff --git a/drivers/usb/dwc2/drd.c b/drivers/usb/dwc2/drd.c
index 1b39c4776369..d8d6493bc457 100644
--- a/drivers/usb/dwc2/drd.c
+++ b/drivers/usb/dwc2/drd.c
@@ -130,8 +130,10 @@ static int dwc2_drd_role_sw_set(struct usb_role_switch *sw, enum usb_role role)
already = dwc2_ovr_avalid(hsotg, true);
} else if (role == USB_ROLE_DEVICE) {
already = dwc2_ovr_bvalid(hsotg, true);
- /* This clear DCTL.SFTDISCON bit */
- dwc2_hsotg_core_connect(hsotg);
+ if (dwc2_is_device_enabled(hsotg)) {
+ /* This clear DCTL.SFTDISCON bit */
+ dwc2_hsotg_core_connect(hsotg);
+ }
} else {
if (dwc2_is_device_mode(hsotg)) {
if (!dwc2_ovr_bvalid(hsotg, false))
--
2.35.1
This is a note to let you know that I've just added the patch titled
tps6598x: clear int mask on probe failure
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
From aba2081e0a9c977396124aa6df93b55ed5912b19 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Tue, 15 Feb 2022 11:22:04 -0700
Subject: tps6598x: clear int mask on probe failure
The interrupt mask is enabled before any potential failure points in
the driver, which can leave a failure path where we exit with
interrupts enabled but the device not live. This causes an infinite
stream of interrupts on an Apple M1 Pro laptop on USB-C.
Add a failure label that's used post enabling interrupts, where we
mask them again before returning an error.
Suggested-by: Sven Peter <sven(a)svenpeter.dev>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Link: https://lore.kernel.org/r/e6b80669-20f3-06e7-9ed5-8951a9c6db6f@kernel.dk
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tipd/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/typec/tipd/core.c b/drivers/usb/typec/tipd/core.c
index 6d27a5b5e3ca..7ffcda94d323 100644
--- a/drivers/usb/typec/tipd/core.c
+++ b/drivers/usb/typec/tipd/core.c
@@ -761,12 +761,12 @@ static int tps6598x_probe(struct i2c_client *client)
ret = tps6598x_read32(tps, TPS_REG_STATUS, &status);
if (ret < 0)
- return ret;
+ goto err_clear_mask;
trace_tps6598x_status(status);
ret = tps6598x_read32(tps, TPS_REG_SYSTEM_CONF, &conf);
if (ret < 0)
- return ret;
+ goto err_clear_mask;
/*
* This fwnode has a "compatible" property, but is never populated as a
@@ -855,7 +855,8 @@ static int tps6598x_probe(struct i2c_client *client)
usb_role_switch_put(tps->role_sw);
err_fwnode_put:
fwnode_handle_put(fwnode);
-
+err_clear_mask:
+ tps6598x_write64(tps, TPS_REG_INT_MASK1, 0);
return ret;
}
--
2.35.1
Hi, all:
What do you think about deleting unsupported linux-x.x.y branches from the
stable/linux.git repository (perhaps by moving them into stable-historic.git).
These branches hold a lot of git objects that are now largely dead weight but
are still shipped around every time stable/linux.git is cloned.
According to my cursory tests, dropping unsupported branches would remove
1.1GB of obsolete objects, dropping the repository size from 3.5GB to 2.4GB
(sizes may vary in your tests due to compression, window size, etc). That's
1.1GB less traffic for each stable/linux.git clone.
Thoughts?
-Konstantin
Hello Dear Friend,
With due respect to your person and much sincerity of purpose I wish
to write to you today for our mutual benefit in this investment
transaction..
I'm Mrs. Aisha Al-Gaddafi, presently residing herein Oman the
Southeastern coast of the Arabian Peninsula in Western Asia, I'm a
single Mother and a widow with three Children. I am the only
biological Daughter of the late Libyan President (Late Colonel Muammar
Gaddafi). I have an investment funds worth Twenty Seven Million Five
Hundred Thousand United State Dollars ($27.500.000.00 ) and i need an
investment Manager/Partner and because of my Asylum Status I will
authorize you the ownership of the investment funds, However, I am
interested in you for investment project assistance in your country,
may be from there,. we can build a business relationship in the
nearest future.
I am willing to negotiate an investment/business profit sharing ratio
with you based on the future investment earning profits. If you are
willing to handle this project kindly reply urgently to enable me to
provide you more information about the investment funds..
Your urgent reply will be appreciated if only you are interested in
this investment project.
Best Regards
Mrs. Aisha Al-Gaddafi.
Dzień dobry,
dostrzegam możliwość współpracy z Państwa firmą.
Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%.
Czy są Państwo zainteresowani weryfikacją wstępnych propozycji?
Pozdrawiam,
Konrad Wiech
In https://lore.kernel.org/lkml/20211209150910.GA23668@axis.com/
Vincent's patch commented on, and worked around, a bug toggling
static_branch's, when a 2nd PRINTK-ish flag was added. The bug
results in a premature static_branch_disable when the 1st of 2 flags
was disabled.
The cited commit computed newflags, but then in the JUMP_LABEL block,
did not use that result, but used just one of the terms in it. Using
newflags instead made the code work properly.
This is Vincents test-case, reduced. It needs the 2nd flag to work
properly, but it's explanatory here.
pt_test() {
echo 5 > /sys/module/dynamic_debug/verbose
site="module tcp" # just one callsite
echo " $site =_ " > /proc/dynamic_debug/control # clear it
# A B ~A ~B
for flg in +T +p "-T #broke here" -p; do
echo " $site $flg " > /proc/dynamic_debug/control
done;
# A B ~B ~A
for flg in +T +p "-p #broke here" -T; do
echo " $site $flg " > /proc/dynamic_debug/control
done
}
pt_test
Fixes: 84da83a6ffc0 dyndbg: combine flags & mask into a struct, simplify with it
CC: vincent.whitchurch(a)axis.com
CC: stable(a)vger.kernel.org
Signed-off-by: Jim Cromie <jim.cromie(a)gmail.com>
---
lib/dynamic_debug.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index dd7f56af9aed..a56c1286ffa4 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -211,10 +211,11 @@ static int ddebug_change(const struct ddebug_query *query,
continue;
#ifdef CONFIG_JUMP_LABEL
if (dp->flags & _DPRINTK_FLAGS_PRINT) {
- if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
+ if (!(newflags & _DPRINTK_FLAGS_PRINT))
static_branch_disable(&dp->key.dd_key_true);
- } else if (modifiers->flags & _DPRINTK_FLAGS_PRINT)
+ } else if (newflags & _DPRINTK_FLAGS_PRINT) {
static_branch_enable(&dp->key.dd_key_true);
+ }
#endif
dp->flags = newflags;
v4pr_info("changed %s:%d [%s]%s =%s\n",
--
2.35.1
The patch titled
Subject: mm: don't skip swap entry even if zap_details specified
has been added to the -mm tree. Its filename is
mm-dont-skip-swap-entry-even-if-zap_details-specified.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-dont-skip-swap-entry-even-if-z…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-dont-skip-swap-entry-even-if-z…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm: don't skip swap entry even if zap_details specified
Patch series "mm: Rework zap ptes on swap entries", v4.
Patch 1 should fix a long standing bug for zap_pte_range() on zap_details
usage. The risk is we could have some swap entries skipped while we should
have zapped them.
Migration entries are not the major concern because file backed memory always
zap in the pattern that "first time without page lock, then re-zap with page
lock" hence the 2nd zap will always make sure all migration entries are already
recovered.
However there can be issues with real swap entries got skipped errornoously.
There's a reproducer provided in commit message of patch 1 for that.
Patch 2-4 are cleanups that are based on patch 1. After the whole patchset
applied, we should have a very clean view of zap_pte_range().
Only patch 1 needs to be backported to stable if necessary.
This patch (of 4):
The "details" pointer shouldn't be the token to decide whether we should
skip swap entries. For example, when the caller specified
details->zap_mapping==NULL, it means the caller wants to zap all the pages
(including COWed pages), then we need to look into swap entries because
there can be private COWed pages that was swapped out.
Skipping some swap entries when details is non-NULL may lead to wrongly
leaving some of the swap entries while we should have zapped them.
A reproducer of the problem:
===8<===
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <stdio.h>
#include <assert.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
int page_size;
int shmem_fd;
char *buffer;
void main(void)
{
int ret;
char val;
page_size = getpagesize();
shmem_fd = memfd_create("test", 0);
assert(shmem_fd >= 0);
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
buffer = mmap(NULL, page_size * 2, PROT_READ | PROT_WRITE,
MAP_PRIVATE, shmem_fd, 0);
assert(buffer != MAP_FAILED);
/* Write private page, swap it out */
buffer[page_size] = 1;
madvise(buffer, page_size * 2, MADV_PAGEOUT);
/* This should drop private buffer[page_size] already */
ret = ftruncate(shmem_fd, page_size);
assert(ret == 0);
/* Recover the size */
ret = ftruncate(shmem_fd, page_size * 2);
assert(ret == 0);
/* Re-read the data, it should be all zero */
val = buffer[page_size];
if (val == 0)
printf("Good\n");
else
printf("BUG\n");
}
===8<===
We don't need to touch up the pmd path, because pmd never had a issue with
swap entries. For example, shmem pmd migration will always be split into
pte level, and same to swapping on anonymous.
Add another helper should_zap_cows() so that we can also check whether we
should zap private mappings when there's no page pointer specified.
This patch drops that trick, so we handle swap ptes coherently. Meanwhile
we should do the same check upon migration entry, hwpoison entry and
genuine swap entries too. To be explicit, we should still remember to
keep the private entries if even_cows==false, and always zap them when
even_cows==true.
The issue seems to exist starting from the initial commit of git.
Link: https://lkml.kernel.org/r/20220216094810.60572-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20220216094810.60572-2-peterx@redhat.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: "Kirill A . Shutemov" <kirill(a)shutemov.name>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 45 ++++++++++++++++++++++++++++++++++++---------
1 file changed, 36 insertions(+), 9 deletions(-)
--- a/mm/memory.c~mm-dont-skip-swap-entry-even-if-zap_details-specified
+++ a/mm/memory.c
@@ -1313,6 +1313,17 @@ struct zap_details {
struct folio *single_folio; /* Locked folio to be unmapped */
};
+/* Whether we should zap all COWed (private) pages too */
+static inline bool should_zap_cows(struct zap_details *details)
+{
+ /* By default, zap all pages */
+ if (!details)
+ return true;
+
+ /* Or, we zap COWed pages only if the caller wants to */
+ return !details->zap_mapping;
+}
+
/*
* We set details->zap_mapping when we want to unmap shared but keep private
* pages. Return true if skip zapping this page, false otherwise.
@@ -1320,11 +1331,15 @@ struct zap_details {
static inline bool
zap_skip_check_mapping(struct zap_details *details, struct page *page)
{
- if (!details || !page)
+ /* If we can make a decision without *page.. */
+ if (should_zap_cows(details))
+ return false;
+
+ /* E.g. zero page */
+ if (!page)
return false;
- return details->zap_mapping &&
- (details->zap_mapping != page_rmapping(page));
+ return details->zap_mapping != page_rmapping(page);
}
static unsigned long zap_pte_range(struct mmu_gather *tlb,
@@ -1405,17 +1420,29 @@ again:
continue;
}
- /* If details->check_mapping, we leave swap entries. */
- if (unlikely(details))
- continue;
-
- if (!non_swap_entry(entry))
+ if (!non_swap_entry(entry)) {
+ /*
+ * If this is a genuine swap entry, then it must be an
+ * private anon page. If the caller wants to skip
+ * COWed pages, ignore it.
+ */
+ if (!should_zap_cows(details))
+ continue;
rss[MM_SWAPENTS]--;
- else if (is_migration_entry(entry)) {
+ } else if (is_migration_entry(entry)) {
struct page *page;
page = pfn_swap_entry_to_page(entry);
+ if (zap_skip_check_mapping(details, page))
+ continue;
rss[mm_counter(page)]--;
+ } else if (is_hwpoison_entry(entry)) {
+ /* If the caller wants to skip COWed pages, ignore it */
+ if (!should_zap_cows(details))
+ continue;
+ } else {
+ /* We should have covered all the swap entry types */
+ WARN_ON_ONCE(1);
}
if (unlikely(!free_swap_and_cache(entry)))
print_bad_pte(vma, addr, ptent, NULL);
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-fix-invalid-page-pointer-returned-with-foll_pin-gups.patch
mm-dont-skip-swap-entry-even-if-zap_details-specified.patch
mm-rename-zap_skip_check_mapping-to-should_zap_page.patch
mm-change-zap_detailszap_mapping-into-even_cows.patch
mm-rework-swap-handling-of-zap_pte_range.patch
When the cfg80211 option "bss_entries_limit" is set to 1, routine
cfg80211_bss_expire_oldest() fails by issuing repeated warnings.
The problem could also occur in the unlikely event that a scan only finds
a single SSID. In the first case, the warning is avoided by testing for
the special ivalue for the option before scanning for the oldest entry.
The second case is handled by converting the WARN_ON() to a WARN_ON_ONCE().
These changes fix commit 9853a55ef1bb ("cfg80211: limit scan results cache
size").
Reported-by: Maxim Klimenko Sergievich <klimenkomaximsergievich(a)gmail.com>
Fixes: 9853a55ef1bb ("cfg80211: limit scan results cache size").
Cc: stable(a)vger.kernel.org # 4.9+
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Signed-off-by: Larry Finger <Larry.Finger(a)lwfinger.net>
---
net/wireless/scan.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index 22e92be61938..d7aa38445fe6 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -463,6 +463,12 @@ static bool cfg80211_bss_expire_oldest(struct cfg80211_registered_device *rdev)
lockdep_assert_held(&rdev->bss_lock);
+ /* If the user has set cfg80211 option bss_entries_limit to 1,
+ * there cannot be an oldest BSS. Skip the scan.
+ */
+ if (unlikely(rdev->bss_entries == 1))
+ return false;
+
list_for_each_entry(bss, &rdev->bss_list, list) {
if (atomic_read(&bss->hold))
continue;
@@ -476,7 +482,7 @@ static bool cfg80211_bss_expire_oldest(struct cfg80211_registered_device *rdev)
oldest = bss;
}
- if (WARN_ON(!oldest))
+ if (WARN_ON_ONCE(!oldest))
return false;
/*
--
2.35.1
On 14.02.2022 18:38, Borislav Petkov wrote:
>On Wed, Feb 09, 2022 at 01:04:36PM -0800, Pawan Gupta wrote:
>> tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and
>> CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use
>> MSR_IA32_TSX_CTRL when supported.
>>
>> Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")
>> Reported-by: kernel test robot <lkp(a)intel.com>
>> Tested-by: Neelima Krishnan <neelima.krishnan(a)intel.com>
>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
>
><--- I'm assuming this needs to be
>
>Cc: <stable(a)vger.kernel.org>
>
>?
Yes, this needs to be backported to a few kernels that have the commit
293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts").
Once this is reviewed, I will send a separate email to stable@ with the
list of stable kernels.
>> @@ -106,13 +110,11 @@ void __init tsx_init(void)
>> int ret;
>>
>> /*
>> - * Hardware will always abort a TSX transaction if both CPUID bits
>> - * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
>> - * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
>> - * here.
>> + * Hardware will always abort a TSX transaction when CPUID
>> + * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
>> + * CPUID.RTM and CPUID.HLE bits. Clear them here.
>> */
>> - if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
>> - boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>> + if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
>
>So you test here X86_FEATURE_RTM_ALWAYS_ABORT and tsx_clear_cpuid()
>tests it again. Simplify I guess.
X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for
MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of
tsx_clear_cpuid() this condition is met, and test for
X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers
must also have this check, otherwise the MSR write will fault.
>> tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
>> tsx_clear_cpuid();
>> setup_clear_cpu_cap(X86_FEATURE_RTM);
>
>Also, here you clear X86_FEATURE_RTM and X86_FEATURE_HLE but the other
>caller of tsx_clear_cpuid() - init_intel() - doesn't clear those bits.
>Why?
Calling setup_clear_cpu_cap() by boot CPU is sufficient to clear these
bits for secondary CPUs. Moreover, init_intel() is also called during
cpu hotplug, clearing cached CPUID bits will not be safe after boot.
>IOW, can we concentrate the whole tsx_clear_cpuid() logic inside it and
>have callers only call it unconditionally. Then the decision whether
>to disable TSX and clear bits will happen all solely in that function
>instead of being spread around the boot code and being inconsistent.
There are certain cases where this will leave the system in an
inconsistent state, for example smt toggle after a late microcode update
that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we
unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of
the CPUs will report TSX feature and other half will not.
Thanks,
Pawan
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
When changing between SAGV vs. no SAGV on tgl+ we have to
update the use_sagv_wm flag for all the crtcs or else
an active pipe not already in the state will end up using
the wrong watermarks. That is especially bad when we end up
with the tighter non-SAGV watermarks with SAGV enabled.
Usually ends up in underruns.
Cc: stable(a)vger.kernel.org
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
Fixes: 7241c57d3140 ("drm/i915: Add TGL+ SAGV support")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/intel_pm.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 9f5e3c399f8d..bd32fd70e6b2 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -4007,6 +4007,17 @@ static int intel_compute_sagv_mask(struct intel_atomic_state *state)
return ret;
}
+ if (intel_can_enable_sagv(dev_priv, new_bw_state) !=
+ intel_can_enable_sagv(dev_priv, old_bw_state)) {
+ ret = intel_atomic_serialize_global_state(&new_bw_state->base);
+ if (ret)
+ return ret;
+ } else if (new_bw_state->pipe_sagv_reject != old_bw_state->pipe_sagv_reject) {
+ ret = intel_atomic_lock_global_state(&new_bw_state->base);
+ if (ret)
+ return ret;
+ }
+
for_each_new_intel_crtc_in_state(state, crtc,
new_crtc_state, i) {
struct skl_pipe_wm *pipe_wm = &new_crtc_state->wm.skl.optimal;
@@ -4022,17 +4033,6 @@ static int intel_compute_sagv_mask(struct intel_atomic_state *state)
intel_can_enable_sagv(dev_priv, new_bw_state);
}
- if (intel_can_enable_sagv(dev_priv, new_bw_state) !=
- intel_can_enable_sagv(dev_priv, old_bw_state)) {
- ret = intel_atomic_serialize_global_state(&new_bw_state->base);
- if (ret)
- return ret;
- } else if (new_bw_state->pipe_sagv_reject != old_bw_state->pipe_sagv_reject) {
- ret = intel_atomic_lock_global_state(&new_bw_state->base);
- if (ret)
- return ret;
- }
-
return 0;
}
--
2.34.1
Solar Designer <solar(a)openwall.com> wrote:
> I'm not aware of anyone actually running into this issue and reporting
> it. The systems that I personally know use suexec along with rlimits
> still run older/distro kernels, so would not yet be affected.
>
> So my mention was based on my understanding of how suexec works, and
> code review. Specifically, Apache httpd has the setting RLimitNPROC,
> which makes it set RLIMIT_NPROC:
>
> https://httpd.apache.org/docs/2.4/mod/core.html#rlimitnproc
>
> The above documentation for it includes:
>
> "This applies to processes forked from Apache httpd children servicing
> requests, not the Apache httpd children themselves. This includes CGI
> scripts and SSI exec commands, but not any processes forked from the
> Apache httpd parent, such as piped logs."
>
> In code, there are:
>
> ./modules/generators/mod_cgid.c: ( (cgid_req.limits.limit_nproc_set) && ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/generators/mod_cgi.c: ((rc = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC,
> ./modules/filters/mod_ext_filter.c: rv = apr_procattr_limit_set(procattr, APR_LIMIT_NPROC, conf->limit_nproc);
>
> For example, in mod_cgi.c this is in run_cgi_child().
>
> I think this means an httpd child sets RLIMIT_NPROC shortly before it
> execs suexec, which is a SUID root program. suexec then switches to the
> target user and execs the CGI script.
>
> Before 2863643fb8b9, the setuid() in suexec would set the flag, and the
> target user's process count would be checked against RLIMIT_NPROC on
> execve(). After 2863643fb8b9, the setuid() in suexec wouldn't set the
> flag because setuid() is (naturally) called when the process is still
> running as root (thus, has those limits bypass capabilities), and
> accordingly execve() would not check the target user's process count
> against RLIMIT_NPROC.
In commit 2863643fb8b9 ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") capable calls were added to set_user to
make it more consistent with fork. Unfortunately because of call site
differences those capables calls were checking the credentials of the
user before set*id() instead of after set*id().
This breaks enforcement of RLIMIT_NPROC for applications that set the
rlimit and then call set*id() while holding a full set of
capabilities. The capabilities are only changed in the new credential
in security_task_fix_setuid().
The code in apache suexec appears to follow this pattern.
Commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits
(RLIMIT_NPROC)") where this check was added describes the targes of this
capability check as:
2/ When a root-owned process (e.g. cgiwrap) sets up process limits and then
calls setuid, the setuid should fail if the user would then be running
more than rlim_cur[RLIMIT_NPROC] processes, but it doesn't. This patch
adds an appropriate test. With this patch, and per-user process limit
imposed in cgiwrap really works.
So the original use case also of this check also appears to match the broken
pattern.
Restore the enforcement of RLIMIT_NPROC by removing the bad capable
checks added in set_user. This unfortunately restores the
inconsistencies state the code has been in for the last 11 years, but
dealing with the inconsistencies looks like a larger problem.
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20210907213042.GA22626@openwall.com/
Link: https://lkml.kernel.org/r/20220212221412.GA29214@openwall.com
Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
kernel/sys.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index ecc4cf019242..8dd938a3d2bf 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -480,8 +480,7 @@ static int set_user(struct cred *new)
* failure to the execve() stage.
*/
if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
- new_user != INIT_USER &&
- !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
+ new_user != INIT_USER)
current->flags |= PF_NPROC_EXCEEDED;
else
current->flags &= ~PF_NPROC_EXCEEDED;
--
2.29.2
While examining is_ucounts_overlimit and reading the various messages
I realized that is_ucounts_overlimit fails to deal with counts that
may have wrapped.
Being wrapped should be a transitory state for counts and they should
never be wrapped for long, but it can happen so handle it.
Cc: stable(a)vger.kernel.org
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
kernel/ucount.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/ucount.c b/kernel/ucount.c
index 65b597431c86..06ea04d44685 100644
--- a/kernel/ucount.c
+++ b/kernel/ucount.c
@@ -350,7 +350,8 @@ bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsign
if (rlimit > LONG_MAX)
max = LONG_MAX;
for (iter = ucounts; iter; iter = iter->ns->ucounts) {
- if (get_ucounts_value(iter, type) > max)
+ long val = get_ucounts_value(iter, type);
+ if (val < 0 || val > max)
return true;
max = READ_ONCE(iter->ns->ucount_max[type]);
}
--
2.29.2
During set*id() which cred->ucounts to charge the the current process
to is not known until after set_cred_ucounts. So move the
RLIMIT_NPROC checking into a new helper flag_nproc_exceeded and call
flag_nproc_exceeded after set_cred_ucounts.
This is very much an arbitrary subset of the places where we currently
change the RLIMIT_NPROC accounting, designed to preserve the existing
logic.
Fixing the existing logic will be the subject of another series of
changes.
Cc: stable(a)vger.kernel.org
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
kernel/sys.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/kernel/sys.c b/kernel/sys.c
index 8dd938a3d2bf..97dc9e5d6bf9 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -472,6 +472,16 @@ static int set_user(struct cred *new)
if (!new_user)
return -EAGAIN;
+ free_uid(new->user);
+ new->user = new_user;
+ return 0;
+}
+
+static void flag_nproc_exceeded(struct cred *new)
+{
+ if (new->ucounts == current_ucounts())
+ return;
+
/*
* We don't fail in case of NPROC limit excess here because too many
* poorly written programs don't check set*uid() return code, assuming
@@ -480,14 +490,10 @@ static int set_user(struct cred *new)
* failure to the execve() stage.
*/
if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
- new_user != INIT_USER)
+ new->user != INIT_USER)
current->flags |= PF_NPROC_EXCEEDED;
else
current->flags &= ~PF_NPROC_EXCEEDED;
-
- free_uid(new->user);
- new->user = new_user;
- return 0;
}
/*
@@ -562,6 +568,7 @@ long __sys_setreuid(uid_t ruid, uid_t euid)
if (retval < 0)
goto error;
+ flag_nproc_exceeded(new);
return commit_creds(new);
error:
@@ -624,6 +631,7 @@ long __sys_setuid(uid_t uid)
if (retval < 0)
goto error;
+ flag_nproc_exceeded(new);
return commit_creds(new);
error:
@@ -703,6 +711,7 @@ long __sys_setresuid(uid_t ruid, uid_t euid, uid_t suid)
if (retval < 0)
goto error;
+ flag_nproc_exceeded(new);
return commit_creds(new);
error:
--
2.29.2
Michal Koutný <mkoutny(a)suse.com> wrote:
> Tasks are associated to multiple users at once. Historically and as per
> setrlimit(2) RLIMIT_NPROC is enforce based on real user ID.
>
> The commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
> made the accounting structure "indexed" by euid and hence potentially
> account tasks differently.
>
> The effective user ID may be different e.g. for setuid programs but
> those are exec'd into already existing task (i.e. below limit), so
> different accounting is moot.
>
> Some special setresuid(2) users may notice the difference, justifying
> this fix.
I looked at cred->ucount and it is only used for rlimit operations
that were previously stored in cred->user. Making the fact
cred->ucount can refer to a different user from cred->user a bug,
affecting all uses of cred->ulimit not just RLIMIT_NPROC.
Fix set_cred_ucounts to always use the real uid not the effective uid.
Further simplify set_cred_ucounts by noticing that set_cred_ucounts
somehow retained a draft version of the check to see if alloc_ucounts
was needed that checks the new->user and new->user_ns against the
current_real_cred(). Remove that draft version of the check.
All that matters for setting the cred->ucounts are the user_ns and uid
fields in the cred.
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-4-mkoutny@suse.com
Reported-by: Michal Koutný <mkoutny(a)suse.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
kernel/cred.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/kernel/cred.c b/kernel/cred.c
index 473d17c431f3..933155c96922 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -665,21 +665,16 @@ EXPORT_SYMBOL(cred_fscmp);
int set_cred_ucounts(struct cred *new)
{
- struct task_struct *task = current;
- const struct cred *old = task->real_cred;
struct ucounts *new_ucounts, *old_ucounts = new->ucounts;
- if (new->user == old->user && new->user_ns == old->user_ns)
- return 0;
-
/*
* This optimization is needed because alloc_ucounts() uses locks
* for table lookups.
*/
- if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->euid))
+ if (old_ucounts->ns == new->user_ns && uid_eq(old_ucounts->uid, new->uid))
return 0;
- if (!(new_ucounts = alloc_ucounts(new->user_ns, new->euid)))
+ if (!(new_ucounts = alloc_ucounts(new->user_ns, new->uid)))
return -EAGAIN;
new->ucounts = new_ucounts;
--
2.29.2
Michal Koutný <mkoutny(a)suse.com> wrote:
> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 21d1c5e386bc ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.
This can be fixed either by fixing the test or by moving the increment
to be before the test. Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.
In the case of CLONE_NEWUSER the ucounts in the task_cred changes.
The function is_ucounts_overlimit needs to use the final version of
the ucounts for the new process. Which means moving the
is_ucounts_overlimit test after copy_creds is necessary.
Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts. The change of the test in
fork was bad because it was before the increment. The test in
set_user was wrong and the change to ucounts fixed it. So this
fix only restores the old behavior in one lcation not two.
Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com
Cc: stable(a)vger.kernel.org
Reported-by: Michal Koutný <mkoutny(a)suse.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
kernel/fork.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..17d8a8c85e3b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2021,18 +2021,18 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_PROVE_LOCKING
DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
#endif
+ retval = copy_creds(p, clone_flags);
+ if (retval < 0)
+ goto bad_fork_free;
+
retval = -EAGAIN;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
if (p->real_cred->user != INIT_USER &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
- goto bad_fork_free;
+ goto bad_fork_cleanup_count;
}
current->flags &= ~PF_NPROC_EXCEEDED;
- retval = copy_creds(p, clone_flags);
- if (retval < 0)
- goto bad_fork_free;
-
/*
* If multiple threads are within copy_process(), then this check
* triggers too late. This doesn't hurt, the check is only there
--
2.29.2
In [1] Michal Koutný <mkoutny(a)suse.com> reported that it does not make
sense to unconditionally exempt the INIT_USER during fork and exec
from RLIMIT_NPROC and then to impose a limit on that same user with
is_ucounts_overlimit. So I looked into why the exeception was added.
commit 909cc4ae86f3 ("[PATCH] Fix two bugs with process limits (RLIMIT_NPROC)") says:
> If a setuid process swaps it's real and effective uids and then
> forks, the fork fails if the new realuid has more processes than
> the original process was limited to. This is particularly a
> problem if a user with a process limit (e.g. 256) runs a
> setuid-root program which does setuid() + fork() (e.g. lprng) while
> root already has more than 256 process (which is quite possible).
>
> The root problem here is that a limit which should be a per-user
> limit is being implemented as a per-process limit with per-process
> (e.g. CAP_SYS_RESOURCE) controls. Being a per-user limit, it
> should be that the root-user can over-ride it, not just some
> process with CAP_SYS_RESOURCE.
>
> This patch adds a test to ignore process limits if the real user is root.
With the moving of the limits from per-user to per-user-per-user_ns it
is clear that testing a user_struct is no longer the proper test and
the test should be a test against the ucounts struct to match the
rest of the logic change that was made.
With RLIMIT_NPROC not being enforced for the global root user anywhere
else should it be enforced in is_ucounts_overlimit for a user
namespace created by the global root user?
Since this is limited to just the global root user, and RLIMIT_NPROC
is already ignored for that user I am going to vote no.
This change does imply that it becomes possible to limit all users in
a user namespace but to not enforce the rlimits on the root user or
anyone with CAP_SYS_ADMIN and CAP_SYS_RESOURCE in the user namespace.
It is not clear to me why any of those exceptions exist so I figure
we should until this is actually a problem for someone before
we relax the permission checks here.
Cc: stable(a)vger.kernel.org
Reported-by: Michal Koutný <mkoutny(a)suse.com>
[1] https://lkml.kernel.org/r/20220207121800.5079-5-mkoutny@suse.com
History-Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
fs/exec.c | 2 +-
kernel/fork.c | 2 +-
kernel/user_namespace.c | 2 ++
3 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 1e7f757cbc2c..01c8c7bae9b4 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1881,7 +1881,7 @@ static int do_execveat_common(int fd, struct filename *filename,
*/
if ((current->flags & PF_NPROC_CHECK) &&
is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
- (current_user() != INIT_USER) &&
+ (current_ucounts() != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
retval = -EAGAIN;
goto out_ret;
diff --git a/kernel/fork.c b/kernel/fork.c
index 2b6a28a86325..6f62d37f3650 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2027,7 +2027,7 @@ static __latent_entropy struct task_struct *copy_process(
retval = -EAGAIN;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
- if (p->real_cred->user != INIT_USER &&
+ if ((task_ucounts(p) != &init_ucounts) &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_cleanup_count;
}
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 6b2e3ca7ee99..f0c04073403d 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -123,6 +123,8 @@ int create_user_ns(struct cred *new)
ns->ucount_max[i] = INT_MAX;
}
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC));
+ if (new->ucounts == &init_ucounts)
+ set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_NPROC, RLIMIT_INFINITY);
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MSGQUEUE, rlimit(RLIMIT_MSGQUEUE));
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_SIGPENDING, rlimit(RLIMIT_SIGPENDING));
set_rlimit_ucount_max(ns, UCOUNT_RLIMIT_MEMLOCK, rlimit(RLIMIT_MEMLOCK));
--
2.29.2
Michal Koutný <mkoutny(a)suse.com> wrote:
> The check is currently against the current->cred but since those are
> going to change and we want to check RLIMIT_NPROC condition after the
> switch, supply the capability check with the new cred.
> But since we're checking new_user being INIT_USER any new cred's
> capability-based allowance may be redundant when the check fails and the
> alternative solution would be revert of the commit 2863643fb8b9
> ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
As of commit 2863643fb8b9 ("set_user: add capability check when
rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
should check RLIMIT_NPROC is buggy, as it tests the capabilites from
before the credential change and not aftwards.
As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
ucounts") examining the rlimit is buggy as cred->ucounts has not yet
been properly set in the new credential.
Make the code correct and more robust moving the test to see if
execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
of the rest of the logic into execve() itself.
As the flag only indicateds that RLIMIT_NPROC should be checked
in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@suse.com
Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@suse.com
Reported-by: Michal Koutný <mkoutny(a)suse.com>
Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
fs/exec.c | 15 ++++++++-------
include/linux/sched.h | 2 +-
kernel/cred.c | 13 +++++++++----
kernel/fork.c | 2 +-
kernel/sys.c | 14 --------------
5 files changed, 19 insertions(+), 27 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 79f2c9483302..1e7f757cbc2c 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1875,20 +1875,21 @@ static int do_execveat_common(int fd, struct filename *filename,
return PTR_ERR(filename);
/*
- * We move the actual failure in case of RLIMIT_NPROC excess from
- * set*uid() to execve() because too many poorly written programs
- * don't check setuid() return code. Here we additionally recheck
- * whether NPROC limit is still exceeded.
+ * After calling set*uid() is RLIMT_NPROC exceeded?
+ * This can not be checked in set*uid() because too many programs don't
+ * check the setuid() return code.
*/
- if ((current->flags & PF_NPROC_EXCEEDED) &&
- is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
+ if ((current->flags & PF_NPROC_CHECK) &&
+ is_ucounts_overlimit(current_ucounts(), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
+ (current_user() != INIT_USER) &&
+ !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) {
retval = -EAGAIN;
goto out_ret;
}
/* We're below the limit (still or again), so we don't want to make
* further execve() calls fail. */
- current->flags &= ~PF_NPROC_EXCEEDED;
+ current->flags &= ~PF_NPROC_CHECK;
bprm = alloc_bprm(fd, filename);
if (IS_ERR(bprm)) {
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 75ba8aa60248..6605a262a6be 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1678,7 +1678,7 @@ extern struct pid *cad_pid;
#define PF_DUMPCORE 0x00000200 /* Dumped core */
#define PF_SIGNALED 0x00000400 /* Killed by a signal */
#define PF_MEMALLOC 0x00000800 /* Allocating memory */
-#define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */
+#define PF_NPROC_CHECK 0x00001000 /* Check in execve if RLIMIT_NPROC was exceeded */
#define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */
#define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */
#define PF_FROZEN 0x00010000 /* Frozen for system suspend */
diff --git a/kernel/cred.c b/kernel/cred.c
index 933155c96922..229cff081167 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -490,13 +490,18 @@ int commit_creds(struct cred *new)
if (!gid_eq(new->fsgid, old->fsgid))
key_fsgid_changed(new);
- /* do it
- * RLIMIT_NPROC limits on user->processes have already been checked
- * in set_user().
+ /*
+ * Remember if the NPROC limit may be exceeded. The set*uid() functions
+ * can not fail if the NPROC limit is exceeded as too many programs
+ * don't check the return code. Instead enforce the NPROC limit for
+ * programs doing set*uid()+execve by harmlessly defering the failure
+ * to the execve() stage.
*/
alter_cred_subscribers(new, 2);
- if (new->user != old->user || new->user_ns != old->user_ns)
+ if (new->user != old->user || new->user_ns != old->user_ns) {
inc_rlimit_ucounts(new->ucounts, UCOUNT_RLIMIT_NPROC, 1);
+ task->flags |= PF_NPROC_CHECK;
+ }
rcu_assign_pointer(task->real_cred, new);
rcu_assign_pointer(task->cred, new);
if (new->user != old->user || new->user_ns != old->user_ns)
diff --git a/kernel/fork.c b/kernel/fork.c
index 17d8a8c85e3b..2b6a28a86325 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2031,7 +2031,7 @@ static __latent_entropy struct task_struct *copy_process(
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
goto bad_fork_cleanup_count;
}
- current->flags &= ~PF_NPROC_EXCEEDED;
+ current->flags &= ~PF_NPROC_CHECK;
/*
* If multiple threads are within copy_process(), then this check
diff --git a/kernel/sys.c b/kernel/sys.c
index ecc4cf019242..b1ed21d79f3b 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -472,20 +472,6 @@ static int set_user(struct cred *new)
if (!new_user)
return -EAGAIN;
- /*
- * We don't fail in case of NPROC limit excess here because too many
- * poorly written programs don't check set*uid() return code, assuming
- * it never fails if called by root. We may still enforce NPROC limit
- * for programs doing set*uid()+execve() by harmlessly deferring the
- * failure to the execve() stage.
- */
- if (is_ucounts_overlimit(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) &&
- new_user != INIT_USER &&
- !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
- current->flags |= PF_NPROC_EXCEEDED;
- else
- current->flags &= ~PF_NPROC_EXCEEDED;
-
free_uid(new->user);
new->user = new_user;
return 0;
--
2.29.2
Michal Koutný <mkoutny(a)suse.com> wrote:
> It was reported that v5.14 behaves differently when enforcing
> RLIMIT_NPROC limit, namely, it allows one more task than previously.
> This is consequence of the commit 21d1c5e386bc ("Reimplement
> RLIMIT_NPROC on top of ucounts") that missed the sharpness of
> equality in the forking path.
This can be fixed either by fixing the test or by moving the increment
to be before the test. Fix it my moving copy_creds which contains
the increment before is_ucounts_overlimit.
This is necessary so that in the case of CLONE_NEWUSER the new credential
with the new ucounts is available of is_ucounts_overlimit to test.
Both the test in fork and the test in set_user were semantically
changed when the code moved to ucounts. The change of the test in
fork was bad because it was before the increment. The test in
set_user was wrong and the change to ucounts fixed it. So this
fix is only restore the old behavior in one lcatio not two.
Link: https://lkml.kernel.org/r/20220204181144.24462-1-mkoutny@suse.com
Cc: stable(a)vger.kernel.org
Reported-by: Michal Koutný <mkoutny(a)suse.com>
Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
---
kernel/fork.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index d75a528f7b21..17d8a8c85e3b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2021,18 +2021,18 @@ static __latent_entropy struct task_struct *copy_process(
#ifdef CONFIG_PROVE_LOCKING
DEBUG_LOCKS_WARN_ON(!p->softirqs_enabled);
#endif
+ retval = copy_creds(p, clone_flags);
+ if (retval < 0)
+ goto bad_fork_free;
+
retval = -EAGAIN;
if (is_ucounts_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
if (p->real_cred->user != INIT_USER &&
!capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
- goto bad_fork_free;
+ goto bad_fork_cleanup_count;
}
current->flags &= ~PF_NPROC_EXCEEDED;
- retval = copy_creds(p, clone_flags);
- if (retval < 0)
- goto bad_fork_free;
-
/*
* If multiple threads are within copy_process(), then this check
* triggers too late. This doesn't hurt, the check is only there
--
2.29.2
On 2021-07-30 12:35, Anders Roxell wrote:
> From: Robin Murphy <robin.murphy(a)arm.com>
>
>> Now that PCI inbound window restrictions are handled generically between
>> the of_pci resource parsing and the IOMMU layer, and described in the
>> Juno DT, we can finally enable the PCIe SMMU without the risk of DMA
>> mappings inadvertently allocating unusable addresses.
>>
>> Similarly, the relevant support for IOMMU mappings for peripheral
>> transfers has been hooked up in the pl330 driver for ages, so we can
>> happily enable the DMA SMMU without that breaking anything either.
>>
>> Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
>
> When we build a kernel with 64k page size and run the ltp syscalls we
> sporadically see a kernel crash while doing a mkfs on a connected SATA
> drive. This is happening every third test run on any juno-r2 device in
> the lab with the same kernel image (stable-rc 5.13.y, mainline and next)
> with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew
through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing
would be knowing what DMA address it was trying to use that went wrong,
but IOMMU tracepoints and/or dma-debug are going to generate an crazy
amount of data to sift through and try to correlate - having done it
before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first
entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
Robin.
> Here is a snippet of the boot log [1]:
>
> + mkfs -t ext4 /dev/disk/by-id/ata-SanDisk_SDSSDA120G_165192443611
> mke2fs 1.43.8 (1-Jan-2018)
> Discarding device blocks: 4096/29305200
> [ 55.344291] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
> frozen
> [ 55.351423] ata1.00: irq_stat 0x00020002, failed to transmit command
> FIS
> [ 55.358205] ata1.00: failed command: DATA SET MANAGEMENT
> [ 55.363561] ata1.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 12
> dma 512 out
> [ 55.363561] res ec/ff:00:00:00:00/00:00:00:00:ec/00 Emask
> 0x12 (ATA bus error)
> [ 55.378955] ata1.00: status: { Busy }
> [ 55.382658] ata1.00: error: { ICRC UNC AMNF IDNF ABRT }
> [ 55.387947] ata1: hard resetting link
> [ 55.391653] ata1: controller in dubious state, performing PORT_RST
> [ 57.588447] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0)
> [ 57.613471] ata1.00: configured for UDMA/100
> [ 57.617866] ata1.00: device reported invalid CHS sector 0
> [ 57.623397] ata1: EH complete
>
>
> When we revert this patch we don't see any issue.
>
> Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
>
> Cheers,
> Anders
> [1]
> https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.13.y/build/v5.13…
>
From: Paul Davey <paul.davey(a)alliedtelesis.co.nz>
On big endian architectures the mhi debugfs files which report pm state
give "Invalid State" for all states. This is caused by using
find_last_bit which takes an unsigned long* while the state is passed in
as an enum mhi_pm_state which will be of int size.
Fix by using __fls to pass the value of state instead of find_last_bit.
Fixes: a6e2e3522f29 ("bus: mhi: core: Add support for PM state transitions")
Signed-off-by: Paul Davey <paul.davey(a)alliedtelesis.co.nz>
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
Reviewed-by: Hemant Kumar <hemantk(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/bus/mhi/core/init.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
index 046f407dc5d6..af484b03558a 100644
--- a/drivers/bus/mhi/core/init.c
+++ b/drivers/bus/mhi/core/init.c
@@ -79,10 +79,12 @@ static const char * const mhi_pm_state_str[] = {
const char *to_mhi_pm_state_str(enum mhi_pm_state state)
{
- unsigned long pm_state = state;
- int index = find_last_bit(&pm_state, 32);
+ int index;
- if (index >= ARRAY_SIZE(mhi_pm_state_str))
+ if (state)
+ index = __fls(state);
+
+ if (!state || index >= ARRAY_SIZE(mhi_pm_state_str))
return "Invalid State";
return mhi_pm_state_str[index];
--
2.25.1
Greetings,
I am Miss Marilyne N'Goran. Please can you help and assist me invest my inheritance in your country? and to help me to come over to your country and start a new life and continue my education. Please can you help me?. When I hear from you I will give you every detail. Best regards, Miss Marilyne