This is the start of the stable review cycle for the 5.15.115 release. There are 42 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 03 Jun 2023 13:19:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.115-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.115-rc1
Paul Blakey paulb@nvidia.com netfilter: ctnetlink: Support offloaded conntrack entry deletion
Nicolas Dichtel nicolas.dichtel@6wind.com ipv{4,6}/raw: fix output xfrm lookup wrt protocol
Carlos Llamas cmllamas@google.com binder: fix UAF of alloc->vma in race with munmap()
Carlos Llamas cmllamas@google.com binder: add lockless binder_alloc_(set|get)_vma()
Carlos Llamas cmllamas@google.com Revert "android: binder: stop saving a pointer to the VMA"
Carlos Llamas cmllamas@google.com Revert "binder_alloc: add missing mmap_lock calls when using the VMA"
Ruihan Li lrh2000@pku.edu.cn bluetooth: Add cmd validity checks at the start of hci_sock_ioctl()
Sebastian Andrzej Siewior bigeasy@linutronix.de xdp: xdp_mem_allocator can be NULL in trace_mem_connect().
Jiaxun Yang jiaxun.yang@flygoat.com irqchip/mips-gic: Don't touch vl_map if a local interrupt is not routable
Yunsheng Lin linyunsheng@huawei.com page_pool: fix inconsistency for page_pool_ring_[un]lock()
Qingfang DENG qingfang.deng@siflower.com.cn net: page_pool: use in_softirq() instead
Toke Høiland-Jørgensen toke@redhat.com xdp: Allow registering memory model without rxq reference
Rahul Rameshbabu rrameshbabu@nvidia.com net/mlx5e: Fix SQ wake logic in ptp napi_poll context
Jiaxun Yang jiaxun.yang@flygoat.com irqchip/mips-gic: Use raw spinlock for gic_lock
Marc Zyngier maz@kernel.org irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()
Carlos Llamas cmllamas@google.com binder: fix UAF caused by faulty buffer cleanup
Hangbin Liu liuhangbin@gmail.com bonding: fix send_peer_notif overflow
Hangbin Liu liuhangbin@gmail.com Bonding: add arp_missed_max option
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: fix network connectivity with multiple CPU ports
Daniel Golle daniel@makrotopia.org net: dsa: mt7530: split-off common parts from mt7531_setup
Frank Wunderlich frank-w@public-files.de net: dsa: mt7530: rework mt753[01]_setup
Vladimir Oltean vladimir.oltean@nxp.com net: dsa: introduce helpers for iterating through ports using dp
Claudio Imbrenda imbrenda@linux.ibm.com KVM: s390: fix race in gmap_make_secure()
Claudio Imbrenda imbrenda@linux.ibm.com KVM: s390: pv: add export before import
Claudiu Beznea claudiu.beznea@microchip.com dmaengine: at_xdmac: restore the content of grws register
Claudiu Beznea claudiu.beznea@microchip.com dmaengine: at_xdmac: do not resume channels paused by consumers
Claudiu Beznea claudiu.beznea@microchip.com dmaengine: at_xdmac: disable/enable clock directly on suspend/resume
Tudor Ambarus tudor.ambarus@microchip.com dmaengine: at_xdmac: Remove a level of indentation in at_xdmac_tasklet()
Tudor Ambarus tudor.ambarus@microchip.com dmaengine: at_xdmac: Move the free desc to the tail of the desc list
David Epping david.epping@missinglinkelectronics.com net: phy: mscc: enable VSC8501/2 RGMII RX clock
Steve Wahl steve.wahl@hpe.com platform/x86: ISST: Remove 8 socket limit
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com platform/x86: ISST: PUNIT device mapping with Sub-NUMA clustering
Shay Drory shayd@nvidia.com net/mlx5: Devcom, serialize devcom registration
Vlad Buslov vladbu@nvidia.com net/mlx5e: Fix deadlock in tc route query code
Mark Bloch mbloch@nvidia.com net/mlx5: devcom only supports 2 ports
Anton Protopopov aspsk@isovalent.com bpf: fix a memory leak in the LRU and LRU_PERCPU hash maps
Hans de Goede hdegoede@redhat.com power: supply: bq24190: Call power_supply_changed() after updating input current
Hans de Goede hdegoede@redhat.com power: supply: core: Refactor power_supply_set_input_current_limit_from_supplier()
Hans de Goede hdegoede@redhat.com power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize
Hans de Goede hdegoede@redhat.com power: supply: bq27xxx: Ensure power_supply_changed() is called on current sign changes
Hans de Goede hdegoede@redhat.com power: supply: bq27xxx: Move bq27xxx_battery_update() down
Sicelo A. Mhlongo absicsz@gmail.com power: supply: bq27xxx: expose battery data when CI=1
-------------
Diffstat:
Documentation/networking/bonding.rst | 11 ++ Makefile | 4 +- arch/s390/kernel/uv.c | 56 ++++--- drivers/android/binder.c | 26 +++- drivers/android/binder_alloc.c | 64 +++----- drivers/android/binder_alloc.h | 2 +- drivers/android/binder_alloc_selftest.c | 2 +- drivers/dma/at_xdmac.c | 145 +++++++++++------ drivers/irqchip/irq-mips-gic.c | 65 +++++--- drivers/net/bonding/bond_main.c | 17 +- drivers/net/bonding/bond_netlink.c | 22 ++- drivers/net/bonding/bond_options.c | 36 ++++- drivers/net/bonding/bond_procfs.c | 2 + drivers/net/bonding/bond_sysfs.c | 13 ++ drivers/net/dsa/mt7530.c | 124 +++++++++------ drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c | 2 + drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 19 +-- drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 19 ++- .../net/ethernet/mellanox/mlx5/core/lib/devcom.c | 81 +++++++--- .../net/ethernet/mellanox/mlx5/core/lib/devcom.h | 3 + drivers/net/phy/mscc/mscc.h | 1 + drivers/net/phy/mscc/mscc_main.c | 54 +++---- .../x86/intel/speed_select_if/isst_if_common.c | 49 ++++-- drivers/power/supply/bq24190_charger.c | 13 +- drivers/power/supply/bq27xxx_battery.c | 171 +++++++++++---------- drivers/power/supply/power_supply_core.c | 57 +++---- include/linux/power/bq27xxx_battery.h | 3 + include/linux/power_supply.h | 5 +- include/net/bond_options.h | 1 + include/net/bonding.h | 3 +- include/net/dsa.h | 28 ++++ include/net/ip.h | 2 + include/net/page_pool.h | 18 --- include/net/xdp.h | 3 + include/uapi/linux/if_link.h | 1 + include/uapi/linux/in.h | 2 + kernel/bpf/hashtab.c | 6 +- net/bluetooth/hci_sock.c | 28 ++++ net/core/page_pool.c | 34 +++- net/core/xdp.c | 93 +++++++---- net/ipv4/ip_sockglue.c | 12 +- net/ipv4/raw.c | 5 +- net/ipv6/raw.c | 3 +- net/netfilter/nf_conntrack_netlink.c | 8 - tools/include/uapi/linux/if_link.h | 1 + 46 files changed, 861 insertions(+), 455 deletions(-)
From: Sicelo A. Mhlongo absicsz@gmail.com
[ Upstream commit 68fdbe090c362e8be23890a7333d156e18c27781 ]
When the Capacity Inaccurate flag is set, the chip still provides data about the battery, albeit inaccurate. Instead of discarding capacity values for CI=1, expose the stale data and use the POWER_SUPPLY_HEALTH_CALIBRATION_REQUIRED property to indicate that the values should be used with care.
Reviewed-by: Pali Rohár pali@kernel.org Signed-off-by: Sicelo A. Mhlongo absicsz@gmail.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Stable-dep-of: ff4c4a2a4437 ("power: supply: bq27xxx: Move bq27xxx_battery_update() down") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq27xxx_battery.c | 60 ++++++++++++-------------- 1 file changed, 27 insertions(+), 33 deletions(-)
diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c index 7334a8b8007e5..66199f1c1c5d1 100644 --- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -1572,14 +1572,6 @@ static int bq27xxx_battery_read_charge(struct bq27xxx_device_info *di, u8 reg) */ static inline int bq27xxx_battery_read_nac(struct bq27xxx_device_info *di) { - int flags; - - if (di->opts & BQ27XXX_O_ZERO) { - flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, true); - if (flags >= 0 && (flags & BQ27000_FLAG_CI)) - return -ENODATA; - } - return bq27xxx_battery_read_charge(di, BQ27XXX_REG_NAC); }
@@ -1742,6 +1734,18 @@ static bool bq27xxx_battery_dead(struct bq27xxx_device_info *di, u16 flags) return flags & (BQ27XXX_FLAG_SOC1 | BQ27XXX_FLAG_SOCF); }
+/* + * Returns true if reported battery capacity is inaccurate + */ +static bool bq27xxx_battery_capacity_inaccurate(struct bq27xxx_device_info *di, + u16 flags) +{ + if (di->opts & BQ27XXX_O_HAS_CI) + return (flags & BQ27000_FLAG_CI); + else + return false; +} + static int bq27xxx_battery_read_health(struct bq27xxx_device_info *di) { /* Unlikely but important to return first */ @@ -1751,6 +1755,8 @@ static int bq27xxx_battery_read_health(struct bq27xxx_device_info *di) return POWER_SUPPLY_HEALTH_COLD; if (unlikely(bq27xxx_battery_dead(di, di->cache.flags))) return POWER_SUPPLY_HEALTH_DEAD; + if (unlikely(bq27xxx_battery_capacity_inaccurate(di, di->cache.flags))) + return POWER_SUPPLY_HEALTH_CALIBRATION_REQUIRED;
return POWER_SUPPLY_HEALTH_GOOD; } @@ -1758,7 +1764,6 @@ static int bq27xxx_battery_read_health(struct bq27xxx_device_info *di) static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di) { struct bq27xxx_reg_cache cache = {0, }; - bool has_ci_flag = di->opts & BQ27XXX_O_HAS_CI; bool has_singe_flag = di->opts & BQ27XXX_O_ZERO;
cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag); @@ -1766,30 +1771,19 @@ static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di) cache.flags = -1; /* read error */ if (cache.flags >= 0) { cache.temperature = bq27xxx_battery_read_temperature(di); - if (has_ci_flag && (cache.flags & BQ27000_FLAG_CI)) { - dev_info_once(di->dev, "battery is not calibrated! ignoring capacity values\n"); - cache.capacity = -ENODATA; - cache.energy = -ENODATA; - cache.time_to_empty = -ENODATA; - cache.time_to_empty_avg = -ENODATA; - cache.time_to_full = -ENODATA; - cache.charge_full = -ENODATA; - cache.health = -ENODATA; - } else { - if (di->regs[BQ27XXX_REG_TTE] != INVALID_REG_ADDR) - cache.time_to_empty = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTE); - if (di->regs[BQ27XXX_REG_TTECP] != INVALID_REG_ADDR) - cache.time_to_empty_avg = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTECP); - if (di->regs[BQ27XXX_REG_TTF] != INVALID_REG_ADDR) - cache.time_to_full = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTF); - - cache.charge_full = bq27xxx_battery_read_fcc(di); - cache.capacity = bq27xxx_battery_read_soc(di); - if (di->regs[BQ27XXX_REG_AE] != INVALID_REG_ADDR) - cache.energy = bq27xxx_battery_read_energy(di); - di->cache.flags = cache.flags; - cache.health = bq27xxx_battery_read_health(di); - } + if (di->regs[BQ27XXX_REG_TTE] != INVALID_REG_ADDR) + cache.time_to_empty = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTE); + if (di->regs[BQ27XXX_REG_TTECP] != INVALID_REG_ADDR) + cache.time_to_empty_avg = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTECP); + if (di->regs[BQ27XXX_REG_TTF] != INVALID_REG_ADDR) + cache.time_to_full = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTF); + + cache.charge_full = bq27xxx_battery_read_fcc(di); + cache.capacity = bq27xxx_battery_read_soc(di); + if (di->regs[BQ27XXX_REG_AE] != INVALID_REG_ADDR) + cache.energy = bq27xxx_battery_read_energy(di); + di->cache.flags = cache.flags; + cache.health = bq27xxx_battery_read_health(di); if (di->regs[BQ27XXX_REG_CYCT] != INVALID_REG_ADDR) cache.cycle_count = bq27xxx_battery_read_cyct(di);
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit ff4c4a2a4437a6d03787c7aafb2617f20c3ef45f ]
Move the bq27xxx_battery_update() functions to below the bq27xxx_battery_current_and_status() function.
This is just moving a block of text, no functional changes.
This is a preparation patch for making bq27xxx_battery_update() check the status and have it call power_supply_changed() on status changes.
Fixes: 297a533b3e62 ("bq27x00: Cache battery registers") Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq27xxx_battery.c | 122 ++++++++++++------------- 1 file changed, 61 insertions(+), 61 deletions(-)
diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c index 66199f1c1c5d1..2fca4b9403ff9 100644 --- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -1761,67 +1761,6 @@ static int bq27xxx_battery_read_health(struct bq27xxx_device_info *di) return POWER_SUPPLY_HEALTH_GOOD; }
-static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di) -{ - struct bq27xxx_reg_cache cache = {0, }; - bool has_singe_flag = di->opts & BQ27XXX_O_ZERO; - - cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag); - if ((cache.flags & 0xff) == 0xff) - cache.flags = -1; /* read error */ - if (cache.flags >= 0) { - cache.temperature = bq27xxx_battery_read_temperature(di); - if (di->regs[BQ27XXX_REG_TTE] != INVALID_REG_ADDR) - cache.time_to_empty = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTE); - if (di->regs[BQ27XXX_REG_TTECP] != INVALID_REG_ADDR) - cache.time_to_empty_avg = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTECP); - if (di->regs[BQ27XXX_REG_TTF] != INVALID_REG_ADDR) - cache.time_to_full = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTF); - - cache.charge_full = bq27xxx_battery_read_fcc(di); - cache.capacity = bq27xxx_battery_read_soc(di); - if (di->regs[BQ27XXX_REG_AE] != INVALID_REG_ADDR) - cache.energy = bq27xxx_battery_read_energy(di); - di->cache.flags = cache.flags; - cache.health = bq27xxx_battery_read_health(di); - if (di->regs[BQ27XXX_REG_CYCT] != INVALID_REG_ADDR) - cache.cycle_count = bq27xxx_battery_read_cyct(di); - - /* We only have to read charge design full once */ - if (di->charge_design_full <= 0) - di->charge_design_full = bq27xxx_battery_read_dcap(di); - } - - if ((di->cache.capacity != cache.capacity) || - (di->cache.flags != cache.flags)) - power_supply_changed(di->bat); - - if (memcmp(&di->cache, &cache, sizeof(cache)) != 0) - di->cache = cache; - - di->last_update = jiffies; - - if (!di->removed && poll_interval > 0) - mod_delayed_work(system_wq, &di->work, poll_interval * HZ); -} - -void bq27xxx_battery_update(struct bq27xxx_device_info *di) -{ - mutex_lock(&di->lock); - bq27xxx_battery_update_unlocked(di); - mutex_unlock(&di->lock); -} -EXPORT_SYMBOL_GPL(bq27xxx_battery_update); - -static void bq27xxx_battery_poll(struct work_struct *work) -{ - struct bq27xxx_device_info *di = - container_of(work, struct bq27xxx_device_info, - work.work); - - bq27xxx_battery_update(di); -} - static bool bq27xxx_battery_is_full(struct bq27xxx_device_info *di, int flags) { if (di->opts & BQ27XXX_O_ZERO) @@ -1895,6 +1834,67 @@ static int bq27xxx_battery_current_and_status( return 0; }
+static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di) +{ + struct bq27xxx_reg_cache cache = {0, }; + bool has_singe_flag = di->opts & BQ27XXX_O_ZERO; + + cache.flags = bq27xxx_read(di, BQ27XXX_REG_FLAGS, has_singe_flag); + if ((cache.flags & 0xff) == 0xff) + cache.flags = -1; /* read error */ + if (cache.flags >= 0) { + cache.temperature = bq27xxx_battery_read_temperature(di); + if (di->regs[BQ27XXX_REG_TTE] != INVALID_REG_ADDR) + cache.time_to_empty = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTE); + if (di->regs[BQ27XXX_REG_TTECP] != INVALID_REG_ADDR) + cache.time_to_empty_avg = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTECP); + if (di->regs[BQ27XXX_REG_TTF] != INVALID_REG_ADDR) + cache.time_to_full = bq27xxx_battery_read_time(di, BQ27XXX_REG_TTF); + + cache.charge_full = bq27xxx_battery_read_fcc(di); + cache.capacity = bq27xxx_battery_read_soc(di); + if (di->regs[BQ27XXX_REG_AE] != INVALID_REG_ADDR) + cache.energy = bq27xxx_battery_read_energy(di); + di->cache.flags = cache.flags; + cache.health = bq27xxx_battery_read_health(di); + if (di->regs[BQ27XXX_REG_CYCT] != INVALID_REG_ADDR) + cache.cycle_count = bq27xxx_battery_read_cyct(di); + + /* We only have to read charge design full once */ + if (di->charge_design_full <= 0) + di->charge_design_full = bq27xxx_battery_read_dcap(di); + } + + if ((di->cache.capacity != cache.capacity) || + (di->cache.flags != cache.flags)) + power_supply_changed(di->bat); + + if (memcmp(&di->cache, &cache, sizeof(cache)) != 0) + di->cache = cache; + + di->last_update = jiffies; + + if (!di->removed && poll_interval > 0) + mod_delayed_work(system_wq, &di->work, poll_interval * HZ); +} + +void bq27xxx_battery_update(struct bq27xxx_device_info *di) +{ + mutex_lock(&di->lock); + bq27xxx_battery_update_unlocked(di); + mutex_unlock(&di->lock); +} +EXPORT_SYMBOL_GPL(bq27xxx_battery_update); + +static void bq27xxx_battery_poll(struct work_struct *work) +{ + struct bq27xxx_device_info *di = + container_of(work, struct bq27xxx_device_info, + work.work); + + bq27xxx_battery_update(di); +} + /* * Get the average power in µW * Return < 0 if something fails.
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 939a116142012926e25de0ea6b7e2f8d86a5f1b6 ]
On gauges where the current register is signed, there is no charging flag in the flags register. So only checking flags will not result in power_supply_changed() getting called when e.g. a charger is plugged in and the current sign changes from negative (discharging) to positive (charging).
This causes userspace's notion of the status to lag until userspace does a poll.
And when a power_supply_leds.c LED trigger is used to indicate charging status with a LED, this LED will lag until the capacity percentage changes, which may take many minutes (because the LED trigger only is updated on power_supply_changed() calls).
Fix this by calling bq27xxx_battery_current_and_status() on gauges with a signed current register and checking if the status has changed.
Fixes: 297a533b3e62 ("bq27x00: Cache battery registers") Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq27xxx_battery.c | 13 ++++++++++++- include/linux/power/bq27xxx_battery.h | 3 +++ 2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c index 2fca4b9403ff9..e6d67655732aa 100644 --- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -1836,6 +1836,7 @@ static int bq27xxx_battery_current_and_status(
static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di) { + union power_supply_propval status = di->last_status; struct bq27xxx_reg_cache cache = {0, }; bool has_singe_flag = di->opts & BQ27XXX_O_ZERO;
@@ -1860,14 +1861,24 @@ static void bq27xxx_battery_update_unlocked(struct bq27xxx_device_info *di) if (di->regs[BQ27XXX_REG_CYCT] != INVALID_REG_ADDR) cache.cycle_count = bq27xxx_battery_read_cyct(di);
+ /* + * On gauges with signed current reporting the current must be + * checked to detect charging <-> discharging status changes. + */ + if (!(di->opts & BQ27XXX_O_ZERO)) + bq27xxx_battery_current_and_status(di, NULL, &status, &cache); + /* We only have to read charge design full once */ if (di->charge_design_full <= 0) di->charge_design_full = bq27xxx_battery_read_dcap(di); }
if ((di->cache.capacity != cache.capacity) || - (di->cache.flags != cache.flags)) + (di->cache.flags != cache.flags) || + (di->last_status.intval != status.intval)) { + di->last_status.intval = status.intval; power_supply_changed(di->bat); + }
if (memcmp(&di->cache, &cache, sizeof(cache)) != 0) di->cache = cache; diff --git a/include/linux/power/bq27xxx_battery.h b/include/linux/power/bq27xxx_battery.h index e3322dad9c85c..7c8d65414a70a 100644 --- a/include/linux/power/bq27xxx_battery.h +++ b/include/linux/power/bq27xxx_battery.h @@ -2,6 +2,8 @@ #ifndef __LINUX_BQ27X00_BATTERY_H__ #define __LINUX_BQ27X00_BATTERY_H__
+#include <linux/power_supply.h> + enum bq27xxx_chip { BQ27000 = 1, /* bq27000, bq27200 */ BQ27010, /* bq27010, bq27210 */ @@ -70,6 +72,7 @@ struct bq27xxx_device_info { int charge_design_full; bool removed; unsigned long last_update; + union power_supply_propval last_status; struct delayed_work work; struct power_supply *bat; struct list_head list;
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 59a99cd462fbdf71f4e845e09f37783035088b4f ]
bq27xxx_external_power_changed() gets called when the charger is plugged in or out. Rather then immediately scheduling an update wait 0.5 seconds for things to stabilize, so that e.g. the (dis)charge current is stable when bq27xxx_battery_update() runs.
Fixes: 740b755a3b34 ("bq27x00: Poll battery state") Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq27xxx_battery.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/power/supply/bq27xxx_battery.c b/drivers/power/supply/bq27xxx_battery.c index e6d67655732aa..ffb5f5c028223 100644 --- a/drivers/power/supply/bq27xxx_battery.c +++ b/drivers/power/supply/bq27xxx_battery.c @@ -2099,8 +2099,8 @@ static void bq27xxx_external_power_changed(struct power_supply *psy) { struct bq27xxx_device_info *di = power_supply_get_drvdata(psy);
- cancel_delayed_work_sync(&di->work); - schedule_delayed_work(&di->work, 0); + /* After charger plug in/out wait 0.5s for things to stabilize */ + mod_delayed_work(system_wq, &di->work, HZ / 2); }
int bq27xxx_battery_setup(struct bq27xxx_device_info *di)
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 2220af8ca61ae67de4ec3deec1c6395a2f65b9fd ]
Some (USB) charger ICs have variants with USB D+ and D- pins to do their own builtin charger-type detection, like e.g. the bq24190 and bq25890 and also variants which lack this functionality, e.g. the bq24192 and bq25892.
In case the charger-type; and thus the input-current-limit detection is done outside the charger IC then we need some way to communicate this to the charger IC. In the past extcon was used for this, but if the external detection does e.g. full USB PD negotiation then the extcon cable-types do not convey enough information.
For these setups it was decided to model the external charging "brick" and the parameters negotiated with it as a power_supply class-device itself; and power_supply_set_input_current_limit_from_supplier() was introduced to allow drivers to get the input-current-limit this way.
But in some cases psy drivers may want to know other properties, e.g. the bq25892 can do "quick-charge" negotiation by pulsing its current draw, but this should only be done if the usb_type psy-property of its supplier is set to DCP (and device-properties indicate the board allows higher voltages).
Instead of adding extra helper functions for each property which a psy-driver wants to query from its supplier, refactor power_supply_set_input_current_limit_from_supplier() into a more generic power_supply_get_property_from_supplier() function.
Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Stable-dep-of: 77c2a3097d70 ("power: supply: bq24190: Call power_supply_changed() after updating input current") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq24190_charger.c | 12 ++++- drivers/power/supply/power_supply_core.c | 57 +++++++++++++----------- include/linux/power_supply.h | 5 ++- 3 files changed, 44 insertions(+), 30 deletions(-)
diff --git a/drivers/power/supply/bq24190_charger.c b/drivers/power/supply/bq24190_charger.c index ebb5ba7f8bb63..19537bd75013d 100644 --- a/drivers/power/supply/bq24190_charger.c +++ b/drivers/power/supply/bq24190_charger.c @@ -1201,8 +1201,18 @@ static void bq24190_input_current_limit_work(struct work_struct *work) struct bq24190_dev_info *bdi = container_of(work, struct bq24190_dev_info, input_current_limit_work.work); + union power_supply_propval val; + int ret; + + ret = power_supply_get_property_from_supplier(bdi->charger, + POWER_SUPPLY_PROP_CURRENT_MAX, + &val); + if (ret) + return;
- power_supply_set_input_current_limit_from_supplier(bdi->charger); + bq24190_charger_set_property(bdi->charger, + POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT, + &val); }
/* Sync the input-current-limit with our parent supply (if we have one) */ diff --git a/drivers/power/supply/power_supply_core.c b/drivers/power/supply/power_supply_core.c index 8161fad081a96..daf76d7885c05 100644 --- a/drivers/power/supply/power_supply_core.c +++ b/drivers/power/supply/power_supply_core.c @@ -375,46 +375,49 @@ int power_supply_is_system_supplied(void) } EXPORT_SYMBOL_GPL(power_supply_is_system_supplied);
-static int __power_supply_get_supplier_max_current(struct device *dev, - void *data) +struct psy_get_supplier_prop_data { + struct power_supply *psy; + enum power_supply_property psp; + union power_supply_propval *val; +}; + +static int __power_supply_get_supplier_property(struct device *dev, void *_data) { - union power_supply_propval ret = {0,}; struct power_supply *epsy = dev_get_drvdata(dev); - struct power_supply *psy = data; + struct psy_get_supplier_prop_data *data = _data;
- if (__power_supply_is_supplied_by(epsy, psy)) - if (!epsy->desc->get_property(epsy, - POWER_SUPPLY_PROP_CURRENT_MAX, - &ret)) - return ret.intval; + if (__power_supply_is_supplied_by(epsy, data->psy)) + if (!epsy->desc->get_property(epsy, data->psp, data->val)) + return 1; /* Success */
- return 0; + return 0; /* Continue iterating */ }
-int power_supply_set_input_current_limit_from_supplier(struct power_supply *psy) +int power_supply_get_property_from_supplier(struct power_supply *psy, + enum power_supply_property psp, + union power_supply_propval *val) { - union power_supply_propval val = {0,}; - int curr; - - if (!psy->desc->set_property) - return -EINVAL; + struct psy_get_supplier_prop_data data = { + .psy = psy, + .psp = psp, + .val = val, + }; + int ret;
/* * This function is not intended for use with a supply with multiple - * suppliers, we simply pick the first supply to report a non 0 - * max-current. + * suppliers, we simply pick the first supply to report the psp. */ - curr = class_for_each_device(power_supply_class, NULL, psy, - __power_supply_get_supplier_max_current); - if (curr <= 0) - return (curr == 0) ? -ENODEV : curr; - - val.intval = curr; + ret = class_for_each_device(power_supply_class, NULL, &data, + __power_supply_get_supplier_property); + if (ret < 0) + return ret; + if (ret == 0) + return -ENODEV;
- return psy->desc->set_property(psy, - POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT, &val); + return 0; } -EXPORT_SYMBOL_GPL(power_supply_set_input_current_limit_from_supplier); +EXPORT_SYMBOL_GPL(power_supply_get_property_from_supplier);
int power_supply_set_battery_charged(struct power_supply *psy) { diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index 9ca1f120a2117..0735b8963e0af 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -420,8 +420,9 @@ power_supply_temp2resist_simple(struct power_supply_resistance_temp_table *table int table_len, int temp); extern void power_supply_changed(struct power_supply *psy); extern int power_supply_am_i_supplied(struct power_supply *psy); -extern int power_supply_set_input_current_limit_from_supplier( - struct power_supply *psy); +int power_supply_get_property_from_supplier(struct power_supply *psy, + enum power_supply_property psp, + union power_supply_propval *val); extern int power_supply_set_battery_charged(struct power_supply *psy);
#ifdef CONFIG_POWER_SUPPLY
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 77c2a3097d7029441e8a91aa0de1b4e5464593da ]
The bq24192 model relies on external charger-type detection and once that is done the bq24190_charger code will update the input current.
In this case, when the initial power_supply_changed() call is made from the interrupt handler, the input settings are 5V/0.5A which on many devices is not enough power to charge (while the device is on).
On many devices the fuel-gauge relies in its external_power_changed callback to timely signal userspace about charging <-> discharging status changes. Add a power_supply_changed() call after updating the input current. This allows the fuel-gauge driver to timely recheck if the battery is charging after the new input current has been applied and then it can immediately notify userspace about this.
Fixes: 18f8e6f695ac ("power: supply: bq24190_charger: Get input_current_limit from our supplier") Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/bq24190_charger.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/power/supply/bq24190_charger.c b/drivers/power/supply/bq24190_charger.c index 19537bd75013d..90ac5e59a5d6f 100644 --- a/drivers/power/supply/bq24190_charger.c +++ b/drivers/power/supply/bq24190_charger.c @@ -1213,6 +1213,7 @@ static void bq24190_input_current_limit_work(struct work_struct *work) bq24190_charger_set_property(bdi->charger, POWER_SUPPLY_PROP_INPUT_CURRENT_LIMIT, &val); + power_supply_changed(bdi->charger); }
/* Sync the input-current-limit with our parent supply (if we have one) */
From: Anton Protopopov aspsk@isovalent.com
[ Upstream commit b34ffb0c6d23583830f9327864b9c1f486003305 ]
The LRU and LRU_PERCPU maps allocate a new element on update before locking the target hash table bucket. Right after that the maps try to lock the bucket. If this fails, then maps return -EBUSY to the caller without releasing the allocated element. This makes the element untracked: it doesn't belong to either of free lists, and it doesn't belong to the hash table, so can't be re-used; this eventually leads to the permanent -ENOMEM on LRU map updates, which is unexpected. Fix this by returning the element to the local free list if bucket locking fails.
Fixes: 20b6cc34ea74 ("bpf: Avoid hashtab deadlock with map_locked") Signed-off-by: Anton Protopopov aspsk@isovalent.com Link: https://lore.kernel.org/r/20230522154558.2166815-1-aspsk@isovalent.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/hashtab.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 10b37773d9e47..a63c68f5945cd 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -1165,7 +1165,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
ret = htab_lock_bucket(htab, b, hash, &flags); if (ret) - return ret; + goto err_lock_bucket;
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1186,6 +1186,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value, err: htab_unlock_bucket(htab, b, hash, flags);
+err_lock_bucket: if (ret) htab_lru_push_free(htab, l_new); else if (l_old) @@ -1288,7 +1289,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
ret = htab_lock_bucket(htab, b, hash, &flags); if (ret) - return ret; + goto err_lock_bucket;
l_old = lookup_elem_raw(head, hash, key, key_size);
@@ -1311,6 +1312,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key, ret = 0; err: htab_unlock_bucket(htab, b, hash, flags); +err_lock_bucket: if (l_new) bpf_lru_push_free(&htab->lru, &l_new->lru_node); return ret;
From: Mark Bloch mbloch@nvidia.com
[ Upstream commit 8a6e75e5f57e9ac82268d9bfca3403598d9d0292 ]
Devcom API is intended to be used between 2 devices only add this implied assumption into the code and check when it's no true.
Signed-off-by: Mark Bloch mbloch@nvidia.com Reviewed-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: 691c041bf208 ("net/mlx5e: Fix deadlock in tc route query code") Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/lib/devcom.c | 16 +++++++++------- .../net/ethernet/mellanox/mlx5/core/lib/devcom.h | 2 ++ 2 files changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c index abd066e952286..617eea1b1701b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c @@ -14,7 +14,7 @@ static LIST_HEAD(devcom_list); struct mlx5_devcom_component { struct { void *data; - } device[MLX5_MAX_PORTS]; + } device[MLX5_DEVCOM_PORTS_SUPPORTED];
mlx5_devcom_event_handler_t handler; struct rw_semaphore sem; @@ -25,7 +25,7 @@ struct mlx5_devcom_list { struct list_head list;
struct mlx5_devcom_component components[MLX5_DEVCOM_NUM_COMPONENTS]; - struct mlx5_core_dev *devs[MLX5_MAX_PORTS]; + struct mlx5_core_dev *devs[MLX5_DEVCOM_PORTS_SUPPORTED]; };
struct mlx5_devcom { @@ -74,13 +74,15 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)
if (!mlx5_core_is_pf(dev)) return NULL; + if (MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_DEVCOM_PORTS_SUPPORTED) + return NULL;
sguid0 = mlx5_query_nic_system_image_guid(dev); list_for_each_entry(iter, &devcom_list, list) { struct mlx5_core_dev *tmp_dev = NULL;
idx = -1; - for (i = 0; i < MLX5_MAX_PORTS; i++) { + for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) { if (iter->devs[i]) tmp_dev = iter->devs[i]; else @@ -135,11 +137,11 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom *devcom)
kfree(devcom);
- for (i = 0; i < MLX5_MAX_PORTS; i++) + for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) if (priv->devs[i]) break;
- if (i != MLX5_MAX_PORTS) + if (i != MLX5_DEVCOM_PORTS_SUPPORTED) return;
list_del(&priv->list); @@ -192,7 +194,7 @@ int mlx5_devcom_send_event(struct mlx5_devcom *devcom,
comp = &devcom->priv->components[id]; down_write(&comp->sem); - for (i = 0; i < MLX5_MAX_PORTS; i++) + for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) if (i != devcom->idx && comp->device[i].data) { err = comp->handler(event, comp->device[i].data, event_data); @@ -240,7 +242,7 @@ void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom, return NULL; }
- for (i = 0; i < MLX5_MAX_PORTS; i++) + for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) if (i != devcom->idx) break;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h index 939d5bf1581b5..94313c18bb647 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h @@ -6,6 +6,8 @@
#include <linux/mlx5/driver.h>
+#define MLX5_DEVCOM_PORTS_SUPPORTED 2 + enum mlx5_devcom_components { MLX5_DEVCOM_ESW_OFFLOADS,
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit 691c041bf20899fc13c793f92ba61ab660fa3a30 ]
Cited commit causes ABBA deadlock[0] when peer flows are created while holding the devcom rw semaphore. Due to peer flows offload implementation the lock is taken much higher up the call chain and there is no obvious way to easily fix the deadlock. Instead, since tc route query code needs the peer eswitch structure only to perform a lookup in xarray and doesn't perform any sleeping operations with it, refactor the code for lockless execution in following ways:
- RCUify the devcom 'data' pointer. When resetting the pointer synchronously wait for RCU grace period before returning. This is fine since devcom is currently only used for synchronization of pairing/unpairing of eswitches which is rare and already expensive as-is.
- Wrap all usages of 'paired' boolean in {READ|WRITE}_ONCE(). The flag has already been used in some unlocked contexts without proper annotations (e.g. users of mlx5_devcom_is_paired() function), but it wasn't an issue since all relevant code paths checked it again after obtaining the devcom semaphore. Now it is also used by mlx5_devcom_get_peer_data_rcu() as "best effort" check to return NULL when devcom is being unpaired. Note that while RCU read lock doesn't prevent the unpaired flag from being changed concurrently it still guarantees that reader can continue to use 'data'.
- Refactor mlx5e_tc_query_route_vport() function to use new mlx5_devcom_get_peer_data_rcu() API which fixes the deadlock.
[0]:
[ 164.599612] ====================================================== [ 164.600142] WARNING: possible circular locking dependency detected [ 164.600667] 6.3.0-rc3+ #1 Not tainted [ 164.601021] ------------------------------------------------------ [ 164.601557] handler1/3456 is trying to acquire lock: [ 164.601998] ffff88811f1714b0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}, at: mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.603078] but task is already holding lock: [ 164.603617] ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.604459] which lock already depends on the new lock.
[ 164.605190] the existing dependency chain (in reverse order) is: [ 164.605848] -> #1 (&comp->sem){++++}-{3:3}: [ 164.606380] down_read+0x39/0x50 [ 164.606772] mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.607336] mlx5e_tc_query_route_vport+0x86/0xc0 [mlx5_core] [ 164.607914] mlx5e_tc_tun_route_lookup+0x1a4/0x1d0 [mlx5_core] [ 164.608495] mlx5e_attach_decap_route+0xc6/0x1e0 [mlx5_core] [ 164.609063] mlx5e_tc_add_fdb_flow+0x1ea/0x360 [mlx5_core] [ 164.609627] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.610175] mlx5e_configure_flower+0x952/0x1a20 [mlx5_core] [ 164.610741] tc_setup_cb_add+0xd4/0x200 [ 164.611146] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.611661] fl_change+0xc95/0x18a0 [cls_flower] [ 164.612116] tc_new_tfilter+0x3fc/0xd20 [ 164.612516] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.612936] netlink_rcv_skb+0x54/0x100 [ 164.613339] netlink_unicast+0x190/0x250 [ 164.613746] netlink_sendmsg+0x245/0x4a0 [ 164.614150] sock_sendmsg+0x38/0x60 [ 164.614522] ____sys_sendmsg+0x1d0/0x1e0 [ 164.614934] ___sys_sendmsg+0x80/0xc0 [ 164.615320] __sys_sendmsg+0x51/0x90 [ 164.615701] do_syscall_64+0x3d/0x90 [ 164.616083] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.616568] -> #0 (&esw->offloads.encap_tbl_lock){+.+.}-{3:3}: [ 164.617210] __lock_acquire+0x159e/0x26e0 [ 164.617638] lock_acquire+0xc2/0x2a0 [ 164.618018] __mutex_lock+0x92/0xcd0 [ 164.618401] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.618943] post_process_attr+0x153/0x2d0 [mlx5_core] [ 164.619471] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core] [ 164.620021] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.620564] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core] [ 164.621125] tc_setup_cb_add+0xd4/0x200 [ 164.621531] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.622047] fl_change+0xc95/0x18a0 [cls_flower] [ 164.622500] tc_new_tfilter+0x3fc/0xd20 [ 164.622906] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.623324] netlink_rcv_skb+0x54/0x100 [ 164.623727] netlink_unicast+0x190/0x250 [ 164.624138] netlink_sendmsg+0x245/0x4a0 [ 164.624544] sock_sendmsg+0x38/0x60 [ 164.624919] ____sys_sendmsg+0x1d0/0x1e0 [ 164.625340] ___sys_sendmsg+0x80/0xc0 [ 164.625731] __sys_sendmsg+0x51/0x90 [ 164.626117] do_syscall_64+0x3d/0x90 [ 164.626502] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.626995] other info that might help us debug this:
[ 164.627725] Possible unsafe locking scenario:
[ 164.628268] CPU0 CPU1 [ 164.628683] ---- ---- [ 164.629098] lock(&comp->sem); [ 164.629421] lock(&esw->offloads.encap_tbl_lock); [ 164.630066] lock(&comp->sem); [ 164.630555] lock(&esw->offloads.encap_tbl_lock); [ 164.630993] *** DEADLOCK ***
[ 164.631575] 3 locks held by handler1/3456: [ 164.631962] #0: ffff888124b75130 (&block->cb_lock){++++}-{3:3}, at: tc_setup_cb_add+0x5b/0x200 [ 164.632703] #1: ffff888116e512b8 (&esw->mode_lock){++++}-{3:3}, at: mlx5_esw_hold+0x39/0x50 [mlx5_core] [ 164.633552] #2: ffff88810137fc98 (&comp->sem){++++}-{3:3}, at: mlx5_devcom_get_peer_data+0x37/0x80 [mlx5_core] [ 164.634435] stack backtrace: [ 164.634883] CPU: 17 PID: 3456 Comm: handler1 Not tainted 6.3.0-rc3+ #1 [ 164.635431] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 164.636340] Call Trace: [ 164.636616] <TASK> [ 164.636863] dump_stack_lvl+0x47/0x70 [ 164.637217] check_noncircular+0xfe/0x110 [ 164.637601] __lock_acquire+0x159e/0x26e0 [ 164.637977] ? mlx5_cmd_set_fte+0x5b0/0x830 [mlx5_core] [ 164.638472] lock_acquire+0xc2/0x2a0 [ 164.638828] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.639339] ? lock_is_held_type+0x98/0x110 [ 164.639728] __mutex_lock+0x92/0xcd0 [ 164.640074] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.640576] ? __lock_acquire+0x382/0x26e0 [ 164.640958] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.641468] ? mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.641965] mlx5e_attach_encap+0xd8/0x8b0 [mlx5_core] [ 164.642454] ? lock_release+0xbf/0x240 [ 164.642819] post_process_attr+0x153/0x2d0 [mlx5_core] [ 164.643318] mlx5e_tc_add_fdb_flow+0x164/0x360 [mlx5_core] [ 164.643835] __mlx5e_add_fdb_flow+0x2d2/0x430 [mlx5_core] [ 164.644340] mlx5e_configure_flower+0xe33/0x1a20 [mlx5_core] [ 164.644862] ? lock_acquire+0xc2/0x2a0 [ 164.645219] tc_setup_cb_add+0xd4/0x200 [ 164.645588] fl_hw_replace_filter+0x14c/0x1f0 [cls_flower] [ 164.646067] fl_change+0xc95/0x18a0 [cls_flower] [ 164.646488] tc_new_tfilter+0x3fc/0xd20 [ 164.646861] ? tc_del_tfilter+0x810/0x810 [ 164.647236] rtnetlink_rcv_msg+0x418/0x5b0 [ 164.647621] ? rtnl_setlink+0x160/0x160 [ 164.647982] netlink_rcv_skb+0x54/0x100 [ 164.648348] netlink_unicast+0x190/0x250 [ 164.648722] netlink_sendmsg+0x245/0x4a0 [ 164.649090] sock_sendmsg+0x38/0x60 [ 164.649434] ____sys_sendmsg+0x1d0/0x1e0 [ 164.649804] ? copy_msghdr_from_user+0x6d/0xa0 [ 164.650213] ___sys_sendmsg+0x80/0xc0 [ 164.650563] ? lock_acquire+0xc2/0x2a0 [ 164.650926] ? lock_acquire+0xc2/0x2a0 [ 164.651286] ? __fget_files+0x5/0x190 [ 164.651644] ? find_held_lock+0x2b/0x80 [ 164.652006] ? __fget_files+0xb9/0x190 [ 164.652365] ? lock_release+0xbf/0x240 [ 164.652723] ? __fget_files+0xd3/0x190 [ 164.653079] __sys_sendmsg+0x51/0x90 [ 164.653435] do_syscall_64+0x3d/0x90 [ 164.653784] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 164.654229] RIP: 0033:0x7f378054f8bd [ 164.654577] Code: 28 89 54 24 1c 48 89 74 24 10 89 7c 24 08 e8 6a c3 f4 ff 8b 54 24 1c 48 8b 74 24 10 41 89 c0 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 44 24 08 e8 be c3 f4 ff 48 [ 164.656041] RSP: 002b:00007f377fa114b0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 164.656701] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f378054f8bd [ 164.657297] RDX: 0000000000000000 RSI: 00007f377fa11540 RDI: 0000000000000014 [ 164.657885] RBP: 00007f377fa12278 R08: 0000000000000000 R09: 000000000000015c [ 164.658472] R10: 00007f377fa123d0 R11: 0000000000000293 R12: 0000560962d99bd0 [ 164.665317] R13: 0000000000000000 R14: 0000560962d99bd0 R15: 00007f377fa11540
Fixes: f9d196bd632b ("net/mlx5e: Use correct eswitch for stack devices with lag") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Reviewed-by: Shay Drory shayd@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_tc.c | 19 ++++---- .../ethernet/mellanox/mlx5/core/lib/devcom.c | 48 ++++++++++++++----- .../ethernet/mellanox/mlx5/core/lib/devcom.h | 1 + 3 files changed, 48 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 9ea4281a55b81..5cef556223e2c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1308,11 +1308,9 @@ bool mlx5e_tc_is_vf_tunnel(struct net_device *out_dev, struct net_device *route_ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *route_dev, u16 *vport) { struct mlx5e_priv *out_priv, *route_priv; - struct mlx5_devcom *devcom = NULL; struct mlx5_core_dev *route_mdev; struct mlx5_eswitch *esw; u16 vhca_id; - int err;
out_priv = netdev_priv(out_dev); esw = out_priv->mdev->priv.eswitch; @@ -1321,6 +1319,9 @@ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *ro
vhca_id = MLX5_CAP_GEN(route_mdev, vhca_id); if (mlx5_lag_is_active(out_priv->mdev)) { + struct mlx5_devcom *devcom; + int err; + /* In lag case we may get devices from different eswitch instances. * If we failed to get vport num, it means, mostly, that we on the wrong * eswitch. @@ -1329,16 +1330,16 @@ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *ro if (err != -ENOENT) return err;
+ rcu_read_lock(); devcom = out_priv->mdev->priv.devcom; - esw = mlx5_devcom_get_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS); - if (!esw) - return -ENODEV; + esw = mlx5_devcom_get_peer_data_rcu(devcom, MLX5_DEVCOM_ESW_OFFLOADS); + err = esw ? mlx5_eswitch_vhca_id_to_vport(esw, vhca_id, vport) : -ENODEV; + rcu_read_unlock(); + + return err; }
- err = mlx5_eswitch_vhca_id_to_vport(esw, vhca_id, vport); - if (devcom) - mlx5_devcom_release_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS); - return err; + return mlx5_eswitch_vhca_id_to_vport(esw, vhca_id, vport); }
int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c index 617eea1b1701b..8f978491dd32f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c @@ -13,7 +13,7 @@ static LIST_HEAD(devcom_list);
struct mlx5_devcom_component { struct { - void *data; + void __rcu *data; } device[MLX5_DEVCOM_PORTS_SUPPORTED];
mlx5_devcom_event_handler_t handler; @@ -163,7 +163,7 @@ void mlx5_devcom_register_component(struct mlx5_devcom *devcom, comp = &devcom->priv->components[id]; down_write(&comp->sem); comp->handler = handler; - comp->device[devcom->idx].data = data; + rcu_assign_pointer(comp->device[devcom->idx].data, data); up_write(&comp->sem); }
@@ -177,8 +177,9 @@ void mlx5_devcom_unregister_component(struct mlx5_devcom *devcom,
comp = &devcom->priv->components[id]; down_write(&comp->sem); - comp->device[devcom->idx].data = NULL; + RCU_INIT_POINTER(comp->device[devcom->idx].data, NULL); up_write(&comp->sem); + synchronize_rcu(); }
int mlx5_devcom_send_event(struct mlx5_devcom *devcom, @@ -194,12 +195,15 @@ int mlx5_devcom_send_event(struct mlx5_devcom *devcom,
comp = &devcom->priv->components[id]; down_write(&comp->sem); - for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) - if (i != devcom->idx && comp->device[i].data) { - err = comp->handler(event, comp->device[i].data, - event_data); + for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) { + void *data = rcu_dereference_protected(comp->device[i].data, + lockdep_is_held(&comp->sem)); + + if (i != devcom->idx && data) { + err = comp->handler(event, data, event_data); break; } + }
up_write(&comp->sem); return err; @@ -214,7 +218,7 @@ void mlx5_devcom_set_paired(struct mlx5_devcom *devcom, comp = &devcom->priv->components[id]; WARN_ON(!rwsem_is_locked(&comp->sem));
- comp->paired = paired; + WRITE_ONCE(comp->paired, paired); }
bool mlx5_devcom_is_paired(struct mlx5_devcom *devcom, @@ -223,7 +227,7 @@ bool mlx5_devcom_is_paired(struct mlx5_devcom *devcom, if (IS_ERR_OR_NULL(devcom)) return false;
- return devcom->priv->components[id].paired; + return READ_ONCE(devcom->priv->components[id].paired); }
void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom, @@ -237,7 +241,7 @@ void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom,
comp = &devcom->priv->components[id]; down_read(&comp->sem); - if (!comp->paired) { + if (!READ_ONCE(comp->paired)) { up_read(&comp->sem); return NULL; } @@ -246,7 +250,29 @@ void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom, if (i != devcom->idx) break;
- return comp->device[i].data; + return rcu_dereference_protected(comp->device[i].data, lockdep_is_held(&comp->sem)); +} + +void *mlx5_devcom_get_peer_data_rcu(struct mlx5_devcom *devcom, enum mlx5_devcom_components id) +{ + struct mlx5_devcom_component *comp; + int i; + + if (IS_ERR_OR_NULL(devcom)) + return NULL; + + for (i = 0; i < MLX5_DEVCOM_PORTS_SUPPORTED; i++) + if (i != devcom->idx) + break; + + comp = &devcom->priv->components[id]; + /* This can change concurrently, however 'data' pointer will remain + * valid for the duration of RCU read section. + */ + if (!READ_ONCE(comp->paired)) + return NULL; + + return rcu_dereference(comp->device[i].data); }
void mlx5_devcom_release_peer_data(struct mlx5_devcom *devcom, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h index 94313c18bb647..9a496f4722dad 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.h @@ -41,6 +41,7 @@ bool mlx5_devcom_is_paired(struct mlx5_devcom *devcom,
void *mlx5_devcom_get_peer_data(struct mlx5_devcom *devcom, enum mlx5_devcom_components id); +void *mlx5_devcom_get_peer_data_rcu(struct mlx5_devcom *devcom, enum mlx5_devcom_components id); void mlx5_devcom_release_peer_data(struct mlx5_devcom *devcom, enum mlx5_devcom_components id);
From: Shay Drory shayd@nvidia.com
[ Upstream commit 1f893f57a3bf9fe1f4bcb25b55aea7f7f9712fe7 ]
From one hand, mlx5 driver is allowing to probe PFs in parallel. From the other hand, devcom, which is a share resource between PFs, is
registered without any lock. This might resulted in memory problems.
Hence, use the global mlx5_dev_list_lock in order to serialize devcom registration.
Fixes: fadd59fc50d0 ("net/mlx5: Introduce inter-device communication mechanism") Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/lib/devcom.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c index 8f978491dd32f..b7d779d08d837 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/devcom.c @@ -3,6 +3,7 @@
#include <linux/mlx5/vport.h> #include "lib/devcom.h" +#include "mlx5_core.h"
static LIST_HEAD(devcom_list);
@@ -77,6 +78,7 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev) if (MLX5_CAP_GEN(dev, num_lag_ports) != MLX5_DEVCOM_PORTS_SUPPORTED) return NULL;
+ mlx5_dev_list_lock(); sguid0 = mlx5_query_nic_system_image_guid(dev); list_for_each_entry(iter, &devcom_list, list) { struct mlx5_core_dev *tmp_dev = NULL; @@ -102,8 +104,10 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev)
if (!priv) { priv = mlx5_devcom_list_alloc(); - if (!priv) - return ERR_PTR(-ENOMEM); + if (!priv) { + devcom = ERR_PTR(-ENOMEM); + goto out; + }
idx = 0; new_priv = true; @@ -114,12 +118,14 @@ struct mlx5_devcom *mlx5_devcom_register_device(struct mlx5_core_dev *dev) if (!devcom) { if (new_priv) kfree(priv); - return ERR_PTR(-ENOMEM); + devcom = ERR_PTR(-ENOMEM); + goto out; }
if (new_priv) list_add(&priv->list, &devcom_list); - +out: + mlx5_dev_list_unlock(); return devcom; }
@@ -132,6 +138,7 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom *devcom) if (IS_ERR_OR_NULL(devcom)) return;
+ mlx5_dev_list_lock(); priv = devcom->priv; priv->devs[devcom->idx] = NULL;
@@ -142,10 +149,12 @@ void mlx5_devcom_unregister_device(struct mlx5_devcom *devcom) break;
if (i != MLX5_DEVCOM_PORTS_SUPPORTED) - return; + goto out;
list_del(&priv->list); kfree(priv); +out: + mlx5_dev_list_unlock(); }
void mlx5_devcom_register_component(struct mlx5_devcom *devcom,
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
[ Upstream commit 9a1aac8a96dc014bec49806a7a964bf2fdbd315f ]
On a multiple package system using Sub-NUMA clustering, there is an issue in mapping Linux CPU number to PUNIT PCI device when manufacturer decided to reuse the PCI bus number across packages. Bus number can be reused as long as they are in different domain or segment. In this case some CPU will fail to find a PCI device to issue SST requests.
When bus numbers are reused across CPU packages, we are using proximity information by matching CPU numa node id to PUNIT PCI device numa node id. But on a package there can be only one PUNIT PCI device, but multiple numa nodes (one for each sub cluster). So, the numa node ID of the PUNIT PCI device can only match with one numa node id of CPUs in a sub cluster in the package.
Since there can be only one PUNIT PCI device per package, if we match with numa node id of any sub cluster in that package, we can use that mapping for any CPU in that package. So, store the match information in a per package data structure and return the information when there is no match.
While here, use defines for max bus number instead of hardcoding.
Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Link: https://lore.kernel.org/r/20220629194817.2418240-1-srinivas.pandruvada@linux... Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Stable-dep-of: bbb320bfe2c3 ("platform/x86: ISST: Remove 8 socket limit") Signed-off-by: Sasha Levin sashal@kernel.org --- .../intel/speed_select_if/isst_if_common.c | 39 +++++++++++++++---- 1 file changed, 32 insertions(+), 7 deletions(-)
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c index e8424e70d81d2..fd102678c75f6 100644 --- a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c +++ b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c @@ -277,29 +277,38 @@ static int isst_if_get_platform_info(void __user *argp) return 0; }
+#define ISST_MAX_BUS_NUMBER 2
struct isst_if_cpu_info { /* For BUS 0 and BUS 1 only, which we need for PUNIT interface */ - int bus_info[2]; - struct pci_dev *pci_dev[2]; + int bus_info[ISST_MAX_BUS_NUMBER]; + struct pci_dev *pci_dev[ISST_MAX_BUS_NUMBER]; int punit_cpu_id; int numa_node; };
+struct isst_if_pkg_info { + struct pci_dev *pci_dev[ISST_MAX_BUS_NUMBER]; +}; + static struct isst_if_cpu_info *isst_cpu_info; +static struct isst_if_pkg_info *isst_pkg_info; + #define ISST_MAX_PCI_DOMAINS 8
static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn) { struct pci_dev *matched_pci_dev = NULL; struct pci_dev *pci_dev = NULL; - int no_matches = 0; + int no_matches = 0, pkg_id; int i, bus_number;
- if (bus_no < 0 || bus_no > 1 || cpu < 0 || cpu >= nr_cpu_ids || - cpu >= num_possible_cpus()) + if (bus_no < 0 || bus_no >= ISST_MAX_BUS_NUMBER || cpu < 0 || + cpu >= nr_cpu_ids || cpu >= num_possible_cpus()) return NULL;
+ pkg_id = topology_physical_package_id(cpu); + bus_number = isst_cpu_info[cpu].bus_info[bus_no]; if (bus_number < 0) return NULL; @@ -324,6 +333,8 @@ static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn }
if (node == isst_cpu_info[cpu].numa_node) { + isst_pkg_info[pkg_id].pci_dev[bus_no] = _pci_dev; + pci_dev = _pci_dev; break; } @@ -342,6 +353,10 @@ static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn if (!pci_dev && no_matches == 1) pci_dev = matched_pci_dev;
+ /* Return pci_dev pointer for any matched CPU in the package */ + if (!pci_dev) + pci_dev = isst_pkg_info[pkg_id].pci_dev[bus_no]; + return pci_dev; }
@@ -361,8 +376,8 @@ struct pci_dev *isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn) { struct pci_dev *pci_dev;
- if (bus_no < 0 || bus_no > 1 || cpu < 0 || cpu >= nr_cpu_ids || - cpu >= num_possible_cpus()) + if (bus_no < 0 || bus_no >= ISST_MAX_BUS_NUMBER || cpu < 0 || + cpu >= nr_cpu_ids || cpu >= num_possible_cpus()) return NULL;
pci_dev = isst_cpu_info[cpu].pci_dev[bus_no]; @@ -417,10 +432,19 @@ static int isst_if_cpu_info_init(void) if (!isst_cpu_info) return -ENOMEM;
+ isst_pkg_info = kcalloc(topology_max_packages(), + sizeof(*isst_pkg_info), + GFP_KERNEL); + if (!isst_pkg_info) { + kfree(isst_cpu_info); + return -ENOMEM; + } + ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "platform/x86/isst-if:online", isst_if_cpu_online, NULL); if (ret < 0) { + kfree(isst_pkg_info); kfree(isst_cpu_info); return ret; } @@ -433,6 +457,7 @@ static int isst_if_cpu_info_init(void) static void isst_if_cpu_info_exit(void) { cpuhp_remove_state(isst_if_online_id); + kfree(isst_pkg_info); kfree(isst_cpu_info); };
From: Steve Wahl steve.wahl@hpe.com
[ Upstream commit bbb320bfe2c3e9740fe89cfa0a7089b4e8bfc4ff ]
Stop restricting the PCI search to a range of PCI domains fed to pci_get_domain_bus_and_slot(). Instead, use for_each_pci_dev() and look at all PCI domains in one pass.
On systems with more than 8 sockets, this avoids error messages like "Information: Invalid level, Can't get TDP control information at specified levels on cpu 480" from the intel speed select utility.
Fixes: aa2ddd242572 ("platform/x86: ISST: Use numa node id for cpu pci dev mapping") Signed-off-by: Steve Wahl steve.wahl@hpe.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20230519160420.2588475-1-steve.wahl@hpe.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../x86/intel/speed_select_if/isst_if_common.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c index fd102678c75f6..f6b32d31c5110 100644 --- a/drivers/platform/x86/intel/speed_select_if/isst_if_common.c +++ b/drivers/platform/x86/intel/speed_select_if/isst_if_common.c @@ -294,14 +294,13 @@ struct isst_if_pkg_info { static struct isst_if_cpu_info *isst_cpu_info; static struct isst_if_pkg_info *isst_pkg_info;
-#define ISST_MAX_PCI_DOMAINS 8 - static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn) { struct pci_dev *matched_pci_dev = NULL; struct pci_dev *pci_dev = NULL; + struct pci_dev *_pci_dev = NULL; int no_matches = 0, pkg_id; - int i, bus_number; + int bus_number;
if (bus_no < 0 || bus_no >= ISST_MAX_BUS_NUMBER || cpu < 0 || cpu >= nr_cpu_ids || cpu >= num_possible_cpus()) @@ -313,12 +312,11 @@ static struct pci_dev *_isst_if_get_pci_dev(int cpu, int bus_no, int dev, int fn if (bus_number < 0) return NULL;
- for (i = 0; i < ISST_MAX_PCI_DOMAINS; ++i) { - struct pci_dev *_pci_dev; + for_each_pci_dev(_pci_dev) { int node;
- _pci_dev = pci_get_domain_bus_and_slot(i, bus_number, PCI_DEVFN(dev, fn)); - if (!_pci_dev) + if (_pci_dev->bus->number != bus_number || + _pci_dev->devfn != PCI_DEVFN(dev, fn)) continue;
++no_matches;
From: David Epping david.epping@missinglinkelectronics.com
[ Upstream commit 71460c9ec5c743e9ffffca3c874d66267c36345e ]
By default the VSC8501 and VSC8502 RGMII/GMII/MII RX_CLK output is disabled. To allow packet forwarding towards the MAC it needs to be enabled.
For other PHYs supported by this driver the clock output is enabled by default.
Fixes: d3169863310d ("net: phy: mscc: add support for VSC8502") Signed-off-by: David Epping david.epping@missinglinkelectronics.com Reviewed-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Reviewed-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/mscc/mscc.h | 1 + drivers/net/phy/mscc/mscc_main.c | 54 +++++++++++++++++--------------- 2 files changed, 29 insertions(+), 26 deletions(-)
diff --git a/drivers/net/phy/mscc/mscc.h b/drivers/net/phy/mscc/mscc.h index a50235fdf7d99..055e4ca5b3b5c 100644 --- a/drivers/net/phy/mscc/mscc.h +++ b/drivers/net/phy/mscc/mscc.h @@ -179,6 +179,7 @@ enum rgmii_clock_delay { #define VSC8502_RGMII_CNTL 20 #define VSC8502_RGMII_RX_DELAY_MASK 0x0070 #define VSC8502_RGMII_TX_DELAY_MASK 0x0007 +#define VSC8502_RGMII_RX_CLK_DISABLE 0x0800
#define MSCC_PHY_WOL_LOWER_MAC_ADDR 21 #define MSCC_PHY_WOL_MID_MAC_ADDR 22 diff --git a/drivers/net/phy/mscc/mscc_main.c b/drivers/net/phy/mscc/mscc_main.c index 74f3aa752724f..cef43b1344a94 100644 --- a/drivers/net/phy/mscc/mscc_main.c +++ b/drivers/net/phy/mscc/mscc_main.c @@ -527,14 +527,27 @@ static int vsc85xx_mac_if_set(struct phy_device *phydev, * * 2.0 ns (which causes the data to be sampled at exactly half way between * clock transitions at 1000 Mbps) if delays should be enabled */ -static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl, - u16 rgmii_rx_delay_mask, - u16 rgmii_tx_delay_mask) +static int vsc85xx_update_rgmii_cntl(struct phy_device *phydev, u32 rgmii_cntl, + u16 rgmii_rx_delay_mask, + u16 rgmii_tx_delay_mask) { u16 rgmii_rx_delay_pos = ffs(rgmii_rx_delay_mask) - 1; u16 rgmii_tx_delay_pos = ffs(rgmii_tx_delay_mask) - 1; u16 reg_val = 0; - int rc; + u16 mask = 0; + int rc = 0; + + /* For traffic to pass, the VSC8502 family needs the RX_CLK disable bit + * to be unset for all PHY modes, so do that as part of the paged + * register modification. + * For some family members (like VSC8530/31/40/41) this bit is reserved + * and read-only, and the RX clock is enabled by default. + */ + if (rgmii_cntl == VSC8502_RGMII_CNTL) + mask |= VSC8502_RGMII_RX_CLK_DISABLE; + + if (phy_interface_is_rgmii(phydev)) + mask |= rgmii_rx_delay_mask | rgmii_tx_delay_mask;
mutex_lock(&phydev->lock);
@@ -545,10 +558,9 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl, phydev->interface == PHY_INTERFACE_MODE_RGMII_ID) reg_val |= RGMII_CLK_DELAY_2_0_NS << rgmii_tx_delay_pos;
- rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2, - rgmii_cntl, - rgmii_rx_delay_mask | rgmii_tx_delay_mask, - reg_val); + if (mask) + rc = phy_modify_paged(phydev, MSCC_PHY_PAGE_EXTENDED_2, + rgmii_cntl, mask, reg_val);
mutex_unlock(&phydev->lock);
@@ -557,19 +569,11 @@ static int vsc85xx_rgmii_set_skews(struct phy_device *phydev, u32 rgmii_cntl,
static int vsc85xx_default_config(struct phy_device *phydev) { - int rc; - phydev->mdix_ctrl = ETH_TP_MDI_AUTO;
- if (phy_interface_mode_is_rgmii(phydev->interface)) { - rc = vsc85xx_rgmii_set_skews(phydev, VSC8502_RGMII_CNTL, - VSC8502_RGMII_RX_DELAY_MASK, - VSC8502_RGMII_TX_DELAY_MASK); - if (rc) - return rc; - } - - return 0; + return vsc85xx_update_rgmii_cntl(phydev, VSC8502_RGMII_CNTL, + VSC8502_RGMII_RX_DELAY_MASK, + VSC8502_RGMII_TX_DELAY_MASK); }
static int vsc85xx_get_tunable(struct phy_device *phydev, @@ -1766,13 +1770,11 @@ static int vsc8584_config_init(struct phy_device *phydev) if (ret) return ret;
- if (phy_interface_is_rgmii(phydev)) { - ret = vsc85xx_rgmii_set_skews(phydev, VSC8572_RGMII_CNTL, - VSC8572_RGMII_RX_DELAY_MASK, - VSC8572_RGMII_TX_DELAY_MASK); - if (ret) - return ret; - } + ret = vsc85xx_update_rgmii_cntl(phydev, VSC8572_RGMII_CNTL, + VSC8572_RGMII_RX_DELAY_MASK, + VSC8572_RGMII_TX_DELAY_MASK); + if (ret) + return ret;
ret = genphy_soft_reset(phydev); if (ret)
From: Tudor Ambarus tudor.ambarus@microchip.com
[ Upstream commit 801db90bf294f647b967e8d99b9ae121bea63d0d ]
Move the free desc to the tail of the list, so that the sequence of descriptors is more track-able in case of debug. One would know which descriptor should come next and could easier catch concurrency over descriptors for example. virt-dma uses list_splice_tail_init() as well, follow the core driver.
Signed-off-by: Tudor Ambarus tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20211215110115.191749-7-tudor.ambarus@microchip.co... Signed-off-by: Vinod Koul vkoul@kernel.org Stable-dep-of: 44fe8440bda5 ("dmaengine: at_xdmac: do not resume channels paused by consumers") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/at_xdmac.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c index 80c609aa2a91c..b45437aab1434 100644 --- a/drivers/dma/at_xdmac.c +++ b/drivers/dma/at_xdmac.c @@ -732,7 +732,8 @@ at_xdmac_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl, if (!desc) { dev_err(chan2dev(chan), "can't get descriptor\n"); if (first) - list_splice_init(&first->descs_list, &atchan->free_descs_list); + list_splice_tail_init(&first->descs_list, + &atchan->free_descs_list); goto spin_unlock; }
@@ -820,7 +821,8 @@ at_xdmac_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr, if (!desc) { dev_err(chan2dev(chan), "can't get descriptor\n"); if (first) - list_splice_init(&first->descs_list, &atchan->free_descs_list); + list_splice_tail_init(&first->descs_list, + &atchan->free_descs_list); spin_unlock_irqrestore(&atchan->lock, irqflags); return NULL; } @@ -1054,8 +1056,8 @@ at_xdmac_prep_interleaved(struct dma_chan *chan, src_addr, dst_addr, xt, chunk); if (!desc) { - list_splice_init(&first->descs_list, - &atchan->free_descs_list); + list_splice_tail_init(&first->descs_list, + &atchan->free_descs_list); return NULL; }
@@ -1135,7 +1137,8 @@ at_xdmac_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dest, dma_addr_t src, if (!desc) { dev_err(chan2dev(chan), "can't get descriptor\n"); if (first) - list_splice_init(&first->descs_list, &atchan->free_descs_list); + list_splice_tail_init(&first->descs_list, + &atchan->free_descs_list); return NULL; }
@@ -1311,8 +1314,8 @@ at_xdmac_prep_dma_memset_sg(struct dma_chan *chan, struct scatterlist *sgl, sg_dma_len(sg), value); if (!desc && first) - list_splice_init(&first->descs_list, - &atchan->free_descs_list); + list_splice_tail_init(&first->descs_list, + &atchan->free_descs_list);
if (!first) first = desc; @@ -1709,7 +1712,8 @@ static void at_xdmac_tasklet(struct tasklet_struct *t)
spin_lock_irq(&atchan->lock); /* Move the xfer descriptors into the free descriptors list. */ - list_splice_init(&desc->descs_list, &atchan->free_descs_list); + list_splice_tail_init(&desc->descs_list, + &atchan->free_descs_list); at_xdmac_advance_work(atchan); spin_unlock_irq(&atchan->lock); } @@ -1858,7 +1862,8 @@ static int at_xdmac_device_terminate_all(struct dma_chan *chan) /* Cancel all pending transfers. */ list_for_each_entry_safe(desc, _desc, &atchan->xfers_list, xfer_node) { list_del(&desc->xfer_node); - list_splice_init(&desc->descs_list, &atchan->free_descs_list); + list_splice_tail_init(&desc->descs_list, + &atchan->free_descs_list); }
clear_bit(AT_XDMAC_CHAN_IS_PAUSED, &atchan->status);
From: Tudor Ambarus tudor.ambarus@microchip.com
[ Upstream commit a61210cae80cac0701d5aca9551466a389717fd2 ]
Apart of making the code easier to read, this patch is a prerequisite for a functional change: tasklets run with interrupts enabled, so we need to protect atchan->irq_status with spin_lock_irq() otherwise the tasklet can be interrupted by the IRQ that modifies irq_status. atchan->irq_status will be protected in a further patch.
Signed-off-by: Tudor Ambarus tudor.ambarus@microchip.com Link: https://lore.kernel.org/r/20211215110115.191749-12-tudor.ambarus@microchip.c... Signed-off-by: Vinod Koul vkoul@kernel.org Stable-dep-of: 44fe8440bda5 ("dmaengine: at_xdmac: do not resume channels paused by consumers") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/at_xdmac.c | 66 ++++++++++++++++++++---------------------- 1 file changed, 32 insertions(+), 34 deletions(-)
diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c index b45437aab1434..f9aa5396c0f8e 100644 --- a/drivers/dma/at_xdmac.c +++ b/drivers/dma/at_xdmac.c @@ -1670,53 +1670,51 @@ static void at_xdmac_tasklet(struct tasklet_struct *t) { struct at_xdmac_chan *atchan = from_tasklet(atchan, t, tasklet); struct at_xdmac_desc *desc; + struct dma_async_tx_descriptor *txd; u32 error_mask;
dev_dbg(chan2dev(&atchan->chan), "%s: status=0x%08x\n", __func__, atchan->irq_status);
- error_mask = AT_XDMAC_CIS_RBEIS - | AT_XDMAC_CIS_WBEIS - | AT_XDMAC_CIS_ROIS; + if (at_xdmac_chan_is_cyclic(atchan)) + return at_xdmac_handle_cyclic(atchan);
- if (at_xdmac_chan_is_cyclic(atchan)) { - at_xdmac_handle_cyclic(atchan); - } else if ((atchan->irq_status & AT_XDMAC_CIS_LIS) - || (atchan->irq_status & error_mask)) { - struct dma_async_tx_descriptor *txd; + error_mask = AT_XDMAC_CIS_RBEIS | AT_XDMAC_CIS_WBEIS | + AT_XDMAC_CIS_ROIS;
- if (atchan->irq_status & error_mask) - at_xdmac_handle_error(atchan); + if (!(atchan->irq_status & AT_XDMAC_CIS_LIS) && + !(atchan->irq_status & error_mask)) + return;
- spin_lock_irq(&atchan->lock); - desc = list_first_entry(&atchan->xfers_list, - struct at_xdmac_desc, - xfer_node); - dev_vdbg(chan2dev(&atchan->chan), "%s: desc 0x%p\n", __func__, desc); - if (!desc->active_xfer) { - dev_err(chan2dev(&atchan->chan), "Xfer not active: exiting"); - spin_unlock_irq(&atchan->lock); - return; - } + if (atchan->irq_status & error_mask) + at_xdmac_handle_error(atchan);
- txd = &desc->tx_dma_desc; - dma_cookie_complete(txd); - /* Remove the transfer from the transfer list. */ - list_del(&desc->xfer_node); + spin_lock_irq(&atchan->lock); + desc = list_first_entry(&atchan->xfers_list, struct at_xdmac_desc, + xfer_node); + dev_vdbg(chan2dev(&atchan->chan), "%s: desc 0x%p\n", __func__, desc); + if (!desc->active_xfer) { + dev_err(chan2dev(&atchan->chan), "Xfer not active: exiting"); spin_unlock_irq(&atchan->lock); + return; + }
- if (txd->flags & DMA_PREP_INTERRUPT) - dmaengine_desc_get_callback_invoke(txd, NULL); + txd = &desc->tx_dma_desc; + dma_cookie_complete(txd); + /* Remove the transfer from the transfer list. */ + list_del(&desc->xfer_node); + spin_unlock_irq(&atchan->lock);
- dma_run_dependencies(txd); + if (txd->flags & DMA_PREP_INTERRUPT) + dmaengine_desc_get_callback_invoke(txd, NULL);
- spin_lock_irq(&atchan->lock); - /* Move the xfer descriptors into the free descriptors list. */ - list_splice_tail_init(&desc->descs_list, - &atchan->free_descs_list); - at_xdmac_advance_work(atchan); - spin_unlock_irq(&atchan->lock); - } + dma_run_dependencies(txd); + + spin_lock_irq(&atchan->lock); + /* Move the xfer descriptors into the free descriptors list. */ + list_splice_tail_init(&desc->descs_list, &atchan->free_descs_list); + at_xdmac_advance_work(atchan); + spin_unlock_irq(&atchan->lock); }
static irqreturn_t at_xdmac_interrupt(int irq, void *dev_id)
From: Claudiu Beznea claudiu.beznea@microchip.com
[ Upstream commit 2de5ddb5e68c94b781b3789bca1ce52000d7d0e0 ]
Runtime PM APIs for at_xdmac just plays with clk_enable()/clk_disable() letting aside the clk_prepare()/clk_unprepare() that needs to be executed as the clock is also prepared on probe. Thus instead of using runtime PM force suspend/resume APIs use clk_disable_unprepare() + pm_runtime_put_noidle() on suspend and clk_prepare_enable() + pm_runtime_get_noresume() on resume. This approach as been chosen instead of using runtime PM force suspend/resume with clk_unprepare()/clk_prepare() as it looks simpler and the final code is better.
While at it added the missing pm_runtime_mark_last_busy() on suspend before decrementing the reference counter.
Fixes: 650b0e990cbd ("dmaengine: at_xdmac: add runtime pm support") Signed-off-by: Claudiu Beznea claudiu.beznea@microchip.com Link: https://lore.kernel.org/r/20230214151827.1050280-2-claudiu.beznea@microchip.... Signed-off-by: Vinod Koul vkoul@kernel.org Stable-dep-of: 44fe8440bda5 ("dmaengine: at_xdmac: do not resume channels paused by consumers") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/at_xdmac.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c index f9aa5396c0f8e..af52429af9172 100644 --- a/drivers/dma/at_xdmac.c +++ b/drivers/dma/at_xdmac.c @@ -1992,6 +1992,7 @@ static int atmel_xdmac_suspend(struct device *dev)
at_xdmac_off(atxdmac); clk_disable_unprepare(atxdmac->clk); + return 0; }
@@ -2008,6 +2009,8 @@ static int atmel_xdmac_resume(struct device *dev) if (ret) return ret;
+ pm_runtime_get_noresume(atxdmac->dev); + at_xdmac_axi_config(pdev);
/* Clear pending interrupts. */
From: Claudiu Beznea claudiu.beznea@microchip.com
[ Upstream commit 44fe8440bda545b5d167329df88c47609a645168 ]
In case there are DMA channels not paused by consumers in suspend process (valid on AT91 SoCs for serial driver when no_console_suspend) the driver pauses them (using at_xdmac_device_pause() which is also the same function called by dmaengine_pause()) and then in the resume process the driver resumes them calling at_xdmac_device_resume() which is the same function called by dmaengine_resume()). This is good for DMA channels not paused by consumers but for drivers that calls dmaengine_pause()/dmaegine_resume() on suspend/resume path this may lead to DMA channel being enabled before the IP is enabled. For IPs that needs strict ordering with regards to DMA channel enablement this will lead to wrong behavior. To fix this add a new set of functions at_xdmac_device_pause_internal()/at_xdmac_device_resume_internal() to be called only on suspend/resume.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Claudiu Beznea claudiu.beznea@microchip.com Link: https://lore.kernel.org/r/20230214151827.1050280-4-claudiu.beznea@microchip.... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/at_xdmac.c | 48 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-)
diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c index af52429af9172..4965961f55aa2 100644 --- a/drivers/dma/at_xdmac.c +++ b/drivers/dma/at_xdmac.c @@ -186,6 +186,7 @@ enum atc_status { AT_XDMAC_CHAN_IS_CYCLIC = 0, AT_XDMAC_CHAN_IS_PAUSED, + AT_XDMAC_CHAN_IS_PAUSED_INTERNAL, };
struct at_xdmac_layout { @@ -346,6 +347,11 @@ static inline int at_xdmac_chan_is_paused(struct at_xdmac_chan *atchan) return test_bit(AT_XDMAC_CHAN_IS_PAUSED, &atchan->status); }
+static inline int at_xdmac_chan_is_paused_internal(struct at_xdmac_chan *atchan) +{ + return test_bit(AT_XDMAC_CHAN_IS_PAUSED_INTERNAL, &atchan->status); +} + static inline bool at_xdmac_chan_is_peripheral_xfer(u32 cfg) { return cfg & AT_XDMAC_CC_TYPE_PER_TRAN; @@ -1801,6 +1807,26 @@ static int at_xdmac_device_config(struct dma_chan *chan, return ret; }
+static void at_xdmac_device_pause_set(struct at_xdmac *atxdmac, + struct at_xdmac_chan *atchan) +{ + at_xdmac_write(atxdmac, atxdmac->layout->grws, atchan->mask); + while (at_xdmac_chan_read(atchan, AT_XDMAC_CC) & + (AT_XDMAC_CC_WRIP | AT_XDMAC_CC_RDIP)) + cpu_relax(); +} + +static void at_xdmac_device_pause_internal(struct at_xdmac_chan *atchan) +{ + struct at_xdmac *atxdmac = to_at_xdmac(atchan->chan.device); + unsigned long flags; + + spin_lock_irqsave(&atchan->lock, flags); + set_bit(AT_XDMAC_CHAN_IS_PAUSED_INTERNAL, &atchan->status); + at_xdmac_device_pause_set(atxdmac, atchan); + spin_unlock_irqrestore(&atchan->lock, flags); +} + static int at_xdmac_device_pause(struct dma_chan *chan) { struct at_xdmac_chan *atchan = to_at_xdmac_chan(chan); @@ -1813,15 +1839,25 @@ static int at_xdmac_device_pause(struct dma_chan *chan) return 0;
spin_lock_irqsave(&atchan->lock, flags); - at_xdmac_write(atxdmac, atxdmac->layout->grws, atchan->mask); - while (at_xdmac_chan_read(atchan, AT_XDMAC_CC) - & (AT_XDMAC_CC_WRIP | AT_XDMAC_CC_RDIP)) - cpu_relax(); + + at_xdmac_device_pause_set(atxdmac, atchan); + /* Decrement runtime PM ref counter for each active descriptor. */ spin_unlock_irqrestore(&atchan->lock, flags);
return 0; }
+static void at_xdmac_device_resume_internal(struct at_xdmac_chan *atchan) +{ + struct at_xdmac *atxdmac = to_at_xdmac(atchan->chan.device); + unsigned long flags; + + spin_lock_irqsave(&atchan->lock, flags); + at_xdmac_write(atxdmac, atxdmac->layout->grwr, atchan->mask); + clear_bit(AT_XDMAC_CHAN_IS_PAUSED_INTERNAL, &atchan->status); + spin_unlock_irqrestore(&atchan->lock, flags); +} + static int at_xdmac_device_resume(struct dma_chan *chan) { struct at_xdmac_chan *atchan = to_at_xdmac_chan(chan); @@ -1981,7 +2017,7 @@ static int atmel_xdmac_suspend(struct device *dev) atchan->save_cc = at_xdmac_chan_read(atchan, AT_XDMAC_CC); if (at_xdmac_chan_is_cyclic(atchan)) { if (!at_xdmac_chan_is_paused(atchan)) - at_xdmac_device_pause(chan); + at_xdmac_device_pause_internal(atchan); atchan->save_cim = at_xdmac_chan_read(atchan, AT_XDMAC_CIM); atchan->save_cnda = at_xdmac_chan_read(atchan, AT_XDMAC_CNDA); atchan->save_cndc = at_xdmac_chan_read(atchan, AT_XDMAC_CNDC); @@ -2026,7 +2062,7 @@ static int atmel_xdmac_resume(struct device *dev) at_xdmac_chan_write(atchan, AT_XDMAC_CC, atchan->save_cc); if (at_xdmac_chan_is_cyclic(atchan)) { if (at_xdmac_chan_is_paused(atchan)) - at_xdmac_device_resume(chan); + at_xdmac_device_resume_internal(atchan); at_xdmac_chan_write(atchan, AT_XDMAC_CNDA, atchan->save_cnda); at_xdmac_chan_write(atchan, AT_XDMAC_CNDC, atchan->save_cndc); at_xdmac_chan_write(atchan, AT_XDMAC_CIE, atchan->save_cim);
From: Claudiu Beznea claudiu.beznea@microchip.com
[ Upstream commit 7c5eb63d16b01c202aaa95f374ae15a807745a73 ]
In case the system suspends to a deep sleep state where power to DMA controller is cut-off we need to restore the content of GRWS register. This is a write only register and writing bit X tells the controller to suspend read and write requests for channel X. Thus set GRWS before restoring the content of GE (Global Enable) regiter.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Claudiu Beznea claudiu.beznea@microchip.com Link: https://lore.kernel.org/r/20230214151827.1050280-5-claudiu.beznea@microchip.... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/at_xdmac.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c index 4965961f55aa2..66bf570a8bd98 100644 --- a/drivers/dma/at_xdmac.c +++ b/drivers/dma/at_xdmac.c @@ -2063,6 +2063,15 @@ static int atmel_xdmac_resume(struct device *dev) if (at_xdmac_chan_is_cyclic(atchan)) { if (at_xdmac_chan_is_paused(atchan)) at_xdmac_device_resume_internal(atchan); + + /* + * We may resume from a deep sleep state where power + * to DMA controller is cut-off. Thus, restore the + * suspend state of channels set though dmaengine API. + */ + else if (at_xdmac_chan_is_paused(atchan)) + at_xdmac_device_pause_set(atxdmac, atchan); + at_xdmac_chan_write(atchan, AT_XDMAC_CNDA, atchan->save_cnda); at_xdmac_chan_write(atchan, AT_XDMAC_CNDC, atchan->save_cndc); at_xdmac_chan_write(atchan, AT_XDMAC_CIE, atchan->save_cim);
From: Claudio Imbrenda imbrenda@linux.ibm.com
[ Upstream commit 72b1daff2671cef2c8cccc6c4e52f8d5ce4ebe58 ]
Due to upcoming changes, it will be possible to temporarily have multiple protected VMs in the same address space, although only one will be actually active.
In that scenario, it is necessary to perform an export of every page that is to be imported, since the hardware does not allow a page belonging to a protected guest to be imported into a different protected guest.
This also applies to pages that are shared, and thus accessible by the host.
Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Link: https://lore.kernel.org/r/20220628135619.32410-7-imbrenda@linux.ibm.com Message-Id: 20220628135619.32410-7-imbrenda@linux.ibm.com Signed-off-by: Janosch Frank frankja@linux.ibm.com Stable-dep-of: c148dc8e2fa4 ("KVM: s390: fix race in gmap_make_secure()") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/uv.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index f95ccbd396925..7d7961c7b1281 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -189,6 +189,32 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr, return rc; }
+/** + * should_export_before_import - Determine whether an export is needed + * before an import-like operation + * @uvcb: the Ultravisor control block of the UVC to be performed + * @mm: the mm of the process + * + * Returns whether an export is needed before every import-like operation. + * This is needed for shared pages, which don't trigger a secure storage + * exception when accessed from a different guest. + * + * Although considered as one, the Unpin Page UVC is not an actual import, + * so it is not affected. + * + * No export is needed also when there is only one protected VM, because the + * page cannot belong to the wrong VM in that case (there is no "other VM" + * it can belong to). + * + * Return: true if an export is needed before every import, otherwise false. + */ +static bool should_export_before_import(struct uv_cb_header *uvcb, struct mm_struct *mm) +{ + if (uvcb->cmd == UVC_CMD_UNPIN_PAGE_SHARED) + return false; + return atomic_read(&mm->context.protected_count) > 1; +} + /* * Requests the Ultravisor to make a page accessible to a guest. * If it's brought in the first time, it will be cleared. If @@ -232,6 +258,8 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
lock_page(page); ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); + if (should_export_before_import(uvcb, gmap->mm)) + uv_convert_from_secure(page_to_phys(page)); rc = make_secure_pte(ptep, uaddr, page, uvcb); pte_unmap_unlock(ptep, ptelock); unlock_page(page);
From: Claudio Imbrenda imbrenda@linux.ibm.com
[ Upstream commit c148dc8e2fa403be501612ee409db866eeed35c0 ]
Fix a potential race in gmap_make_secure() and remove the last user of follow_page() without FOLL_GET.
The old code is locking something it doesn't have a reference to, and as explained by Jason and David in this discussion: https://lore.kernel.org/linux-mm/Y9J4P%2FRNvY1Ztn0Q@nvidia.com/ it can lead to all kind of bad things, including the page getting unmapped (MADV_DONTNEED), freed, reallocated as a larger folio and the unlock_page() would target the wrong bit. There is also another race with the FOLL_WRITE, which could race between the follow_page() and the get_locked_pte().
The main point is to remove the last use of follow_page() without FOLL_GET or FOLL_PIN, removing the races can be considered a nice bonus.
Link: https://lore.kernel.org/linux-mm/Y9J4P%2FRNvY1Ztn0Q@nvidia.com/ Suggested-by: Jason Gunthorpe jgg@nvidia.com Fixes: 214d9bbcd3a6 ("s390/mm: provide memory management functions for protected KVM guests") Reviewed-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com Message-Id: 20230428092753.27913-2-imbrenda@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/uv.c | 32 +++++++++++--------------------- 1 file changed, 11 insertions(+), 21 deletions(-)
diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c index 7d7961c7b1281..66d1248c8c923 100644 --- a/arch/s390/kernel/uv.c +++ b/arch/s390/kernel/uv.c @@ -160,21 +160,10 @@ static int expected_page_refs(struct page *page) return res; }
-static int make_secure_pte(pte_t *ptep, unsigned long addr, - struct page *exp_page, struct uv_cb_header *uvcb) +static int make_page_secure(struct page *page, struct uv_cb_header *uvcb) { - pte_t entry = READ_ONCE(*ptep); - struct page *page; int expected, rc = 0;
- if (!pte_present(entry)) - return -ENXIO; - if (pte_val(entry) & _PAGE_INVALID) - return -ENXIO; - - page = pte_page(entry); - if (page != exp_page) - return -ENXIO; if (PageWriteback(page)) return -EAGAIN; expected = expected_page_refs(page); @@ -252,17 +241,18 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb) goto out;
rc = -ENXIO; - page = follow_page(vma, uaddr, FOLL_WRITE); - if (IS_ERR_OR_NULL(page)) - goto out; - - lock_page(page); ptep = get_locked_pte(gmap->mm, uaddr, &ptelock); - if (should_export_before_import(uvcb, gmap->mm)) - uv_convert_from_secure(page_to_phys(page)); - rc = make_secure_pte(ptep, uaddr, page, uvcb); + if (pte_present(*ptep) && !(pte_val(*ptep) & _PAGE_INVALID) && pte_write(*ptep)) { + page = pte_page(*ptep); + rc = -EAGAIN; + if (trylock_page(page)) { + if (should_export_before_import(uvcb, gmap->mm)) + uv_convert_from_secure(page_to_phys(page)); + rc = make_page_secure(page, uvcb); + unlock_page(page); + } + } pte_unmap_unlock(ptep, ptelock); - unlock_page(page); out: mmap_read_unlock(gmap->mm);
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 82b318983c515f29b8b3a0dad9f6a5fe8a68a7f4 ]
Since the DSA conversion from the ds->ports array into the dst->ports list, the DSA API has encouraged driver writers, as well as the core itself, to write inefficient code.
Currently, code that wants to filter by a specific type of port when iterating, like {!unused, user, cpu, dsa}, uses the dsa_is_*_port helper. Under the hood, this uses dsa_to_port which iterates again through dst->ports. But the driver iterates through the port list already, so the complexity is quadratic for the typical case of a single-switch tree.
This patch introduces some iteration helpers where the iterator is already a struct dsa_port *dp, so that the other variant of the filtering functions, dsa_port_is_{unused,user,cpu_dsa}, can be used directly on the iterator. This eliminates the second lookup.
These functions can be used both by the core and by drivers.
Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 120a56b01bee ("net: dsa: mt7530: fix network connectivity with multiple CPU ports") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/dsa.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/include/net/dsa.h b/include/net/dsa.h index d784e76113b8d..bec439c4a0859 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -472,6 +472,34 @@ static inline bool dsa_is_user_port(struct dsa_switch *ds, int p) return dsa_to_port(ds, p)->type == DSA_PORT_TYPE_USER; }
+#define dsa_tree_for_each_user_port(_dp, _dst) \ + list_for_each_entry((_dp), &(_dst)->ports, list) \ + if (dsa_port_is_user((_dp))) + +#define dsa_switch_for_each_port(_dp, _ds) \ + list_for_each_entry((_dp), &(_ds)->dst->ports, list) \ + if ((_dp)->ds == (_ds)) + +#define dsa_switch_for_each_port_safe(_dp, _next, _ds) \ + list_for_each_entry_safe((_dp), (_next), &(_ds)->dst->ports, list) \ + if ((_dp)->ds == (_ds)) + +#define dsa_switch_for_each_port_continue_reverse(_dp, _ds) \ + list_for_each_entry_continue_reverse((_dp), &(_ds)->dst->ports, list) \ + if ((_dp)->ds == (_ds)) + +#define dsa_switch_for_each_available_port(_dp, _ds) \ + dsa_switch_for_each_port((_dp), (_ds)) \ + if (!dsa_port_is_unused((_dp))) + +#define dsa_switch_for_each_user_port(_dp, _ds) \ + dsa_switch_for_each_port((_dp), (_ds)) \ + if (dsa_port_is_user((_dp))) + +#define dsa_switch_for_each_cpu_port(_dp, _ds) \ + dsa_switch_for_each_port((_dp), (_ds)) \ + if (dsa_port_is_cpu((_dp))) + static inline u32 dsa_user_ports(struct dsa_switch *ds) { u32 mask = 0;
From: Frank Wunderlich frank-w@public-files.de
[ Upstream commit 6e19bc26cccdd34739b8c42aba2758777d18b211 ]
Enumerate available cpu-ports instead of using hardcoded constant.
Suggested-by: Vladimir Oltean olteanv@gmail.com Signed-off-by: Frank Wunderlich frank-w@public-files.de Reviewed-by: Vladimir Oltean olteanv@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 120a56b01bee ("net: dsa: mt7530: fix network connectivity with multiple CPU ports") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index e7a551570cf3c..0e64873fbc37b 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -2094,11 +2094,12 @@ static int mt7530_setup(struct dsa_switch *ds) { struct mt7530_priv *priv = ds->priv; + struct device_node *dn = NULL; struct device_node *phy_node; struct device_node *mac_np; struct mt7530_dummy_poll p; phy_interface_t interface; - struct device_node *dn; + struct dsa_port *cpu_dp; u32 id, val; int ret, i;
@@ -2106,7 +2107,19 @@ mt7530_setup(struct dsa_switch *ds) * controller also is the container for two GMACs nodes representing * as two netdev instances. */ - dn = dsa_to_port(ds, MT7530_CPU_PORT)->master->dev.of_node->parent; + dsa_switch_for_each_cpu_port(cpu_dp, ds) { + dn = cpu_dp->master->dev.of_node->parent; + /* It doesn't matter which CPU port is found first, + * their masters should share the same parent OF node + */ + break; + } + + if (!dn) { + dev_err(ds->dev, "parent OF node of DSA master not found"); + return -EINVAL; + } + ds->assisted_learning_on_cpu_port = true; ds->mtu_enforcement_ingress = true;
@@ -2279,6 +2292,7 @@ mt7531_setup(struct dsa_switch *ds) { struct mt7530_priv *priv = ds->priv; struct mt7530_dummy_poll p; + struct dsa_port *cpu_dp; u32 val, id; int ret, i;
@@ -2353,8 +2367,11 @@ mt7531_setup(struct dsa_switch *ds) CORE_PLL_GROUP4, val);
/* BPDU to CPU port */ - mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK, - BIT(MT7530_CPU_PORT)); + dsa_switch_for_each_cpu_port(cpu_dp, ds) { + mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK, + BIT(cpu_dp->index)); + break; + } mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, MT753X_BPDU_CPU_ONLY);
From: Daniel Golle daniel@makrotopia.org
[ Upstream commit 7f54cc9772ced2d76ac11832f0ada43798443ac9 ]
MT7988 shares a significant part of the setup function with MT7531. Split-off those parts into a shared function which is going to be used also by mt7988_setup.
Signed-off-by: Daniel Golle daniel@makrotopia.org Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 120a56b01bee ("net: dsa: mt7530: fix network connectivity with multiple CPU ports") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 101 ++++++++++++++++++++++----------------- 1 file changed, 56 insertions(+), 45 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index 0e64873fbc37b..ae156b14b57d5 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -2287,14 +2287,67 @@ mt7530_setup(struct dsa_switch *ds) return 0; }
+static int +mt7531_setup_common(struct dsa_switch *ds) +{ + struct mt7530_priv *priv = ds->priv; + struct dsa_port *cpu_dp; + int ret, i; + + /* BPDU to CPU port */ + dsa_switch_for_each_cpu_port(cpu_dp, ds) { + mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK, + BIT(cpu_dp->index)); + break; + } + mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, + MT753X_BPDU_CPU_ONLY); + + /* Enable and reset MIB counters */ + mt7530_mib_reset(ds); + + for (i = 0; i < MT7530_NUM_PORTS; i++) { + /* Disable forwarding by default on all ports */ + mt7530_rmw(priv, MT7530_PCR_P(i), PCR_MATRIX_MASK, + PCR_MATRIX_CLR); + + /* Disable learning by default on all ports */ + mt7530_set(priv, MT7530_PSC_P(i), SA_DIS); + + mt7530_set(priv, MT7531_DBG_CNT(i), MT7531_DIS_CLR); + + if (dsa_is_cpu_port(ds, i)) { + ret = mt753x_cpu_port_enable(ds, i); + if (ret) + return ret; + } else { + mt7530_port_disable(ds, i); + + /* Set default PVID to 0 on all user ports */ + mt7530_rmw(priv, MT7530_PPBV1_P(i), G0_PORT_VID_MASK, + G0_PORT_VID_DEF); + } + + /* Enable consistent egress tag */ + mt7530_rmw(priv, MT7530_PVC_P(i), PVC_EG_TAG_MASK, + PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT)); + } + + /* Flush the FDB table */ + ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, NULL); + if (ret < 0) + return ret; + + return 0; +} + static int mt7531_setup(struct dsa_switch *ds) { struct mt7530_priv *priv = ds->priv; struct mt7530_dummy_poll p; - struct dsa_port *cpu_dp; u32 val, id; - int ret, i; + int ret;
/* Reset whole chip through gpio pin or memory-mapped registers for * different type of hardware @@ -2366,44 +2419,7 @@ mt7531_setup(struct dsa_switch *ds) mt7531_ind_c45_phy_write(priv, MT753X_CTRL_PHY_ADDR, MDIO_MMD_VEND2, CORE_PLL_GROUP4, val);
- /* BPDU to CPU port */ - dsa_switch_for_each_cpu_port(cpu_dp, ds) { - mt7530_rmw(priv, MT7531_CFC, MT7531_CPU_PMAP_MASK, - BIT(cpu_dp->index)); - break; - } - mt7530_rmw(priv, MT753X_BPC, MT753X_BPDU_PORT_FW_MASK, - MT753X_BPDU_CPU_ONLY); - - /* Enable and reset MIB counters */ - mt7530_mib_reset(ds); - - for (i = 0; i < MT7530_NUM_PORTS; i++) { - /* Disable forwarding by default on all ports */ - mt7530_rmw(priv, MT7530_PCR_P(i), PCR_MATRIX_MASK, - PCR_MATRIX_CLR); - - /* Disable learning by default on all ports */ - mt7530_set(priv, MT7530_PSC_P(i), SA_DIS); - - mt7530_set(priv, MT7531_DBG_CNT(i), MT7531_DIS_CLR); - - if (dsa_is_cpu_port(ds, i)) { - ret = mt753x_cpu_port_enable(ds, i); - if (ret) - return ret; - } else { - mt7530_port_disable(ds, i); - - /* Set default PVID to 0 on all user ports */ - mt7530_rmw(priv, MT7530_PPBV1_P(i), G0_PORT_VID_MASK, - G0_PORT_VID_DEF); - } - - /* Enable consistent egress tag */ - mt7530_rmw(priv, MT7530_PVC_P(i), PVC_EG_TAG_MASK, - PVC_EG_TAG(MT7530_VLAN_EG_CONSISTENT)); - } + mt7531_setup_common(ds);
/* Setup VLAN ID 0 for VLAN-unaware bridges */ ret = mt7530_setup_vlan0(priv); @@ -2413,11 +2429,6 @@ mt7531_setup(struct dsa_switch *ds) ds->assisted_learning_on_cpu_port = true; ds->mtu_enforcement_ingress = true;
- /* Flush the FDB table */ - ret = mt7530_fdb_cmd(priv, MT7530_FDB_FLUSH, NULL); - if (ret < 0) - return ret; - return 0; }
From: Arınç ÜNAL arinc.unal@arinc9.com
[ Upstream commit 120a56b01beed51ab5956a734adcfd2760307107 ]
On mt753x_cpu_port_enable() there's code that enables flooding for the CPU port only. Since mt753x_cpu_port_enable() runs twice when both CPU ports are enabled, port 6 becomes the only port to forward the frames to. But port 5 is the active port, so no frames received from the user ports will be forwarded to port 5 which breaks network connectivity.
Every bit of the BC_FFP, UNM_FFP, and UNU_FFP bits represents a port. Fix this issue by setting the bit that corresponds to the CPU port without overwriting the other bits.
Clear the bits beforehand only for the MT7531 switch. According to the documents MT7621 Giga Switch Programming Guide v0.3 and MT7531 Reference Manual for Development Board v1.0, after reset, the BC_FFP, UNM_FFP, and UNU_FFP bits are set to 1 for MT7531, 0 for MT7530.
The commit 5e5502e012b8 ("net: dsa: mt7530: fix roaming from DSA user ports") silently changed the method to set the bits on the MT7530_MFC. Instead of clearing the relevant bits before mt7530_cpu_port_enable() which runs under a for loop, the commit started doing it on mt7530_cpu_port_enable().
Back then, this didn't really matter as only a single CPU port could be used since the CPU port number was hardcoded. The driver was later changed with commit 1f9a6abecf53 ("net: dsa: mt7530: get cpu-port via dp->cpu_dp instead of constant") to retrieve the CPU port via dp->cpu_dp. With that, this silent change became an issue for when using multiple CPU ports.
Fixes: 5e5502e012b8 ("net: dsa: mt7530: fix roaming from DSA user ports") Signed-off-by: Arınç ÜNAL arinc.unal@arinc9.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mt7530.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c index ae156b14b57d5..4ec598efc3332 100644 --- a/drivers/net/dsa/mt7530.c +++ b/drivers/net/dsa/mt7530.c @@ -1010,9 +1010,9 @@ mt753x_cpu_port_enable(struct dsa_switch *ds, int port) mt7530_write(priv, MT7530_PVC_P(port), PORT_SPEC_TAG);
- /* Disable flooding by default */ - mt7530_rmw(priv, MT7530_MFC, BC_FFP_MASK | UNM_FFP_MASK | UNU_FFP_MASK, - BC_FFP(BIT(port)) | UNM_FFP(BIT(port)) | UNU_FFP(BIT(port))); + /* Enable flooding on the CPU port */ + mt7530_set(priv, MT7530_MFC, BC_FFP(BIT(port)) | UNM_FFP(BIT(port)) | + UNU_FFP(BIT(port)));
/* Set CPU port number */ if (priv->id == ID_MT7621) @@ -2306,6 +2306,10 @@ mt7531_setup_common(struct dsa_switch *ds) /* Enable and reset MIB counters */ mt7530_mib_reset(ds);
+ /* Disable flooding on all ports */ + mt7530_clear(priv, MT7530_MFC, BC_FFP_MASK | UNM_FFP_MASK | + UNU_FFP_MASK); + for (i = 0; i < MT7530_NUM_PORTS; i++) { /* Disable forwarding by default on all ports */ mt7530_rmw(priv, MT7530_PCR_P(i), PCR_MATRIX_MASK,
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 5944b5abd8646e8c6ac6af2b55f87dede1dae898 ]
Currently, we use hard code number to verify if we are in the arp_interval timeslice. But some user may want to reduce/extend the verify timeslice. With the similar team option 'missed_max' the uers could change that number based on their own environment.
Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 9949e2efb54e ("bonding: fix send_peer_notif overflow") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/bonding.rst | 11 +++++++++++ drivers/net/bonding/bond_main.c | 17 +++++++++-------- drivers/net/bonding/bond_netlink.c | 15 +++++++++++++++ drivers/net/bonding/bond_options.c | 28 ++++++++++++++++++++++++++++ drivers/net/bonding/bond_procfs.c | 2 ++ drivers/net/bonding/bond_sysfs.c | 13 +++++++++++++ include/net/bond_options.h | 1 + include/net/bonding.h | 1 + include/uapi/linux/if_link.h | 1 + tools/include/uapi/linux/if_link.h | 1 + 10 files changed, 82 insertions(+), 8 deletions(-)
diff --git a/Documentation/networking/bonding.rst b/Documentation/networking/bonding.rst index c0a789b008063..ab98373535ea6 100644 --- a/Documentation/networking/bonding.rst +++ b/Documentation/networking/bonding.rst @@ -422,6 +422,17 @@ arp_all_targets consider the slave up only when all of the arp_ip_targets are reachable
+arp_missed_max + + Specifies the number of arp_interval monitor checks that must + fail in order for an interface to be marked down by the ARP monitor. + + In order to provide orderly failover semantics, backup interfaces + are permitted an extra monitor check (i.e., they must fail + arp_missed_max + 1 times before being marked down). + + The default value is 2, and the allowable range is 1 - 255. + downdelay
Specifies the time, in milliseconds, to wait before disabling diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index a2ce9f0fb43c5..b4d613bdbc060 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -3145,8 +3145,8 @@ static void bond_loadbalance_arp_mon(struct bonding *bond) * when the source ip is 0, so don't take the link down * if we don't know our ip yet */ - if (!bond_time_in_interval(bond, trans_start, 2) || - !bond_time_in_interval(bond, slave->last_rx, 2)) { + if (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) || + !bond_time_in_interval(bond, slave->last_rx, bond->params.missed_max)) {
bond_propose_link_state(slave, BOND_LINK_DOWN); slave_state_changed = 1; @@ -3240,7 +3240,7 @@ static int bond_ab_arp_inspect(struct bonding *bond)
/* Backup slave is down if: * - No current_arp_slave AND - * - more than 3*delta since last receive AND + * - more than (missed_max+1)*delta since last receive AND * - the bond has an IP address * * Note: a non-null current_arp_slave indicates @@ -3252,20 +3252,20 @@ static int bond_ab_arp_inspect(struct bonding *bond) */ if (!bond_is_active_slave(slave) && !rcu_access_pointer(bond->current_arp_slave) && - !bond_time_in_interval(bond, last_rx, 3)) { + !bond_time_in_interval(bond, last_rx, bond->params.missed_max + 1)) { bond_propose_link_state(slave, BOND_LINK_DOWN); commit++; }
/* Active slave is down if: - * - more than 2*delta since transmitting OR - * - (more than 2*delta since receive AND + * - more than missed_max*delta since transmitting OR + * - (more than missed_max*delta since receive AND * the bond has an IP address) */ trans_start = dev_trans_start(slave->dev); if (bond_is_active_slave(slave) && - (!bond_time_in_interval(bond, trans_start, 2) || - !bond_time_in_interval(bond, last_rx, 2))) { + (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) || + !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) { bond_propose_link_state(slave, BOND_LINK_DOWN); commit++; } @@ -5886,6 +5886,7 @@ static int bond_check_params(struct bond_params *params) params->arp_interval = arp_interval; params->arp_validate = arp_validate_value; params->arp_all_targets = arp_all_targets_value; + params->missed_max = 2; params->updelay = updelay; params->downdelay = downdelay; params->peer_notif_delay = 0; diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 5d54e11d18fa5..1007bf6d385d4 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -110,6 +110,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = { .len = ETH_ALEN }, [IFLA_BOND_TLB_DYNAMIC_LB] = { .type = NLA_U8 }, [IFLA_BOND_PEER_NOTIF_DELAY] = { .type = NLA_U32 }, + [IFLA_BOND_MISSED_MAX] = { .type = NLA_U8 }, };
static const struct nla_policy bond_slave_policy[IFLA_BOND_SLAVE_MAX + 1] = { @@ -453,6 +454,15 @@ static int bond_changelink(struct net_device *bond_dev, struct nlattr *tb[], return err; }
+ if (data[IFLA_BOND_MISSED_MAX]) { + int missed_max = nla_get_u8(data[IFLA_BOND_MISSED_MAX]); + + bond_opt_initval(&newval, missed_max); + err = __bond_opt_set(bond, BOND_OPT_MISSED_MAX, &newval); + if (err) + return err; + } + return 0; }
@@ -515,6 +525,7 @@ static size_t bond_get_size(const struct net_device *bond_dev) nla_total_size(ETH_ALEN) + /* IFLA_BOND_AD_ACTOR_SYSTEM */ nla_total_size(sizeof(u8)) + /* IFLA_BOND_TLB_DYNAMIC_LB */ nla_total_size(sizeof(u32)) + /* IFLA_BOND_PEER_NOTIF_DELAY */ + nla_total_size(sizeof(u8)) + /* IFLA_BOND_MISSED_MAX */ 0; }
@@ -650,6 +661,10 @@ static int bond_fill_info(struct sk_buff *skb, bond->params.tlb_dynamic_lb)) goto nla_put_failure;
+ if (nla_put_u8(skb, IFLA_BOND_MISSED_MAX, + bond->params.missed_max)) + goto nla_put_failure; + if (BOND_MODE(bond) == BOND_MODE_8023AD) { struct ad_info info;
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index b93337b5a7211..2e8484a91a0e7 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -78,6 +78,8 @@ static int bond_option_ad_actor_system_set(struct bonding *bond, const struct bond_opt_value *newval); static int bond_option_ad_user_port_key_set(struct bonding *bond, const struct bond_opt_value *newval); +static int bond_option_missed_max_set(struct bonding *bond, + const struct bond_opt_value *newval);
static const struct bond_opt_value bond_mode_tbl[] = { @@ -213,6 +215,13 @@ static const struct bond_opt_value bond_ad_user_port_key_tbl[] = { { NULL, -1, 0}, };
+static const struct bond_opt_value bond_missed_max_tbl[] = { + { "minval", 1, BOND_VALFLAG_MIN}, + { "maxval", 255, BOND_VALFLAG_MAX}, + { "default", 2, BOND_VALFLAG_DEFAULT}, + { NULL, -1, 0}, +}; + static const struct bond_option bond_opts[BOND_OPT_LAST] = { [BOND_OPT_MODE] = { .id = BOND_OPT_MODE, @@ -270,6 +279,15 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = { .values = bond_intmax_tbl, .set = bond_option_arp_interval_set }, + [BOND_OPT_MISSED_MAX] = { + .id = BOND_OPT_MISSED_MAX, + .name = "arp_missed_max", + .desc = "Maximum number of missed ARP interval", + .unsuppmodes = BIT(BOND_MODE_8023AD) | BIT(BOND_MODE_TLB) | + BIT(BOND_MODE_ALB), + .values = bond_missed_max_tbl, + .set = bond_option_missed_max_set + }, [BOND_OPT_ARP_TARGETS] = { .id = BOND_OPT_ARP_TARGETS, .name = "arp_ip_target", @@ -1186,6 +1204,16 @@ static int bond_option_arp_all_targets_set(struct bonding *bond, return 0; }
+static int bond_option_missed_max_set(struct bonding *bond, + const struct bond_opt_value *newval) +{ + netdev_dbg(bond->dev, "Setting missed max to %s (%llu)\n", + newval->string, newval->value); + bond->params.missed_max = newval->value; + + return 0; +} + static int bond_option_primary_set(struct bonding *bond, const struct bond_opt_value *newval) { diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index f3e3bfd72556c..2ec11af5f0cce 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c @@ -115,6 +115,8 @@ static void bond_info_show_master(struct seq_file *seq)
seq_printf(seq, "ARP Polling Interval (ms): %d\n", bond->params.arp_interval); + seq_printf(seq, "ARP Missed Max: %u\n", + bond->params.missed_max);
seq_printf(seq, "ARP IP target/s (n.n.n.n form):");
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index b9e9842fed94e..22aa22f4e0882 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -303,6 +303,18 @@ static ssize_t bonding_show_arp_targets(struct device *d, static DEVICE_ATTR(arp_ip_target, 0644, bonding_show_arp_targets, bonding_sysfs_store_option);
+/* Show the arp missed max. */ +static ssize_t bonding_show_missed_max(struct device *d, + struct device_attribute *attr, + char *buf) +{ + struct bonding *bond = to_bond(d); + + return sprintf(buf, "%u\n", bond->params.missed_max); +} +static DEVICE_ATTR(arp_missed_max, 0644, + bonding_show_missed_max, bonding_sysfs_store_option); + /* Show the up and down delays. */ static ssize_t bonding_show_downdelay(struct device *d, struct device_attribute *attr, @@ -779,6 +791,7 @@ static struct attribute *per_bond_attrs[] = { &dev_attr_ad_actor_sys_prio.attr, &dev_attr_ad_actor_system.attr, &dev_attr_ad_user_port_key.attr, + &dev_attr_arp_missed_max.attr, NULL, };
diff --git a/include/net/bond_options.h b/include/net/bond_options.h index e64833a674eb8..dd75c071f67e2 100644 --- a/include/net/bond_options.h +++ b/include/net/bond_options.h @@ -65,6 +65,7 @@ enum { BOND_OPT_NUM_PEER_NOTIF_ALIAS, BOND_OPT_PEER_NOTIF_DELAY, BOND_OPT_LACP_ACTIVE, + BOND_OPT_MISSED_MAX, BOND_OPT_LAST };
diff --git a/include/net/bonding.h b/include/net/bonding.h index de0bdcc7dc7f9..0db3c5f36868b 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -121,6 +121,7 @@ struct bond_params { int xmit_policy; int miimon; u8 num_peer_notif; + u8 missed_max; int arp_interval; int arp_validate; int arp_all_targets; diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index eebd3894fe89a..4ac53b30b6dc9 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -858,6 +858,7 @@ enum { IFLA_BOND_TLB_DYNAMIC_LB, IFLA_BOND_PEER_NOTIF_DELAY, IFLA_BOND_AD_LACP_ACTIVE, + IFLA_BOND_MISSED_MAX, __IFLA_BOND_MAX, };
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h index b3610fdd1feec..4772a115231ae 100644 --- a/tools/include/uapi/linux/if_link.h +++ b/tools/include/uapi/linux/if_link.h @@ -655,6 +655,7 @@ enum { IFLA_BOND_TLB_DYNAMIC_LB, IFLA_BOND_PEER_NOTIF_DELAY, IFLA_BOND_AD_LACP_ACTIVE, + IFLA_BOND_MISSED_MAX, __IFLA_BOND_MAX, };
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 9949e2efb54eb3001cb2f6512ff3166dddbfb75d ]
Bonding send_peer_notif was defined as u8. Since commit 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications"). the bond->send_peer_notif will be num_peer_notif multiplied by peer_notif_delay, which is u8 * u32. This would cause the send_peer_notif overflow easily. e.g.
ip link add bond0 type bond mode 1 miimon 100 num_grat_arp 30 peer_notify_delay 1000
To fix the overflow, let's set the send_peer_notif to u32 and limit peer_notif_delay to 300s.
Reported-by: Liang Li liali@redhat.com Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2090053 Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_netlink.c | 7 ++++++- drivers/net/bonding/bond_options.c | 8 +++++++- include/net/bonding.h | 2 +- 3 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/net/bonding/bond_netlink.c b/drivers/net/bonding/bond_netlink.c index 1007bf6d385d4..7398accd46805 100644 --- a/drivers/net/bonding/bond_netlink.c +++ b/drivers/net/bonding/bond_netlink.c @@ -79,6 +79,11 @@ static int bond_fill_slave_info(struct sk_buff *skb, return -EMSGSIZE; }
+/* Limit the max delay range to 300s */ +static struct netlink_range_validation delay_range = { + .max = 300000, +}; + static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = { [IFLA_BOND_MODE] = { .type = NLA_U8 }, [IFLA_BOND_ACTIVE_SLAVE] = { .type = NLA_U32 }, @@ -109,7 +114,7 @@ static const struct nla_policy bond_policy[IFLA_BOND_MAX + 1] = { [IFLA_BOND_AD_ACTOR_SYSTEM] = { .type = NLA_BINARY, .len = ETH_ALEN }, [IFLA_BOND_TLB_DYNAMIC_LB] = { .type = NLA_U8 }, - [IFLA_BOND_PEER_NOTIF_DELAY] = { .type = NLA_U32 }, + [IFLA_BOND_PEER_NOTIF_DELAY] = NLA_POLICY_FULL_RANGE(NLA_U32, &delay_range), [IFLA_BOND_MISSED_MAX] = { .type = NLA_U8 }, };
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 2e8484a91a0e7..5f883a18bbabd 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -165,6 +165,12 @@ static const struct bond_opt_value bond_num_peer_notif_tbl[] = { { NULL, -1, 0} };
+static const struct bond_opt_value bond_peer_notif_delay_tbl[] = { + { "off", 0, 0}, + { "maxval", 300000, BOND_VALFLAG_MAX}, + { NULL, -1, 0} +}; + static const struct bond_opt_value bond_primary_reselect_tbl[] = { { "always", BOND_PRI_RESELECT_ALWAYS, BOND_VALFLAG_DEFAULT}, { "better", BOND_PRI_RESELECT_BETTER, 0}, @@ -467,7 +473,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = { .id = BOND_OPT_PEER_NOTIF_DELAY, .name = "peer_notif_delay", .desc = "Delay between each peer notification on failover event, in milliseconds", - .values = bond_intmax_tbl, + .values = bond_peer_notif_delay_tbl, .set = bond_option_peer_notif_delay_set } }; diff --git a/include/net/bonding.h b/include/net/bonding.h index 0db3c5f36868b..e4453cf4f0171 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -228,7 +228,7 @@ struct bonding { */ spinlock_t mode_lock; spinlock_t stats_lock; - u8 send_peer_notif; + u32 send_peer_notif; u8 igmp_retrans; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry;
From: Carlos Llamas cmllamas@google.com
[ Upstream commit bdc1c5fac982845a58d28690cdb56db8c88a530d ]
In binder_transaction_buffer_release() the 'failed_at' offset indicates the number of objects to clean up. However, this function was changed by commit 44d8047f1d87 ("binder: use standard functions to allocate fds"), to release all the objects in the buffer when 'failed_at' is zero.
This introduced an issue when a transaction buffer is released without any objects having been processed so far. In this case, 'failed_at' is indeed zero yet it is misinterpreted as releasing the entire buffer.
This leads to use-after-free errors where nodes are incorrectly freed and subsequently accessed. Such is the case in the following KASAN report:
================================================================== BUG: KASAN: slab-use-after-free in binder_thread_read+0xc40/0x1f30 Read of size 8 at addr ffff4faf037cfc58 by task poc/474
CPU: 6 PID: 474 Comm: poc Not tainted 6.3.0-12570-g7df047b3f0aa #5 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x48/0x60 print_report+0xf8/0x5b8 kasan_report+0xb8/0xfc __asan_load8+0x9c/0xb8 binder_thread_read+0xc40/0x1f30 binder_ioctl+0xd9c/0x1768 __arm64_sys_ioctl+0xd4/0x118 invoke_syscall+0x60/0x188 [...]
Allocated by task 474: kasan_save_stack+0x3c/0x64 kasan_set_track+0x2c/0x40 kasan_save_alloc_info+0x24/0x34 __kasan_kmalloc+0xb8/0xbc kmalloc_trace+0x48/0x5c binder_new_node+0x3c/0x3a4 binder_transaction+0x2b58/0x36f0 binder_thread_write+0x8e0/0x1b78 binder_ioctl+0x14a0/0x1768 __arm64_sys_ioctl+0xd4/0x118 invoke_syscall+0x60/0x188 [...]
Freed by task 475: kasan_save_stack+0x3c/0x64 kasan_set_track+0x2c/0x40 kasan_save_free_info+0x38/0x5c __kasan_slab_free+0xe8/0x154 __kmem_cache_free+0x128/0x2bc kfree+0x58/0x70 binder_dec_node_tmpref+0x178/0x1fc binder_transaction_buffer_release+0x430/0x628 binder_transaction+0x1954/0x36f0 binder_thread_write+0x8e0/0x1b78 binder_ioctl+0x14a0/0x1768 __arm64_sys_ioctl+0xd4/0x118 invoke_syscall+0x60/0x188 [...] ==================================================================
In order to avoid these issues, let's always calculate the intended 'failed_at' offset beforehand. This is renamed and wrapped in a helper function to make it clear and convenient.
Fixes: 32e9f56a96d8 ("binder: don't detect sender/target during buffer cleanup") Reported-by: Zi Fan Tan zifantan@google.com Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com Acked-by: Todd Kjos tkjos@google.com Link: https://lore.kernel.org/r/20230505203020.4101154-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/android/binder.c | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c index c8d33c5dbe295..a4749b6c3d730 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -1903,24 +1903,23 @@ static void binder_deferred_fd_close(int fd) static void binder_transaction_buffer_release(struct binder_proc *proc, struct binder_thread *thread, struct binder_buffer *buffer, - binder_size_t failed_at, + binder_size_t off_end_offset, bool is_failure) { int debug_id = buffer->debug_id; - binder_size_t off_start_offset, buffer_offset, off_end_offset; + binder_size_t off_start_offset, buffer_offset;
binder_debug(BINDER_DEBUG_TRANSACTION, "%d buffer release %d, size %zd-%zd, failed at %llx\n", proc->pid, buffer->debug_id, buffer->data_size, buffer->offsets_size, - (unsigned long long)failed_at); + (unsigned long long)off_end_offset);
if (buffer->target_node) binder_dec_node(buffer->target_node, 1, 0);
off_start_offset = ALIGN(buffer->data_size, sizeof(void *)); - off_end_offset = is_failure && failed_at ? failed_at : - off_start_offset + buffer->offsets_size; + for (buffer_offset = off_start_offset; buffer_offset < off_end_offset; buffer_offset += sizeof(binder_size_t)) { struct binder_object_header *hdr; @@ -2080,6 +2079,21 @@ static void binder_transaction_buffer_release(struct binder_proc *proc, } }
+/* Clean up all the objects in the buffer */ +static inline void binder_release_entire_buffer(struct binder_proc *proc, + struct binder_thread *thread, + struct binder_buffer *buffer, + bool is_failure) +{ + binder_size_t off_end_offset; + + off_end_offset = ALIGN(buffer->data_size, sizeof(void *)); + off_end_offset += buffer->offsets_size; + + binder_transaction_buffer_release(proc, thread, buffer, + off_end_offset, is_failure); +} + static int binder_translate_binder(struct flat_binder_object *fp, struct binder_transaction *t, struct binder_thread *thread) @@ -3578,7 +3592,7 @@ binder_free_buf(struct binder_proc *proc, binder_node_inner_unlock(buf_node); } trace_binder_transaction_buffer_release(buffer); - binder_transaction_buffer_release(proc, thread, buffer, 0, is_failure); + binder_release_entire_buffer(proc, thread, buffer, is_failure); binder_alloc_free_buf(&proc->alloc, buffer); }
From: Marc Zyngier maz@kernel.org
[ Upstream commit dd098a0e031928cf88c89f7577d31821e1f0e6de ]
The MIPS GIC driver uses irq_cpu_online() to go and program the per-CPU interrupts. However, this method iterates over all IRQs in the system, despite only 3 per-CPU interrupts being of interest.
Let's be terribly bold and do the iteration ourselves. To ensure mutual exclusion, hold the gic_lock spinlock that is otherwise taken while dealing with these interrupts.
Signed-off-by: Marc Zyngier maz@kernel.org Reviewed-by: Serge Semin fancer.lancer@gmail.com Reviewed-by: Florian Fainelli f.fainelli@gmail.com Tested-by: Serge Semin fancer.lancer@gmail.com Link: https://lore.kernel.org/r/20211021170414.3341522-3-maz@kernel.org Stable-dep-of: 3d6a0e4197c0 ("irqchip/mips-gic: Use raw spinlock for gic_lock") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-mips-gic.c | 37 ++++++++++++++++++++++++---------- 1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c index d815285f1efe3..0f14b2d7b19cb 100644 --- a/drivers/irqchip/irq-mips-gic.c +++ b/drivers/irqchip/irq-mips-gic.c @@ -383,24 +383,35 @@ static void gic_unmask_local_irq_all_vpes(struct irq_data *d) spin_unlock_irqrestore(&gic_lock, flags); }
-static void gic_all_vpes_irq_cpu_online(struct irq_data *d) +static void gic_all_vpes_irq_cpu_online(void) { - struct gic_all_vpes_chip_data *cd; - unsigned int intr; + static const unsigned int local_intrs[] = { + GIC_LOCAL_INT_TIMER, + GIC_LOCAL_INT_PERFCTR, + GIC_LOCAL_INT_FDC, + }; + unsigned long flags; + int i;
- intr = GIC_HWIRQ_TO_LOCAL(d->hwirq); - cd = irq_data_get_irq_chip_data(d); + spin_lock_irqsave(&gic_lock, flags);
- write_gic_vl_map(mips_gic_vx_map_reg(intr), cd->map); - if (cd->mask) - write_gic_vl_smask(BIT(intr)); + for (i = 0; i < ARRAY_SIZE(local_intrs); i++) { + unsigned int intr = local_intrs[i]; + struct gic_all_vpes_chip_data *cd; + + cd = &gic_all_vpes_chip_data[intr]; + write_gic_vl_map(mips_gic_vx_map_reg(intr), cd->map); + if (cd->mask) + write_gic_vl_smask(BIT(intr)); + } + + spin_unlock_irqrestore(&gic_lock, flags); }
static struct irq_chip gic_all_vpes_local_irq_controller = { .name = "MIPS GIC Local", .irq_mask = gic_mask_local_irq_all_vpes, .irq_unmask = gic_unmask_local_irq_all_vpes, - .irq_cpu_online = gic_all_vpes_irq_cpu_online, };
static void __gic_irq_dispatch(void) @@ -481,6 +492,10 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int virq, intr = GIC_HWIRQ_TO_LOCAL(hwirq); map = GIC_MAP_PIN_MAP_TO_PIN | gic_cpu_pin;
+ /* + * If adding support for more per-cpu interrupts, keep the the + * array in gic_all_vpes_irq_cpu_online() in sync. + */ switch (intr) { case GIC_LOCAL_INT_TIMER: /* CONFIG_MIPS_CMP workaround (see __gic_init) */ @@ -711,8 +726,8 @@ static int gic_cpu_startup(unsigned int cpu) /* Clear all local IRQ masks (ie. disable all local interrupts) */ write_gic_vl_rmask(~0);
- /* Invoke irq_cpu_online callbacks to enable desired interrupts */ - irq_cpu_online(); + /* Enable desired interrupts */ + gic_all_vpes_irq_cpu_online();
return 0; }
From: Jiaxun Yang jiaxun.yang@flygoat.com
[ Upstream commit 3d6a0e4197c04599d75d85a608c8bb16a630a38c ]
Since we may hold gic_lock in hardirq context, use raw spinlock makes more sense given that it is for low-level interrupt handling routine and the critical section is small.
Fixes BUG:
[ 0.426106] ============================= [ 0.426257] [ BUG: Invalid wait context ] [ 0.426422] 6.3.0-rc7-next-20230421-dirty #54 Not tainted [ 0.426638] ----------------------------- [ 0.426766] swapper/0/1 is trying to lock: [ 0.426954] ffffffff8104e7b8 (gic_lock){....}-{3:3}, at: gic_set_type+0x30/08
Fixes: 95150ae8b330 ("irqchip: mips-gic: Implement irq_set_type callback") Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Reviewed-by: Serge Semin fancer.lancer@gmail.com Tested-by: Serge Semin fancer.lancer@gmail.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230424103156.66753-3-jiaxun.yang@flygoat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-mips-gic.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c index 0f14b2d7b19cb..0d4515257c59c 100644 --- a/drivers/irqchip/irq-mips-gic.c +++ b/drivers/irqchip/irq-mips-gic.c @@ -49,7 +49,7 @@ void __iomem *mips_gic_base;
static DEFINE_PER_CPU_READ_MOSTLY(unsigned long[GIC_MAX_LONGS], pcpu_masks);
-static DEFINE_SPINLOCK(gic_lock); +static DEFINE_RAW_SPINLOCK(gic_lock); static struct irq_domain *gic_irq_domain; static int gic_shared_intrs; static unsigned int gic_cpu_pin; @@ -210,7 +210,7 @@ static int gic_set_type(struct irq_data *d, unsigned int type)
irq = GIC_HWIRQ_TO_SHARED(d->hwirq);
- spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags); switch (type & IRQ_TYPE_SENSE_MASK) { case IRQ_TYPE_EDGE_FALLING: pol = GIC_POL_FALLING_EDGE; @@ -250,7 +250,7 @@ static int gic_set_type(struct irq_data *d, unsigned int type) else irq_set_chip_handler_name_locked(d, &gic_level_irq_controller, handle_level_irq, NULL); - spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags);
return 0; } @@ -268,7 +268,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *cpumask, return -EINVAL;
/* Assumption : cpumask refers to a single CPU */ - spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags);
/* Re-route this IRQ */ write_gic_map_vp(irq, BIT(mips_cm_vp_id(cpu))); @@ -279,7 +279,7 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *cpumask, set_bit(irq, per_cpu_ptr(pcpu_masks, cpu));
irq_data_update_effective_affinity(d, cpumask_of(cpu)); - spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags);
return IRQ_SET_MASK_OK; } @@ -357,12 +357,12 @@ static void gic_mask_local_irq_all_vpes(struct irq_data *d) cd = irq_data_get_irq_chip_data(d); cd->mask = false;
- spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags); for_each_online_cpu(cpu) { write_gic_vl_other(mips_cm_vp_id(cpu)); write_gic_vo_rmask(BIT(intr)); } - spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags); }
static void gic_unmask_local_irq_all_vpes(struct irq_data *d) @@ -375,12 +375,12 @@ static void gic_unmask_local_irq_all_vpes(struct irq_data *d) cd = irq_data_get_irq_chip_data(d); cd->mask = true;
- spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags); for_each_online_cpu(cpu) { write_gic_vl_other(mips_cm_vp_id(cpu)); write_gic_vo_smask(BIT(intr)); } - spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags); }
static void gic_all_vpes_irq_cpu_online(void) @@ -393,7 +393,7 @@ static void gic_all_vpes_irq_cpu_online(void) unsigned long flags; int i;
- spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags);
for (i = 0; i < ARRAY_SIZE(local_intrs); i++) { unsigned int intr = local_intrs[i]; @@ -405,7 +405,7 @@ static void gic_all_vpes_irq_cpu_online(void) write_gic_vl_smask(BIT(intr)); }
- spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags); }
static struct irq_chip gic_all_vpes_local_irq_controller = { @@ -435,11 +435,11 @@ static int gic_shared_irq_domain_map(struct irq_domain *d, unsigned int virq,
data = irq_get_irq_data(virq);
- spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags); write_gic_map_pin(intr, GIC_MAP_PIN_MAP_TO_PIN | gic_cpu_pin); write_gic_map_vp(intr, BIT(mips_cm_vp_id(cpu))); irq_data_update_effective_affinity(data, cpumask_of(cpu)); - spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags);
return 0; } @@ -534,12 +534,12 @@ static int gic_irq_domain_map(struct irq_domain *d, unsigned int virq, if (!gic_local_irq_is_routable(intr)) return -EPERM;
- spin_lock_irqsave(&gic_lock, flags); + raw_spin_lock_irqsave(&gic_lock, flags); for_each_online_cpu(cpu) { write_gic_vl_other(mips_cm_vp_id(cpu)); write_gic_vo_map(mips_gic_vx_map_reg(intr), map); } - spin_unlock_irqrestore(&gic_lock, flags); + raw_spin_unlock_irqrestore(&gic_lock, flags);
return 0; }
From: Rahul Rameshbabu rrameshbabu@nvidia.com
[ Upstream commit 7aa50380191635e5897a773f272829cc961a2be5 ]
Check in the mlx5e_ptp_poll_ts_cq context if the ptp tx sq should be woken up. Before change, the ptp tx sq may never wake up if the ptp tx ts skb fifo is full when mlx5e_poll_tx_cq checks if the queue should be woken up.
Fixes: 1880bc4e4a96 ("net/mlx5e: Add TX port timestamp support") Signed-off-by: Rahul Rameshbabu rrameshbabu@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/ptp.c | 2 ++ .../net/ethernet/mellanox/mlx5/core/en/txrx.h | 2 ++ .../net/ethernet/mellanox/mlx5/core/en_tx.c | 19 ++++++++++++------- 3 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c index 3a86f66d12955..ee95cc3a03786 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c @@ -126,6 +126,8 @@ static bool mlx5e_ptp_poll_ts_cq(struct mlx5e_cq *cq, int budget) /* ensure cq space is freed before enabling more cqes */ wmb();
+ mlx5e_txqsq_wake(&ptpsq->txqsq); + return work_done == budget; }
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h index f5c872043bcbd..cf62d1f6d7f20 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/txrx.h @@ -172,6 +172,8 @@ static inline u16 mlx5e_txqsq_get_next_pi(struct mlx5e_txqsq *sq, u16 size) return pi; }
+void mlx5e_txqsq_wake(struct mlx5e_txqsq *sq); + struct mlx5e_icosq_wqe_info { u8 wqe_type; u8 num_wqebbs; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c index e18fa5ae0fd84..6813279b57f89 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tx.c @@ -810,6 +810,17 @@ static void mlx5e_tx_wi_consume_fifo_skbs(struct mlx5e_txqsq *sq, struct mlx5e_t } }
+void mlx5e_txqsq_wake(struct mlx5e_txqsq *sq) +{ + if (netif_tx_queue_stopped(sq->txq) && + mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, sq->stop_room) && + mlx5e_ptpsq_fifo_has_room(sq) && + !test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) { + netif_tx_wake_queue(sq->txq); + sq->stats->wake++; + } +} + bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget) { struct mlx5e_sq_stats *stats; @@ -909,13 +920,7 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget)
netdev_tx_completed_queue(sq->txq, npkts, nbytes);
- if (netif_tx_queue_stopped(sq->txq) && - mlx5e_wqc_has_room_for(&sq->wq, sq->cc, sq->pc, sq->stop_room) && - mlx5e_ptpsq_fifo_has_room(sq) && - !test_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) { - netif_tx_wake_queue(sq->txq); - stats->wake++; - } + mlx5e_txqsq_wake(sq);
return (i == MLX5E_TX_CQ_POLL_BUDGET); }
From: Toke Høiland-Jørgensen toke@redhat.com
[ Upstream commit 4a48ef70b93b8c7ed5190adfca18849e76387b80 ]
The functions that register an XDP memory model take a struct xdp_rxq as parameter, but the RXQ is not actually used for anything other than pulling out the struct xdp_mem_info that it embeds. So refactor the register functions and export variants that just take a pointer to the xdp_mem_info.
This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a page_pool instance that is not connected to any network device.
Signed-off-by: Toke Høiland-Jørgensen toke@redhat.com Signed-off-by: Alexei Starovoitov ast@kernel.org Link: https://lore.kernel.org/bpf/20220103150812.87914-2-toke@redhat.com Stable-dep-of: 368d3cb406cd ("page_pool: fix inconsistency for page_pool_ring_[un]lock()") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/xdp.h | 3 ++ net/core/xdp.c | 92 +++++++++++++++++++++++++++++++---------------- 2 files changed, 65 insertions(+), 30 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h index ad5b02dcb6f4c..b2ac69cb30b3d 100644 --- a/include/net/xdp.h +++ b/include/net/xdp.h @@ -260,6 +260,9 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq); int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, enum xdp_mem_type type, void *allocator); void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq); +int xdp_reg_mem_model(struct xdp_mem_info *mem, + enum xdp_mem_type type, void *allocator); +void xdp_unreg_mem_model(struct xdp_mem_info *mem);
/* Drivers not supporting XDP metadata can use this helper, which * rejects any room expansion for metadata as a result. diff --git a/net/core/xdp.c b/net/core/xdp.c index cc92ccb384325..5ed2e9d5a3191 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -110,20 +110,15 @@ static void mem_allocator_disconnect(void *allocator) mutex_unlock(&mem_id_lock); }
-void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) +void xdp_unreg_mem_model(struct xdp_mem_info *mem) { struct xdp_mem_allocator *xa; - int type = xdp_rxq->mem.type; - int id = xdp_rxq->mem.id; + int type = mem->type; + int id = mem->id;
/* Reset mem info to defaults */ - xdp_rxq->mem.id = 0; - xdp_rxq->mem.type = 0; - - if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { - WARN(1, "Missing register, driver bug"); - return; - } + mem->id = 0; + mem->type = 0;
if (id == 0) return; @@ -135,6 +130,17 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) rcu_read_unlock(); } } +EXPORT_SYMBOL_GPL(xdp_unreg_mem_model); + +void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq) +{ + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { + WARN(1, "Missing register, driver bug"); + return; + } + + xdp_unreg_mem_model(&xdp_rxq->mem); +} EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg_mem_model);
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq) @@ -261,28 +267,24 @@ static bool __is_supported_mem_type(enum xdp_mem_type type) return true; }
-int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, - enum xdp_mem_type type, void *allocator) +static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem, + enum xdp_mem_type type, + void *allocator) { struct xdp_mem_allocator *xdp_alloc; gfp_t gfp = GFP_KERNEL; int id, errno, ret; void *ptr;
- if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { - WARN(1, "Missing register, driver bug"); - return -EFAULT; - } - if (!__is_supported_mem_type(type)) - return -EOPNOTSUPP; + return ERR_PTR(-EOPNOTSUPP);
- xdp_rxq->mem.type = type; + mem->type = type;
if (!allocator) { if (type == MEM_TYPE_PAGE_POOL) - return -EINVAL; /* Setup time check page_pool req */ - return 0; + return ERR_PTR(-EINVAL); /* Setup time check page_pool req */ + return NULL; }
/* Delay init of rhashtable to save memory if feature isn't used */ @@ -292,13 +294,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, mutex_unlock(&mem_id_lock); if (ret < 0) { WARN_ON(1); - return ret; + return ERR_PTR(ret); } }
xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp); if (!xdp_alloc) - return -ENOMEM; + return ERR_PTR(-ENOMEM);
mutex_lock(&mem_id_lock); id = __mem_id_cyclic_get(gfp); @@ -306,15 +308,15 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, errno = id; goto err; } - xdp_rxq->mem.id = id; - xdp_alloc->mem = xdp_rxq->mem; + mem->id = id; + xdp_alloc->mem = *mem; xdp_alloc->allocator = allocator;
/* Insert allocator into ID lookup table */ ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node); if (IS_ERR(ptr)) { - ida_simple_remove(&mem_id_pool, xdp_rxq->mem.id); - xdp_rxq->mem.id = 0; + ida_simple_remove(&mem_id_pool, mem->id); + mem->id = 0; errno = PTR_ERR(ptr); goto err; } @@ -324,13 +326,43 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
mutex_unlock(&mem_id_lock);
- trace_mem_connect(xdp_alloc, xdp_rxq); - return 0; + return xdp_alloc; err: mutex_unlock(&mem_id_lock); kfree(xdp_alloc); - return errno; + return ERR_PTR(errno); +} + +int xdp_reg_mem_model(struct xdp_mem_info *mem, + enum xdp_mem_type type, void *allocator) +{ + struct xdp_mem_allocator *xdp_alloc; + + xdp_alloc = __xdp_reg_mem_model(mem, type, allocator); + if (IS_ERR(xdp_alloc)) + return PTR_ERR(xdp_alloc); + return 0; +} +EXPORT_SYMBOL_GPL(xdp_reg_mem_model); + +int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, + enum xdp_mem_type type, void *allocator) +{ + struct xdp_mem_allocator *xdp_alloc; + + if (xdp_rxq->reg_state != REG_STATE_REGISTERED) { + WARN(1, "Missing register, driver bug"); + return -EFAULT; + } + + xdp_alloc = __xdp_reg_mem_model(&xdp_rxq->mem, type, allocator); + if (IS_ERR(xdp_alloc)) + return PTR_ERR(xdp_alloc); + + trace_mem_connect(xdp_alloc, xdp_rxq); + return 0; } + EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
/* XDP RX runs under NAPI protection, and in different delivery error
From: Qingfang DENG qingfang.deng@siflower.com.cn
[ Upstream commit 542bcea4be866b14b3a5c8e90773329066656c43 ]
We use BH context only for synchronization, so we don't care if it's actually serving softirq or not.
As a side node, in case of threaded NAPI, in_serving_softirq() will return false because it's in process context with BH off, making page_pool_recycle_in_cache() unreachable.
Signed-off-by: Qingfang DENG qingfang.deng@siflower.com.cn Tested-by: Felix Fietkau nbd@nbd.name Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 368d3cb406cd ("page_pool: fix inconsistency for page_pool_ring_[un]lock()") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/page_pool.h | 4 ++-- net/core/page_pool.c | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/net/page_pool.h b/include/net/page_pool.h index a4082406a0039..80d987419436e 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -285,7 +285,7 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) static inline void page_pool_ring_lock(struct page_pool *pool) __acquires(&pool->ring.producer_lock) { - if (in_serving_softirq()) + if (in_softirq()) spin_lock(&pool->ring.producer_lock); else spin_lock_bh(&pool->ring.producer_lock); @@ -294,7 +294,7 @@ static inline void page_pool_ring_lock(struct page_pool *pool) static inline void page_pool_ring_unlock(struct page_pool *pool) __releases(&pool->ring.producer_lock) { - if (in_serving_softirq()) + if (in_softirq()) spin_unlock(&pool->ring.producer_lock); else spin_unlock_bh(&pool->ring.producer_lock); diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 1a6978427d6c8..1d520fa1b98a8 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -390,8 +390,8 @@ static void page_pool_return_page(struct page_pool *pool, struct page *page) static bool page_pool_recycle_in_ring(struct page_pool *pool, struct page *page) { int ret; - /* BH protection not needed if current is serving softirq */ - if (in_serving_softirq()) + /* BH protection not needed if current is softirq */ + if (in_softirq()) ret = ptr_ring_produce(&pool->ring, page); else ret = ptr_ring_produce_bh(&pool->ring, page); @@ -446,7 +446,7 @@ __page_pool_put_page(struct page_pool *pool, struct page *page, page_pool_dma_sync_for_device(pool, page, dma_sync_size);
- if (allow_direct && in_serving_softirq() && + if (allow_direct && in_softirq() && page_pool_recycle_in_cache(page, pool)) return NULL;
From: Yunsheng Lin linyunsheng@huawei.com
[ Upstream commit 368d3cb406cdd074d1df2ad9ec06d1bfcb664882 ]
page_pool_ring_[un]lock() use in_softirq() to decide which spin lock variant to use, and when they are called in the context with in_softirq() being false, spin_lock_bh() is called in page_pool_ring_lock() while spin_unlock() is called in page_pool_ring_unlock(), because spin_lock_bh() has disabled the softirq in page_pool_ring_lock(), which causes inconsistency for spin lock pair calling.
This patch fixes it by returning in_softirq state from page_pool_producer_lock(), and use it to decide which spin lock variant to use in page_pool_producer_unlock().
As pool->ring has both producer and consumer lock, so rename it to page_pool_producer_[un]lock() to reflect the actual usage. Also move them to page_pool.c as they are only used there, and remove the 'inline' as the compiler may have better idea to do inlining or not.
Fixes: 7886244736a4 ("net: page_pool: Add bulk support for ptr_ring") Signed-off-by: Yunsheng Lin linyunsheng@huawei.com Acked-by: Jesper Dangaard Brouer brouer@redhat.com Acked-by: Ilias Apalodimas ilias.apalodimas@linaro.org Link: https://lore.kernel.org/r/20230522031714.5089-1-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/page_pool.h | 18 ------------------ net/core/page_pool.c | 28 ++++++++++++++++++++++++++-- 2 files changed, 26 insertions(+), 20 deletions(-)
diff --git a/include/net/page_pool.h b/include/net/page_pool.h index 80d987419436e..edcc22605842e 100644 --- a/include/net/page_pool.h +++ b/include/net/page_pool.h @@ -282,22 +282,4 @@ static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid) page_pool_update_nid(pool, new_nid); }
-static inline void page_pool_ring_lock(struct page_pool *pool) - __acquires(&pool->ring.producer_lock) -{ - if (in_softirq()) - spin_lock(&pool->ring.producer_lock); - else - spin_lock_bh(&pool->ring.producer_lock); -} - -static inline void page_pool_ring_unlock(struct page_pool *pool) - __releases(&pool->ring.producer_lock) -{ - if (in_softirq()) - spin_unlock(&pool->ring.producer_lock); - else - spin_unlock_bh(&pool->ring.producer_lock); -} - #endif /* _NET_PAGE_POOL_H */ diff --git a/net/core/page_pool.c b/net/core/page_pool.c index 1d520fa1b98a8..069d6ba0e33fb 100644 --- a/net/core/page_pool.c +++ b/net/core/page_pool.c @@ -26,6 +26,29 @@
#define BIAS_MAX LONG_MAX
+static bool page_pool_producer_lock(struct page_pool *pool) + __acquires(&pool->ring.producer_lock) +{ + bool in_softirq = in_softirq(); + + if (in_softirq) + spin_lock(&pool->ring.producer_lock); + else + spin_lock_bh(&pool->ring.producer_lock); + + return in_softirq; +} + +static void page_pool_producer_unlock(struct page_pool *pool, + bool in_softirq) + __releases(&pool->ring.producer_lock) +{ + if (in_softirq) + spin_unlock(&pool->ring.producer_lock); + else + spin_unlock_bh(&pool->ring.producer_lock); +} + static int page_pool_init(struct page_pool *pool, const struct page_pool_params *params) { @@ -489,6 +512,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, int count) { int i, bulk_len = 0; + bool in_softirq;
for (i = 0; i < count; i++) { struct page *page = virt_to_head_page(data[i]); @@ -503,12 +527,12 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data, return;
/* Bulk producer into ptr_ring page_pool cache */ - page_pool_ring_lock(pool); + in_softirq = page_pool_producer_lock(pool); for (i = 0; i < bulk_len; i++) { if (__ptr_ring_produce(&pool->ring, data[i])) break; /* ring full */ } - page_pool_ring_unlock(pool); + page_pool_producer_unlock(pool, in_softirq);
/* Hopefully all pages was return into ptr_ring */ if (likely(i == bulk_len))
From: Jiaxun Yang jiaxun.yang@flygoat.com
[ Upstream commit 2c6c9c049510163090b979ea5f92a68ae8d93c45 ]
When a GIC local interrupt is not routable, it's vl_map will be used to control some internal states for core (providing IPTI, IPPCI, IPFDC input signal for core). Overriding it will interfere core's intetrupt controller.
Do not touch vl_map if a local interrupt is not routable, we are not going to remap it.
Before dd098a0e0319 (" irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()"), if a local interrupt is not routable, then it won't be requested from GIC Local domain, and thus gic_all_vpes_irq_cpu_online won't be called for that particular interrupt.
Fixes: dd098a0e0319 (" irqchip/mips-gic: Get rid of the reliance on irq_cpu_online()") Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Reviewed-by: Serge Semin fancer.lancer@gmail.com Tested-by: Serge Semin fancer.lancer@gmail.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20230424103156.66753-2-jiaxun.yang@flygoat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-mips-gic.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/irqchip/irq-mips-gic.c b/drivers/irqchip/irq-mips-gic.c index 0d4515257c59c..c654fe22fcf33 100644 --- a/drivers/irqchip/irq-mips-gic.c +++ b/drivers/irqchip/irq-mips-gic.c @@ -399,6 +399,8 @@ static void gic_all_vpes_irq_cpu_online(void) unsigned int intr = local_intrs[i]; struct gic_all_vpes_chip_data *cd;
+ if (!gic_local_irq_is_routable(intr)) + continue; cd = &gic_all_vpes_chip_data[intr]; write_gic_vl_map(mips_gic_vx_map_reg(intr), cd->map); if (cd->mask)
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit e0ae713023a9d09d6e1b454bdc8e8c1dd32c586e ]
Since the commit mentioned below __xdp_reg_mem_model() can return a NULL pointer. This pointer is dereferenced in trace_mem_connect() which leads to segfault.
The trace points (mem_connect + mem_disconnect) were put in place to pair connect/disconnect using the IDs. The ID is only assigned if __xdp_reg_mem_model() does not return NULL. That connect trace point is of no use if there is no ID.
Skip that connect trace point if xdp_alloc is NULL.
[ Toke Høiland-Jørgensen delivered the reasoning for skipping the trace point ]
Fixes: 4a48ef70b93b8 ("xdp: Allow registering memory model without rxq reference") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Acked-by: Toke Høiland-Jørgensen toke@redhat.com Link: https://lore.kernel.org/r/YikmmXsffE+QajTB@linutronix.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/xdp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/xdp.c b/net/core/xdp.c index 5ed2e9d5a3191..a3e3d2538a3a8 100644 --- a/net/core/xdp.c +++ b/net/core/xdp.c @@ -359,7 +359,8 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq, if (IS_ERR(xdp_alloc)) return PTR_ERR(xdp_alloc);
- trace_mem_connect(xdp_alloc, xdp_rxq); + if (trace_mem_connect_enabled() && xdp_alloc) + trace_mem_connect(xdp_alloc, xdp_rxq); return 0; }
From: Ruihan Li lrh2000@pku.edu.cn
commit 000c2fa2c144c499c881a101819cf1936a1f7cf2 upstream.
Previously, channel open messages were always sent to monitors on the first ioctl() call for unbound HCI sockets, even if the command and arguments were completely invalid. This can leave an exploitable hole with the abuse of invalid ioctl calls.
This commit hardens the ioctl processing logic by first checking if the command is valid, and immediately returning with an ENOIOCTLCMD error code if it is not. This ensures that ioctl calls with invalid commands are free of side effects, and increases the difficulty of further exploitation by forcing exploitation to find a way to pass a valid command first.
Signed-off-by: Ruihan Li lrh2000@pku.edu.cn Co-developed-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Marcel Holtmann marcel@holtmann.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Dragos-Marian Panait dragos.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hci_sock.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
--- a/net/bluetooth/hci_sock.c +++ b/net/bluetooth/hci_sock.c @@ -980,6 +980,34 @@ static int hci_sock_ioctl(struct socket
BT_DBG("cmd %x arg %lx", cmd, arg);
+ /* Make sure the cmd is valid before doing anything */ + switch (cmd) { + case HCIGETDEVLIST: + case HCIGETDEVINFO: + case HCIGETCONNLIST: + case HCIDEVUP: + case HCIDEVDOWN: + case HCIDEVRESET: + case HCIDEVRESTAT: + case HCISETSCAN: + case HCISETAUTH: + case HCISETENCRYPT: + case HCISETPTYPE: + case HCISETLINKPOL: + case HCISETLINKMODE: + case HCISETACLMTU: + case HCISETSCOMTU: + case HCIINQUIRY: + case HCISETRAW: + case HCIGETCONNINFO: + case HCIGETAUTHINFO: + case HCIBLOCKADDR: + case HCIUNBLOCKADDR: + break; + default: + return -ENOIOCTLCMD; + } + lock_sock(sk);
if (hci_pi(sk)->channel != HCI_CHANNEL_RAW) {
From: Carlos Llamas cmllamas@google.com
commit b15655b12ddca7ade09807f790bafb6fab61b50a upstream.
This reverts commit 44e602b4e52f70f04620bbbf4fe46ecb40170bde.
This caused a performance regression particularly when pages are getting reclaimed. We don't need to acquire the mmap_lock to determine when the binder buffer has been fully initialized. A subsequent patch will bring back the lockless approach for this.
[cmllamas: resolved trivial conflicts with renaming of alloc->mm]
Fixes: 44e602b4e52f ("binder_alloc: add missing mmap_lock calls when using the VMA") Cc: Liam Howlett liam.howlett@oracle.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com Link: https://lore.kernel.org/r/20230502201220.1756319-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [cmllamas: revert of original commit 44e602b4e52f applied clean] Signed-off-by: Carlos Llamas cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder_alloc.c | 31 ++++++++++--------------------- 1 file changed, 10 insertions(+), 21 deletions(-)
--- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -394,15 +394,12 @@ static struct binder_buffer *binder_allo size_t size, data_offsets_size; int ret;
- mmap_read_lock(alloc->vma_vm_mm); if (!binder_alloc_get_vma(alloc)) { - mmap_read_unlock(alloc->vma_vm_mm); binder_alloc_debug(BINDER_DEBUG_USER_ERROR, "%d: binder_alloc_buf, no vma\n", alloc->pid); return ERR_PTR(-ESRCH); } - mmap_read_unlock(alloc->vma_vm_mm);
data_offsets_size = ALIGN(data_size, sizeof(void *)) + ALIGN(offsets_size, sizeof(void *)); @@ -930,25 +927,17 @@ void binder_alloc_print_pages(struct seq * Make sure the binder_alloc is fully initialized, otherwise we might * read inconsistent state. */ - - mmap_read_lock(alloc->vma_vm_mm); - if (binder_alloc_get_vma(alloc) == NULL) { - mmap_read_unlock(alloc->vma_vm_mm); - goto uninitialized; - } - - mmap_read_unlock(alloc->vma_vm_mm); - for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) { - page = &alloc->pages[i]; - if (!page->page_ptr) - free++; - else if (list_empty(&page->lru)) - active++; - else - lru++; + if (binder_alloc_get_vma(alloc) != NULL) { + for (i = 0; i < alloc->buffer_size / PAGE_SIZE; i++) { + page = &alloc->pages[i]; + if (!page->page_ptr) + free++; + else if (list_empty(&page->lru)) + active++; + else + lru++; + } } - -uninitialized: mutex_unlock(&alloc->mutex); seq_printf(m, " pages: %d:%d:%d\n", active, lru, free); seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high);
From: Carlos Llamas cmllamas@google.com
commit c0fd2101781ef761b636769b2f445351f71c3626 upstream.
This reverts commit a43cfc87caaf46710c8027a8c23b8a55f1078f19.
This patch fixed an issue reported by syzkaller in [1]. However, this turned out to be only a band-aid in binder. The root cause, as bisected by syzkaller, was fixed by commit 5789151e48ac ("mm/mmap: undo ->mmap() when mas_preallocate() fails"). We no longer need the patch for binder.
Reverting such patch allows us to have a lockless access to alloc->vma in specific cases where the mmap_lock is not required. This approach avoids the contention that caused a performance regression.
[1] https://lore.kernel.org/all/0000000000004a0dbe05e1d749e0@google.com
[cmllamas: resolved conflicts with rework of alloc->mm and removal of binder_alloc_set_vma() also fixed comment section]
Fixes: a43cfc87caaf ("android: binder: stop saving a pointer to the VMA") Cc: Liam Howlett liam.howlett@oracle.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com Link: https://lore.kernel.org/r/20230502201220.1756319-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [cmllamas: fixed merge conflict in binder_alloc_set_vma()] Signed-off-by: Carlos Llamas cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder_alloc.c | 27 +++++++++++++-------------- drivers/android/binder_alloc.h | 2 +- drivers/android/binder_alloc_selftest.c | 2 +- 3 files changed, 15 insertions(+), 16 deletions(-)
--- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -213,7 +213,7 @@ static int binder_update_page_range(stru
if (mm) { mmap_read_lock(mm); - vma = vma_lookup(mm, alloc->vma_addr); + vma = alloc->vma; }
if (!vma && need_mm) { @@ -313,14 +313,12 @@ err_no_vma: static inline void binder_alloc_set_vma(struct binder_alloc *alloc, struct vm_area_struct *vma) { - unsigned long vm_start = 0; - - if (vma) { - vm_start = vma->vm_start; - mmap_assert_write_locked(alloc->vma_vm_mm); - } - - alloc->vma_addr = vm_start; + /* + * If we see alloc->vma is not NULL, buffer data structures set up + * completely. Look at smp_rmb side binder_alloc_get_vma. + */ + smp_wmb(); + alloc->vma = vma; }
static inline struct vm_area_struct *binder_alloc_get_vma( @@ -328,9 +326,11 @@ static inline struct vm_area_struct *bin { struct vm_area_struct *vma = NULL;
- if (alloc->vma_addr) - vma = vma_lookup(alloc->vma_vm_mm, alloc->vma_addr); - + if (alloc->vma) { + /* Look at description in binder_alloc_set_vma */ + smp_rmb(); + vma = alloc->vma; + } return vma; }
@@ -819,8 +819,7 @@ void binder_alloc_deferred_release(struc
buffers = 0; mutex_lock(&alloc->mutex); - BUG_ON(alloc->vma_addr && - vma_lookup(alloc->vma_vm_mm, alloc->vma_addr)); + BUG_ON(alloc->vma);
while ((n = rb_first(&alloc->allocated_buffers))) { buffer = rb_entry(n, struct binder_buffer, rb_node); --- a/drivers/android/binder_alloc.h +++ b/drivers/android/binder_alloc.h @@ -100,7 +100,7 @@ struct binder_lru_page { */ struct binder_alloc { struct mutex mutex; - unsigned long vma_addr; + struct vm_area_struct *vma; struct mm_struct *vma_vm_mm; void __user *buffer; struct list_head buffers; --- a/drivers/android/binder_alloc_selftest.c +++ b/drivers/android/binder_alloc_selftest.c @@ -287,7 +287,7 @@ void binder_selftest_alloc(struct binder if (!binder_selftest_run) return; mutex_lock(&binder_selftest_lock); - if (!binder_selftest_run || !alloc->vma_addr) + if (!binder_selftest_run || !alloc->vma) goto done; pr_info("STARTED\n"); binder_selftest_alloc_offset(alloc, end_offset, 0);
From: Carlos Llamas cmllamas@google.com
commit 0fa53349c3acba0239369ba4cd133740a408d246 upstream.
Bring back the original lockless design in binder_alloc to determine whether the buffer setup has been completed by the ->mmap() handler. However, this time use smp_load_acquire() and smp_store_release() to wrap all the ordering in a single macro call.
Also, add comments to make it evident that binder uses alloc->vma to determine when the binder_alloc has been fully initialized. In these scenarios acquiring the mmap_lock is not required.
Fixes: a43cfc87caaf ("android: binder: stop saving a pointer to the VMA") Cc: Liam Howlett liam.howlett@oracle.com Cc: Suren Baghdasaryan surenb@google.com Cc: stable@vger.kernel.org Signed-off-by: Carlos Llamas cmllamas@google.com Link: https://lore.kernel.org/r/20230502201220.1756319-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [cmllamas: fixed minor merge conflict in binder_alloc_set_vma()] Signed-off-by: Carlos Llamas cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder_alloc.c | 22 +++++++--------------- 1 file changed, 7 insertions(+), 15 deletions(-)
--- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -309,29 +309,18 @@ err_no_vma: return vma ? -ENOMEM : -ESRCH; }
- static inline void binder_alloc_set_vma(struct binder_alloc *alloc, struct vm_area_struct *vma) { - /* - * If we see alloc->vma is not NULL, buffer data structures set up - * completely. Look at smp_rmb side binder_alloc_get_vma. - */ - smp_wmb(); - alloc->vma = vma; + /* pairs with smp_load_acquire in binder_alloc_get_vma() */ + smp_store_release(&alloc->vma, vma); }
static inline struct vm_area_struct *binder_alloc_get_vma( struct binder_alloc *alloc) { - struct vm_area_struct *vma = NULL; - - if (alloc->vma) { - /* Look at description in binder_alloc_set_vma */ - smp_rmb(); - vma = alloc->vma; - } - return vma; + /* pairs with smp_store_release in binder_alloc_set_vma() */ + return smp_load_acquire(&alloc->vma); }
static bool debug_low_async_space_locked(struct binder_alloc *alloc, int pid) @@ -394,6 +383,7 @@ static struct binder_buffer *binder_allo size_t size, data_offsets_size; int ret;
+ /* Check binder_alloc is fully initialized */ if (!binder_alloc_get_vma(alloc)) { binder_alloc_debug(BINDER_DEBUG_USER_ERROR, "%d: binder_alloc_buf, no vma\n", @@ -789,6 +779,8 @@ int binder_alloc_mmap_handler(struct bin buffer->free = 1; binder_insert_free_buffer(alloc, buffer); alloc->free_async_space = alloc->buffer_size / 2; + + /* Signal binder_alloc is fully initialized */ binder_alloc_set_vma(alloc, vma);
return 0;
From: Carlos Llamas cmllamas@google.com
commit d1d8875c8c13517f6fd1ff8d4d3e1ac366a17e07 upstream.
[ cmllamas: clean forward port from commit 015ac18be7de ("binder: fix UAF of alloc->vma in race with munmap()") in 5.10 stable. It is needed in mainline after the revert of commit a43cfc87caaf ("android: binder: stop saving a pointer to the VMA") as pointed out by Liam. The commit log and tags have been tweaked to reflect this. ]
In commit 720c24192404 ("ANDROID: binder: change down_write to down_read") binder assumed the mmap read lock is sufficient to protect alloc->vma inside binder_update_page_range(). This used to be accurate until commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap"), which now downgrades the mmap_lock after detaching the vma from the rbtree in munmap(). Then it proceeds to teardown and free the vma with only the read lock held.
This means that accesses to alloc->vma in binder_update_page_range() now will race with vm_area_free() in munmap() and can cause a UAF as shown in the following KASAN trace:
================================================================== BUG: KASAN: use-after-free in vm_insert_page+0x7c/0x1f0 Read of size 8 at addr ffff16204ad00600 by task server/558
CPU: 3 PID: 558 Comm: server Not tainted 5.10.150-00001-gdc8dcf942daa #1 Hardware name: linux,dummy-virt (DT) Call trace: dump_backtrace+0x0/0x2a0 show_stack+0x18/0x2c dump_stack+0xf8/0x164 print_address_description.constprop.0+0x9c/0x538 kasan_report+0x120/0x200 __asan_load8+0xa0/0xc4 vm_insert_page+0x7c/0x1f0 binder_update_page_range+0x278/0x50c binder_alloc_new_buf+0x3f0/0xba0 binder_transaction+0x64c/0x3040 binder_thread_write+0x924/0x2020 binder_ioctl+0x1610/0x2e5c __arm64_sys_ioctl+0xd4/0x120 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0
Allocated by task 559: kasan_save_stack+0x38/0x6c __kasan_kmalloc.constprop.0+0xe4/0xf0 kasan_slab_alloc+0x18/0x2c kmem_cache_alloc+0x1b0/0x2d0 vm_area_alloc+0x28/0x94 mmap_region+0x378/0x920 do_mmap+0x3f0/0x600 vm_mmap_pgoff+0x150/0x17c ksys_mmap_pgoff+0x284/0x2dc __arm64_sys_mmap+0x84/0xa4 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0
Freed by task 560: kasan_save_stack+0x38/0x6c kasan_set_track+0x28/0x40 kasan_set_free_info+0x24/0x4c __kasan_slab_free+0x100/0x164 kasan_slab_free+0x14/0x20 kmem_cache_free+0xc4/0x34c vm_area_free+0x1c/0x2c remove_vma+0x7c/0x94 __do_munmap+0x358/0x710 __vm_munmap+0xbc/0x130 __arm64_sys_munmap+0x4c/0x64 el0_svc_common.constprop.0+0xac/0x270 do_el0_svc+0x38/0xa0 el0_svc+0x1c/0x2c el0_sync_handler+0xe8/0x114 el0_sync+0x180/0x1c0
[...] ==================================================================
To prevent the race above, revert back to taking the mmap write lock inside binder_update_page_range(). One might expect an increase of mmap lock contention. However, binder already serializes these calls via top level alloc->mutex. Also, there was no performance impact shown when running the binder benchmark tests.
Fixes: c0fd2101781e ("Revert "android: binder: stop saving a pointer to the VMA"") Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by: Jann Horn jannh@google.com Closes: https://lore.kernel.org/all/20230518144052.xkj6vmddccq4v66b@revolver Cc: stable@vger.kernel.org Cc: Minchan Kim minchan@kernel.org Cc: Yang Shi yang.shi@linux.alibaba.com Cc: Liam Howlett liam.howlett@oracle.com Signed-off-by: Carlos Llamas cmllamas@google.com Acked-by: Todd Kjos tkjos@google.com Link: https://lore.kernel.org/r/20230519195950.1775656-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Carlos Llamas cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/android/binder_alloc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -212,7 +212,7 @@ static int binder_update_page_range(stru mm = alloc->vma_vm_mm;
if (mm) { - mmap_read_lock(mm); + mmap_write_lock(mm); vma = alloc->vma; }
@@ -270,7 +270,7 @@ static int binder_update_page_range(stru trace_binder_alloc_page_end(alloc, index); } if (mm) { - mmap_read_unlock(mm); + mmap_write_unlock(mm); mmput(mm); } return 0; @@ -303,7 +303,7 @@ err_page_ptr_cleared: } err_no_vma: if (mm) { - mmap_read_unlock(mm); + mmap_write_unlock(mm); mmput(mm); } return vma ? -ENOMEM : -ESRCH;
From: Nicolas Dichtel nicolas.dichtel@6wind.com
commit 3632679d9e4f879f49949bb5b050e0de553e4739 upstream.
With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector.
For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket.
For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: stable@vger.kernel.org Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Link: https://lore.kernel.org/r/20230522120820.1319391-1-nicolas.dichtel@6wind.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Nicolas Dichtel nicolas.dichtel@6wind.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/ip.h | 2 ++ include/uapi/linux/in.h | 2 ++ net/ipv4/ip_sockglue.c | 12 +++++++++++- net/ipv4/raw.c | 5 ++++- net/ipv6/raw.c | 3 ++- 5 files changed, 21 insertions(+), 3 deletions(-)
--- a/include/net/ip.h +++ b/include/net/ip.h @@ -75,6 +75,7 @@ struct ipcm_cookie { __be32 addr; int oif; struct ip_options_rcu *opt; + __u8 protocol; __u8 ttl; __s16 tos; char priority; @@ -95,6 +96,7 @@ static inline void ipcm_init_sk(struct i ipcm->sockc.tsflags = inet->sk.sk_tsflags; ipcm->oif = inet->sk.sk_bound_dev_if; ipcm->addr = inet->inet_saddr; + ipcm->protocol = inet->inet_num; }
#define IPCB(skb) ((struct inet_skb_parm*)((skb)->cb)) --- a/include/uapi/linux/in.h +++ b/include/uapi/linux/in.h @@ -159,6 +159,8 @@ struct in_addr { #define MCAST_MSFILTER 48 #define IP_MULTICAST_ALL 49 #define IP_UNICAST_IF 50 +#define IP_LOCAL_PORT_RANGE 51 +#define IP_PROTOCOL 52
#define MCAST_EXCLUDE 0 #define MCAST_INCLUDE 1 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -317,7 +317,14 @@ int ip_cmsg_send(struct sock *sk, struct ipc->tos = val; ipc->priority = rt_tos2priority(ipc->tos); break; - + case IP_PROTOCOL: + if (cmsg->cmsg_len != CMSG_LEN(sizeof(int))) + return -EINVAL; + val = *(int *)CMSG_DATA(cmsg); + if (val < 1 || val > 255) + return -EINVAL; + ipc->protocol = val; + break; default: return -EINVAL; } @@ -1724,6 +1731,9 @@ static int do_ip_getsockopt(struct sock case IP_MINTTL: val = inet->min_ttl; break; + case IP_PROTOCOL: + val = inet_sk(sk)->inet_num; + break; default: release_sock(sk); return -ENOPROTOOPT; --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -559,6 +559,9 @@ static int raw_sendmsg(struct sock *sk, }
ipcm_init_sk(&ipc, inet); + /* Keep backward compat */ + if (hdrincl) + ipc.protocol = IPPROTO_RAW;
if (msg->msg_controllen) { err = ip_cmsg_send(sk, msg, &ipc, false); @@ -626,7 +629,7 @@ static int raw_sendmsg(struct sock *sk,
flowi4_init_output(&fl4, ipc.oif, ipc.sockc.mark, tos, RT_SCOPE_UNIVERSE, - hdrincl ? IPPROTO_RAW : sk->sk_protocol, + hdrincl ? ipc.protocol : sk->sk_protocol, inet_sk_flowi_flags(sk) | (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), daddr, saddr, 0, 0, sk->sk_uid); --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -828,7 +828,8 @@ static int rawv6_sendmsg(struct sock *sk
if (!proto) proto = inet->inet_num; - else if (proto != inet->inet_num) + else if (proto != inet->inet_num && + inet->inet_num != IPPROTO_RAW) return -EINVAL;
if (proto > 255)
From: Paul Blakey paulb@nvidia.com
commit 9b7c68b3911aef84afa4cbfc31bce20f10570d51 upstream.
Currently, offloaded conntrack entries (flows) can only be deleted after they are removed from offload, which is either by timeout, tcp state change or tc ct rule deletion. This can cause issues for users wishing to manually delete or flush existing entries.
Support deletion of offloaded conntrack entries.
Example usage: # Delete all offloaded (and non offloaded) conntrack entries # whose source address is 1.2.3.4 $ conntrack -D -s 1.2.3.4 # Delete all entries $ conntrack -F
Signed-off-by: Paul Blakey paulb@nvidia.com Reviewed-by: Simon Horman simon.horman@corigine.com Acked-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Florian Westphal fw@strlen.de Cc: Demi Marie Obenour demi@invisiblethingslab.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_conntrack_netlink.c | 8 -------- 1 file changed, 8 deletions(-)
--- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1546,9 +1546,6 @@ static const struct nla_policy ct_nla_po
static int ctnetlink_flush_iterate(struct nf_conn *ct, void *data) { - if (test_bit(IPS_OFFLOAD_BIT, &ct->status)) - return 0; - return ctnetlink_filter_match(ct, data); }
@@ -1612,11 +1609,6 @@ static int ctnetlink_del_conntrack(struc
ct = nf_ct_tuplehash_to_ctrack(h);
- if (test_bit(IPS_OFFLOAD_BIT, &ct->status)) { - nf_ct_put(ct); - return -EBUSY; - } - if (cda[CTA_ID]) { __be32 id = nla_get_be32(cda[CTA_ID]);
On Thu, 1 Jun 2023 at 18:54, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.115 release. There are 42 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sat, 03 Jun 2023 13:19:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.115-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 6.1 and 5.15.
drivers/dma/at_xdmac.c: In function 'atmel_xdmac_resume': drivers/dma/at_xdmac.c:2049:9: error: implicit declaration of function 'pm_runtime_get_noresume' [-Werror=implicit-function-declaration] 2049 | pm_runtime_get_noresume(atxdmac->dev); | ^~~~~~~~~~~~~~~~~~~~~~~ drivers/dma/at_xdmac.c:2049:40: error: 'struct at_xdmac' has no member named 'dev' 2049 | pm_runtime_get_noresume(atxdmac->dev); | ^~ cc1: some warnings being treated as errors
reported link: https://lore.kernel.org/stable/CA+G9fYswtPyrYJbwcGFhc5o7mkRmWZEWCCeSjmR64M+N...
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
-- Linaro LKFT https://lkft.linaro.org
linux-stable-mirror@lists.linaro.org