The macro FAN_FROM_REG evaluates its arguments multiple times. When used
in lockless contexts involving shared driver data, this leads to
Time-of-Check to Time-of-Use (TOCTOU) race conditions, potentially
causing divide-by-zero errors.
Convert the macro to a static function. This guarantees that arguments
are evaluated only once (pass-by-value), preventing the race
conditions.
Additionally, in store_fan_div, move the calculation of the minimum
limit inside the update lock. This ensures that the read-modify-write
sequence operates on consistent data.
Adhere to the principle of minimal changes by only converting macros
that evaluate arguments multiple times and are used in lockless
contexts.
Link: https://lore.kernel.org/all/CALbr=LYJ_ehtp53HXEVkSpYoub+XYSTU8Rg=o1xxMJ8=5z…
Fixes: 9873964d6eb2 ("[PATCH] HWMON: w83791d: New hardware monitoring driver for the Winbond W83791D")
Cc: stable(a)vger.kernel.org
Signed-off-by: Gui-Dong Han <hanguidong02(a)gmail.com>
---
Based on the discussion in the link, I will submit a series of patches to
address TOCTOU issues in the hwmon subsystem by converting macros to
functions or adjusting locking where appropriate.
---
drivers/hwmon/w83791d.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/hwmon/w83791d.c b/drivers/hwmon/w83791d.c
index ace854b370a0..996e36951f9d 100644
--- a/drivers/hwmon/w83791d.c
+++ b/drivers/hwmon/w83791d.c
@@ -218,9 +218,14 @@ static u8 fan_to_reg(long rpm, int div)
return clamp_val((1350000 + rpm * div / 2) / (rpm * div), 1, 254);
}
-#define FAN_FROM_REG(val, div) ((val) == 0 ? -1 : \
- ((val) == 255 ? 0 : \
- 1350000 / ((val) * (div))))
+static int fan_from_reg(int val, int div)
+{
+ if (val == 0)
+ return -1;
+ if (val == 255)
+ return 0;
+ return 1350000 / (val * div);
+}
/* for temp1 which is 8-bit resolution, LSB = 1 degree Celsius */
#define TEMP1_FROM_REG(val) ((val) * 1000)
@@ -521,7 +526,7 @@ static ssize_t show_##reg(struct device *dev, struct device_attribute *attr, \
struct w83791d_data *data = w83791d_update_device(dev); \
int nr = sensor_attr->index; \
return sprintf(buf, "%d\n", \
- FAN_FROM_REG(data->reg[nr], DIV_FROM_REG(data->fan_div[nr]))); \
+ fan_from_reg(data->reg[nr], DIV_FROM_REG(data->fan_div[nr]))); \
}
show_fan_reg(fan);
@@ -585,10 +590,10 @@ static ssize_t store_fan_div(struct device *dev, struct device_attribute *attr,
if (err)
return err;
+ mutex_lock(&data->update_lock);
/* Save fan_min */
- min = FAN_FROM_REG(data->fan_min[nr], DIV_FROM_REG(data->fan_div[nr]));
+ min = fan_from_reg(data->fan_min[nr], DIV_FROM_REG(data->fan_div[nr]));
- mutex_lock(&data->update_lock);
data->fan_div[nr] = div_to_reg(nr, val);
switch (nr) {
--
2.43.0
From: Sakari Ailus <sakari.ailus(a)linux.intel.com>
[ Upstream commit 5d010473cdeaabf6a2d3a9e2aed2186c1b73c213 ]
Calling fwnode_get_next_child_node() in ACPI implementation of the fwnode
property API is somewhat problematic as the latter is used in the
impelementation of the former. Instead of using
fwnode_get_next_child_node() in acpi_graph_get_next_endpoint(), call
acpi_get_next_subnode() directly instead.
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas(a)ideasonboard.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron(a)huawei.com>
Link: https://patch.msgid.link/20251001104320.1272752-3-sakari.ailus@linux.intel.…
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
LLM Generated explanations, may be completely bogus:
## Analysis
### 1. COMMIT MESSAGE ANALYSIS
The commit message states:
- Problem: `acpi_graph_get_next_endpoint()` calls
`fwnode_get_next_child_node()`, which dispatches back to ACPI code,
creating unnecessary indirection.
- Solution: Call `acpi_get_next_subnode()` directly instead.
No "Cc: stable(a)vger.kernel.org" tag, no "Fixes:" tag, no explicit bug
report link. The message says "somewhat problematic," indicating an
architectural issue rather than a critical bug.
### 2. CODE CHANGE ANALYSIS
The diff shows 4 replacements in `acpi_graph_get_next_endpoint()`:
- Line 1475: `fwnode_get_next_child_node(fwnode, port)` →
`acpi_get_next_subnode(fwnode, port)`
- Line 1493: `fwnode_get_next_child_node(port, prev)` →
`acpi_get_next_subnode(port, prev)`
- Line 1495: `fwnode_get_next_child_node(fwnode, port)` →
`acpi_get_next_subnode(fwnode, port)`
- Line 1499: `fwnode_get_next_child_node(port, NULL)` →
`acpi_get_next_subnode(port, NULL)`
Call chain:
1. `fwnode_get_next_child_node()` dispatches via `fwnode_call_ptr_op()`
to the fwnode-specific implementation.
2. For ACPI fwnodes, it calls `acpi_get_next_present_subnode()`
(registered at line 1747).
3. `acpi_get_next_present_subnode()` filters non-present device nodes
and calls `acpi_get_next_subnode()`.
Why the change is safe:
- Graph endpoints are ACPI data nodes (checked by `is_acpi_graph_node()`
at line 1448: `is_acpi_data_node(fwnode)`).
- `acpi_get_next_present_subnode()` only filters non-present device
nodes (lines 1407-1408), not data nodes.
- Therefore, for graph endpoints, `acpi_get_next_subnode()` and
`fwnode_get_next_child_node()` behave the same.
### 3. CLASSIFICATION
This is a bug fix addressing an architectural issue:
- Removes unnecessary indirection in ACPI-specific code.
- Avoids a circular dependency pattern (ACPI → generic → ACPI).
- Functionally equivalent for graph endpoints.
Not a feature addition, not a new API, not a refactor.
### 4. SCOPE AND RISK ASSESSMENT
- Scope: 4 lines changed in one function in one file.
- Risk: Very low — same behavior for graph endpoints, cleaner
architecture.
- Complexity: Low — direct function call replacement.
### 5. USER IMPACT
- Who is affected: Users of ACPI graph endpoints (e.g., camera/media
drivers, device tree-like ACPI usage).
- Severity: Low — architectural improvement, not a visible bug fix.
- Likelihood: The "somewhat problematic" wording suggests no immediate
user-visible issue.
### 6. STABILITY INDICATORS
- Reviewed-by: Laurent Pinchart, Jonathan Cameron
- Signed-off-by: Rafael J. Wysocki (ACPI maintainer)
- No "Tested-by:" tags
- Commit date: October 1, 2025 (recent)
### 7. DEPENDENCY CHECK
- `acpi_get_next_subnode()` exists in the same file and has been present
for years.
- No external dependencies introduced.
- Should apply cleanly to stable trees that have this code.
### 8. HISTORICAL CONTEXT
Related commits:
- `79389a83bc388`: Introduced `acpi_graph_get_next_endpoint()` with
`fwnode_get_next_child_node()` calls.
- `48698e6cf44c3`: Introduced `acpi_get_next_present_subnode()` to
filter non-present devices.
- `5d010473cdeaa` (this commit): Removes the indirection.
The pattern existed since the function was introduced; this commit
cleans it up.
### 9. STABLE KERNEL CRITERIA EVALUATION
- Obviously correct: Yes — direct call instead of indirection.
- Fixes a real bug: Yes — architectural issue that could cause problems.
- Important issue: Moderate — architectural improvement, not a critical
bug.
- Small and contained: Yes — 4 lines, single function.
- No new features: Yes — same behavior, cleaner code.
- Applies cleanly: Yes — should apply without conflicts.
### 10. RISK VS BENEFIT
Benefits:
- Removes unnecessary indirection.
- Avoids circular dependency pattern.
- Improves code clarity.
- No functional change for graph endpoints.
Risks:
- Very low — functionally equivalent change.
- No new code paths or logic changes.
### 11. CONCERNS AND CONSIDERATIONS
- No "Cc: stable" tag, but that alone doesn't disqualify.
- Recent commit (Oct 2025) — hasn't been in mainline long.
- No explicit bug report or user complaint mentioned.
- Architectural improvement rather than a critical fix.
### CONCLUSION
This is a small, correct fix that removes unnecessary indirection in
ACPI code. It fixes an architectural issue and is functionally
equivalent for graph endpoints. It meets stable kernel criteria:
correct, fixes a real issue, small scope, no new features, and should
apply cleanly.
However, it's an architectural improvement rather than a critical bug
fix, and there's no explicit stable tag or user-visible bug report. The
"somewhat problematic" wording suggests it may not cause immediate
visible problems.
Given the conservative nature of stable trees and the lack of evidence
of user-visible impact, this is borderline but leans toward acceptable
for stable backporting due to its correctness, small scope, and
architectural benefit.
**YES**
drivers/acpi/property.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/acpi/property.c b/drivers/acpi/property.c
index 43d5e457814e1..76158b1399029 100644
--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -1472,7 +1472,7 @@ static struct fwnode_handle *acpi_graph_get_next_endpoint(
if (!prev) {
do {
- port = fwnode_get_next_child_node(fwnode, port);
+ port = acpi_get_next_subnode(fwnode, port);
/*
* The names of the port nodes begin with "port@"
* followed by the number of the port node and they also
@@ -1490,13 +1490,13 @@ static struct fwnode_handle *acpi_graph_get_next_endpoint(
if (!port)
return NULL;
- endpoint = fwnode_get_next_child_node(port, prev);
+ endpoint = acpi_get_next_subnode(port, prev);
while (!endpoint) {
- port = fwnode_get_next_child_node(fwnode, port);
+ port = acpi_get_next_subnode(fwnode, port);
if (!port)
break;
if (is_acpi_graph_node(port, "port"))
- endpoint = fwnode_get_next_child_node(port, NULL);
+ endpoint = acpi_get_next_subnode(port, NULL);
}
/*
--
2.51.0
The local variable 'val' was never clamped to -75000 or 180000 because
the return value of clamp_val() was not used. Fix this by assigning the
clamped value back to 'val', and use clamp() instead of clamp_val().
Cc: stable(a)vger.kernel.org
Fixes: a557a92e6881 ("net: phy: marvell-88q2xxx: add support for temperature sensor")
Signed-off-by: Thorsten Blum <thorsten.blum(a)linux.dev>
---
drivers/net/phy/marvell-88q2xxx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/marvell-88q2xxx.c b/drivers/net/phy/marvell-88q2xxx.c
index f3d83b04c953..201dee1a1698 100644
--- a/drivers/net/phy/marvell-88q2xxx.c
+++ b/drivers/net/phy/marvell-88q2xxx.c
@@ -698,7 +698,7 @@ static int mv88q2xxx_hwmon_write(struct device *dev,
switch (attr) {
case hwmon_temp_max:
- clamp_val(val, -75000, 180000);
+ val = clamp(val, -75000, 180000);
val = (val / 1000) + 75;
val = FIELD_PREP(MDIO_MMD_PCS_MV_TEMP_SENSOR3_INT_THRESH_MASK,
val);
--
Thorsten Blum <thorsten.blum(a)linux.dev>
GPG: 1D60 735E 8AEF 3BE4 73B6 9D84 7336 78FD 8DFE EAD4
From: Haotien Hsu <haotienh(a)nvidia.com>
The UTMIP sleepwalk programming sequence requires asserting both
LINEVAL_WALK_EN and WAKE_WALK_EN when enabling the sleepwalk logic.
However, the current code mistakenly cleared WAKE_WALK_EN, which
prevents the sleepwalk trigger from operating correctly.
Fix this by asserting WAKE_WALK_EN together with LINEVAL_WALK_EN.
Fixes: 1f9cab6cc20c ("phy: tegra: xusb: Add wake/sleepwalk for Tegra186")
Cc: stable(a)vger.kernel.org
Signed-off-by: Haotien Hsu <haotienh(a)nvidia.com>
Signed-off-by: Wayne Chang <waynec(a)nvidia.com>
---
drivers/phy/tegra/xusb-tegra186.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/phy/tegra/xusb-tegra186.c b/drivers/phy/tegra/xusb-tegra186.c
index e818f6c3980e..b2a76710c0c4 100644
--- a/drivers/phy/tegra/xusb-tegra186.c
+++ b/drivers/phy/tegra/xusb-tegra186.c
@@ -401,8 +401,7 @@ static int tegra186_utmi_enable_phy_sleepwalk(struct tegra_xusb_lane *lane,
/* enable the trigger of the sleepwalk logic */
value = ao_readl(priv, XUSB_AO_UTMIP_SLEEPWALK_CFG(index));
- value |= LINEVAL_WALK_EN;
- value &= ~WAKE_WALK_EN;
+ value |= LINEVAL_WALK_EN | WAKE_WALK_EN;
ao_writel(priv, value, XUSB_AO_UTMIP_SLEEPWALK_CFG(index));
/* reset the walk pointer and clear the alarm of the sleepwalk logic,
--
2.25.1
From: Vladimir Oltean <vladimir.oltean(a)nxp.com>
[ Upstream commit 5f2b28b79d2d1946ee36ad8b3dc0066f73c90481 ]
There are actually 2 problems:
- deleting the last element doesn't require the memmove of elements
[i + 1, end) over it. Actually, element i+1 is out of bounds.
- The memmove itself should move size - i - 1 elements, because the last
element is out of bounds.
The out-of-bounds element still remains out of bounds after being
accessed, so the problem is only that we touch it, not that it becomes
in active use. But I suppose it can lead to issues if the out-of-bounds
element is part of an unmapped page.
Fixes: 6666cebc5e30 ("net: dsa: sja1105: Add support for VLAN operations")
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://patch.msgid.link/20250318115716.2124395-4-vladimir.oltean@nxp.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Chen Yu <xnguchen(a)sina.cn>
---
drivers/net/dsa/sja1105/sja1105_static_config.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_static_config.c b/drivers/net/dsa/sja1105/sja1105_static_config.c
index baba204ad62f..2ac91fe2a79b 100644
--- a/drivers/net/dsa/sja1105/sja1105_static_config.c
+++ b/drivers/net/dsa/sja1105/sja1105_static_config.c
@@ -1921,8 +1921,10 @@ int sja1105_table_delete_entry(struct sja1105_table *table, int i)
if (i > table->entry_count)
return -ERANGE;
- memmove(entries + i * entry_size, entries + (i + 1) * entry_size,
- (table->entry_count - i) * entry_size);
+ if (i + 1 < table->entry_count) {
+ memmove(entries + i * entry_size, entries + (i + 1) * entry_size,
+ (table->entry_count - i - 1) * entry_size);
+ }
table->entry_count--;
--
2.17.1
get_user/put_user change didn't spend time in next and
seems a bit too risky to rush. I'm keeping it in my tree
and we'll get it in the next cycle.
The following changes since commit ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d:
Linux 6.18-rc7 (2025-11-23 14:53:16 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 205dd7a5d6ad6f4c8e8fcd3c3b95a7c0e7067fee:
virtio_pci: drop kernel.h (2025-11-30 18:02:43 -0500)
----------------------------------------------------------------
virtio,vhost: fixes, cleanups
Just a bunch of fixes and cleanups, mostly very simple. Several
features are merged through net-next this time around.
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Alok Tiwari (3):
virtio_vdpa: fix misleading return in void function
vdpa/mlx5: Fix incorrect error code reporting in query_virtqueues
vdpa/pds: use %pe for ERR_PTR() in event handler registration
Kriish Sharma (1):
virtio: fix kernel-doc for mapping/free_coherent functions
Marco Crivellari (2):
virtio_balloon: add WQ_PERCPU to alloc_workqueue users
vduse: add WQ_PERCPU to alloc_workqueue users
Miaoqian Lin (1):
virtio: vdpa: Fix reference count leak in octep_sriov_enable()
Michael S. Tsirkin (11):
virtio: fix typo in virtio_device_ready() comment
virtio: fix whitespace in virtio_config_ops
virtio: fix grammar in virtio_queue_info docs
virtio: fix grammar in virtio_map_ops docs
virtio: standardize Returns documentation style
virtio: fix virtqueue_set_affinity() docs
virtio: fix map ops comment
virtio: clean up features qword/dword terms
vhost/test: add test specific macro for features
vhost: switch to arrays of feature bits
virtio_pci: drop kernel.h
Mike Christie (1):
vhost: Fix kthread worker cgroup failure handling
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 +-
drivers/vdpa/octeon_ep/octep_vdpa_main.c | 1 +
drivers/vdpa/pds/vdpa_dev.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 3 ++-
drivers/vhost/net.c | 29 +++++++++++-----------
drivers/vhost/scsi.c | 9 ++++---
drivers/vhost/test.c | 10 ++++++--
drivers/vhost/vhost.c | 4 ++-
drivers/vhost/vhost.h | 42 ++++++++++++++++++++++++++------
drivers/vhost/vsock.c | 10 +++++---
drivers/virtio/virtio.c | 12 ++++-----
drivers/virtio/virtio_balloon.c | 3 ++-
drivers/virtio/virtio_debug.c | 10 ++++----
drivers/virtio/virtio_pci_modern_dev.c | 6 ++---
drivers/virtio/virtio_ring.c | 7 +++---
drivers/virtio/virtio_vdpa.c | 2 +-
include/linux/virtio.h | 2 +-
include/linux/virtio_config.h | 24 +++++++++---------
include/linux/virtio_features.h | 29 +++++++++++-----------
include/linux/virtio_pci_modern.h | 8 +++---
include/uapi/linux/virtio_pci.h | 2 +-
21 files changed, 131 insertions(+), 86 deletions(-)
When VM boots with one virtio-crypto PCI device and builtin backend,
run openssl benchmark command with multiple processes, such as
openssl speed -evp aes-128-cbc -engine afalg -seconds 10 -multi 32
openssl processes will hangup and there is error reported like this:
virtio_crypto virtio0: dataq.0:id 3 is not a head!
It seems that the data virtqueue need protection when it is handled
for virtio done notification. If the spinlock protection is added
in virtcrypto_done_task(), openssl benchmark with multiple processes
works well.
Fixes: fed93fb62e05 ("crypto: virtio - Handle dataq logic with tasklet")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bibo Mao <maobibo(a)loongson.cn>
---
drivers/crypto/virtio/virtio_crypto_core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/crypto/virtio/virtio_crypto_core.c b/drivers/crypto/virtio/virtio_crypto_core.c
index 3d241446099c..ccc6b5c1b24b 100644
--- a/drivers/crypto/virtio/virtio_crypto_core.c
+++ b/drivers/crypto/virtio/virtio_crypto_core.c
@@ -75,15 +75,20 @@ static void virtcrypto_done_task(unsigned long data)
struct data_queue *data_vq = (struct data_queue *)data;
struct virtqueue *vq = data_vq->vq;
struct virtio_crypto_request *vc_req;
+ unsigned long flags;
unsigned int len;
+ spin_lock_irqsave(&data_vq->lock, flags);
do {
virtqueue_disable_cb(vq);
while ((vc_req = virtqueue_get_buf(vq, &len)) != NULL) {
+ spin_unlock_irqrestore(&data_vq->lock, flags);
if (vc_req->alg_cb)
vc_req->alg_cb(vc_req, len);
+ spin_lock_irqsave(&data_vq->lock, flags);
}
} while (!virtqueue_enable_cb(vq));
+ spin_unlock_irqrestore(&data_vq->lock, flags);
}
static void virtcrypto_dataq_callback(struct virtqueue *vq)
--
2.39.3
The below “No resource for ep” warning appears when a StartTransfer
command is issued for bulk or interrupt endpoints in
`dwc3_gadget_ep_enable` while a previous StartTransfer on the same
endpoint is still in progress. The gadget functions drivers can invoke
`usb_ep_enable` (which triggers a new StartTransfer command) before the
earlier transfer has completed. Because the previous StartTransfer is
still active, `dwc3_gadget_ep_disable` can skip the required
`EndTransfer` due to `DWC3_EP_DELAY_STOP`, leading to the endpoint
resources are busy for previous StartTransfer and warning ("No resource
for ep") from dwc3 driver.
To resolve this, a check is added to `dwc3_gadget_ep_enable` that
checks the `DWC3_EP_TRANSFER_STARTED` flag before issuing a new
StartTransfer. By preventing a second StartTransfer on an already busy
endpoint, the resource conflict is eliminated, the warning disappears,
and potential kernel panics caused by `panic_on_warn` are avoided.
------------[ cut here ]------------
dwc3 13200000.dwc3: No resource for ep1out
WARNING: CPU: 0 PID: 700 at drivers/usb/dwc3/gadget.c:398 dwc3_send_gadget_ep_cmd+0x2f8/0x76c
Call trace:
dwc3_send_gadget_ep_cmd+0x2f8/0x76c
__dwc3_gadget_ep_enable+0x490/0x7c0
dwc3_gadget_ep_enable+0x6c/0xe4
usb_ep_enable+0x5c/0x15c
mp_eth_stop+0xd4/0x11c
__dev_close_many+0x160/0x1c8
__dev_change_flags+0xfc/0x220
dev_change_flags+0x24/0x70
devinet_ioctl+0x434/0x524
inet_ioctl+0xa8/0x224
sock_do_ioctl+0x74/0x128
sock_ioctl+0x3bc/0x468
__arm64_sys_ioctl+0xa8/0xe4
invoke_syscall+0x58/0x10c
el0_svc_common+0xa8/0xdc
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x88
el0t_64_sync_handler+0x70/0xbc
el0t_64_sync+0x1a8/0x1ac
Fixes: a97ea994605e ("usb: dwc3: gadget: offset Start Transfer latency for bulk EPs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Selvarasu Ganesan <selvarasu.g(a)samsung.com>
---
Changes in v2:
- Removed change-id.
- Updated commit message.
Link to v1: https://lore.kernel.org/linux-usb/20251117152812.622-1-selvarasu.g@samsung.…
---
drivers/usb/dwc3/gadget.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 1f67fb6aead5..8d3caa71ea12 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -963,8 +963,9 @@ static int __dwc3_gadget_ep_enable(struct dwc3_ep *dep, unsigned int action)
* Issue StartTransfer here with no-op TRB so we can always rely on No
* Response Update Transfer command.
*/
- if (usb_endpoint_xfer_bulk(desc) ||
- usb_endpoint_xfer_int(desc)) {
+ if ((usb_endpoint_xfer_bulk(desc) ||
+ usb_endpoint_xfer_int(desc)) &&
+ !(dep->flags & DWC3_EP_TRANSFER_STARTED)) {
struct dwc3_gadget_ep_cmd_params params;
struct dwc3_trb *trb;
dma_addr_t trb_dma;
--
2.34.1
Currently, kvfree_rcu_barrier() flushes RCU sheaves across all slab
caches when a cache is destroyed. This is unnecessary; only the RCU
sheaves belonging to the cache being destroyed need to be flushed.
As suggested by Vlastimil Babka, introduce a weaker form of
kvfree_rcu_barrier() that operates on a specific slab cache.
Factor out flush_rcu_sheaves_on_cache() from flush_all_rcu_sheaves() and
call it from flush_all_rcu_sheaves() and kvfree_rcu_barrier_on_cache().
Call kvfree_rcu_barrier_on_cache() instead of kvfree_rcu_barrier() on
cache destruction.
The performance benefit is evaluated on a 12 core 24 threads AMD Ryzen
5900X machine (1 socket), by loading slub_kunit module.
Before:
Total calls: 19
Average latency (us): 18127
Total time (us): 344414
After:
Total calls: 19
Average latency (us): 10066
Total time (us): 191264
Two performance regression have been reported:
- stress module loader test's runtime increases by 50-60% (Daniel)
- internal graphics test's runtime on Tegra23 increases by 35% (Jon)
They are fixed by this change.
Suggested-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: ec66e0d59952 ("slab: add sheaf support for batching kfree_rcu() operations")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/linux-mm/1bda09da-93be-4737-aef0-d47f8c5c9301@suse.…
Reported-and-tested-by: Daniel Gomez <da.gomez(a)samsung.com>
Closes: https://lore.kernel.org/linux-mm/0406562e-2066-4cf8-9902-b2b0616dd742@kerne…
Reported-and-tested-by: Jon Hunter <jonathanh(a)nvidia.com>
Closes: https://lore.kernel.org/linux-mm/e988eff6-1287-425e-a06c-805af5bbf262@nvidi…
Signed-off-by: Harry Yoo <harry.yoo(a)oracle.com>
---
No code change, added proper tags and updated changelog.
include/linux/slab.h | 5 ++++
mm/slab.h | 1 +
mm/slab_common.c | 52 +++++++++++++++++++++++++++++------------
mm/slub.c | 55 ++++++++++++++++++++++++--------------------
4 files changed, 73 insertions(+), 40 deletions(-)
diff --git a/include/linux/slab.h b/include/linux/slab.h
index cf443f064a66..937c93d44e8c 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -1149,6 +1149,10 @@ static inline void kvfree_rcu_barrier(void)
{
rcu_barrier();
}
+static inline void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
+{
+ rcu_barrier();
+}
static inline void kfree_rcu_scheduler_running(void) { }
#else
@@ -1156,6 +1160,7 @@ void kvfree_rcu_barrier(void);
void kfree_rcu_scheduler_running(void);
#endif
+void kvfree_rcu_barrier_on_cache(struct kmem_cache *s);
/**
* kmalloc_size_roundup - Report allocation bucket size for the given size
diff --git a/mm/slab.h b/mm/slab.h
index f730e012553c..e767aa7e91b0 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -422,6 +422,7 @@ static inline bool is_kmalloc_normal(struct kmem_cache *s)
bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj);
void flush_all_rcu_sheaves(void);
+void flush_rcu_sheaves_on_cache(struct kmem_cache *s);
#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
SLAB_CACHE_DMA32 | SLAB_PANIC | \
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 84dfff4f7b1f..dd8a49d6f9cc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -492,7 +492,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
return;
/* in-flight kfree_rcu()'s may include objects from our cache */
- kvfree_rcu_barrier();
+ kvfree_rcu_barrier_on_cache(s);
if (IS_ENABLED(CONFIG_SLUB_RCU_DEBUG) &&
(s->flags & SLAB_TYPESAFE_BY_RCU)) {
@@ -2038,25 +2038,13 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
}
EXPORT_SYMBOL_GPL(kvfree_call_rcu);
-/**
- * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete.
- *
- * Note that a single argument of kvfree_rcu() call has a slow path that
- * triggers synchronize_rcu() following by freeing a pointer. It is done
- * before the return from the function. Therefore for any single-argument
- * call that will result in a kfree() to a cache that is to be destroyed
- * during module exit, it is developer's responsibility to ensure that all
- * such calls have returned before the call to kmem_cache_destroy().
- */
-void kvfree_rcu_barrier(void)
+static inline void __kvfree_rcu_barrier(void)
{
struct kfree_rcu_cpu_work *krwp;
struct kfree_rcu_cpu *krcp;
bool queued;
int i, cpu;
- flush_all_rcu_sheaves();
-
/*
* Firstly we detach objects and queue them over an RCU-batch
* for all CPUs. Finally queued works are flushed for each CPU.
@@ -2118,8 +2106,43 @@ void kvfree_rcu_barrier(void)
}
}
}
+
+/**
+ * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete.
+ *
+ * Note that a single argument of kvfree_rcu() call has a slow path that
+ * triggers synchronize_rcu() following by freeing a pointer. It is done
+ * before the return from the function. Therefore for any single-argument
+ * call that will result in a kfree() to a cache that is to be destroyed
+ * during module exit, it is developer's responsibility to ensure that all
+ * such calls have returned before the call to kmem_cache_destroy().
+ */
+void kvfree_rcu_barrier(void)
+{
+ flush_all_rcu_sheaves();
+ __kvfree_rcu_barrier();
+}
EXPORT_SYMBOL_GPL(kvfree_rcu_barrier);
+/**
+ * kvfree_rcu_barrier_on_cache - Wait for in-flight kvfree_rcu() calls on a
+ * specific slab cache.
+ * @s: slab cache to wait for
+ *
+ * See the description of kvfree_rcu_barrier() for details.
+ */
+void kvfree_rcu_barrier_on_cache(struct kmem_cache *s)
+{
+ if (s->cpu_sheaves)
+ flush_rcu_sheaves_on_cache(s);
+ /*
+ * TODO: Introduce a version of __kvfree_rcu_barrier() that works
+ * on a specific slab cache.
+ */
+ __kvfree_rcu_barrier();
+}
+EXPORT_SYMBOL_GPL(kvfree_rcu_barrier_on_cache);
+
static unsigned long
kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
{
@@ -2215,4 +2238,3 @@ void __init kvfree_rcu_init(void)
}
#endif /* CONFIG_KVFREE_RCU_BATCHED */
-
diff --git a/mm/slub.c b/mm/slub.c
index 785e25a14999..7cec2220712b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -4118,42 +4118,47 @@ static void flush_rcu_sheaf(struct work_struct *w)
/* needed for kvfree_rcu_barrier() */
-void flush_all_rcu_sheaves(void)
+void flush_rcu_sheaves_on_cache(struct kmem_cache *s)
{
struct slub_flush_work *sfw;
- struct kmem_cache *s;
unsigned int cpu;
- cpus_read_lock();
- mutex_lock(&slab_mutex);
+ mutex_lock(&flush_lock);
- list_for_each_entry(s, &slab_caches, list) {
- if (!s->cpu_sheaves)
- continue;
+ for_each_online_cpu(cpu) {
+ sfw = &per_cpu(slub_flush, cpu);
- mutex_lock(&flush_lock);
+ /*
+ * we don't check if rcu_free sheaf exists - racing
+ * __kfree_rcu_sheaf() might have just removed it.
+ * by executing flush_rcu_sheaf() on the cpu we make
+ * sure the __kfree_rcu_sheaf() finished its call_rcu()
+ */
- for_each_online_cpu(cpu) {
- sfw = &per_cpu(slub_flush, cpu);
+ INIT_WORK(&sfw->work, flush_rcu_sheaf);
+ sfw->s = s;
+ queue_work_on(cpu, flushwq, &sfw->work);
+ }
- /*
- * we don't check if rcu_free sheaf exists - racing
- * __kfree_rcu_sheaf() might have just removed it.
- * by executing flush_rcu_sheaf() on the cpu we make
- * sure the __kfree_rcu_sheaf() finished its call_rcu()
- */
+ for_each_online_cpu(cpu) {
+ sfw = &per_cpu(slub_flush, cpu);
+ flush_work(&sfw->work);
+ }
- INIT_WORK(&sfw->work, flush_rcu_sheaf);
- sfw->s = s;
- queue_work_on(cpu, flushwq, &sfw->work);
- }
+ mutex_unlock(&flush_lock);
+}
- for_each_online_cpu(cpu) {
- sfw = &per_cpu(slub_flush, cpu);
- flush_work(&sfw->work);
- }
+void flush_all_rcu_sheaves(void)
+{
+ struct kmem_cache *s;
+
+ cpus_read_lock();
+ mutex_lock(&slab_mutex);
- mutex_unlock(&flush_lock);
+ list_for_each_entry(s, &slab_caches, list) {
+ if (!s->cpu_sheaves)
+ continue;
+ flush_rcu_sheaves_on_cache(s);
}
mutex_unlock(&slab_mutex);
--
2.43.0