The DW UART may trigger the RX_TIMEOUT interrupt without data
present and remain stuck in this state indefinitely. The
dw8250_handle_irq() function detects this condition by checking
if the UART_LSR_DR bit is not set when RX_TIMEOUT occurs. When
detected, it performs a "dummy read" to recover the DW UART from
this state.
When the PSLVERR_RESP_EN parameter is set to 1, reading the UART_RX
while the FIFO is enabled and UART_LSR_DR is not set will generate a
PSLVERR error, which may lead to a system panic. There are two methods
to prevent PSLVERR: one is to check if UART_LSR_DR is set before reading
UART_RX when the FIFO is enabled, and the other is to read UART_RX when
the FIFO is disabled.
Given these two scenarios, the FIFO must be disabled before the
"dummy read" operation and re-enabled afterward to maintain normal
UART functionality.
Fixes: 424d79183af0 ("serial: 8250_dw: Avoid "too much work" from bogus rx timeout interrupt")
Signed-off-by: Yunhui Cui <cuiyunhui(a)bytedance.com>
Cc: stable(a)vger.kernel.org
---
drivers/tty/serial/8250/8250_dw.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c
index 1902f29444a1c..082b7fcf251db 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -297,9 +297,17 @@ static int dw8250_handle_irq(struct uart_port *p)
uart_port_lock_irqsave(p, &flags);
status = serial_lsr_in(up);
- if (!(status & (UART_LSR_DR | UART_LSR_BI)))
+ if (!(status & (UART_LSR_DR | UART_LSR_BI))) {
+ /* To avoid PSLVERR, disable the FIFO first. */
+ if (up->fcr & UART_FCR_ENABLE_FIFO)
+ serial_out(up, UART_FCR, 0);
+
serial_port_in(p, UART_RX);
+ if (up->fcr & UART_FCR_ENABLE_FIFO)
+ serial_out(up, UART_FCR, up->fcr);
+ }
+
uart_port_unlock_irqrestore(p, flags);
}
--
2.39.5
Hello,
This is to inform all that constant firmware crashes have been seen in
the "Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter",
which was shipped with the Dell Inspiron 5567 laptops. This affects
every kernel release, including the stable and the longterm ones.
All the logs have been taken after livebooting an Arch Linux ISO.
Every distro has been tried, and it has been confirmed that some error
of this kind is shown in every distro.
## Steps to reproduce the issue
1. Boot/liveboot any Linux ISO through this card (and possibly, this laptop).
2. Wi-Fi network interface appears.
3. Connect the Wi-Fi router to the computer.
4. A few moments/minutes after that, the touchpad stops working, and
the network interface cannot even access the Internet anymore (BUT,
the network interface might disappear, might not disappear).
## Affected distros and the necessary workarounds
This has been the pattern on every distro and their corresponding
kernels (LMDE, Linux Mint, Pop!_OS, Zorin, Kubuntu, KDE Neon,
elementaryOS, Fedora, and even Arch). The fix which made these distros
usable is to add two things:
- Adding "options ath10k_core skip_otp=y" to a new conf file in /etc/modprobe.d.
- Adding "pci=noaer" in GRUB kernel parameters so that the logs are
not flooded with Multiple Correctable Errors.
To defend my case (that it occurs in the other models of Inspiron 5567
too), I have recently contacted someone running Linux Mint on the same
model. The answer was the same: the touchpad and the Wi-Fi stop
simultaneously.
## Some of the limitations
The kernel was tainted, but the other things have been properly noted
in case they might provide some useful details. As stated,
investigating why IRQ #16 is disabled will probably give us the
answer.
## Logs provided
All the logs in a combined manner can be found here:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180
- Full dmesg: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- Hostnamectl: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- lspci: https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- Modinfo of the driver:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- Ping command:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- /proc/interrupts:
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
- IP addr command (Heavily Redacted):
https://gist.github.com/BandhanPramanik/ddb0cb23eca03ca2ea43a1d832a16180#fi…
Lastly, this issue on the GitHub repository of Pop!_OS 'might' be
relevant: https://github.com/pop-os/pop/issues/1470
It would be highly appreciated if the matter were looked into.
Thanks,
Bandhan Pramanik
The AMD IOMMU documentation seems pretty clear that the V2 table follows
the normal CPU expectation of sign extension. This is shown in
Figure 25: AMD64 Long Mode 4-Kbyte Page Address Translation
Where bits Sign-Extend [63:57] == [56]. This is typical for x86 which
would have three regions in the page table: lower, non-canonical, upper.
The manual describes that the V1 table does not sign extend in section
2.2.4 Sharing AMD64 Processor and IOMMU Page Tables GPA-to-SPA
Further, Vasant has checked this and indicates the HW has an addtional
behavior that the manual does not yet describe. The AMDv2 table does not
have the sign extended behavior when attached to PASID 0, which may
explain why this has gone unnoticed.
The iommu domain geometry does not directly support sign extended page
tables. The driver should report only one of the lower/upper spaces. Solve
this by removing the top VA bit from the geometry to use only the lower
space.
This will also make the iommu_domain work consistently on all PASID 0 and
PASID != 1.
Adjust dma_max_address() to remove the top VA bit. It now returns:
5 Level:
Before 0x1ffffffffffffff
After 0x0ffffffffffffff
4 Level:
Before 0xffffffffffff
After 0x7fffffffffff
Fixes: 11c439a19466 ("iommu/amd/pgtbl_v2: Fix domain max address")
Link: https://lore.kernel.org/all/8858d4d6-d360-4ef0-935c-bfd13ea54f42@amd.com/
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
---
drivers/iommu/amd/iommu.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
v2:
- Revise the commit message and comment with the new information
from Vasant.
v1: https://patch.msgid.link/r/0-v1-6925ece6b623+296-amdv2_geo_jgg@nvidia.com
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 3117d99cf83d0d..1baa9d3583f369 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -2526,8 +2526,21 @@ static inline u64 dma_max_address(enum protection_domain_mode pgtable)
if (pgtable == PD_MODE_V1)
return ~0ULL;
- /* V2 with 4/5 level page table */
- return ((1ULL << PM_LEVEL_SHIFT(amd_iommu_gpt_level)) - 1);
+ /*
+ * V2 with 4/5 level page table. Note that "2.2.6.5 AMD64 4-Kbyte Page
+ * Translation" shows that the V2 table sign extends the top of the
+ * address space creating a reserved region in the middle of the
+ * translation, just like the CPU does. Further Vasant says the docs are
+ * incomplete and this only applies to non-zero PASIDs. If the AMDv2
+ * page table is assigned to the 0 PASID then there is no sign extension
+ * check.
+ *
+ * Since the IOMMU must have a fixed geometry, and the core code does
+ * not understand sign extended addressing, we have to chop off the high
+ * bit to get consistent behavior with attachments of the domain to any
+ * PASID.
+ */
+ return ((1ULL << (PM_LEVEL_SHIFT(amd_iommu_gpt_level) - 1)) - 1);
}
static bool amd_iommu_hd_support(struct amd_iommu *iommu)
base-commit: eb328711b15b17987021dbb674f446b7b008dca5
--
2.43.0
When the PSLVERR_RESP_EN parameter is set to 1, the device generates
an error response if an attempt is made to read an empty RBR (Receive
Buffer Register) while the FIFO is enabled.
In serial8250_do_startup(), calling serial_port_out(port, UART_LCR,
UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes
dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter
function enables the FIFO via serial_out(p, UART_FCR, p->fcr).
Execution proceeds to the serial_port_in(port, UART_RX).
This satisfies the PSLVERR trigger condition.
When another CPU (e.g., using printk()) is accessing the UART (UART
is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) ==
(lcr & ~UART_LCR_SPAR) in dw8250_check_lcr(), causing it to enter
dw8250_force_idle().
Put serial_port_out(port, UART_LCR, UART_LCR_WLEN8) under the port->lock
to fix this issue.
Panic backtrace:
[ 0.442336] Oops - unknown exception [#1]
[ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a
[ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e
...
[ 0.442416] console_on_rootfs+0x26/0x70
Fixes: c49436b657d0 ("serial: 8250_dw: Improve unwritable LCR workaround")
Link: https://lore.kernel.org/all/84cydt5peu.fsf@jogness.linutronix.de/T/
Signed-off-by: Yunhui Cui <cuiyunhui(a)bytedance.com>
Cc: stable(a)vger.kernel.org
---
drivers/tty/serial/8250/8250_port.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 6d7b8c4667c9c..07fe818dffa34 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -2376,9 +2376,10 @@ int serial8250_do_startup(struct uart_port *port)
/*
* Now, initialize the UART
*/
- serial_port_out(port, UART_LCR, UART_LCR_WLEN8);
uart_port_lock_irqsave(port, &flags);
+ serial_port_out(port, UART_LCR, UART_LCR_WLEN8);
+
if (up->port.flags & UPF_FOURPORT) {
if (!up->port.irq)
up->port.mctrl |= TIOCM_OUT1;
--
2.39.5
From: Yang Xiwen <forbidden405(a)outlook.com>
Original logic only sets the return value but doesn't jump out of the
loop if the bus is kept active by a client. This is not expected. A
malicious or buggy i2c client can hang the kernel in this case and
should be avoided. This is observed during a long time test with a
PCA953x GPIO extender.
Fix it by changing the logic to not only sets the return value, but also
jumps out of the loop and return to the caller with -ETIMEDOUT.
Cc: stable(a)vger.kernel.org
Signed-off-by: Yang Xiwen <forbidden405(a)outlook.com>
---
drivers/i2c/busses/i2c-qup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c
index 3a36d682ed57..5b053e51f4c9 100644
--- a/drivers/i2c/busses/i2c-qup.c
+++ b/drivers/i2c/busses/i2c-qup.c
@@ -452,8 +452,10 @@ static int qup_i2c_bus_active(struct qup_i2c_dev *qup, int len)
if (!(status & I2C_STATUS_BUS_ACTIVE))
break;
- if (time_after(jiffies, timeout))
+ if (time_after(jiffies, timeout)) {
ret = -ETIMEDOUT;
+ break;
+ }
usleep_range(len, len * 2);
}
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250615-qca-i2c-d41bb61aa59e
Best regards,
--
Yang Xiwen <forbidden405(a)outlook.com>
Hi,
in the last week, after updating to 6.6.92, we’ve encountered a number of VMs reporting temporarily hung tasks blocking the whole system for a few minutes. They unblock by themselves and have similar tracebacks.
The IO PSIs show 100% pressure for that time, but the underlying devices are still processing read and write IO (well within their capacity). I’ve eliminated the underlying storage (Ceph) as the source of problems as I couldn’t find any latency outliers or significant queuing during that time.
I’ve seen somewhat similar reports on 6.6.88 and 6.6.77, but those might have been different outliers.
I’m attaching 3 logs - my intuition and the data so far leads me to consider this might be a kernel bug. I haven’t found a way to reproduce this, yet.
Regards,
Christian
--
Christian Theune · ct(a)flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
On Fri, Apr 11, 2025 at 10:14:44AM -0500, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello(a)amd.com>
>
> commit a5cfc9d65879c ("thunderbolt: Add wake on connect/disconnect
> on USB4 ports") introduced a sysfs file to control wake up policy
> for a given USB4 port that defaulted to disabled.
>
> However when testing commit 4bfeea6ec1c02 ("thunderbolt: Use wake
> on connect and disconnect over suspend") I found that it was working
> even without making changes to the power/wakeup file (which defaults
> to disabled). This is because of a logic error doing a bitwise or
> of the wake-on-connect flag with device_may_wakeup() which should
> have been a logical AND.
>
> Adjust the logic so that policy is only applied when wakeup is
> actually enabled.
>
> Fixes: a5cfc9d65879c ("thunderbolt: Add wake on connect/disconnect on USB4 ports")
> Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Hi! There have been a couple of reports of a Thunderbolt regression in
recent stable kernels, and one reporter has now bisected it to this
change:
• https://bugzilla.kernel.org/show_bug.cgi?id=220284
• https://github.com/NixOS/nixpkgs/issues/420730
Both reporters are CCed, and say it starts working after the module is
reloaded.
Link: https://lore.kernel.org/r/bug-220284-208809@https.bugzilla.kernel.org%2F/
(for regzbot)
> ---
> drivers/thunderbolt/usb4.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/thunderbolt/usb4.c b/drivers/thunderbolt/usb4.c
> index e51d01671d8e7..3e96f1afd4268 100644
> --- a/drivers/thunderbolt/usb4.c
> +++ b/drivers/thunderbolt/usb4.c
> @@ -440,10 +440,10 @@ int usb4_switch_set_wake(struct tb_switch *sw, unsigned int flags)
> bool configured = val & PORT_CS_19_PC;
> usb4 = port->usb4;
>
> - if (((flags & TB_WAKE_ON_CONNECT) |
> + if (((flags & TB_WAKE_ON_CONNECT) &&
> device_may_wakeup(&usb4->dev)) && !configured)
> val |= PORT_CS_19_WOC;
> - if (((flags & TB_WAKE_ON_DISCONNECT) |
> + if (((flags & TB_WAKE_ON_DISCONNECT) &&
> device_may_wakeup(&usb4->dev)) && configured)
> val |= PORT_CS_19_WOD;
> if ((flags & TB_WAKE_ON_USB4) && configured)
> --
> 2.43.0
>