On Mon, May 05, 2025 at 04:00:31PM +0200, Salvatore Bonaccorso wrote:
Hi Moritz,
On Mon, May 05, 2025 at 01:47:15PM +0200, Moritz Mühlenhoff wrote:
Am Wed, Apr 30, 2025 at 05:55:20PM +0200 schrieb Salvatore Bonaccorso:
Hi
We got a regression report in Debian after the update from 6.1.133 to 6.1.135. Melvin is reporting that discard/trimm trhough a RAID10 array stalls idefintively. The full report is inlined below and originates from https://bugs.debian.org/1104460 .
JFTR, we ran into the same problem with a few Wikimedia servers running 6.1.135 and RAID 10: The servers started to lock up once fstrim.service got started. Full oops messages are available at https://phabricator.wikimedia.org/P75746
Thanks for this aditional datapoints. Assuming you wont be able to thest the other stable series where the commit d05af90d6218 ("md/raid10: fix missing discard IO accounting") went in, might you at least be able to test the 6.1.y branch with the commit reverted again and manually trigger the issue?
If needed I can provide a test Debian package of 6.1.135 (or 6.1.137) with the patch reverted.
So one additional data point as several Debian users were reporting back beeing affected: One user did upgrade to 6.12.25 (where the commit was backported as well) and is not able to reproduce the issue there.
This indicates we might miss some pre-requisites in the 6.1.y series?
user is trying now the 6.1.135 with patch reverted as well.
Regards, Salvatore