Hi all,
We've identified a regression affecting PCI passthrough / SR-IOV virtualization starting from Linux v6.12.35.
A user reported that [1], beginning with this version, SR-IOV virtual functions fail to initialize properly inside the guest. The issue appears to some MMIO operations not completing correctly in the guest.
[ 2.152320] i915 0000:07:00.0: [drm] *ERROR* GT0: GUC: mmio request 0x4509: failure 306/0 [ 2.152327] i915 0000:07:00.0: [drm] *ERROR* GuC initialization failed (-ENXIO) [ 2.152330] i915 0000:07:00.0: [drm] *ERROR* GT0: Failed to initialize GPU, declaring it wedged!
Here is the |git bisect| log:
# bad: [fbad404f04d758c52bae79ca20d0e7fe5fef91d3] Linux 6.12.37 # good: [e03ced99c437f4a7992b8fa3d97d598f55453fd0] Linux 6.12.33 git bisect start 'HEAD' 'v6.12.33' # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc # good: [35f116a4658f787bea7e82fdd23e2e9789254f5e] drm/xe: Make xe_gt_freq part of the Documentation git bisect good 35f116a4658f787bea7e82fdd23e2e9789254f5e # good: [261f2a655b709e59a8d759ce9fa478778d9e84f4] crypto: qat - add shutdown handler to qat_c3xxx git bisect good 261f2a655b709e59a8d759ce9fa478778d9e84f4 # good: [4d0686b53cc9342be3f8ce06336fd5ab0d206355] ata: ahci: Disallow LPM for Asus B550-F motherboard git bisect good 4d0686b53cc9342be3f8ce06336fd5ab0d206355 # bad: [ce4ef0274cb66a4750000f33f2d316c0dbaf4515] KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY git bisect bad ce4ef0274cb66a4750000f33f2d316c0dbaf4515 # bad: [8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9] parisc/unaligned: Fix hex output to show 8 hex chars git bisect bad 8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9 # good: [fed611bd8c7b76b070aa407d0c7558e20d9e1f68] f2fs: fix to do sanity check on ino and xnid git bisect good fed611bd8c7b76b070aa407d0c7558e20d9e1f68 # good: [8a008c89e5e5c5332e4c0a33d707db9ddd529f8a] net/sched: fix use-after-free in taprio_dev_notifier git bisect good 8a008c89e5e5c5332e4c0a33d707db9ddd529f8a # bad: [3f2098f4fba7718eb2501207ca6e99d22427f25a] fbdev: Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var git bisect bad 3f2098f4fba7718eb2501207ca6e99d22427f25a # bad: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices git bisect bad fb5873b779dd5858123c19bbd6959566771e2e83 # good: [81c64c2f84ab581d1c45dbbbca941c13128faee6] net: ftgmac100: select FIXED_PHY git bisect good 81c64c2f84ab581d1c45dbbbca941c13128faee6 # first bad commit: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
commit fb5873b779dd5858123c19bbd6959566771e2e83 Author: Lu Baolu baolu.lu@linux.intel.com Date: Tue May 20 15:58:49 2025 +0800
iommu/vt-d: Restore context entry setup order for aliased devices commit 320302baed05c6456164652541f23d2a96522c06 upstream.
This commit was introduced in [2], and the issue only affects stable kernels prior to v6.15. Besides, the Ubuntu v6.14-series kernel used by Proxmox also appears to be affected [3].
Best regards,
Ban ZuoXiang
[1]: https://github.com/strongtz/i915-sriov-dkms/issues/320
[2]: https://lore.kernel.org/r/20250514060523.2862195-1-baolu.lu@linux.intel.com
On 7/21/25 17:59, Ban ZuoXiang wrote:
Hi all,
We've identified a regression affecting PCI passthrough / SR-IOV virtualization starting from Linux v6.12.35.
A user reported that [1], beginning with this version, SR-IOV virtual functions fail to initialize properly inside the guest. The issue appears to some MMIO operations not completing correctly in the guest.
[ 2.152320] i915 0000:07:00.0: [drm]*ERROR* GT0: GUC: mmio request 0x4509: failure 306/0 [ 2.152327] i915 0000:07:00.0: [drm]*ERROR* GuC initialization failed (-ENXIO) [ 2.152330] i915 0000:07:00.0: [drm]*ERROR* GT0: Failed to initialize GPU, declaring it wedged!
Here is the|git bisect| log:
# bad: [fbad404f04d758c52bae79ca20d0e7fe5fef91d3] Linux 6.12.37 # good: [e03ced99c437f4a7992b8fa3d97d598f55453fd0] Linux 6.12.33 git bisect start 'HEAD' 'v6.12.33' # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc # bad: [b01a29a80cca28f0c7d0864e2d62fb9616051bfc] ACPI: bus: Bail out if acpi_kobj registration fails git bisect bad b01a29a80cca28f0c7d0864e2d62fb9616051bfc # good: [35f116a4658f787bea7e82fdd23e2e9789254f5e] drm/xe: Make xe_gt_freq part of the Documentation git bisect good 35f116a4658f787bea7e82fdd23e2e9789254f5e # good: [261f2a655b709e59a8d759ce9fa478778d9e84f4] crypto: qat - add shutdown handler to qat_c3xxx git bisect good 261f2a655b709e59a8d759ce9fa478778d9e84f4 # good: [4d0686b53cc9342be3f8ce06336fd5ab0d206355] ata: ahci: Disallow LPM for Asus B550-F motherboard git bisect good 4d0686b53cc9342be3f8ce06336fd5ab0d206355 # bad: [ce4ef0274cb66a4750000f33f2d316c0dbaf4515] KVM: s390: rename PROT_NONE to PROT_TYPE_DUMMY git bisect bad ce4ef0274cb66a4750000f33f2d316c0dbaf4515 # bad: [8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9] parisc/unaligned: Fix hex output to show 8 hex chars git bisect bad 8d0645b59b19d97a3b7c5a3fb8dae0c89e98cde9 # good: [fed611bd8c7b76b070aa407d0c7558e20d9e1f68] f2fs: fix to do sanity check on ino and xnid git bisect good fed611bd8c7b76b070aa407d0c7558e20d9e1f68 # good: [8a008c89e5e5c5332e4c0a33d707db9ddd529f8a] net/sched: fix use-after-free in taprio_dev_notifier git bisect good 8a008c89e5e5c5332e4c0a33d707db9ddd529f8a # bad: [3f2098f4fba7718eb2501207ca6e99d22427f25a] fbdev: Fix do_register_framebuffer to prevent null-ptr-deref in fb_videomode_to_var git bisect bad 3f2098f4fba7718eb2501207ca6e99d22427f25a # bad: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices git bisect bad fb5873b779dd5858123c19bbd6959566771e2e83 # good: [81c64c2f84ab581d1c45dbbbca941c13128faee6] net: ftgmac100: select FIXED_PHY git bisect good 81c64c2f84ab581d1c45dbbbca941c13128faee6 # first bad commit: [fb5873b779dd5858123c19bbd6959566771e2e83] iommu/vt-d: Restore context entry setup order for aliased devices
commit fb5873b779dd5858123c19bbd6959566771e2e83 Author: Lu Baolubaolu.lu@linux.intel.com Date: Tue May 20 15:58:49 2025 +0800
iommu/vt-d: Restore context entry setup order for aliased devices commit 320302baed05c6456164652541f23d2a96522c06 upstream.
This commit was introduced in [2], and the issue only affects stable kernels prior to v6.15. Besides, the Ubuntu v6.14-series kernel used by Proxmox also appears to be affected [3].
Thanks for reporting. Can this issue be reproduced with the latest mainline linux kernel? Can it work if you simply revert this commit?
Thanks, baolu
Thanks for reporting. Can this issue be reproduced with the latest mainline linux kernel? Can it work if you simply revert this commit?
Thanks, baolu
Simply reverting this commit can resolve the issue. Since Intel GPU SR-IOV currently depends on out-of-tree modules and is not yet compatible with the mainline kernel, I will test it later. It can be confirmed that the v6.15 stable series is not affected, which also includes a backport of this commit.
regards, Ban Zuoxiang
Thanks for reporting. Can this issue be reproduced with the latest mainline linux kernel? Can it work if you simply revert this commit?
Thanks, baolu
Hi, baolu
The issue cannot be reproduced on the latest mainline kernel (6.16.0-rc7-1-mainline). The Ubuntu v6.14-series kernel which also include the commit is also not affected. I think the issue only affects the v6.12 series in linux-stable tree. Should I wait for the stable maintainers to solve it?
Thanks, Ban ZuoXiang
On Tue, Jul 22, 2025 at 09:14:08PM +0800, Ban ZuoXiang wrote:
Thanks for reporting. Can this issue be reproduced with the latest mainline linux kernel? Can it work if you simply revert this commit?
Thanks, baolu
Hi, baolu
The issue cannot be reproduced on the latest mainline kernel (6.16.0-rc7-1-mainline). The Ubuntu v6.14-series kernel which also include the commit is also not affected. I think the issue only affects the v6.12 series in linux-stable tree. Should I wait for the stable maintainers to solve it?
Nope! We need your help as you are the one that can reproduce it :)
Are we missing a backport? Did we get the backport incorrect? Should we just revert it?
thanks,
greg k-h
linux-stable-mirror@lists.linaro.org