Customers have been reporting that the I/O is radically being
slowed down to HPE virtual USB ILO served DVD images during installation.
Lots of investigation by the Red Hat lab has found that the issue is
because MSI edge interrupts do not work properly for these
ILO USB devices.
We start fast and then drop to polling mode and its unusable.
The issue exists currently upstream on 5.13 as tested by Red Hat,
and reverting the mentioned patch corrects this upstream.
David Jeffery has this explanation:
The problem with the patch turning on MSI appears to be that the ehci
driver (and possibly other usb controller types too) wasn't written to
support edge-triggered interrupts.
The ehci_irq routine appears to be written in such a way that it will
be racy with multiple interrupt source bits.
With a level-triggered interrupt, it gets called another time and cleans
up other interrupt sources.
But with MSI edge, the interrupt state staying high results in no
new interrupt and ehci has to run based on polling.
static irqreturn_t ehci_irq (struct usb_hcd *hcd)
{
...
status = ehci_readl(ehci, &ehci->regs->status);
/* e.g. cardbus physical eject */
if (status == ~(u32) 0) {
ehci_dbg (ehci, "device removed\n");
goto dead;
}
/*
* We don't use STS_FLR, but some controllers don't like it to
* remain on, so mask it out along with the other status bits.
*/
masked_status = status & (INTR_MASK | STS_FLR);
/* Shared IRQ? */
if (!masked_status || unlikely(ehci->rh_state == EHCI_RH_HALTED)) {
spin_unlock_irqrestore(&ehci->lock, flags);
return IRQ_NONE;
}
/* clear (just) interrupts */
ehci_writel(ehci, masked_status, &ehci->regs->status);
...
ehci_irq() reads the interrupt status register and then writes the active
interrupt-related bits back out to ack the interrupt cause.
But with an edge interrupt, this is racy as another source of interrupt
could be raised by ehci between the read and the write reaching the
hardware.
e.g. If STS_IAA was set during the initial read, but some other bit like
STS_INT gets raised by the hardware between the read and the write to the
interrupt status register, the interrupt signal state won't drop.
The interrupt state says high, and since it is now edged triggered with
MSI, no new invocation of the interrupt handler gets triggered.
Suggested-by: David Jeffery <djeffery(a)redhat.com>
Suggested-by: Alexandros Panagiotou <apanagio(a)redhat.com>
Tested-by: Laurence Oberman <loberman(a)redhat.com>
Signed-off-by: Laurence Oberman <loberman(a)redhat.com>
Fixes: 306c54d0edb6ba94d39877524dddebaad7770cf2 PCI: Disable MSI for
Pericom PCIe-USB adapter
---
drivers/usb/core/hcd-pci.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
index d630ccc..522a179 100644
--- a/drivers/usb/core/hcd-pci.c
+++ b/drivers/usb/core/hcd-pci.c
@@ -195,21 +195,20 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct pci_device_id *id,
* make sure irq setup is not touched for xhci in generic hcd code
*/
if ((driver->flags & HCD_MASK) < HCD_USB3) {
- retval = pci_alloc_irq_vectors(dev, 1, 1, PCI_IRQ_LEGACY | PCI_IRQ_MSI);
- if (retval < 0) {
+ if (!dev->irq) {
dev_err(&dev->dev,
"Found HC with no IRQ. Check BIOS/PCI %s setup!\n",
pci_name(dev));
retval = -ENODEV;
goto disable_pci;
}
- hcd_irq = pci_irq_vector(dev, 0);
+ hcd_irq = dev->irq;
}
hcd = usb_create_hcd(driver, &dev->dev, pci_name(dev));
if (!hcd) {
retval = -ENOMEM;
- goto free_irq_vectors;
+ goto disable_pci;
}
hcd->amd_resume_bug = (usb_hcd_amd_remote_wakeup_quirk(dev) &&
@@ -288,9 +287,6 @@ int usb_hcd_pci_probe(struct pci_dev *dev, const struct pci_device_id *id,
put_hcd:
usb_put_hcd(hcd);
-free_irq_vectors:
- if ((driver->flags & HCD_MASK) < HCD_USB3)
- pci_free_irq_vectors(dev);
disable_pci:
pci_disable_device(dev);
dev_err(&dev->dev, "init %s fail, %d\n", pci_name(dev), retval);
@@ -352,8 +348,6 @@ void usb_hcd_pci_remove(struct pci_dev *dev)
up_read(&companions_rwsem);
}
usb_put_hcd(hcd);
- if ((hcd_driver_flags & HCD_MASK) < HCD_USB3)
- pci_free_irq_vectors(dev);
pci_disable_device(dev);
}
EXPORT_SYMBOL_GPL(usb_hcd_pci_remove);
@@ -465,7 +459,7 @@ static int suspend_common(struct device *dev, bool do_wakeup)
* synchronized here.
*/
if (!hcd->msix_enabled)
- synchronize_irq(pci_irq_vector(pci_dev, 0));
+ synchronize_irq(pci_dev->irq);
/* Downstream ports from this root hub should already be quiesced, so
* there will be no DMA activity. Now we can shut down the upstream
--
1.8.3.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
The conversion to ww mutexes failed to address the fence code which
already returns -EDEADLK when we run out of fences. Ww mutexes on
the other hand treat -EDEADLK as an internal errno value indicating
a need to restart the operation due to a deadlock. So now when the
fence code returns -EDEADLK the higher level code erroneously
restarts everything instead of returning the error to userspace
as is expected.
To remedy this let's switch the fence code to use a different errno
value for this. -ENOBUFS seems like a semi-reasonable unique choice.
Apart from igt the only user of this I could find is sna, and even
there all we do is dump the current fence registers from debugfs
into the X server log. So no user visible functionality is affected.
If we really cared about preserving this we could of course convert
back to -EDEADLK higher up, but doesn't seem like that's worth
the hassle here.
Not quite sure which commit specifically broke this, but I'll
just attribute it to the general gem ww mutex work.
Cc: stable(a)vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)intel.com>
Testcase: igt/gem_pread/exhaustion
Testcase: igt/gem_pwrite/basic-exhaustion
Testcase: igt/gem_fenced_exec_thrash/too-many-fences
Fixes: 80f0b679d6f0 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
index cac7f3f44642..f8948de72036 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c
@@ -348,7 +348,7 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt)
if (intel_has_pending_fb_unpin(ggtt->vm.i915))
return ERR_PTR(-EAGAIN);
- return ERR_PTR(-EDEADLK);
+ return ERR_PTR(-ENOBUFS);
}
int __i915_vma_pin_fence(struct i915_vma *vma)
--
2.31.1
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 949241ad55a9 - Linux 5.13.2-rc1
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: OK
Tests: FAILED
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
One or more kernel tests failed:
ppc64le:
❌ LTP
aarch64:
❌ Boot test
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
❌ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ LTP
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
❌ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage: swraid mdadm raid_module test
🚧 ❌ Podman system integration test - as root
🚧 ❌ Podman system integration test - as user
🚧 ✅ Storage blktests
🚧 ✅ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 2:
✅ Boot test
✅ Reboot test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ xfstests - cifsv3.11
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ ACPI table test
⚡⚡⚡ LTP
⚡⚡⚡ CIFS Connectathon
⚡⚡⚡ POSIX pjd-fstest suites
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ jvm - jcstress tests
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking cki netfilter test
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ xarray-idr-radixtree-test
🚧 ⚡⚡⚡ i2c: i2cdetect sanity
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ audit: audit testsuite test
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ xfstests - cifsv3.11
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ xfstests - cifsv3.11
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Check the allocation size before toggling kfence_allocation_gate.
This way allocations that can't be served by KFENCE will not result in
waiting for another CONFIG_KFENCE_SAMPLE_INTERVAL without allocating
anything.
Suggested-by: Marco Elver <elver(a)google.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org # 5.12+
Signed-off-by: Alexander Potapenko <glider(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
---
mm/kfence/core.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/kfence/core.c b/mm/kfence/core.c
index 4d21ac44d5d35..33bb20d91bf6a 100644
--- a/mm/kfence/core.c
+++ b/mm/kfence/core.c
@@ -733,6 +733,13 @@ void kfence_shutdown_cache(struct kmem_cache *s)
void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
{
+ /*
+ * Perform size check before switching kfence_allocation_gate, so that
+ * we don't disable KFENCE without making an allocation.
+ */
+ if (size > PAGE_SIZE)
+ return NULL;
+
/*
* allocation_gate only needs to become non-zero, so it doesn't make
* sense to continue writing to it and pay the associated contention
@@ -757,9 +764,6 @@ void *__kfence_alloc(struct kmem_cache *s, size_t size, gfp_t flags)
if (!READ_ONCE(kfence_enabled))
return NULL;
- if (size > PAGE_SIZE)
- return NULL;
-
return kfence_guarded_alloc(s, size, flags);
}
--
2.32.0.93.g670b81a890-goog
Affinity change can happen in both interrupt context and non-interrupt
context. The paranoia check for interrupt target cpu is not always true
in non-interrupt context.
For example:
Set affinity for irq to CPU#7
$ echo 80 > /proc/irq/default_smp_affinity
Open serial port on different CPU.
$taskset -c 4 agetty -L 115200 ttyS1
Will triger the following warning.
WARNING: CPU: 4 PID: 765 at arch/x86/kernel/apic/msi.c:75 msi_set_affinity+0x16f/0x190
Call Trace:
irq_do_set_affinity+0x57/0x1b0
irq_setup_affinity+0x136/0x190
irq_startup+0xaf/0xf0
__setup_irq+0x745/0x7e0
? kmem_cache_alloc_trace+0x2b5/0x450
request_threaded_irq+0x110/0x180
univ8250_setup_irq+0x1e2/0x220
serial8250_do_startup+0x481/0x760
serial8250_startup+0x1e/0x20
uart_startup.part.22+0xf8/0x260
uart_port_activate+0x41/0x60
tty_port_open+0x82/0xd0
? _raw_spin_unlock+0x16/0x30
uart_open+0x1e/0x30
tty_open+0x100/0x4d0
? cdev_purge+0x60/0x60
chrdev_open+0xa3/0x1c0
? cdev_put.part.3+0x20/0x20
do_dentry_open+0x21f/0x380
vfs_open+0x2f/0x40
path_openat+0xd67/0xf60
? page_add_file_rmap+0xad/0x140
? do_set_pte+0xd7/0x210
do_filp_open+0xc5/0x140
do_sys_openat2+0x239/0x300
? do_sys_openat2+0x239/0x300
do_sys_open+0x59/0x80
__x64_sys_openat+0x20/0x30
do_syscall_64+0x3f/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
With this patch, the warning is only reported in interrupt context and when
the target CPU is not local CPU.
Signed-off-by: Yongxin Liu <yongxin.liu(a)windriver.com>
---
arch/x86/kernel/apic/msi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 44ebe25e7703..c4c73f227eeb 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -72,7 +72,7 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
* Paranoia: Validate that the interrupt target is the local
* CPU.
*/
- if (WARN_ON_ONCE(cpu != smp_processor_id())) {
+ if (in_interrupt() && WARN_ON_ONCE(cpu != smp_processor_id())) {
irq_msi_update_msg(irqd, cfg);
return ret;
}
--
2.14.5
We skip filling out the pt with scratch entries if the va range covers
the entire pt, since we later have to fill it with the PTEs for the
object pages anyway. However this might leave open a small window where
the PTEs don't point to anything valid for the HW to consume.
When for example using 2M GTT pages this fill_px() showed up as being
quite significant in perf measurements, and ends up being completely
wasted since we ignore the pt and just use the pde directly.
Anyway, currently we have our PTE construction split between alloc and
insert, which is probably slightly iffy nowadays, since the alloc
doesn't actually allocate anything anymore, instead it just sets up the
page directories and points the PTEs at the scratch page. Later when we
do the insert step we re-program the PTEs again. Better might be to
squash the alloc and insert into a single step, then bringing back this
optimisation(along with some others) should be possible.
Fixes: 14826673247e ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Jon Bloomfield <jon.bloomfield(a)intel.com>
Cc: Chris Wilson <chris.p.wilson(a)intel.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: <stable(a)vger.kernel.org> # v4.15+
---
drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
index 3d02c726c746..6e0e52eeb87a 100644
--- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
@@ -303,10 +303,7 @@ static void __gen8_ppgtt_alloc(struct i915_address_space * const vm,
__i915_gem_object_pin_pages(pt->base);
i915_gem_object_make_unshrinkable(pt->base);
- if (lvl ||
- gen8_pt_count(*start, end) < I915_PDES ||
- intel_vgpu_active(vm->i915))
- fill_px(pt, vm->scratch[lvl]->encode);
+ fill_px(pt, vm->scratch[lvl]->encode);
spin_lock(&pd->lock);
if (likely(!pd->entry[idx])) {
--
2.26.3
Hi,
It looks like there are multiple use-after-free accesses in
j1939_session_deactivate()
static bool j1939_session_deactivate(struct j1939_session *session)
{
bool active;
j1939_session_list_lock(session->priv);
active = j1939_session_deactivate_locked(session); //session can be freed inside
j1939_session_list_unlock(session->priv); // It causes UAF read and write
return active;
}
session can be freed by
j1939_session_deactivate_locked->j1939_session_put->__j1939_session_release->j1939_session_destroy->kfree.
Therefore it makes the unlock function perform UAF access.
Best,
Xiaochen Zou
Department of Computer Science & Engineering
University of California, Riverside
The performance reporting driver added cpu hotplug
feature but it didn't add pmu migration call in cpu
offline function.
This can create an issue incase the current designated
cpu being used to collect fme pmu data got offline,
as based on current code we are not migrating fme pmu to
new target cpu. Because of that perf will still try to
fetch data from that offline cpu and hence we will not
get counter data.
Patch fixed this issue by adding pmu_migrate_context call
in fme_perf_offline_cpu function.
Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
Tested-by: Xu Yilun <yilun.xu(a)intel.com>
Signed-off-by: Kajol Jain <kjain(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org
---
drivers/fpga/dfl-fme-perf.c | 4 ++++
1 file changed, 4 insertions(+)
---
Changelog:
v1 -> v2:
- Add stable(a)vger.kernel.org in cc list
RFC -> PATCH v1
- Remove RFC tag
- Did nits changes on subject and commit message as suggested by Xu Yilun
- Added Tested-by tag
- Link to rfc patch: https://lkml.org/lkml/2021/6/28/112
---
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 4299145ef347..b9a54583e505 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -953,6 +953,10 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
return 0;
priv->cpu = target;
+
+ /* Migrate fme_perf pmu events to the new target cpu */
+ perf_pmu_migrate_context(&priv->pmu, cpu, target);
+
return 0;
}
--
2.31.1