From: Nirmal Patel <nirmal.patel(a)linux.intel.com>
[ Upstream commit 886e67100b904cb1b106ed1dfa8a60696aff519a ]
During the boot process all the PCI devices are assigned default PCI-MSI
IRQ domain including VMD endpoint devices. If interrupt-remapping is
enabled by IOMMU, the PCI devices except VMD get new INTEL-IR-MSI IRQ
domain. And VMD is supposed to create and assign a separate VMD-MSI IRQ
domain for its child devices in order to support MSI-X remapping
capabilities.
Now when MSI-X remapping in VMD is disabled in order to improve
performance, VMD skips VMD-MSI IRQ domain assignment process to its
child devices. Thus the devices behind VMD get default PCI-MSI IRQ
domain instead of INTEL-IR-MSI IRQ domain when VMD creates root bus and
configures child devices.
As a result host OS fails to boot and DMAR errors were observed when
interrupt remapping was enabled on Intel Icelake CPUs. For instance:
DMAR: DRHD: handling fault status reg 2
DMAR: [INTR-REMAP] Request device [0xe2:0x00.0] fault index 0xa00 [fault reason 0x25] Blocked a compatibility format interrupt request
To fix this issue, dev_msi_info struct in dev struct maintains correct
value of IRQ domain. VMD will use this information to assign proper IRQ
domain to its child devices when it doesn't create a separate IRQ domain.
Link: https://lore.kernel.org/r/20220511095707.25403-2-nirmal.patel@linux.intel.c…
Signed-off-by: Nirmal Patel <nirmal.patel(a)linux.intel.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
(cherry picked from commit 886e67100b904cb1b106ed1dfa8a60696aff519a)
[ This patch has already been backported to the Ubuntu 5.15 kernel
and fixes boot issues on Intel platforms with VMD rev 04,
confirmed on version 5.15.189. ]
Signed-off-by: Artur Piechocki <artur.piechocki(a)open-e.com>
---
drivers/pci/controller/vmd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c
index 846590706a38..d0d18e9c6968 100644
--- a/drivers/pci/controller/vmd.c
+++ b/drivers/pci/controller/vmd.c
@@ -799,6 +799,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features)
vmd_attach_resources(vmd);
if (vmd->irq_domain)
dev_set_msi_domain(&vmd->bus->dev, vmd->irq_domain);
+ else
+ dev_set_msi_domain(&vmd->bus->dev,
+ dev_get_msi_domain(&vmd->dev->dev));
WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj,
"domain"), "Can't create symlink to domain\n");
--
2.50.1
The ring buffer size varies across VMBus channels. The size of sysfs
node for the ring buffer is currently hardcoded to 4 MB. Userspace
clients either use fstat() or hardcode this size for doing mmap().
To address this, make the sysfs node size dynamic to reflect the
actual ring buffer size for each channel. This will ensure that
fstat() on ring sysfs node always returns the correct size of
ring buffer.
This is a backport of the upstream commit
65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic")
with modifications, as the original patch has missing dependencies on
kernel v6.12.x. The structure "struct attribute_group" does not have
bin_size field in v6.12.x kernel so the logic of configuring size of
sysfs node for ring buffer has been moved to
vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size
related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page")
Cc: <stable(a)vger.kernel.org> # 6.12.x
Signed-off-by: Naman Jain <namjain(a)linux.microsoft.com>
---
This change won't apply on older kernels currently due to missing
dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has
changed completely, while the functionality remains same.
Requesting Michael, Dexuan, Wei to please review again.
---
drivers/hv/vmbus_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 1f519e925f06..616e63fb2f15 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = {
.name = "ring",
.mode = 0600,
},
- .size = 2 * SZ_2M,
.mmap = hv_mmap_ring_buffer_wrapper,
};
static struct attribute *vmbus_chan_attrs[] = {
@@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj,
/* Hide ring attribute if channel's ring_sysfs_visible is set to false */
if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible)
return 0;
+ attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
return attr->attr.mode;
}
--
2.34.1
Ever since commit c2ff29e99a76 ("siw: Inline do_tcp_sendpages()"),
we have been doing this:
static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
size_t size)
[...]
/* Calculate the number of bytes we need to push, for this page
* specifically */
size_t bytes = min_t(size_t, PAGE_SIZE - offset, size);
/* If we can't splice it, then copy it in, as normal */
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
/* Set the bvec pointing to the page, with len $bytes */
bvec_set_page(&bvec, page[i], bytes, offset);
/* Set the iter to $size, aka the size of the whole sendpages (!!!) */
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
try_page_again:
lock_sock(sk);
/* Sendmsg with $size size (!!!) */
rv = tcp_sendmsg_locked(sk, &msg, size);
This means we've been sending oversized iov_iters and tcp_sendmsg calls
for a while. This has a been a benign bug because sendpage_ok() always
returned true. With the recent slab allocator changes being slowly
introduced into next (that disallow sendpage on large kmalloc
allocations), we have recently hit out-of-bounds crashes, due to slight
differences in iov_iter behavior between the MSG_SPLICE_PAGES and
"regular" copy paths:
(MSG_SPLICE_PAGES)
skb_splice_from_iter
iov_iter_extract_pages
iov_iter_extract_bvec_pages
uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere
skb_splice_from_iter gets a "short" read
(!MSG_SPLICE_PAGES)
skb_copy_to_page_nocache copy=iov_iter_count
[...]
copy_from_iter
/* this doesn't help */
if (unlikely(iter->count < len))
len = iter->count;
iterate_bvec
... and we run off the bvecs
Fix this by properly setting the iov_iter's byte count, plus sending the
correct byte count to tcp_sendmsg_locked.
Cc: stable(a)vger.kernel.org
Fixes: c2ff29e99a76 ("siw: Inline do_tcp_sendpages()")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com
Reviewed-by: David Howells <dhowells(a)redhat.com>
Signed-off-by: Pedro Falcato <pfalcato(a)suse.de>
---
v2:
- Add David Howells's Rb on the original patch
- Remove the offset increment, since it's dead code
drivers/infiniband/sw/siw/siw_qp_tx.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index 3a08f57d2211..f7dd32c6e5ba 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -340,18 +340,17 @@ static int siw_tcp_sendpages(struct socket *s, struct page **page, int offset,
if (!sendpage_ok(page[i]))
msg.msg_flags &= ~MSG_SPLICE_PAGES;
bvec_set_page(&bvec, page[i], bytes, offset);
- iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size);
+ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, bytes);
try_page_again:
lock_sock(sk);
- rv = tcp_sendmsg_locked(sk, &msg, size);
+ rv = tcp_sendmsg_locked(sk, &msg, bytes);
release_sock(sk);
if (rv > 0) {
size -= rv;
sent += rv;
if (rv != bytes) {
- offset += rv;
bytes -= rv;
goto try_page_again;
}
--
2.50.1