From: Vitaly Kuznetsov <vkuznets(a)redhat.com>
[ Upstream commit f0880e2cb7e1f8039a048fdd01ce45ab77247221 ]
Passed through PCI device sometimes misbehave on Gen1 VMs when Hyper-V
DRM driver is also loaded. Looking at IOMEM assignment, we can see e.g.
$ cat /proc/iomem
...
f8000000-fffbffff : PCI Bus 0000:00
f8000000-fbffffff : 0000:00:08.0
f8000000-f8001fff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
...
fe0000000-fffffffff : PCI Bus 0000:00
fe0000000-fe07fffff : bb8c4f33-2ba2-4808-9f7f-02f3b4da22fe
fe0000000-fe07fffff : 2ba2:00:02.0
fe0000000-fe07fffff : mlx4_core
the interesting part is the 'f8000000' region as it is actually the
VM's framebuffer:
$ lspci -v
...
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA (prog-if 00 [VGA controller])
Flags: bus master, fast devsel, latency 0, IRQ 11
Memory at f8000000 (32-bit, non-prefetchable) [size=64M]
...
hv_vmbus: registering driver hyperv_drm
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Synthvid Version major 3, minor 5
hyperv_drm 0000:00:08.0: vgaarb: deactivate vga console
hyperv_drm 0000:00:08.0: BAR 0: can't reserve [mem 0xf8000000-0xfbffffff]
hyperv_drm 5620e0c7-8062-4dce-aeb7-520c7ef76171: [drm] Cannot request framebuffer, boot fb still active?
Note: "Cannot request framebuffer" is not a fatal error in
hyperv_setup_gen1() as the code assumes there's some other framebuffer
device there but we actually have some other PCI device (mlx4 in this
case) config space there!
The problem appears to be that vmbus_allocate_mmio() can use dedicated
framebuffer region to serve any MMIO request from any device. The
semantics one might assume of a parameter named "fb_overlap_ok"
aren't implemented because !fb_overlap_ok essentially has no effect.
The existing semantics are really "prefer_fb_overlap". This patch
implements the expected and needed semantics, which is to not allocate
from the frame buffer space when !fb_overlap_ok.
Note, Gen2 VMs are usually unaffected by the issue because
framebuffer region is already taken by EFI fb (in case kernel supports
it) but Gen1 VMs may have this region unclaimed by the time Hyper-V PCI
pass-through driver tries allocating MMIO space if Hyper-V DRM/FB drivers
load after it. Devices can be brought up in any sequence so let's
resolve the issue by always ignoring 'fb_mmio' region for non-FB
requests, even if the region is unclaimed.
Reviewed-by: Michael Kelley <mikelley(a)microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Link: https://lore.kernel.org/r/20220827130345.1320254-4-vkuznets@redhat.com
Signed-off-by: Wei Liu <wei.liu(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/hv/vmbus_drv.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 547ae334e5cd..027029efb008 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -2309,7 +2309,7 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
bool fb_overlap_ok)
{
struct resource *iter, *shadow;
- resource_size_t range_min, range_max, start;
+ resource_size_t range_min, range_max, start, end;
const char *dev_n = dev_name(&device_obj->device);
int retval;
@@ -2344,6 +2344,14 @@ int vmbus_allocate_mmio(struct resource **new, struct hv_device *device_obj,
range_max = iter->end;
start = (range_min + align - 1) & ~(align - 1);
for (; start + size - 1 <= range_max; start += align) {
+ end = start + size - 1;
+
+ /* Skip the whole fb_mmio region if not fb_overlap_ok */
+ if (!fb_overlap_ok && fb_mmio &&
+ (((start >= fb_mmio->start) && (start <= fb_mmio->end)) ||
+ ((end >= fb_mmio->start) && (end <= fb_mmio->end))))
+ continue;
+
shadow = __request_region(iter, start, size, NULL,
IORESOURCE_BUSY);
if (!shadow)
--
2.35.1
The patch titled
Subject: mm/huge_memory: do not clobber swp_entry_t during THP split
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-do-not-clobber-swp_entry_t-during-thp-split.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm/huge_memory: do not clobber swp_entry_t during THP split
Date: Wed, 19 Oct 2022 14:41:56 +0100
The following has been observed when running stressng mmap since commit
b653db77350c ("mm: Clear page->private when splitting or migrating a page")
watchdog: BUG: soft lockup - CPU#75 stuck for 26s! [stress-ng:9546]
CPU: 75 PID: 9546 Comm: stress-ng Tainted: G E 6.0.0-revert-b653db77-fix+ #29 0357d79b60fb09775f678e4f3f64ef0579ad1374
Hardware name: SGI.COM C2112-4GP3/X10DRT-P-Series, BIOS 2.0a 05/09/2016
RIP: 0010:xas_descend+0x28/0x80
Code: cc cc 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
RSP: 0018:ffffbbf02a2236a8 EFLAGS: 00000246
RAX: ffff9cab7d6a0002 RBX: ffffe04b0af88040 RCX: 0000000000000002
RDX: 0000000000000030 RSI: ffff9cab60509b60 RDI: ffffbbf02a2236c0
RBP: 0000000000000000 R08: ffff9cab60509b60 R09: ffffbbf02a2236c0
R10: 0000000000000001 R11: ffffbbf02a223698 R12: 0000000000000000
R13: ffff9cab4e28da80 R14: 0000000000039c01 R15: ffff9cab4e28da88
FS: 00007fab89b85e40(0000) GS:ffff9cea3fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fab84e00000 CR3: 00000040b73a4003 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
xas_load+0x3a/0x50
__filemap_get_folio+0x80/0x370
? put_swap_page+0x163/0x360
pagecache_get_page+0x13/0x90
__try_to_reclaim_swap+0x50/0x190
scan_swap_map_slots+0x31e/0x670
get_swap_pages+0x226/0x3c0
folio_alloc_swap+0x1cc/0x240
add_to_swap+0x14/0x70
shrink_page_list+0x968/0xbc0
reclaim_page_list+0x70/0xf0
reclaim_pages+0xdd/0x120
madvise_cold_or_pageout_pte_range+0x814/0xf30
walk_pgd_range+0x637/0xa30
__walk_page_range+0x142/0x170
walk_page_range+0x146/0x170
madvise_pageout+0xb7/0x280
? asm_common_interrupt+0x22/0x40
madvise_vma_behavior+0x3b7/0xac0
? find_vma+0x4a/0x70
? find_vma+0x64/0x70
? madvise_vma_anon_name+0x40/0x40
madvise_walk_vmas+0xa6/0x130
do_madvise+0x2f4/0x360
__x64_sys_madvise+0x26/0x30
do_syscall_64+0x5b/0x80
? do_syscall_64+0x67/0x80
? syscall_exit_to_user_mode+0x17/0x40
? do_syscall_64+0x67/0x80
? syscall_exit_to_user_mode+0x17/0x40
? do_syscall_64+0x67/0x80
? do_syscall_64+0x67/0x80
? common_interrupt+0x8b/0xa0
entry_SYSCALL_64_after_hwframe+0x63/0xcd
The problem can be reproduced with the mmtests config
config-workload-stressng-mmap. It does not always happen and when it
triggers is variable but it has happened on multiple machines.
The intent of commit b653db77350c patch was to avoid the case where
PG_private is clear but folio->private is not-NULL. However, THP tail
pages uses page->private for "swp_entry_t if folio_test_swapcache()" as
stated in the documentation for struct folio. This patch only clobbers
page->private for tail pages if the head page was not in swapcache and
warns once if page->private had an unexpected value.
Link: https://lkml.kernel.org/r/20221019134156.zjyyn5aownakvztf@techsingularity.n…
Fixes: b653db77350c ("mm: Clear page->private when splitting or migrating a page")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Brian Foster <bfoster(a)redhat.com>
Cc: Dan Streetman <ddstreet(a)ieee.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: Seth Jennings <sjenning(a)redhat.com>
Cc: Vitaly Wool <vitaly.wool(a)konsulko.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-do-not-clobber-swp_entry_t-during-thp-split
+++ a/mm/huge_memory.c
@@ -2455,7 +2455,16 @@ static void __split_huge_page_tail(struc
page_tail);
page_tail->mapping = head->mapping;
page_tail->index = head->index + tail;
- page_tail->private = 0;
+
+ /*
+ * page->private should not be set in tail pages with the exception
+ * of swap cache pages that store the swp_entry_t in tail pages.
+ * Fix up and warn once if private is unexpectedly set.
+ */
+ if (!folio_test_swapcache(page_folio(head))) {
+ VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, head);
+ page_tail->private = 0;
+ }
/* Page flags must be visible before we make the page non-compound. */
smp_wmb();
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-huge_memory-do-not-clobber-swp_entry_t-during-thp-split.patch
The gadget driver may wait on the request completion when it sets the
USB_GADGET_DELAYED_STATUS. Make sure that the End Transfer command can
go through if the dwc->delayed_status is set so that the request can
complete. When the delayed_status is set, the Setup packet is already
processed, and the next phase should be either Data or Status. It's
unlikely that the host would cancel the control transfer and send a new
Setup packet during End Transfer command. But if that's the case, we can
try again when ep0state returns to EP0_SETUP_PHASE.
Fixes: e1ee843488d5 ("usb: dwc3: gadget: Force sending delayed status during soft disconnect")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
Changes in v2:
- Fix build issue due to cherry-picking...
drivers/usb/dwc3/gadget.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..dd8ecbe61bec 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1698,6 +1698,16 @@ static int __dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force, bool int
cmd |= DWC3_DEPCMD_PARAM(dep->resource_index);
memset(¶ms, 0, sizeof(params));
ret = dwc3_send_gadget_ep_cmd(dep, cmd, ¶ms);
+ /*
+ * If the End Transfer command was timed out while the device is
+ * not in SETUP phase, it's possible that an incoming Setup packet
+ * may prevent the command's completion. Let's retry when the
+ * ep0state returns to EP0_SETUP_PHASE.
+ */
+ if (ret == -ETIMEDOUT && dep->dwc->ep0state != EP0_SETUP_PHASE) {
+ dep->flags |= DWC3_EP_DELAY_STOP;
+ return 0;
+ }
WARN_ON_ONCE(ret);
dep->resource_index = 0;
@@ -3719,7 +3729,7 @@ void dwc3_stop_active_transfer(struct dwc3_ep *dep, bool force,
* timeout. Delay issuing the End Transfer command until the Setup TRB is
* prepared.
*/
- if (dwc->ep0state != EP0_SETUP_PHASE) {
+ if (dwc->ep0state != EP0_SETUP_PHASE && !dwc->delayed_status) {
dep->flags |= DWC3_EP_DELAY_STOP;
return;
}
base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
--
2.28.0