The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d4c972bff3129a9dd4c22a3999fd8eba1a81531a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042336-upscale-specks-c8aa@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d4c972bff312 ("drm/vmwgfx: Sort primary plane formats by order of preference")
8bb75aeb58bd ("drm/vmwgfx: validate the screen formats")
92f1d09ca4ed ("drm: Switch to %p4cc format modifier")
d8713d6684a4 ("drm/vmwgfx/vmwgfx_kms: Mark vmw_{cursor,primary}_plane_formats as __maybe_unused")
5fbd41d3bf12 ("Merge tag 'drm-misc-next-2020-11-27-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4c972bff3129a9dd4c22a3999fd8eba1a81531a Mon Sep 17 00:00:00 2001
From: Zack Rusin <zack.rusin(a)broadcom.com>
Date: Thu, 11 Apr 2024 22:55:11 -0400
Subject: [PATCH] drm/vmwgfx: Sort primary plane formats by order of preference
The table of primary plane formats wasn't sorted at all, leading to
applications picking our least desirable formats by defaults.
Sort the primary plane formats according to our order of preference.
Nice side-effect of this change is that it makes IGT's kms_atomic
plane-invalid-params pass because the test picks the first format
which for vmwgfx was DRM_FORMAT_XRGB1555 and uses fb's with odd sizes
which make Pixman, which IGT depends on assert due to the fact that our
16bpp formats aren't 32 bit aligned like Pixman requires all formats
to be.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 36cc79bc9077 ("drm/vmwgfx: Add universal plane support")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.12+
Acked-by: Pekka Paalanen <pekka.paalanen(a)collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-6-zack.r…
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index a94947b588e8..19a843da87b7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -243,10 +243,10 @@ struct vmw_framebuffer_bo {
static const uint32_t __maybe_unused vmw_primary_plane_formats[] = {
- DRM_FORMAT_XRGB1555,
- DRM_FORMAT_RGB565,
DRM_FORMAT_XRGB8888,
DRM_FORMAT_ARGB8888,
+ DRM_FORMAT_RGB565,
+ DRM_FORMAT_XRGB1555,
};
static const uint32_t __maybe_unused vmw_cursor_plane_formats[] = {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d4c972bff3129a9dd4c22a3999fd8eba1a81531a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042335-prowling-handwash-80c5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d4c972bff312 ("drm/vmwgfx: Sort primary plane formats by order of preference")
8bb75aeb58bd ("drm/vmwgfx: validate the screen formats")
92f1d09ca4ed ("drm: Switch to %p4cc format modifier")
d8713d6684a4 ("drm/vmwgfx/vmwgfx_kms: Mark vmw_{cursor,primary}_plane_formats as __maybe_unused")
5fbd41d3bf12 ("Merge tag 'drm-misc-next-2020-11-27-1' of git://anongit.freedesktop.org/drm/drm-misc into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d4c972bff3129a9dd4c22a3999fd8eba1a81531a Mon Sep 17 00:00:00 2001
From: Zack Rusin <zack.rusin(a)broadcom.com>
Date: Thu, 11 Apr 2024 22:55:11 -0400
Subject: [PATCH] drm/vmwgfx: Sort primary plane formats by order of preference
The table of primary plane formats wasn't sorted at all, leading to
applications picking our least desirable formats by defaults.
Sort the primary plane formats according to our order of preference.
Nice side-effect of this change is that it makes IGT's kms_atomic
plane-invalid-params pass because the test picks the first format
which for vmwgfx was DRM_FORMAT_XRGB1555 and uses fb's with odd sizes
which make Pixman, which IGT depends on assert due to the fact that our
16bpp formats aren't 32 bit aligned like Pixman requires all formats
to be.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 36cc79bc9077 ("drm/vmwgfx: Add universal plane support")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.12+
Acked-by: Pekka Paalanen <pekka.paalanen(a)collabora.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240412025511.78553-6-zack.r…
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
index a94947b588e8..19a843da87b7 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.h
@@ -243,10 +243,10 @@ struct vmw_framebuffer_bo {
static const uint32_t __maybe_unused vmw_primary_plane_formats[] = {
- DRM_FORMAT_XRGB1555,
- DRM_FORMAT_RGB565,
DRM_FORMAT_XRGB8888,
DRM_FORMAT_ARGB8888,
+ DRM_FORMAT_RGB565,
+ DRM_FORMAT_XRGB1555,
};
static const uint32_t __maybe_unused vmw_cursor_plane_formats[] = {
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x b6976f323a8687cc0d55bc92c2086fd934324ed5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042340-laborious-overact-587b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
b6976f323a86 ("drm/ttm: stop pooling cached NUMA pages v2")
fd37721803c6 ("mm, treewide: introduce NR_PAGE_ORDERS")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b6976f323a8687cc0d55bc92c2086fd934324ed5 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <ckoenig.leichtzumerken(a)gmail.com>
Date: Mon, 15 Apr 2024 15:48:21 +0200
Subject: [PATCH] drm/ttm: stop pooling cached NUMA pages v2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We only pool write combined and uncached allocations because they
require extra overhead on allocation and release.
If we also pool cached NUMA it not only means some extra unnecessary
overhead, but also that under memory pressure it can happen that
pages from the wrong NUMA node enters the pool and are re-used
over and over again.
This can lead to performance reduction after running into memory
pressure.
v2: restructure and cleanup the code a bit from the internal hack to
test this.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
Fixes: 4482d3c94d7f ("drm/ttm: add NUMA node id to the pool")
CC: stable(a)vger.kernel.org
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240415134821.1919-1-christi…
diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c
index 112438d965ff..6e1fd6985ffc 100644
--- a/drivers/gpu/drm/ttm/ttm_pool.c
+++ b/drivers/gpu/drm/ttm/ttm_pool.c
@@ -288,17 +288,23 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool,
enum ttm_caching caching,
unsigned int order)
{
- if (pool->use_dma_alloc || pool->nid != NUMA_NO_NODE)
+ if (pool->use_dma_alloc)
return &pool->caching[caching].orders[order];
#ifdef CONFIG_X86
switch (caching) {
case ttm_write_combined:
+ if (pool->nid != NUMA_NO_NODE)
+ return &pool->caching[caching].orders[order];
+
if (pool->use_dma32)
return &global_dma32_write_combined[order];
return &global_write_combined[order];
case ttm_uncached:
+ if (pool->nid != NUMA_NO_NODE)
+ return &pool->caching[caching].orders[order];
+
if (pool->use_dma32)
return &global_dma32_uncached[order];
@@ -566,11 +572,17 @@ void ttm_pool_init(struct ttm_pool *pool, struct device *dev,
pool->use_dma_alloc = use_dma_alloc;
pool->use_dma32 = use_dma32;
- if (use_dma_alloc || nid != NUMA_NO_NODE) {
- for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
- for (j = 0; j < NR_PAGE_ORDERS; ++j)
- ttm_pool_type_init(&pool->caching[i].orders[j],
- pool, i, j);
+ for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
+ for (j = 0; j < NR_PAGE_ORDERS; ++j) {
+ struct ttm_pool_type *pt;
+
+ /* Initialize only pool types which are actually used */
+ pt = ttm_pool_select_type(pool, i, j);
+ if (pt != &pool->caching[i].orders[j])
+ continue;
+
+ ttm_pool_type_init(pt, pool, i, j);
+ }
}
}
EXPORT_SYMBOL(ttm_pool_init);
@@ -599,10 +611,16 @@ void ttm_pool_fini(struct ttm_pool *pool)
{
unsigned int i, j;
- if (pool->use_dma_alloc || pool->nid != NUMA_NO_NODE) {
- for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i)
- for (j = 0; j < NR_PAGE_ORDERS; ++j)
- ttm_pool_type_fini(&pool->caching[i].orders[j]);
+ for (i = 0; i < TTM_NUM_CACHING_TYPES; ++i) {
+ for (j = 0; j < NR_PAGE_ORDERS; ++j) {
+ struct ttm_pool_type *pt;
+
+ pt = ttm_pool_select_type(pool, i, j);
+ if (pt != &pool->caching[i].orders[j])
+ continue;
+
+ ttm_pool_type_fini(pt);
+ }
}
/* We removed the pool types from the LRU, but we need to also make sure
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 46dad3c1e57897ab9228332f03e1c14798d2d3b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042347-create-item-ebf8@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
46dad3c1e578 ("init/main.c: Fix potential static_command_line memory overflow")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46dad3c1e57897ab9228332f03e1c14798d2d3b9 Mon Sep 17 00:00:00 2001
From: Yuntao Wang <ytcoode(a)gmail.com>
Date: Fri, 12 Apr 2024 16:17:32 +0800
Subject: [PATCH] init/main.c: Fix potential static_command_line memory
overflow
We allocate memory of size 'xlen + strlen(boot_command_line) + 1' for
static_command_line, but the strings copied into static_command_line are
extra_command_line and command_line, rather than extra_command_line and
boot_command_line.
When strlen(command_line) > strlen(boot_command_line), static_command_line
will overflow.
This patch just recovers strlen(command_line) which was miss-consolidated
with strlen(boot_command_line) in the commit f5c7310ac73e ("init/main: add
checks for the return value of memblock_alloc*()")
Link: https://lore.kernel.org/all/20240412081733.35925-2-ytcoode@gmail.com/
Fixes: f5c7310ac73e ("init/main: add checks for the return value of memblock_alloc*()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yuntao Wang <ytcoode(a)gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/init/main.c b/init/main.c
index 881f6230ee59..5dcf5274c09c 100644
--- a/init/main.c
+++ b/init/main.c
@@ -636,6 +636,8 @@ static void __init setup_command_line(char *command_line)
if (!saved_command_line)
panic("%s: Failed to allocate %zu bytes\n", __func__, len + ilen);
+ len = xlen + strlen(command_line) + 1;
+
static_command_line = memblock_alloc(len, SMP_CACHE_BYTES);
if (!static_command_line)
panic("%s: Failed to allocate %zu bytes\n", __func__, len);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 49ff3b4aec51e3abfc9369997cc603319b02af9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042325-rockstar-civic-9c1a@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
49ff3b4aec51 ("KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms")
a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ff3b4aec51e3abfc9369997cc603319b02af9a Mon Sep 17 00:00:00 2001
From: Sandipan Das <sandipan.das(a)amd.com>
Date: Fri, 5 Apr 2024 16:55:55 -0700
Subject: [PATCH] KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD
platforms
On AMD and Hygon platforms, the local APIC does not automatically set
the mask bit of the LVTPC register when handling a PMI and there is
no need to clear it in the kernel's PMI handler.
For guests, the mask bit is currently set by kvm_apic_local_deliver()
and unless it is cleared by the guest kernel's PMI handler, PMIs stop
arriving and break use-cases like sampling with perf record.
This does not affect non-PerfMonV2 guests because PMIs are handled in
the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
mask bit irrespective of the vendor.
Before:
$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]
After:
$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
Fixes: a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sandipan Das <sandipan.das(a)amd.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
[sean: use is_intel_compatible instead of !is_amd_or_hygon()]
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-ID: <20240405235603.1173076-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cf37586f0466..ebf41023be38 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2776,7 +2776,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
- if (r && lvt_type == APIC_LVTPC)
+ if (r && lvt_type == APIC_LVTPC &&
+ guest_cpuid_is_intel_compatible(apic->vcpu))
kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
return r;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 49ff3b4aec51e3abfc9369997cc603319b02af9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042320-deflected-pester-4998@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
49ff3b4aec51 ("KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms")
a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ff3b4aec51e3abfc9369997cc603319b02af9a Mon Sep 17 00:00:00 2001
From: Sandipan Das <sandipan.das(a)amd.com>
Date: Fri, 5 Apr 2024 16:55:55 -0700
Subject: [PATCH] KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD
platforms
On AMD and Hygon platforms, the local APIC does not automatically set
the mask bit of the LVTPC register when handling a PMI and there is
no need to clear it in the kernel's PMI handler.
For guests, the mask bit is currently set by kvm_apic_local_deliver()
and unless it is cleared by the guest kernel's PMI handler, PMIs stop
arriving and break use-cases like sampling with perf record.
This does not affect non-PerfMonV2 guests because PMIs are handled in
the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
mask bit irrespective of the vendor.
Before:
$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]
After:
$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
Fixes: a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sandipan Das <sandipan.das(a)amd.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
[sean: use is_intel_compatible instead of !is_amd_or_hygon()]
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-ID: <20240405235603.1173076-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cf37586f0466..ebf41023be38 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2776,7 +2776,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
- if (r && lvt_type == APIC_LVTPC)
+ if (r && lvt_type == APIC_LVTPC &&
+ guest_cpuid_is_intel_compatible(apic->vcpu))
kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
return r;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 49ff3b4aec51e3abfc9369997cc603319b02af9a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042306-overfeed-simmering-3d3c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
49ff3b4aec51 ("KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms")
a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 49ff3b4aec51e3abfc9369997cc603319b02af9a Mon Sep 17 00:00:00 2001
From: Sandipan Das <sandipan.das(a)amd.com>
Date: Fri, 5 Apr 2024 16:55:55 -0700
Subject: [PATCH] KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD
platforms
On AMD and Hygon platforms, the local APIC does not automatically set
the mask bit of the LVTPC register when handling a PMI and there is
no need to clear it in the kernel's PMI handler.
For guests, the mask bit is currently set by kvm_apic_local_deliver()
and unless it is cleared by the guest kernel's PMI handler, PMIs stop
arriving and break use-cases like sampling with perf record.
This does not affect non-PerfMonV2 guests because PMIs are handled in
the guest kernel by x86_pmu_handle_irq() which always clears the LVTPC
mask bit irrespective of the vendor.
Before:
$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (1 samples) ]
After:
$ perf record -e cycles:u true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (19 samples) ]
Fixes: a16eb25b09c0 ("KVM: x86: Mask LVTPC when handling a PMI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sandipan Das <sandipan.das(a)amd.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
[sean: use is_intel_compatible instead of !is_amd_or_hygon()]
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-ID: <20240405235603.1173076-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cf37586f0466..ebf41023be38 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2776,7 +2776,8 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
- if (r && lvt_type == APIC_LVTPC)
+ if (r && lvt_type == APIC_LVTPC &&
+ guest_cpuid_is_intel_compatible(apic->vcpu))
kvm_lapic_set_reg(apic, APIC_LVTPC, reg | APIC_LVT_MASKED);
return r;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 54c4ec5f8c471b7c1137a1f769648549c423c026
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042357-dropbox-unhitched-ffb8@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
54c4ec5f8c47 ("serial: mxs-auart: add spinlock around changing cts state")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54c4ec5f8c471b7c1137a1f769648549c423c026 Mon Sep 17 00:00:00 2001
From: Emil Kronborg <emil.kronborg(a)protonmail.com>
Date: Wed, 20 Mar 2024 12:15:36 +0000
Subject: [PATCH] serial: mxs-auart: add spinlock around changing cts state
The uart_handle_cts_change() function in serial_core expects the caller
to hold uport->lock. For example, I have seen the below kernel splat,
when the Bluetooth driver is loaded on an i.MX28 board.
[ 85.119255] ------------[ cut here ]------------
[ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec
[ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs
[ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1
[ 85.151396] Hardware name: Freescale MXS (Device Tree)
[ 85.156679] Workqueue: hci0 hci_power_on [bluetooth]
(...)
[ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4
[ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210
(...)
Cc: stable(a)vger.kernel.org
Fixes: 4d90bb147ef6 ("serial: core: Document and assert lock requirements for irq helpers")
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Emil Kronborg <emil.kronborg(a)protonmail.com>
Link: https://lore.kernel.org/r/20240320121530.11348-1-emil.kronborg@protonmail.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/mxs-auart.c b/drivers/tty/serial/mxs-auart.c
index 4749331fe618..1e8853eae504 100644
--- a/drivers/tty/serial/mxs-auart.c
+++ b/drivers/tty/serial/mxs-auart.c
@@ -1086,11 +1086,13 @@ static void mxs_auart_set_ldisc(struct uart_port *port,
static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
{
- u32 istat;
+ u32 istat, stat;
struct mxs_auart_port *s = context;
u32 mctrl_temp = s->mctrl_prev;
- u32 stat = mxs_read(s, REG_STAT);
+ uart_port_lock(&s->port);
+
+ stat = mxs_read(s, REG_STAT);
istat = mxs_read(s, REG_INTR);
/* ack irq */
@@ -1126,6 +1128,8 @@ static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
istat &= ~AUART_INTR_TXIS;
}
+ uart_port_unlock(&s->port);
+
return IRQ_HANDLED;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 54c4ec5f8c471b7c1137a1f769648549c423c026
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042355-imagines-such-7bf6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
54c4ec5f8c47 ("serial: mxs-auart: add spinlock around changing cts state")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54c4ec5f8c471b7c1137a1f769648549c423c026 Mon Sep 17 00:00:00 2001
From: Emil Kronborg <emil.kronborg(a)protonmail.com>
Date: Wed, 20 Mar 2024 12:15:36 +0000
Subject: [PATCH] serial: mxs-auart: add spinlock around changing cts state
The uart_handle_cts_change() function in serial_core expects the caller
to hold uport->lock. For example, I have seen the below kernel splat,
when the Bluetooth driver is loaded on an i.MX28 board.
[ 85.119255] ------------[ cut here ]------------
[ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec
[ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs
[ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1
[ 85.151396] Hardware name: Freescale MXS (Device Tree)
[ 85.156679] Workqueue: hci0 hci_power_on [bluetooth]
(...)
[ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4
[ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210
(...)
Cc: stable(a)vger.kernel.org
Fixes: 4d90bb147ef6 ("serial: core: Document and assert lock requirements for irq helpers")
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Emil Kronborg <emil.kronborg(a)protonmail.com>
Link: https://lore.kernel.org/r/20240320121530.11348-1-emil.kronborg@protonmail.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/mxs-auart.c b/drivers/tty/serial/mxs-auart.c
index 4749331fe618..1e8853eae504 100644
--- a/drivers/tty/serial/mxs-auart.c
+++ b/drivers/tty/serial/mxs-auart.c
@@ -1086,11 +1086,13 @@ static void mxs_auart_set_ldisc(struct uart_port *port,
static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
{
- u32 istat;
+ u32 istat, stat;
struct mxs_auart_port *s = context;
u32 mctrl_temp = s->mctrl_prev;
- u32 stat = mxs_read(s, REG_STAT);
+ uart_port_lock(&s->port);
+
+ stat = mxs_read(s, REG_STAT);
istat = mxs_read(s, REG_INTR);
/* ack irq */
@@ -1126,6 +1128,8 @@ static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
istat &= ~AUART_INTR_TXIS;
}
+ uart_port_unlock(&s->port);
+
return IRQ_HANDLED;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 54c4ec5f8c471b7c1137a1f769648549c423c026
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042354-childless-album-3181@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
54c4ec5f8c47 ("serial: mxs-auart: add spinlock around changing cts state")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54c4ec5f8c471b7c1137a1f769648549c423c026 Mon Sep 17 00:00:00 2001
From: Emil Kronborg <emil.kronborg(a)protonmail.com>
Date: Wed, 20 Mar 2024 12:15:36 +0000
Subject: [PATCH] serial: mxs-auart: add spinlock around changing cts state
The uart_handle_cts_change() function in serial_core expects the caller
to hold uport->lock. For example, I have seen the below kernel splat,
when the Bluetooth driver is loaded on an i.MX28 board.
[ 85.119255] ------------[ cut here ]------------
[ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec
[ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs
[ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1
[ 85.151396] Hardware name: Freescale MXS (Device Tree)
[ 85.156679] Workqueue: hci0 hci_power_on [bluetooth]
(...)
[ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4
[ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210
(...)
Cc: stable(a)vger.kernel.org
Fixes: 4d90bb147ef6 ("serial: core: Document and assert lock requirements for irq helpers")
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Emil Kronborg <emil.kronborg(a)protonmail.com>
Link: https://lore.kernel.org/r/20240320121530.11348-1-emil.kronborg@protonmail.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/mxs-auart.c b/drivers/tty/serial/mxs-auart.c
index 4749331fe618..1e8853eae504 100644
--- a/drivers/tty/serial/mxs-auart.c
+++ b/drivers/tty/serial/mxs-auart.c
@@ -1086,11 +1086,13 @@ static void mxs_auart_set_ldisc(struct uart_port *port,
static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
{
- u32 istat;
+ u32 istat, stat;
struct mxs_auart_port *s = context;
u32 mctrl_temp = s->mctrl_prev;
- u32 stat = mxs_read(s, REG_STAT);
+ uart_port_lock(&s->port);
+
+ stat = mxs_read(s, REG_STAT);
istat = mxs_read(s, REG_INTR);
/* ack irq */
@@ -1126,6 +1128,8 @@ static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
istat &= ~AUART_INTR_TXIS;
}
+ uart_port_unlock(&s->port);
+
return IRQ_HANDLED;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 54c4ec5f8c471b7c1137a1f769648549c423c026
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042353-mayday-porcupine-efb6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
54c4ec5f8c47 ("serial: mxs-auart: add spinlock around changing cts state")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54c4ec5f8c471b7c1137a1f769648549c423c026 Mon Sep 17 00:00:00 2001
From: Emil Kronborg <emil.kronborg(a)protonmail.com>
Date: Wed, 20 Mar 2024 12:15:36 +0000
Subject: [PATCH] serial: mxs-auart: add spinlock around changing cts state
The uart_handle_cts_change() function in serial_core expects the caller
to hold uport->lock. For example, I have seen the below kernel splat,
when the Bluetooth driver is loaded on an i.MX28 board.
[ 85.119255] ------------[ cut here ]------------
[ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec
[ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs
[ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1
[ 85.151396] Hardware name: Freescale MXS (Device Tree)
[ 85.156679] Workqueue: hci0 hci_power_on [bluetooth]
(...)
[ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4
[ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210
(...)
Cc: stable(a)vger.kernel.org
Fixes: 4d90bb147ef6 ("serial: core: Document and assert lock requirements for irq helpers")
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Signed-off-by: Emil Kronborg <emil.kronborg(a)protonmail.com>
Link: https://lore.kernel.org/r/20240320121530.11348-1-emil.kronborg@protonmail.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/mxs-auart.c b/drivers/tty/serial/mxs-auart.c
index 4749331fe618..1e8853eae504 100644
--- a/drivers/tty/serial/mxs-auart.c
+++ b/drivers/tty/serial/mxs-auart.c
@@ -1086,11 +1086,13 @@ static void mxs_auart_set_ldisc(struct uart_port *port,
static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
{
- u32 istat;
+ u32 istat, stat;
struct mxs_auart_port *s = context;
u32 mctrl_temp = s->mctrl_prev;
- u32 stat = mxs_read(s, REG_STAT);
+ uart_port_lock(&s->port);
+
+ stat = mxs_read(s, REG_STAT);
istat = mxs_read(s, REG_INTR);
/* ack irq */
@@ -1126,6 +1128,8 @@ static irqreturn_t mxs_auart_irq_handle(int irq, void *context)
istat &= ~AUART_INTR_TXIS;
}
+ uart_port_unlock(&s->port);
+
return IRQ_HANDLED;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 631426ba1d45a8672b177ee85ad4cabe760dd131
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042338-ideally-setting-9717@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
631426ba1d45 ("mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly")
0f20bba1688b ("mm/gup: explicitly define and check internal GUP flags, disallow FOLL_TOUCH")
6cd06ab12d1a ("gup: make the stack expansion warning a bit more targeted")
9471f1f2f502 ("Merge branch 'expand-stack'")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 631426ba1d45a8672b177ee85ad4cabe760dd131 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Thu, 14 Mar 2024 17:12:59 +0100
Subject: [PATCH] mm/madvise: make MADV_POPULATE_(READ|WRITE) handle
VM_FAULT_RETRY properly
Darrick reports that in some cases where pread() would fail with -EIO and
mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.
While the madvise() call can be interrupted by a signal, this is not the
desired behavior. MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
like page faults in that case: fail and not retry forever.
A reproducer can be found at [1].
The reason is that __get_user_pages(), as called by
faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
it will simply return 0 when VM_FAULT_RETRY happened, making
madvise_populate()->faultin_vma_page_range() retry again and again, never
setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().
__get_user_pages_locked() does what we want, but duplicating that logic in
faultin_vma_page_range() feels wrong.
So let's use __get_user_pages_locked() instead, that will detect
VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
from faultin_page() to __get_user_pages(), all the way to
madvise_populate().
But, there is an issue: __get_user_pages_locked() will end up re-taking
the MM lock and then __get_user_pages() will do another VMA lookup. In
the meantime, the VMA layout could have changed and we'd fail with
different error codes than we'd want to.
As __get_user_pages() will currently do a new VMA lookup either way, let
it do the VMA handling in a different way, controlled by a new
FOLL_MADV_POPULATE flag, effectively moving these checks from
madvise_populate() + faultin_page_range() in there.
With this change, Darricks reproducer properly fails with -EFAULT, as
documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.
[1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/
Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Darrick J. Wong <djwong(a)kernel.org>
Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index af8edadc05d1..1611e73b1121 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1206,6 +1206,22 @@ static long __get_user_pages(struct mm_struct *mm,
/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
+ /*
+ * MADV_POPULATE_(READ|WRITE) wants to handle VMA
+ * lookups+error reporting differently.
+ */
+ if (gup_flags & FOLL_MADV_POPULATE) {
+ vma = vma_lookup(mm, start);
+ if (!vma) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (check_vma_flags(vma, gup_flags)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ goto retry;
+ }
vma = gup_vma_lookup(mm, start);
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
@@ -1685,35 +1701,35 @@ long populate_vma_page_range(struct vm_area_struct *vma,
}
/*
- * faultin_vma_page_range() - populate (prefault) page tables inside the
- * given VMA range readable/writable
+ * faultin_page_range() - populate (prefault) page tables inside the
+ * given range readable/writable
*
* This takes care of mlocking the pages, too, if VM_LOCKED is set.
*
- * @vma: target vma
+ * @mm: the mm to populate page tables in
* @start: start address
* @end: end address
* @write: whether to prefault readable or writable
* @locked: whether the mmap_lock is still held
*
- * Returns either number of processed pages in the vma, or a negative error
- * code on error (see __get_user_pages()).
+ * Returns either number of processed pages in the MM, or a negative error
+ * code on error (see __get_user_pages()). Note that this function reports
+ * errors related to VMAs, such as incompatible mappings, as expected by
+ * MADV_POPULATE_(READ|WRITE).
*
- * vma->vm_mm->mmap_lock must be held. The range must be page-aligned and
- * covered by the VMA. If it's released, *@locked will be set to 0.
+ * The range must be page-aligned.
+ *
+ * mm->mmap_lock must be held. If it's released, *@locked will be set to 0.
*/
-long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, bool write, int *locked)
+long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked)
{
- struct mm_struct *mm = vma->vm_mm;
unsigned long nr_pages = (end - start) / PAGE_SIZE;
int gup_flags;
long ret;
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
- VM_BUG_ON_VMA(start < vma->vm_start, vma);
- VM_BUG_ON_VMA(end > vma->vm_end, vma);
mmap_assert_locked(mm);
/*
@@ -1725,19 +1741,13 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
* a poisoned page.
* !FOLL_FORCE: Require proper access permissions.
*/
- gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE;
+ gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE |
+ FOLL_MADV_POPULATE;
if (write)
gup_flags |= FOLL_WRITE;
- /*
- * We want to report -EINVAL instead of -EFAULT for any permission
- * problems or incompatible mappings.
- */
- if (check_vma_flags(vma, gup_flags))
- return -EINVAL;
-
- ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, locked);
+ ret = __get_user_pages_locked(mm, start, nr_pages, NULL, locked,
+ gup_flags);
lru_add_drain();
return ret;
}
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..07ad2675a88b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -686,9 +686,8 @@ struct anon_vma *folio_anon_vma(struct folio *folio);
void unmap_mapping_folio(struct folio *folio);
extern long populate_vma_page_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end, int *locked);
-extern long faultin_vma_page_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- bool write, int *locked);
+extern long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked);
extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags,
unsigned long bytes);
@@ -1127,10 +1126,13 @@ enum {
FOLL_FAST_ONLY = 1 << 20,
/* allow unlocking the mmap lock */
FOLL_UNLOCKABLE = 1 << 21,
+ /* VMA lookup+checks compatible with MADV_POPULATE_(READ|WRITE) */
+ FOLL_MADV_POPULATE = 1 << 22,
};
#define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \
- FOLL_FAST_ONLY | FOLL_UNLOCKABLE)
+ FOLL_FAST_ONLY | FOLL_UNLOCKABLE | \
+ FOLL_MADV_POPULATE)
/*
* Indicates for which pages that are write-protected in the page table,
diff --git a/mm/madvise.c b/mm/madvise.c
index 44a498c94158..1a073fcc4c0c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -908,27 +908,14 @@ static long madvise_populate(struct vm_area_struct *vma,
{
const bool write = behavior == MADV_POPULATE_WRITE;
struct mm_struct *mm = vma->vm_mm;
- unsigned long tmp_end;
int locked = 1;
long pages;
*prev = vma;
while (start < end) {
- /*
- * We might have temporarily dropped the lock. For example,
- * our VMA might have been split.
- */
- if (!vma || start >= vma->vm_end) {
- vma = vma_lookup(mm, start);
- if (!vma)
- return -ENOMEM;
- }
-
- tmp_end = min_t(unsigned long, end, vma->vm_end);
/* Populate (prefault) page tables readable/writable. */
- pages = faultin_vma_page_range(vma, start, tmp_end, write,
- &locked);
+ pages = faultin_page_range(mm, start, end, write, &locked);
if (!locked) {
mmap_read_lock(mm);
locked = 1;
@@ -949,7 +936,7 @@ static long madvise_populate(struct vm_area_struct *vma,
pr_warn_once("%s: unhandled return value: %ld\n",
__func__, pages);
fallthrough;
- case -ENOMEM:
+ case -ENOMEM: /* No VMA or out of memory. */
return -ENOMEM;
}
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 631426ba1d45a8672b177ee85ad4cabe760dd131
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042336-tinfoil-trusting-765e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
631426ba1d45 ("mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly")
0f20bba1688b ("mm/gup: explicitly define and check internal GUP flags, disallow FOLL_TOUCH")
6cd06ab12d1a ("gup: make the stack expansion warning a bit more targeted")
9471f1f2f502 ("Merge branch 'expand-stack'")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 631426ba1d45a8672b177ee85ad4cabe760dd131 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Thu, 14 Mar 2024 17:12:59 +0100
Subject: [PATCH] mm/madvise: make MADV_POPULATE_(READ|WRITE) handle
VM_FAULT_RETRY properly
Darrick reports that in some cases where pread() would fail with -EIO and
mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.
While the madvise() call can be interrupted by a signal, this is not the
desired behavior. MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
like page faults in that case: fail and not retry forever.
A reproducer can be found at [1].
The reason is that __get_user_pages(), as called by
faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
it will simply return 0 when VM_FAULT_RETRY happened, making
madvise_populate()->faultin_vma_page_range() retry again and again, never
setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().
__get_user_pages_locked() does what we want, but duplicating that logic in
faultin_vma_page_range() feels wrong.
So let's use __get_user_pages_locked() instead, that will detect
VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
from faultin_page() to __get_user_pages(), all the way to
madvise_populate().
But, there is an issue: __get_user_pages_locked() will end up re-taking
the MM lock and then __get_user_pages() will do another VMA lookup. In
the meantime, the VMA layout could have changed and we'd fail with
different error codes than we'd want to.
As __get_user_pages() will currently do a new VMA lookup either way, let
it do the VMA handling in a different way, controlled by a new
FOLL_MADV_POPULATE flag, effectively moving these checks from
madvise_populate() + faultin_page_range() in there.
With this change, Darricks reproducer properly fails with -EFAULT, as
documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.
[1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/
Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Darrick J. Wong <djwong(a)kernel.org>
Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index af8edadc05d1..1611e73b1121 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1206,6 +1206,22 @@ static long __get_user_pages(struct mm_struct *mm,
/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
+ /*
+ * MADV_POPULATE_(READ|WRITE) wants to handle VMA
+ * lookups+error reporting differently.
+ */
+ if (gup_flags & FOLL_MADV_POPULATE) {
+ vma = vma_lookup(mm, start);
+ if (!vma) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (check_vma_flags(vma, gup_flags)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ goto retry;
+ }
vma = gup_vma_lookup(mm, start);
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
@@ -1685,35 +1701,35 @@ long populate_vma_page_range(struct vm_area_struct *vma,
}
/*
- * faultin_vma_page_range() - populate (prefault) page tables inside the
- * given VMA range readable/writable
+ * faultin_page_range() - populate (prefault) page tables inside the
+ * given range readable/writable
*
* This takes care of mlocking the pages, too, if VM_LOCKED is set.
*
- * @vma: target vma
+ * @mm: the mm to populate page tables in
* @start: start address
* @end: end address
* @write: whether to prefault readable or writable
* @locked: whether the mmap_lock is still held
*
- * Returns either number of processed pages in the vma, or a negative error
- * code on error (see __get_user_pages()).
+ * Returns either number of processed pages in the MM, or a negative error
+ * code on error (see __get_user_pages()). Note that this function reports
+ * errors related to VMAs, such as incompatible mappings, as expected by
+ * MADV_POPULATE_(READ|WRITE).
*
- * vma->vm_mm->mmap_lock must be held. The range must be page-aligned and
- * covered by the VMA. If it's released, *@locked will be set to 0.
+ * The range must be page-aligned.
+ *
+ * mm->mmap_lock must be held. If it's released, *@locked will be set to 0.
*/
-long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, bool write, int *locked)
+long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked)
{
- struct mm_struct *mm = vma->vm_mm;
unsigned long nr_pages = (end - start) / PAGE_SIZE;
int gup_flags;
long ret;
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
- VM_BUG_ON_VMA(start < vma->vm_start, vma);
- VM_BUG_ON_VMA(end > vma->vm_end, vma);
mmap_assert_locked(mm);
/*
@@ -1725,19 +1741,13 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
* a poisoned page.
* !FOLL_FORCE: Require proper access permissions.
*/
- gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE;
+ gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE |
+ FOLL_MADV_POPULATE;
if (write)
gup_flags |= FOLL_WRITE;
- /*
- * We want to report -EINVAL instead of -EFAULT for any permission
- * problems or incompatible mappings.
- */
- if (check_vma_flags(vma, gup_flags))
- return -EINVAL;
-
- ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, locked);
+ ret = __get_user_pages_locked(mm, start, nr_pages, NULL, locked,
+ gup_flags);
lru_add_drain();
return ret;
}
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..07ad2675a88b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -686,9 +686,8 @@ struct anon_vma *folio_anon_vma(struct folio *folio);
void unmap_mapping_folio(struct folio *folio);
extern long populate_vma_page_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end, int *locked);
-extern long faultin_vma_page_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- bool write, int *locked);
+extern long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked);
extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags,
unsigned long bytes);
@@ -1127,10 +1126,13 @@ enum {
FOLL_FAST_ONLY = 1 << 20,
/* allow unlocking the mmap lock */
FOLL_UNLOCKABLE = 1 << 21,
+ /* VMA lookup+checks compatible with MADV_POPULATE_(READ|WRITE) */
+ FOLL_MADV_POPULATE = 1 << 22,
};
#define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \
- FOLL_FAST_ONLY | FOLL_UNLOCKABLE)
+ FOLL_FAST_ONLY | FOLL_UNLOCKABLE | \
+ FOLL_MADV_POPULATE)
/*
* Indicates for which pages that are write-protected in the page table,
diff --git a/mm/madvise.c b/mm/madvise.c
index 44a498c94158..1a073fcc4c0c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -908,27 +908,14 @@ static long madvise_populate(struct vm_area_struct *vma,
{
const bool write = behavior == MADV_POPULATE_WRITE;
struct mm_struct *mm = vma->vm_mm;
- unsigned long tmp_end;
int locked = 1;
long pages;
*prev = vma;
while (start < end) {
- /*
- * We might have temporarily dropped the lock. For example,
- * our VMA might have been split.
- */
- if (!vma || start >= vma->vm_end) {
- vma = vma_lookup(mm, start);
- if (!vma)
- return -ENOMEM;
- }
-
- tmp_end = min_t(unsigned long, end, vma->vm_end);
/* Populate (prefault) page tables readable/writable. */
- pages = faultin_vma_page_range(vma, start, tmp_end, write,
- &locked);
+ pages = faultin_page_range(mm, start, end, write, &locked);
if (!locked) {
mmap_read_lock(mm);
locked = 1;
@@ -949,7 +936,7 @@ static long madvise_populate(struct vm_area_struct *vma,
pr_warn_once("%s: unhandled return value: %ld\n",
__func__, pages);
fallthrough;
- case -ENOMEM:
+ case -ENOMEM: /* No VMA or out of memory. */
return -ENOMEM;
}
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 631426ba1d45a8672b177ee85ad4cabe760dd131
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042332-mop-broiling-603b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
631426ba1d45 ("mm/madvise: make MADV_POPULATE_(READ|WRITE) handle VM_FAULT_RETRY properly")
0f20bba1688b ("mm/gup: explicitly define and check internal GUP flags, disallow FOLL_TOUCH")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 631426ba1d45a8672b177ee85ad4cabe760dd131 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Thu, 14 Mar 2024 17:12:59 +0100
Subject: [PATCH] mm/madvise: make MADV_POPULATE_(READ|WRITE) handle
VM_FAULT_RETRY properly
Darrick reports that in some cases where pread() would fail with -EIO and
mmap()+access would generate a SIGBUS signal, MADV_POPULATE_READ /
MADV_POPULATE_WRITE will keep retrying forever and not fail with -EFAULT.
While the madvise() call can be interrupted by a signal, this is not the
desired behavior. MADV_POPULATE_READ / MADV_POPULATE_WRITE should behave
like page faults in that case: fail and not retry forever.
A reproducer can be found at [1].
The reason is that __get_user_pages(), as called by
faultin_vma_page_range(), will not handle VM_FAULT_RETRY in a proper way:
it will simply return 0 when VM_FAULT_RETRY happened, making
madvise_populate()->faultin_vma_page_range() retry again and again, never
setting FOLL_TRIED->FAULT_FLAG_TRIED for __get_user_pages().
__get_user_pages_locked() does what we want, but duplicating that logic in
faultin_vma_page_range() feels wrong.
So let's use __get_user_pages_locked() instead, that will detect
VM_FAULT_RETRY and set FOLL_TRIED when retrying, making the fault handler
return VM_FAULT_SIGBUS (VM_FAULT_ERROR) at some point, propagating -EFAULT
from faultin_page() to __get_user_pages(), all the way to
madvise_populate().
But, there is an issue: __get_user_pages_locked() will end up re-taking
the MM lock and then __get_user_pages() will do another VMA lookup. In
the meantime, the VMA layout could have changed and we'd fail with
different error codes than we'd want to.
As __get_user_pages() will currently do a new VMA lookup either way, let
it do the VMA handling in a different way, controlled by a new
FOLL_MADV_POPULATE flag, effectively moving these checks from
madvise_populate() + faultin_page_range() in there.
With this change, Darricks reproducer properly fails with -EFAULT, as
documented for MADV_POPULATE_READ / MADV_POPULATE_WRITE.
[1] https://lore.kernel.org/all/20240313171936.GN1927156@frogsfrogsfrogs/
Link: https://lkml.kernel.org/r/20240314161300.382526-1-david@redhat.com
Link: https://lkml.kernel.org/r/20240314161300.382526-2-david@redhat.com
Fixes: 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Darrick J. Wong <djwong(a)kernel.org>
Closes: https://lore.kernel.org/all/20240311223815.GW1927156@frogsfrogsfrogs/
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index af8edadc05d1..1611e73b1121 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1206,6 +1206,22 @@ static long __get_user_pages(struct mm_struct *mm,
/* first iteration or cross vma bound */
if (!vma || start >= vma->vm_end) {
+ /*
+ * MADV_POPULATE_(READ|WRITE) wants to handle VMA
+ * lookups+error reporting differently.
+ */
+ if (gup_flags & FOLL_MADV_POPULATE) {
+ vma = vma_lookup(mm, start);
+ if (!vma) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ if (check_vma_flags(vma, gup_flags)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ goto retry;
+ }
vma = gup_vma_lookup(mm, start);
if (!vma && in_gate_area(mm, start)) {
ret = get_gate_page(mm, start & PAGE_MASK,
@@ -1685,35 +1701,35 @@ long populate_vma_page_range(struct vm_area_struct *vma,
}
/*
- * faultin_vma_page_range() - populate (prefault) page tables inside the
- * given VMA range readable/writable
+ * faultin_page_range() - populate (prefault) page tables inside the
+ * given range readable/writable
*
* This takes care of mlocking the pages, too, if VM_LOCKED is set.
*
- * @vma: target vma
+ * @mm: the mm to populate page tables in
* @start: start address
* @end: end address
* @write: whether to prefault readable or writable
* @locked: whether the mmap_lock is still held
*
- * Returns either number of processed pages in the vma, or a negative error
- * code on error (see __get_user_pages()).
+ * Returns either number of processed pages in the MM, or a negative error
+ * code on error (see __get_user_pages()). Note that this function reports
+ * errors related to VMAs, such as incompatible mappings, as expected by
+ * MADV_POPULATE_(READ|WRITE).
*
- * vma->vm_mm->mmap_lock must be held. The range must be page-aligned and
- * covered by the VMA. If it's released, *@locked will be set to 0.
+ * The range must be page-aligned.
+ *
+ * mm->mmap_lock must be held. If it's released, *@locked will be set to 0.
*/
-long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
- unsigned long end, bool write, int *locked)
+long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked)
{
- struct mm_struct *mm = vma->vm_mm;
unsigned long nr_pages = (end - start) / PAGE_SIZE;
int gup_flags;
long ret;
VM_BUG_ON(!PAGE_ALIGNED(start));
VM_BUG_ON(!PAGE_ALIGNED(end));
- VM_BUG_ON_VMA(start < vma->vm_start, vma);
- VM_BUG_ON_VMA(end > vma->vm_end, vma);
mmap_assert_locked(mm);
/*
@@ -1725,19 +1741,13 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
* a poisoned page.
* !FOLL_FORCE: Require proper access permissions.
*/
- gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE;
+ gup_flags = FOLL_TOUCH | FOLL_HWPOISON | FOLL_UNLOCKABLE |
+ FOLL_MADV_POPULATE;
if (write)
gup_flags |= FOLL_WRITE;
- /*
- * We want to report -EINVAL instead of -EFAULT for any permission
- * problems or incompatible mappings.
- */
- if (check_vma_flags(vma, gup_flags))
- return -EINVAL;
-
- ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, locked);
+ ret = __get_user_pages_locked(mm, start, nr_pages, NULL, locked,
+ gup_flags);
lru_add_drain();
return ret;
}
diff --git a/mm/internal.h b/mm/internal.h
index 7e486f2c502c..07ad2675a88b 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -686,9 +686,8 @@ struct anon_vma *folio_anon_vma(struct folio *folio);
void unmap_mapping_folio(struct folio *folio);
extern long populate_vma_page_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end, int *locked);
-extern long faultin_vma_page_range(struct vm_area_struct *vma,
- unsigned long start, unsigned long end,
- bool write, int *locked);
+extern long faultin_page_range(struct mm_struct *mm, unsigned long start,
+ unsigned long end, bool write, int *locked);
extern bool mlock_future_ok(struct mm_struct *mm, unsigned long flags,
unsigned long bytes);
@@ -1127,10 +1126,13 @@ enum {
FOLL_FAST_ONLY = 1 << 20,
/* allow unlocking the mmap lock */
FOLL_UNLOCKABLE = 1 << 21,
+ /* VMA lookup+checks compatible with MADV_POPULATE_(READ|WRITE) */
+ FOLL_MADV_POPULATE = 1 << 22,
};
#define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \
- FOLL_FAST_ONLY | FOLL_UNLOCKABLE)
+ FOLL_FAST_ONLY | FOLL_UNLOCKABLE | \
+ FOLL_MADV_POPULATE)
/*
* Indicates for which pages that are write-protected in the page table,
diff --git a/mm/madvise.c b/mm/madvise.c
index 44a498c94158..1a073fcc4c0c 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -908,27 +908,14 @@ static long madvise_populate(struct vm_area_struct *vma,
{
const bool write = behavior == MADV_POPULATE_WRITE;
struct mm_struct *mm = vma->vm_mm;
- unsigned long tmp_end;
int locked = 1;
long pages;
*prev = vma;
while (start < end) {
- /*
- * We might have temporarily dropped the lock. For example,
- * our VMA might have been split.
- */
- if (!vma || start >= vma->vm_end) {
- vma = vma_lookup(mm, start);
- if (!vma)
- return -ENOMEM;
- }
-
- tmp_end = min_t(unsigned long, end, vma->vm_end);
/* Populate (prefault) page tables readable/writable. */
- pages = faultin_vma_page_range(vma, start, tmp_end, write,
- &locked);
+ pages = faultin_page_range(mm, start, end, write, &locked);
if (!locked) {
mmap_read_lock(mm);
locked = 1;
@@ -949,7 +936,7 @@ static long madvise_populate(struct vm_area_struct *vma,
pr_warn_once("%s: unhandled return value: %ld\n",
__func__, pages);
fallthrough;
- case -ENOMEM:
+ case -ENOMEM: /* No VMA or out of memory. */
return -ENOMEM;
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042301-anthill-recant-a287@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
1647b52757d5 ("KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing")
cbb359d81a26 ("KVM: x86/pmu: Move PMU reset logic to common x86 code")
73554b29bd70 ("KVM: x86/pmu: Synthesize at most one PMI per VM-exit")
b29a2acd36dd ("KVM: x86/pmu: Truncate counter value to allowed width on write")
4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
1c2bf8a6b045 ("KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap")
13afa29ae489 ("KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code")
c85cdc1cc1ea ("KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86")
30dab5c0b65e ("KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bits")
8de18543dfe3 ("KVM: x86/pmu: Move reprogram_counters() to pmu.h")
53550b89220b ("KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_mask")
649bccd7fac9 ("KVM: x86/pmu: Rewrite reprogram_counters() to improve performance")
8bca8c5ce40b ("KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpers")
cdd2fbf6360e ("KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()")
957d0f70e97b ("KVM: x86/pmu: Zero out LBR capabilities during PMU refresh")
3a6de51a437f ("KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run")
7e768ce8278b ("KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshed")
e33b6d79acac ("KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042357-unnamed-riverboat-0b22@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
1647b52757d5 ("KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing")
cbb359d81a26 ("KVM: x86/pmu: Move PMU reset logic to common x86 code")
73554b29bd70 ("KVM: x86/pmu: Synthesize at most one PMI per VM-exit")
b29a2acd36dd ("KVM: x86/pmu: Truncate counter value to allowed width on write")
4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
1c2bf8a6b045 ("KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap")
13afa29ae489 ("KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code")
c85cdc1cc1ea ("KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86")
30dab5c0b65e ("KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bits")
8de18543dfe3 ("KVM: x86/pmu: Move reprogram_counters() to pmu.h")
53550b89220b ("KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_mask")
649bccd7fac9 ("KVM: x86/pmu: Rewrite reprogram_counters() to improve performance")
8bca8c5ce40b ("KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpers")
cdd2fbf6360e ("KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()")
957d0f70e97b ("KVM: x86/pmu: Zero out LBR capabilities during PMU refresh")
3a6de51a437f ("KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run")
7e768ce8278b ("KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshed")
e33b6d79acac ("KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042354-dividers-trowel-ba4a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
1647b52757d5 ("KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing")
cbb359d81a26 ("KVM: x86/pmu: Move PMU reset logic to common x86 code")
73554b29bd70 ("KVM: x86/pmu: Synthesize at most one PMI per VM-exit")
b29a2acd36dd ("KVM: x86/pmu: Truncate counter value to allowed width on write")
4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
1c2bf8a6b045 ("KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap")
13afa29ae489 ("KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code")
c85cdc1cc1ea ("KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86")
30dab5c0b65e ("KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bits")
8de18543dfe3 ("KVM: x86/pmu: Move reprogram_counters() to pmu.h")
53550b89220b ("KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_mask")
649bccd7fac9 ("KVM: x86/pmu: Rewrite reprogram_counters() to improve performance")
8bca8c5ce40b ("KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpers")
cdd2fbf6360e ("KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()")
957d0f70e97b ("KVM: x86/pmu: Zero out LBR capabilities during PMU refresh")
3a6de51a437f ("KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run")
7e768ce8278b ("KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshed")
e33b6d79acac ("KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042350-unaware-very-3433@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
1647b52757d5 ("KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing")
cbb359d81a26 ("KVM: x86/pmu: Move PMU reset logic to common x86 code")
73554b29bd70 ("KVM: x86/pmu: Synthesize at most one PMI per VM-exit")
b29a2acd36dd ("KVM: x86/pmu: Truncate counter value to allowed width on write")
4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
1c2bf8a6b045 ("KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap")
13afa29ae489 ("KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code")
c85cdc1cc1ea ("KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86")
30dab5c0b65e ("KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bits")
8de18543dfe3 ("KVM: x86/pmu: Move reprogram_counters() to pmu.h")
53550b89220b ("KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_mask")
649bccd7fac9 ("KVM: x86/pmu: Rewrite reprogram_counters() to improve performance")
8bca8c5ce40b ("KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpers")
cdd2fbf6360e ("KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()")
957d0f70e97b ("KVM: x86/pmu: Zero out LBR capabilities during PMU refresh")
3a6de51a437f ("KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run")
7e768ce8278b ("KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshed")
e33b6d79acac ("KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042347-polar-gains-eb8f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
1647b52757d5 ("KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing")
cbb359d81a26 ("KVM: x86/pmu: Move PMU reset logic to common x86 code")
73554b29bd70 ("KVM: x86/pmu: Synthesize at most one PMI per VM-exit")
b29a2acd36dd ("KVM: x86/pmu: Truncate counter value to allowed width on write")
4a2771895ca6 ("KVM: x86/svm/pmu: Add AMD PerfMonV2 support")
1c2bf8a6b045 ("KVM: x86/pmu: Constrain the num of guest counters with kvm_pmu_cap")
13afa29ae489 ("KVM: x86/pmu: Provide Intel PMU's pmc_is_enabled() as generic x86 code")
c85cdc1cc1ea ("KVM: x86/pmu: Move handling PERF_GLOBAL_CTRL and friends to common x86")
30dab5c0b65e ("KVM: x86/pmu: Reject userspace attempts to set reserved GLOBAL_STATUS bits")
8de18543dfe3 ("KVM: x86/pmu: Move reprogram_counters() to pmu.h")
53550b89220b ("KVM: x86/pmu: Rename global_ovf_ctrl_mask to global_status_mask")
649bccd7fac9 ("KVM: x86/pmu: Rewrite reprogram_counters() to improve performance")
8bca8c5ce40b ("KVM: VMX: Refactor intel_pmu_{g,}set_msr() to align with other helpers")
cdd2fbf6360e ("KVM: x86/pmu: Rename pmc_is_enabled() to pmc_is_globally_enabled()")
957d0f70e97b ("KVM: x86/pmu: Zero out LBR capabilities during PMU refresh")
3a6de51a437f ("KVM: x86/pmu: WARN and bug the VM if PMU is refreshed after vCPU has run")
7e768ce8278b ("KVM: x86/pmu: Zero out pmu->all_valid_pmc_idx each time it's refreshed")
e33b6d79acac ("KVM: x86/pmu: Don't tell userspace to save MSRs for non-existent fixed PMCs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042343-imitate-divinity-9e38@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
1647b52757d5 ("KVM: x86/pmu: Reset the PMU, i.e. stop counters, before refreshing")
cbb359d81a26 ("KVM: x86/pmu: Move PMU reset logic to common x86 code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x de120e1d692d73c7eefa3278837b1eb68f90728a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042340-retrieval-unhook-57b9@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
de120e1d692d ("KVM: x86/pmu: Set enable bits for GP counters in PERF_GLOBAL_CTRL at "RESET"")
f933b88e2015 ("KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From de120e1d692d73c7eefa3278837b1eb68f90728a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Fri, 8 Mar 2024 17:36:40 -0800
Subject: [PATCH] KVM: x86/pmu: Set enable bits for GP counters in
PERF_GLOBAL_CTRL at "RESET"
Set the enable bits for general purpose counters in IA32_PERF_GLOBAL_CTRL
when refreshing the PMU to emulate the MSR's architecturally defined
post-RESET behavior. Per Intel's SDM:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
and
Where "n" is the number of general-purpose counters available in the processor.
AMD also documents this behavior for PerfMonV2 CPUs in one of AMD's many
PPRs.
Do not set any PERF_GLOBAL_CTRL bits if there are no general purpose
counters, although a literal reading of the SDM would require the CPU to
set either bits 63:0 or 31:0. The intent of the behavior is to globally
enable all GP counters; honor the intent, if not the letter of the law.
Leaving PERF_GLOBAL_CTRL '0' effectively breaks PMU usage in guests that
haven't been updated to work with PMUs that support PERF_GLOBAL_CTRL.
This bug was recently exposed when KVM added supported for AMD's
PerfMonV2, i.e. when KVM started exposing a vPMU with PERF_GLOBAL_CTRL to
guest software that only knew how to program v1 PMUs (that don't support
PERF_GLOBAL_CTRL).
Failure to emulate the post-RESET behavior results in such guests
unknowingly leaving all general purpose counters globally disabled (the
entire reason the post-RESET value sets the GP counter enable bits is to
maintain backwards compatibility).
The bug has likely gone unnoticed because PERF_GLOBAL_CTRL has been
supported on Intel CPUs for as long as KVM has existed, i.e. hardly anyone
is running guest software that isn't aware of PERF_GLOBAL_CTRL on Intel
PMUs. And because up until v6.0, KVM _did_ emulate the behavior for Intel
CPUs, although the old behavior was likely dumb luck.
Because (a) that old code was also broken in its own way (the history of
this code is a comedy of errors), and (b) PERF_GLOBAL_CTRL was documented
as having a value of '0' post-RESET in all SDMs before March 2023.
Initial vPMU support in commit f5132b01386b ("KVM: Expose a version 2
architectural PMU to a guests") *almost* got it right (again likely by
dumb luck), but for some reason only set the bits if the guest PMU was
advertised as v1:
if (pmu->version == 1) {
pmu->global_ctrl = (1 << pmu->nr_arch_gp_counters) - 1;
return;
}
Commit f19a0c2c2e6a ("KVM: PMU emulation: GLOBAL_CTRL MSR should be
enabled on reset") then tried to remedy that goof, presumably because
guest PMUs were leaving PERF_GLOBAL_CTRL '0', i.e. weren't enabling
counters.
pmu->global_ctrl = ((1 << pmu->nr_arch_gp_counters) - 1) |
(((1ull << pmu->nr_arch_fixed_counters) - 1) << X86_PMC_IDX_FIXED);
pmu->global_ctrl_mask = ~pmu->global_ctrl;
That was KVM's behavior up until commit c49467a45fe0 ("KVM: x86/pmu:
Don't overwrite the pmu->global_ctrl when refreshing") removed
*everything*. However, it did so based on the behavior defined by the
SDM , which at the time stated that "Global Perf Counter Controls" is
'0' at Power-Up and RESET.
But then the March 2023 SDM (325462-079US), stealthily changed its
"IA-32 and Intel 64 Processor States Following Power-up, Reset, or INIT"
table to say:
IA32_PERF_GLOBAL_CTRL: Sets bits n-1:0 and clears the upper bits.
Note, kvm_pmu_refresh() can be invoked multiple times, i.e. it's not a
"pure" RESET flow. But it can only be called prior to the first KVM_RUN,
i.e. the guest will only ever observe the final value.
Note #2, KVM has always cleared global_ctrl during refresh (see commit
f5132b01386b ("KVM: Expose a version 2 architectural PMU to a guests")),
i.e. there is no danger of breaking existing setups by clobbering a value
set by userspace.
Reported-by: Babu Moger <babu.moger(a)amd.com>
Cc: Sandipan Das <sandipan.das(a)amd.com>
Cc: Like Xu <like.xu.linux(a)gmail.com>
Cc: Mingwei Zhang <mizhang(a)google.com>
Cc: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Tested-by: Dapeng Mi <dapeng1.mi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240309013641.1413400-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index c397b28e3d1b..a593b03c9aed 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -775,8 +775,20 @@ void kvm_pmu_refresh(struct kvm_vcpu *vcpu)
pmu->pebs_data_cfg_mask = ~0ull;
bitmap_zero(pmu->all_valid_pmc_idx, X86_PMC_IDX_MAX);
- if (vcpu->kvm->arch.enable_pmu)
- static_call(kvm_x86_pmu_refresh)(vcpu);
+ if (!vcpu->kvm->arch.enable_pmu)
+ return;
+
+ static_call(kvm_x86_pmu_refresh)(vcpu);
+
+ /*
+ * At RESET, both Intel and AMD CPUs set all enable bits for general
+ * purpose counters in IA32_PERF_GLOBAL_CTRL (so that software that
+ * was written for v1 PMUs don't unknowingly leave GP counters disabled
+ * in the global controls). Emulate that behavior when refreshing the
+ * PMU so that userspace doesn't need to manually set PERF_GLOBAL_CTRL.
+ */
+ if (kvm_pmu_has_perf_global_ctrl(pmu) && pmu->nr_arch_gp_counters)
+ pmu->global_ctrl = GENMASK_ULL(pmu->nr_arch_gp_counters - 1, 0);
}
void kvm_pmu_init(struct kvm_vcpu *vcpu)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 5bfc311dd6c376d350b39028b9000ad766ddc934
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042323-ransack-tightness-3eba@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
5bfc311dd6c3 ("usb: xhci: correct return value in case of STS_HCE")
84ac5e4fa517 ("xhci: move event processing for one interrupter to a separate function")
e30e9ad9ed66 ("xhci: update event ring dequeue pointer position to controller correctly")
143e64df1bda ("xhci: remove unnecessary event_ring_deq parameter from xhci_handle_event()")
becbd202af84 ("xhci: make isoc_bei_interval variable interrupter specific.")
4f022aad80dc ("xhci: Add interrupt pending autoclear flag to each interrupter")
c99b38c41234 ("xhci: add support to allocate several interrupters")
47f503cf5f79 ("xhci: split free interrupter into separate remove and free parts")
d1830364e963 ("xhci: Simplify event ring dequeue pointer update for port change events")
08cc5616798d ("xhci: Clean up ERST_PTR_MASK inversion")
28084d3fcc3c ("xhci: Use more than one Event Ring segment")
044818a6cd80 ("xhci: Set DESI bits in ERDP register correctly")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bfc311dd6c376d350b39028b9000ad766ddc934 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum(a)suse.com>
Date: Thu, 4 Apr 2024 15:11:05 +0300
Subject: [PATCH] usb: xhci: correct return value in case of STS_HCE
If we get STS_HCE we give up on the interrupt, but for the purpose
of IRQ handling that still counts as ours. We may return IRQ_NONE
only if we are positive that it wasn't ours. Hence correct the default.
Fixes: 2a25e66d676d ("xhci: print warning when HCE was set")
Cc: stable(a)vger.kernel.org # v6.2+
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240404121106.2842417-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 52278afea94b..575f0fd9c9f1 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3133,7 +3133,7 @@ static int xhci_handle_events(struct xhci_hcd *xhci, struct xhci_interrupter *ir
irqreturn_t xhci_irq(struct usb_hcd *hcd)
{
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
- irqreturn_t ret = IRQ_NONE;
+ irqreturn_t ret = IRQ_HANDLED;
u32 status;
spin_lock(&xhci->lock);
@@ -3141,12 +3141,13 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
status = readl(&xhci->op_regs->status);
if (status == ~(u32)0) {
xhci_hc_died(xhci);
- ret = IRQ_HANDLED;
goto out;
}
- if (!(status & STS_EINT))
+ if (!(status & STS_EINT)) {
+ ret = IRQ_NONE;
goto out;
+ }
if (status & STS_HCE) {
xhci_warn(xhci, "WARNING: Host Controller Error\n");
@@ -3156,7 +3157,6 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
if (status & STS_FATAL) {
xhci_warn(xhci, "WARNING: Host System Error\n");
xhci_halt(xhci);
- ret = IRQ_HANDLED;
goto out;
}
@@ -3167,7 +3167,6 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
*/
status |= STS_EINT;
writel(status, &xhci->op_regs->status);
- ret = IRQ_HANDLED;
/* This is the handler of the primary interrupter */
xhci_handle_events(xhci, xhci->interrupters[0]);
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x 5bfc311dd6c376d350b39028b9000ad766ddc934
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042322-lunchbox-flint-5a2b@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
5bfc311dd6c3 ("usb: xhci: correct return value in case of STS_HCE")
84ac5e4fa517 ("xhci: move event processing for one interrupter to a separate function")
e30e9ad9ed66 ("xhci: update event ring dequeue pointer position to controller correctly")
143e64df1bda ("xhci: remove unnecessary event_ring_deq parameter from xhci_handle_event()")
becbd202af84 ("xhci: make isoc_bei_interval variable interrupter specific.")
4f022aad80dc ("xhci: Add interrupt pending autoclear flag to each interrupter")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bfc311dd6c376d350b39028b9000ad766ddc934 Mon Sep 17 00:00:00 2001
From: Oliver Neukum <oneukum(a)suse.com>
Date: Thu, 4 Apr 2024 15:11:05 +0300
Subject: [PATCH] usb: xhci: correct return value in case of STS_HCE
If we get STS_HCE we give up on the interrupt, but for the purpose
of IRQ handling that still counts as ours. We may return IRQ_NONE
only if we are positive that it wasn't ours. Hence correct the default.
Fixes: 2a25e66d676d ("xhci: print warning when HCE was set")
Cc: stable(a)vger.kernel.org # v6.2+
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Link: https://lore.kernel.org/r/20240404121106.2842417-2-mathias.nyman@linux.inte…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 52278afea94b..575f0fd9c9f1 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -3133,7 +3133,7 @@ static int xhci_handle_events(struct xhci_hcd *xhci, struct xhci_interrupter *ir
irqreturn_t xhci_irq(struct usb_hcd *hcd)
{
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
- irqreturn_t ret = IRQ_NONE;
+ irqreturn_t ret = IRQ_HANDLED;
u32 status;
spin_lock(&xhci->lock);
@@ -3141,12 +3141,13 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
status = readl(&xhci->op_regs->status);
if (status == ~(u32)0) {
xhci_hc_died(xhci);
- ret = IRQ_HANDLED;
goto out;
}
- if (!(status & STS_EINT))
+ if (!(status & STS_EINT)) {
+ ret = IRQ_NONE;
goto out;
+ }
if (status & STS_HCE) {
xhci_warn(xhci, "WARNING: Host Controller Error\n");
@@ -3156,7 +3157,6 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
if (status & STS_FATAL) {
xhci_warn(xhci, "WARNING: Host System Error\n");
xhci_halt(xhci);
- ret = IRQ_HANDLED;
goto out;
}
@@ -3167,7 +3167,6 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
*/
status |= STS_EINT;
writel(status, &xhci->op_regs->status);
- ret = IRQ_HANDLED;
/* This is the handler of the primary interrupter */
xhci_handle_events(xhci, xhci->interrupters[0]);
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x 91f10a3d21f2313485178d49efef8a3ba02bd8c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042332-depose-frenzy-e027@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
91f10a3d21f2 ("Revert "drm/amd/display: fix USB-C flag update after enc10 feature init"")
7d1e9d0369e4 ("drm/amd/display: Check DP Alt mode DPCS state via DMUB")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 91f10a3d21f2313485178d49efef8a3ba02bd8c7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Fri, 29 Mar 2024 18:03:03 -0400
Subject: [PATCH] Revert "drm/amd/display: fix USB-C flag update after enc10
feature init"
This reverts commit b5abd7f983e14054593dc91d6df2aa5f8cc67652.
This change breaks DSC on 4k monitors at 144Hz over USB-C.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3254
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Muhammad Ahmed <ahmed.ahmed(a)amd.com>
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Charlene Liu <charlene.liu(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
index e224a028d68a..8a0460e86309 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_dio_link_encoder.c
@@ -248,14 +248,12 @@ void dcn32_link_encoder_construct(
enc10->base.hpd_source = init_data->hpd_source;
enc10->base.connector = init_data->connector;
+ if (enc10->base.connector.id == CONNECTOR_ID_USBC)
+ enc10->base.features.flags.bits.DP_IS_USB_C = 1;
+
enc10->base.preferred_engine = ENGINE_ID_UNKNOWN;
enc10->base.features = *enc_features;
- if (enc10->base.connector.id == CONNECTOR_ID_USBC)
- enc10->base.features.flags.bits.DP_IS_USB_C = 1;
-
- if (enc10->base.connector.id == CONNECTOR_ID_USBC)
- enc10->base.features.flags.bits.DP_IS_USB_C = 1;
enc10->base.transmitter = init_data->transmitter;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn35/dcn35_dio_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn35/dcn35_dio_link_encoder.c
index 81e349d5835b..da94e5309fba 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn35/dcn35_dio_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn35/dcn35_dio_link_encoder.c
@@ -184,6 +184,8 @@ void dcn35_link_encoder_construct(
enc10->base.hpd_source = init_data->hpd_source;
enc10->base.connector = init_data->connector;
+ if (enc10->base.connector.id == CONNECTOR_ID_USBC)
+ enc10->base.features.flags.bits.DP_IS_USB_C = 1;
enc10->base.preferred_engine = ENGINE_ID_UNKNOWN;
@@ -238,8 +240,6 @@ void dcn35_link_encoder_construct(
}
enc10->base.features.flags.bits.HDMI_6GB_EN = 1;
- if (enc10->base.connector.id == CONNECTOR_ID_USBC)
- enc10->base.features.flags.bits.DP_IS_USB_C = 1;
if (bp_funcs->get_connector_speed_cap_info)
result = bp_funcs->get_connector_speed_cap_info(enc10->base.ctx->dc_bios,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 13c785323b36b845300b256d0e5963c3727667d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042348-galleria-wielder-75a5@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
13c785323b36 ("serial: stm32: Return IRQ_NONE in the ISR if no handling happend")
c5d06662551c ("serial: stm32: Use port lock wrappers")
a01ae50d7eae ("serial: stm32: replace access to DMAR bit by dmaengine_pause/resume")
7f28bcea824e ("serial: stm32: group dma pause/resume error handling into single function")
00d1f9c6af0d ("serial: stm32: modify parameter and rename stm32_usart_rx_dma_enabled")
db89728abad5 ("serial: stm32: avoid clearing DMAT bit during transfer")
3f6c02fa712b ("serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler")
d7c76716169d ("serial: stm32: Use TC interrupt to deassert GPIO RTS in RS485 mode")
3bcea529b295 ("serial: stm32: Factor out GPIO RTS toggling into separate function")
037b91ec7729 ("serial: stm32: fix software flow control transfer")
d3d079bde07e ("serial: stm32: prevent TDR register overwrite when sending x_char")
195437d14fb4 ("serial: stm32: correct loop for dma error handling")
2a3bcfe03725 ("serial: stm32: fix flow control transfer in DMA mode")
9a135f16d228 ("serial: stm32: rework TX DMA state condition")
56a23f9319e8 ("serial: stm32: move tx dma terminate DMA to shutdown")
6333a4850621 ("serial: stm32: push DMA RX data before suspending")
6eeb348c8482 ("serial: stm32: terminate / restart DMA transfer at suspend / resume")
e0abc903deea ("serial: stm32: rework RX dma initialization and release")
d1ec8a2eabe9 ("serial: stm32: update throttle and unthrottle ops for dma mode")
33bb2f6ac308 ("serial: stm32: rework RX over DMA")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13c785323b36b845300b256d0e5963c3727667d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Wed, 17 Apr 2024 11:03:27 +0200
Subject: [PATCH] serial: stm32: Return IRQ_NONE in the ISR if no handling
happend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If there is a stuck irq that the handler doesn't address, returning
IRQ_HANDLED unconditionally makes it impossible for the irq core to
detect the problem and disable the irq. So only return IRQ_HANDLED if
an event was handled.
A stuck irq is still problematic, but with this change at least it only
makes the UART nonfunctional instead of occupying the (usually only) CPU
by 100% and so stall the whole machine.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/5f92603d0dfd8a5b8014b2b10a902d91e0bb881f.17133441…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 58d169e5c1db..d60cbac69194 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -861,6 +861,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
u32 sr;
unsigned int size;
+ irqreturn_t ret = IRQ_NONE;
sr = readl_relaxed(port->membase + ofs->isr);
@@ -869,11 +870,14 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
(sr & USART_SR_TC)) {
stm32_usart_tc_interrupt_disable(port);
stm32_usart_rs485_rts_disable(port);
+ ret = IRQ_HANDLED;
}
- if ((sr & USART_SR_RTOF) && ofs->icr != UNDEF_REG)
+ if ((sr & USART_SR_RTOF) && ofs->icr != UNDEF_REG) {
writel_relaxed(USART_ICR_RTOCF,
port->membase + ofs->icr);
+ ret = IRQ_HANDLED;
+ }
if ((sr & USART_SR_WUF) && ofs->icr != UNDEF_REG) {
/* Clear wake up flag and disable wake up interrupt */
@@ -882,6 +886,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_WUFIE);
if (irqd_is_wakeup_set(irq_get_irq_data(port->irq)))
pm_wakeup_event(tport->tty->dev, 0);
+ ret = IRQ_HANDLED;
}
/*
@@ -896,6 +901,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_unlock_and_check_sysrq(port);
if (size)
tty_flip_buffer_push(tport);
+ ret = IRQ_HANDLED;
}
}
@@ -903,6 +909,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_port_lock(port);
stm32_usart_transmit_chars(port);
uart_port_unlock(port);
+ ret = IRQ_HANDLED;
}
/* Receiver timeout irq for DMA RX */
@@ -912,9 +919,10 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_unlock_and_check_sysrq(port);
if (size)
tty_flip_buffer_push(tport);
+ ret = IRQ_HANDLED;
}
- return IRQ_HANDLED;
+ return ret;
}
static void stm32_usart_set_mctrl(struct uart_port *port, unsigned int mctrl)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 13c785323b36b845300b256d0e5963c3727667d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042346-correct-footgear-b44a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
13c785323b36 ("serial: stm32: Return IRQ_NONE in the ISR if no handling happend")
c5d06662551c ("serial: stm32: Use port lock wrappers")
a01ae50d7eae ("serial: stm32: replace access to DMAR bit by dmaengine_pause/resume")
7f28bcea824e ("serial: stm32: group dma pause/resume error handling into single function")
00d1f9c6af0d ("serial: stm32: modify parameter and rename stm32_usart_rx_dma_enabled")
db89728abad5 ("serial: stm32: avoid clearing DMAT bit during transfer")
3f6c02fa712b ("serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler")
d7c76716169d ("serial: stm32: Use TC interrupt to deassert GPIO RTS in RS485 mode")
3bcea529b295 ("serial: stm32: Factor out GPIO RTS toggling into separate function")
037b91ec7729 ("serial: stm32: fix software flow control transfer")
d3d079bde07e ("serial: stm32: prevent TDR register overwrite when sending x_char")
195437d14fb4 ("serial: stm32: correct loop for dma error handling")
2a3bcfe03725 ("serial: stm32: fix flow control transfer in DMA mode")
9a135f16d228 ("serial: stm32: rework TX DMA state condition")
56a23f9319e8 ("serial: stm32: move tx dma terminate DMA to shutdown")
6333a4850621 ("serial: stm32: push DMA RX data before suspending")
6eeb348c8482 ("serial: stm32: terminate / restart DMA transfer at suspend / resume")
e0abc903deea ("serial: stm32: rework RX dma initialization and release")
d1ec8a2eabe9 ("serial: stm32: update throttle and unthrottle ops for dma mode")
33bb2f6ac308 ("serial: stm32: rework RX over DMA")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13c785323b36b845300b256d0e5963c3727667d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Wed, 17 Apr 2024 11:03:27 +0200
Subject: [PATCH] serial: stm32: Return IRQ_NONE in the ISR if no handling
happend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If there is a stuck irq that the handler doesn't address, returning
IRQ_HANDLED unconditionally makes it impossible for the irq core to
detect the problem and disable the irq. So only return IRQ_HANDLED if
an event was handled.
A stuck irq is still problematic, but with this change at least it only
makes the UART nonfunctional instead of occupying the (usually only) CPU
by 100% and so stall the whole machine.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/5f92603d0dfd8a5b8014b2b10a902d91e0bb881f.17133441…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 58d169e5c1db..d60cbac69194 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -861,6 +861,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
u32 sr;
unsigned int size;
+ irqreturn_t ret = IRQ_NONE;
sr = readl_relaxed(port->membase + ofs->isr);
@@ -869,11 +870,14 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
(sr & USART_SR_TC)) {
stm32_usart_tc_interrupt_disable(port);
stm32_usart_rs485_rts_disable(port);
+ ret = IRQ_HANDLED;
}
- if ((sr & USART_SR_RTOF) && ofs->icr != UNDEF_REG)
+ if ((sr & USART_SR_RTOF) && ofs->icr != UNDEF_REG) {
writel_relaxed(USART_ICR_RTOCF,
port->membase + ofs->icr);
+ ret = IRQ_HANDLED;
+ }
if ((sr & USART_SR_WUF) && ofs->icr != UNDEF_REG) {
/* Clear wake up flag and disable wake up interrupt */
@@ -882,6 +886,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_WUFIE);
if (irqd_is_wakeup_set(irq_get_irq_data(port->irq)))
pm_wakeup_event(tport->tty->dev, 0);
+ ret = IRQ_HANDLED;
}
/*
@@ -896,6 +901,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_unlock_and_check_sysrq(port);
if (size)
tty_flip_buffer_push(tport);
+ ret = IRQ_HANDLED;
}
}
@@ -903,6 +909,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_port_lock(port);
stm32_usart_transmit_chars(port);
uart_port_unlock(port);
+ ret = IRQ_HANDLED;
}
/* Receiver timeout irq for DMA RX */
@@ -912,9 +919,10 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_unlock_and_check_sysrq(port);
if (size)
tty_flip_buffer_push(tport);
+ ret = IRQ_HANDLED;
}
- return IRQ_HANDLED;
+ return ret;
}
static void stm32_usart_set_mctrl(struct uart_port *port, unsigned int mctrl)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 13c785323b36b845300b256d0e5963c3727667d7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042345-brunette-striking-6f32@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
13c785323b36 ("serial: stm32: Return IRQ_NONE in the ISR if no handling happend")
c5d06662551c ("serial: stm32: Use port lock wrappers")
a01ae50d7eae ("serial: stm32: replace access to DMAR bit by dmaengine_pause/resume")
7f28bcea824e ("serial: stm32: group dma pause/resume error handling into single function")
00d1f9c6af0d ("serial: stm32: modify parameter and rename stm32_usart_rx_dma_enabled")
db89728abad5 ("serial: stm32: avoid clearing DMAT bit during transfer")
3f6c02fa712b ("serial: stm32: Merge hard IRQ and threaded IRQ handling into single IRQ handler")
d7c76716169d ("serial: stm32: Use TC interrupt to deassert GPIO RTS in RS485 mode")
3bcea529b295 ("serial: stm32: Factor out GPIO RTS toggling into separate function")
037b91ec7729 ("serial: stm32: fix software flow control transfer")
d3d079bde07e ("serial: stm32: prevent TDR register overwrite when sending x_char")
195437d14fb4 ("serial: stm32: correct loop for dma error handling")
2a3bcfe03725 ("serial: stm32: fix flow control transfer in DMA mode")
9a135f16d228 ("serial: stm32: rework TX DMA state condition")
56a23f9319e8 ("serial: stm32: move tx dma terminate DMA to shutdown")
6333a4850621 ("serial: stm32: push DMA RX data before suspending")
6eeb348c8482 ("serial: stm32: terminate / restart DMA transfer at suspend / resume")
e0abc903deea ("serial: stm32: rework RX dma initialization and release")
d1ec8a2eabe9 ("serial: stm32: update throttle and unthrottle ops for dma mode")
33bb2f6ac308 ("serial: stm32: rework RX over DMA")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 13c785323b36b845300b256d0e5963c3727667d7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Uwe=20Kleine-K=C3=B6nig?= <u.kleine-koenig(a)pengutronix.de>
Date: Wed, 17 Apr 2024 11:03:27 +0200
Subject: [PATCH] serial: stm32: Return IRQ_NONE in the ISR if no handling
happend
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
If there is a stuck irq that the handler doesn't address, returning
IRQ_HANDLED unconditionally makes it impossible for the irq core to
detect the problem and disable the irq. So only return IRQ_HANDLED if
an event was handled.
A stuck irq is still problematic, but with this change at least it only
makes the UART nonfunctional instead of occupying the (usually only) CPU
by 100% and so stall the whole machine.
Fixes: 48a6092fb41f ("serial: stm32-usart: Add STM32 USART Driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/5f92603d0dfd8a5b8014b2b10a902d91e0bb881f.17133441…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 58d169e5c1db..d60cbac69194 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -861,6 +861,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
u32 sr;
unsigned int size;
+ irqreturn_t ret = IRQ_NONE;
sr = readl_relaxed(port->membase + ofs->isr);
@@ -869,11 +870,14 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
(sr & USART_SR_TC)) {
stm32_usart_tc_interrupt_disable(port);
stm32_usart_rs485_rts_disable(port);
+ ret = IRQ_HANDLED;
}
- if ((sr & USART_SR_RTOF) && ofs->icr != UNDEF_REG)
+ if ((sr & USART_SR_RTOF) && ofs->icr != UNDEF_REG) {
writel_relaxed(USART_ICR_RTOCF,
port->membase + ofs->icr);
+ ret = IRQ_HANDLED;
+ }
if ((sr & USART_SR_WUF) && ofs->icr != UNDEF_REG) {
/* Clear wake up flag and disable wake up interrupt */
@@ -882,6 +886,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_WUFIE);
if (irqd_is_wakeup_set(irq_get_irq_data(port->irq)))
pm_wakeup_event(tport->tty->dev, 0);
+ ret = IRQ_HANDLED;
}
/*
@@ -896,6 +901,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_unlock_and_check_sysrq(port);
if (size)
tty_flip_buffer_push(tport);
+ ret = IRQ_HANDLED;
}
}
@@ -903,6 +909,7 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_port_lock(port);
stm32_usart_transmit_chars(port);
uart_port_unlock(port);
+ ret = IRQ_HANDLED;
}
/* Receiver timeout irq for DMA RX */
@@ -912,9 +919,10 @@ static irqreturn_t stm32_usart_interrupt(int irq, void *ptr)
uart_unlock_and_check_sysrq(port);
if (size)
tty_flip_buffer_push(tport);
+ ret = IRQ_HANDLED;
}
- return IRQ_HANDLED;
+ return ret;
}
static void stm32_usart_set_mctrl(struct uart_port *port, unsigned int mctrl)
With deferred IO enabled, a page fault happens when data is written to the
framebuffer device. Then driver determines which page is being updated by
calculating the offset of the written virtual address within the virtual
memory area, and uses this offset to get the updated page within the
internal buffer. This page is later copied to hardware (thus the name
"deferred IO").
This calculation is only correct if the virtual memory area is mapped to
the beginning of the internal buffer. Otherwise this is wrong. For example,
if users do:
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
Then the virtual memory area will mapped at offset 0xff000 within the
internal buffer. This offset 0xff000 is not accounted for, and wrong page
is updated. This will lead to wrong pixels being updated on the device.
However, it gets worse: if users do 2 mmap to the same virtual address, for
example:
int fd = open("/dev/fb0", O_RDWR, 0);
char *ptr = (char *) 0x20000000ul;
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0xff000);
*ptr = 0; // write #1
mmap(ptr, 4096, PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
*ptr = 0; // write #2
In this case, both write #1 and write #2 apply to the same virtual address
(0x20000000ul), and the driver mistakenly thinks the same page is being
written to. When the second write happens, the driver thinks this is the
same page as the last time, and reuse the page from write #1. The driver
then lock_page() an incorrect page, and returns VM_FAULT_LOCKED with the
correct page unlocked. It is unclear what will happen with memory
management subsystem after that, but likely something terrible.
Fix this by taking the mapping offset into account.
Reported-and-tested-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
Closes: https://lore.kernel.org/linux-fbdev/271372d6-e665-4e7f-b088-dee5f4ab341a@or…
Fixes: 56c134f7f1b5 ("fbdev: Track deferred-I/O pages in pageref struct")
Cc: stable(a)vger.kernel.org
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
---
drivers/video/fbdev/core/fb_defio.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index dae96c9f61cf..d5d6cd9e8b29 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -196,7 +196,8 @@ static vm_fault_t fb_deferred_io_track_page(struct fb_info *info, unsigned long
*/
static vm_fault_t fb_deferred_io_page_mkwrite(struct fb_info *info, struct vm_fault *vmf)
{
- unsigned long offset = vmf->address - vmf->vma->vm_start;
+ unsigned long offset = vmf->address - vmf->vma->vm_start
+ + (vmf->vma->vm_pgoff << PAGE_SHIFT);
struct page *page = vmf->page;
file_update_time(vmf->vma->vm_file);
--
2.39.2
Inspired by a patch from [Justin][1] I took a closer look at kdb_read().
Despite Justin's patch being a (correct) one-line manipulation it was a
tough patch to review because the surrounding code was hard to read and
it looked like there were unfixed problems.
This series isn't enough to make kdb_read() beautiful but it does make
it shorter, easier to reason about and fixes two buffer overflows and a
screen redraw problem!
[1]: https://lore.kernel.org/all/20240403-strncpy-kernel-debug-kdb-kdb_io-c-v1-1…
Signed-off-by: Daniel Thompson <daniel.thompson(a)linaro.org>
---
Changes in v2:
- No code changes!
- I belatedly realized that one of the cleanups actually fixed a buffer
overflow so there are changes to Cc: (to add stable@...) and to one
of the patch descriptions.
- Link to v1: https://lore.kernel.org/r/20240416-kgdb_read_refactor-v1-0-b18c2d01076d@lin…
---
Daniel Thompson (7):
kdb: Fix buffer overflow during tab-complete
kdb: Use format-strings rather than '\0' injection in kdb_read()
kdb: Fix console handling when editing and tab-completing commands
kdb: Merge identical case statements in kdb_read()
kdb: Use format-specifiers rather than memset() for padding in kdb_read()
kdb: Replace double memcpy() with memmove() in kdb_read()
kdb: Simplify management of tmpbuffer in kdb_read()
kernel/debug/kdb/kdb_io.c | 133 ++++++++++++++++++++--------------------------
1 file changed, 58 insertions(+), 75 deletions(-)
---
base-commit: dccce9b8780618986962ba37c373668bcf426866
change-id: 20240415-kgdb_read_refactor-2ea2dfc15dbb
Best regards,
--
Daniel Thompson <daniel.thompson(a)linaro.org>
[ Upstream commit 9fb9eb4b59acc607e978288c96ac7efa917153d4 ]
systems, using igb driver, crash while executing poweroff command
as per following call stack:
crash> bt -a
PID: 62583 TASK: ffff97ebbf28dc40 CPU: 0 COMMAND: "poweroff"
#0 [ffffa7adcd64f8a0] machine_kexec at ffffffffa606c7c1
#1 [ffffa7adcd64f900] __crash_kexec at ffffffffa613bb52
#2 [ffffa7adcd64f9d0] panic at ffffffffa6099c45
#3 [ffffa7adcd64fa50] oops_end at ffffffffa603359a
#4 [ffffa7adcd64fa78] die at ffffffffa6033c32
#5 [ffffa7adcd64faa8] do_trap at ffffffffa60309a0
#6 [ffffa7adcd64faf8] do_error_trap at ffffffffa60311e7
#7 [ffffa7adcd64fbc0] do_invalid_op at ffffffffa6031320
#8 [ffffa7adcd64fbd0] invalid_op at ffffffffa6a01f2a
[exception RIP: free_msi_irqs+408]
RIP: ffffffffa645d248 RSP: ffffa7adcd64fc88 RFLAGS: 00010286
RAX: ffff97eb1396fe00 RBX: 0000000000000000 RCX: ffff97eb1396fe00
RDX: ffff97eb1396fe00 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa7adcd64fcb0 R8: 0000000000000002 R9: 000000000000fbff
R10: 0000000000000000 R11: 0000000000000000 R12: ffff98c047af4720
R13: ffff97eb87cd32a0 R14: ffff97eb87cd3000 R15: ffffa7adcd64fd57
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#9 [ffffa7adcd64fc80] free_msi_irqs at ffffffffa645d0fc
#10 [ffffa7adcd64fcb8] pci_disable_msix at ffffffffa645d896
#11 [ffffa7adcd64fce0] igb_reset_interrupt_capability at ffffffffc024f335 [igb]
#12 [ffffa7adcd64fd08] __igb_shutdown at ffffffffc0258ed7 [igb]
#13 [ffffa7adcd64fd48] igb_shutdown at ffffffffc025908b [igb]
#14 [ffffa7adcd64fd70] pci_device_shutdown at ffffffffa6441e3a
#15 [ffffa7adcd64fd98] device_shutdown at ffffffffa6570260
#16 [ffffa7adcd64fdc8] kernel_power_off at ffffffffa60c0725
#17 [ffffa7adcd64fdd8] SYSC_reboot at ffffffffa60c08f1
#18 [ffffa7adcd64ff18] sys_reboot at ffffffffa60c09ee
#19 [ffffa7adcd64ff28] do_syscall_64 at ffffffffa6003ca9
#20 [ffffa7adcd64ff50] entry_SYSCALL_64_after_hwframe at ffffffffa6a001b1
This happens because igb_shutdown has not yet freed up allocated irqs and
free_msi_irqs finds irq_has_action true for involved msi irqs here and this
condition triggers BUG_ON.
Freeing irqs before proceeding further in igb_clear_interrupt_scheme,
fixes this problem.
Signed-off-by: Imran Khan <imran.f.khan(a)oracle.com>
---
This issue does not happen in v5.17 or later kernel versions because
'commit 9fb9eb4b59ac ("PCI/MSI: Let core code free MSI descriptors")',
explicitly frees up MSI based irqs and hence indirectly fixes this issue
as well. Also this is why I have mentioned this commit as equivalent
upstream commit. But this upstream change itself is dependent on a bunch
of changes starting from 'commit 288c81ce4be7 ("PCI/MSI: Move code into a
separate directory")', which refactored msi driver into multiple parts.
So another way of fixing this issue would be to backport these patches and
get this issue implictly fixed.
Kindly let me know if my current patch is not acceptable and in that case
will it be fine if I backport the above mentioned msi driver refactoring
patches to LST.
drivers/net/ethernet/intel/igb/igb_main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 7c42a99be5065..5b59fb65231ba 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -107,6 +107,7 @@ static int igb_setup_all_rx_resources(struct igb_adapter *);
static void igb_free_all_tx_resources(struct igb_adapter *);
static void igb_free_all_rx_resources(struct igb_adapter *);
static void igb_setup_mrqc(struct igb_adapter *);
+static void igb_free_irq(struct igb_adapter *adapter);
static int igb_probe(struct pci_dev *, const struct pci_device_id *);
static void igb_remove(struct pci_dev *pdev);
static int igb_sw_init(struct igb_adapter *);
@@ -1080,6 +1081,7 @@ static void igb_free_q_vectors(struct igb_adapter *adapter)
*/
static void igb_clear_interrupt_scheme(struct igb_adapter *adapter)
{
+ igb_free_irq(adapter);
igb_free_q_vectors(adapter);
igb_reset_interrupt_capability(adapter);
}
--
2.34.1
From: Joakim Sindholt <opensource(a)zhasha.com>
[ Upstream commit cd25e15e57e68a6b18dc9323047fe9c68b99290b ]
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u.
Signed-off-by: Joakim Sindholt <opensource(a)zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 72b779bc09422..d1a0f36dcdd43 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -101,7 +101,7 @@ static int p9mode2perm(struct v9fs_session_info *v9ses,
int res;
int mode = stat->mode;
- res = mode & S_IALLUGO;
+ res = mode & 0777; /* S_IRWXUGO */
if (v9fs_proto_dotu(v9ses)) {
if ((mode & P9_DMSETUID) == P9_DMSETUID)
res |= S_ISUID;
--
2.43.0
From: Joakim Sindholt <opensource(a)zhasha.com>
[ Upstream commit cd25e15e57e68a6b18dc9323047fe9c68b99290b ]
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u.
Signed-off-by: Joakim Sindholt <opensource(a)zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0791480bf922b..88ca5015f987e 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -86,7 +86,7 @@ static int p9mode2perm(struct v9fs_session_info *v9ses,
int res;
int mode = stat->mode;
- res = mode & S_IALLUGO;
+ res = mode & 0777; /* S_IRWXUGO */
if (v9fs_proto_dotu(v9ses)) {
if ((mode & P9_DMSETUID) == P9_DMSETUID)
res |= S_ISUID;
--
2.43.0
From: Joakim Sindholt <opensource(a)zhasha.com>
[ Upstream commit cd25e15e57e68a6b18dc9323047fe9c68b99290b ]
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u.
Signed-off-by: Joakim Sindholt <opensource(a)zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 0d9b7d453a877..75907f77f9e38 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -87,7 +87,7 @@ static int p9mode2perm(struct v9fs_session_info *v9ses,
int res;
int mode = stat->mode;
- res = mode & S_IALLUGO;
+ res = mode & 0777; /* S_IRWXUGO */
if (v9fs_proto_dotu(v9ses)) {
if ((mode & P9_DMSETUID) == P9_DMSETUID)
res |= S_ISUID;
--
2.43.0
From: Joakim Sindholt <opensource(a)zhasha.com>
[ Upstream commit cd25e15e57e68a6b18dc9323047fe9c68b99290b ]
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u.
Signed-off-by: Joakim Sindholt <opensource(a)zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 5e2657c1dbbe6..a0c5a372dcf62 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -85,7 +85,7 @@ static int p9mode2perm(struct v9fs_session_info *v9ses,
int res;
int mode = stat->mode;
- res = mode & S_IALLUGO;
+ res = mode & 0777; /* S_IRWXUGO */
if (v9fs_proto_dotu(v9ses)) {
if ((mode & P9_DMSETUID) == P9_DMSETUID)
res |= S_ISUID;
--
2.43.0
From: Joakim Sindholt <opensource(a)zhasha.com>
[ Upstream commit cd25e15e57e68a6b18dc9323047fe9c68b99290b ]
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u.
Signed-off-by: Joakim Sindholt <opensource(a)zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index ea695c4a7a3fb..3bdf6df4b553e 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -83,7 +83,7 @@ static int p9mode2perm(struct v9fs_session_info *v9ses,
int res;
int mode = stat->mode;
- res = mode & S_IALLUGO;
+ res = mode & 0777; /* S_IRWXUGO */
if (v9fs_proto_dotu(v9ses)) {
if ((mode & P9_DMSETUID) == P9_DMSETUID)
res |= S_ISUID;
--
2.43.0
From: Joakim Sindholt <opensource(a)zhasha.com>
[ Upstream commit cd25e15e57e68a6b18dc9323047fe9c68b99290b ]
Garbage in plain 9P2000's perm bits is allowed through, which causes it
to be able to set (among others) the suid bit. This was presumably not
the intent since the unix extended bits are handled explicitly and
conditionally on .u.
Signed-off-by: Joakim Sindholt <opensource(a)zhasha.com>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/9p/vfs_inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/9p/vfs_inode.c b/fs/9p/vfs_inode.c
index 32572982f72e6..e337fec9b18e1 100644
--- a/fs/9p/vfs_inode.c
+++ b/fs/9p/vfs_inode.c
@@ -83,7 +83,7 @@ static int p9mode2perm(struct v9fs_session_info *v9ses,
int res;
int mode = stat->mode;
- res = mode & S_IALLUGO;
+ res = mode & 0777; /* S_IRWXUGO */
if (v9fs_proto_dotu(v9ses)) {
if ((mode & P9_DMSETUID) == P9_DMSETUID)
res |= S_ISUID;
--
2.43.0
Removes the output of this purely informational message from the
kernel buffer:
"ntfs3: Max link count 4000"
Signed-off-by: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
Cc: stable(a)vger.kernel.org
---
fs/ntfs3/super.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index 9df7c20d066f..ac4722011140 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1804,8 +1804,6 @@ static int __init init_ntfs_fs(void)
{
int err;
- pr_info("ntfs3: Max link count %u\n", NTFS_LINK_MAX);
-
if (IS_ENABLED(CONFIG_NTFS3_FS_POSIX_ACL))
pr_info("ntfs3: Enabled Linux POSIX ACLs support\n");
if (IS_ENABLED(CONFIG_NTFS3_64BIT_CLUSTER))
--
2.34.1
When counting and checking hard links in an ntfs file record,
struct MFT_REC {
struct NTFS_RECORD_HEADER rhdr; // 'FILE'
__le16 seq; // 0x10: Sequence number for this record.
>> __le16 hard_links; // 0x12: The number of hard links to record.
__le16 attr_off; // 0x14: Offset to attributes.
...
the ntfs3 driver ignored short names (DOS names), causing the link count
to be reduced by 1 and messages to be output to dmesg.
For Windows, such a situation is a minor error, meaning chkdsk does not report
errors on such a volume, and in the case of using the /f switch, it silently
corrects them, reporting that no errors were found. This does not affect
the consistency of the file system.
Nevertheless, the behavior in the ntfs3 driver is incorrect and
changes the content of the file system. This patch should fix that.
PS: most likely, there has been a confusion of concepts
MFT_REC::hard_links and inode::__i_nlink.
Fixes: 82cae269cfa95 ("fs/ntfs3: Add initialization of super block")
Signed-off-by: Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com>
Cc: stable(a)vger.kernel.org
---
fs/ntfs3/inode.c | 7 ++++---
fs/ntfs3/record.c | 11 ++---------
2 files changed, 6 insertions(+), 12 deletions(-)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c
index eb7a8c9fba01..05f169018c4e 100644
--- a/fs/ntfs3/inode.c
+++ b/fs/ntfs3/inode.c
@@ -37,7 +37,7 @@ static struct inode *ntfs_read_mft(struct inode *inode,
bool is_dir;
unsigned long ino = inode->i_ino;
u32 rp_fa = 0, asize, t32;
- u16 roff, rsize, names = 0;
+ u16 roff, rsize, names = 0, links = 0;
const struct ATTR_FILE_NAME *fname = NULL;
const struct INDEX_ROOT *root;
struct REPARSE_DATA_BUFFER rp; // 0x18 bytes
@@ -200,11 +200,12 @@ static struct inode *ntfs_read_mft(struct inode *inode,
rsize < SIZEOF_ATTRIBUTE_FILENAME)
goto out;
+ names += 1;
fname = Add2Ptr(attr, roff);
if (fname->type == FILE_NAME_DOS)
goto next_attr;
- names += 1;
+ links += 1;
if (name && name->len == fname->name_len &&
!ntfs_cmp_names_cpu(name, (struct le_str *)&fname->name_len,
NULL, false))
@@ -429,7 +430,7 @@ static struct inode *ntfs_read_mft(struct inode *inode,
ni->mi.dirty = true;
}
- set_nlink(inode, names);
+ set_nlink(inode, links);
if (S_ISDIR(mode)) {
ni->std_fa |= FILE_ATTRIBUTE_DIRECTORY;
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c
index 6aa3a9d44df1..6c76503edc20 100644
--- a/fs/ntfs3/record.c
+++ b/fs/ntfs3/record.c
@@ -534,16 +534,9 @@ bool mi_remove_attr(struct ntfs_inode *ni, struct mft_inode *mi,
if (aoff + asize > used)
return false;
- if (ni && is_attr_indexed(attr)) {
+ if (ni && is_attr_indexed(attr) && attr->type == ATTR_NAME) {
u16 links = le16_to_cpu(ni->mi.mrec->hard_links);
- struct ATTR_FILE_NAME *fname =
- attr->type != ATTR_NAME ?
- NULL :
- resident_data_ex(attr,
- SIZEOF_ATTRIBUTE_FILENAME);
- if (fname && fname->type == FILE_NAME_DOS) {
- /* Do not decrease links count deleting DOS name. */
- } else if (!links) {
+ if (!links) {
/* minor error. Not critical. */
} else {
ni->mi.mrec->hard_links = cpu_to_le16(links - 1);
--
2.34.1
When KUnit tests are enabled, under very big kernel configurations
(e.g. `allyesconfig`), we can trigger a `rustdoc` ICE [1]:
RUSTDOC TK rust/kernel/lib.rs
error: the compiler unexpectedly panicked. this is a bug.
The reason is that this build step has a duplicated `@rustc_cfg` argument,
which contains the kernel configuration, and thus a lot of arguments. The
factor 2 happens to be enough to reach the ICE.
Thus remove the unneeded `@rustc_cfg`. By doing so, we clean up the
command and workaround the ICE.
The ICE has been fixed in the upcoming Rust 1.79 [2].
Cc: stable(a)vger.kernel.org
Fixes: a66d733da801 ("rust: support running Rust documentation tests as KUnit ones")
Link: https://github.com/rust-lang/rust/issues/122722 [1]
Link: https://github.com/rust-lang/rust/pull/122840 [2]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
rust/Makefile | 1 -
1 file changed, 1 deletion(-)
diff --git a/rust/Makefile b/rust/Makefile
index 846e6ab9d5a9..86a125c4243c 100644
--- a/rust/Makefile
+++ b/rust/Makefile
@@ -175,7 +175,6 @@ quiet_cmd_rustdoc_test_kernel = RUSTDOC TK $<
mkdir -p $(objtree)/$(obj)/test/doctests/kernel; \
OBJTREE=$(abspath $(objtree)) \
$(RUSTDOC) --test $(rust_flags) \
- @$(objtree)/include/generated/rustc_cfg \
-L$(objtree)/$(obj) --extern alloc --extern kernel \
--extern build_error --extern macros \
--extern bindings --extern uapi \
base-commit: 4cece764965020c22cff7665b18a012006359095
--
2.44.0
On Sun, Apr 21, 2024 at 01:11:19PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> configs/hardening: Fix disabling UBSAN configurations
>
> to the 6.8-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> configs-hardening-fix-disabling-ubsan-configurations.patch
> and it can be found in the queue-6.8 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit a54fba0bb1f52707b423c908e153d6429d08db58
> Author: Nathan Chancellor <nathan(a)kernel.org>
> Date: Thu Apr 11 11:11:06 2024 -0700
>
> configs/hardening: Fix disabling UBSAN configurations
>
> [ Upstream commit e048d668f2969cf2b76e0fa21882a1b3bb323eca ]
While I think backporting this makes sense, I don't know that
backporting 918327e9b7ff ("ubsan: Remove CONFIG_UBSAN_SANITIZE_ALL") to
resolve the conflict with 6.8 is entirely necessary (or beneficial, I
don't know how Kees feels about it though). I've attached a version that
applies cleanly to 6.8, in case it is desirable.
Cheers,
Nathan