This patch fixes an issue that a hugetlb uffd-wr-protected mapping can be
writable even with uffd-wp bit set. It only happens with all these
conditions met: (1) hugetlb memory (2) private mapping (3) original mapping
was missing, then (4) being wr-protected (IOW, pte marker installed). Then
write to the page to trigger.
Userfaultfd-wp trap for hugetlb was implemented in hugetlb_fault() before
even reaching hugetlb_wp() to avoid taking more locks that userfault won't
need. However there's one CoW optimization path for missing hugetlb page
that can trigger hugetlb_wp() inside hugetlb_no_page(), that can bypass the
userfaultfd-wp traps.
A few ways to resolve this:
(1) Skip the CoW optimization for hugetlb private mapping, considering
that private mappings for hugetlb should be very rare, so it may not
really be helpful to major workloads. The worst case is we only skip the
optimization if userfaultfd_wp(vma)==true, because uffd-wp needs another
fault anyway.
(2) Move the userfaultfd-wp handling for hugetlb from hugetlb_fault()
into hugetlb_wp(). The major cons is there're a bunch of locks taken
when calling hugetlb_wp(), and that will make the changeset unnecessarily
complicated due to the lock operations.
(3) Carry over uffd-wp bit in hugetlb_wp(), so it'll need to fault again
for uffd-wp privately mapped pages.
This patch chose option (3) which contains the minimum changeset (simplest
for backport) and also make sure hugetlb_wp() itself will start to be
always safe with uffd-wp ptes even if called elsewhere in the future.
This patch will be needed for v5.19+ hence copy stable.
Reported-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: linux-stable <stable(a)vger.kernel.org>
Fixes: 166f3ecc0daf ("mm/hugetlb: hook page faults for uffd write protection")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/hugetlb.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8bfd07f4c143..22337b191eae 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5478,7 +5478,7 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
struct folio *pagecache_folio, spinlock_t *ptl)
{
const bool unshare = flags & FAULT_FLAG_UNSHARE;
- pte_t pte;
+ pte_t pte, newpte;
struct hstate *h = hstate_vma(vma);
struct page *old_page;
struct folio *new_folio;
@@ -5622,8 +5622,10 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma,
mmu_notifier_invalidate_range(mm, range.start, range.end);
page_remove_rmap(old_page, vma, true);
hugepage_add_new_anon_rmap(new_folio, vma, haddr);
- set_huge_pte_at(mm, haddr, ptep,
- make_huge_pte(vma, &new_folio->page, !unshare));
+ newpte = make_huge_pte(vma, &new_folio->page, !unshare);
+ if (huge_pte_uffd_wp(pte))
+ newpte = huge_pte_mkuffd_wp(newpte);
+ set_huge_pte_at(mm, haddr, ptep, newpte);
folio_set_hugetlb_migratable(new_folio);
/* Make the old page be freed below */
new_folio = page_folio(old_page);
--
2.39.1
The Hyper-V "EnlightenedNptTlb" enlightenment is always enabled when KVM
is running on top of Hyper-V and Hyper-V exposes support for it (which
is always). On AMD CPUs this enlightenment results in ASID invalidations
not flushing TLB entries derived from the NPT. To force the underlying
(L0) hypervisor to rebuild its shadow page tables, an explicit hypercall
is needed.
The original KVM implementation of Hyper-V's "EnlightenedNptTlb" on SVM
only added remote TLB flush hooks. This worked out fine for a while, as
sufficient remote TLB flushes where being issued in KVM to mask the
problem. Since v5.17, changes in the TDP code reduced the number of
flushes and the out-of-sync TLB prevents guests from booting
successfully.
Split svm_flush_tlb_current() into separate callbacks for the 3 cases
(guest/all/current), and issue the required Hyper-V hypercall when a
Hyper-V TLB flush is needed. The most important case where the TLB flush
was missing is when loading a new PGD, which is followed by what is now
svm_flush_tlb_current(). Since the hypercall acts on all CPUs, cache the
last flushed root in kvm_arch->hv_root_tdp. This prevents the shadow
NPTs from being unnecessarily rebuilt for multiple vcpus and when the
same root is flushed multiple times in a row on a single vcpu.
Cc: stable(a)vger.kernel.org # v5.17+
Fixes: 1e0c7d40758b ("KVM: SVM: hyper-v: Remote TLB flush for SVM")
Link: https://lore.kernel.org/lkml/43980946-7bbf-dcef-7e40-af904c456250@linux.mic…
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
---
arch/x86/kvm/kvm_onhyperv.c | 23 +++++++++++++++++++++++
arch/x86/kvm/kvm_onhyperv.h | 5 +++++
arch/x86/kvm/svm/svm.c | 18 +++++++++++++++---
3 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
index 482d6639ef88..036e04c0a161 100644
--- a/arch/x86/kvm/kvm_onhyperv.c
+++ b/arch/x86/kvm/kvm_onhyperv.c
@@ -94,6 +94,29 @@ int hv_remote_flush_tlb(struct kvm *kvm)
}
EXPORT_SYMBOL_GPL(hv_remote_flush_tlb);
+void hv_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+ struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
+ hpa_t root_tdp = vcpu->arch.mmu->root.hpa;
+
+ if (kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb && VALID_PAGE(root_tdp)) {
+ spin_lock(&kvm_arch->hv_root_tdp_lock);
+ if (kvm_arch->hv_root_tdp != root_tdp) {
+ hyperv_flush_guest_mapping(root_tdp);
+ kvm_arch->hv_root_tdp = root_tdp;
+ }
+ spin_unlock(&kvm_arch->hv_root_tdp_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(hv_flush_tlb_current);
+
+void hv_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+ if (WARN_ON_ONCE(kvm_x86_ops.tlb_remote_flush == hv_remote_flush_tlb))
+ hv_remote_flush_tlb(vcpu->kvm);
+}
+EXPORT_SYMBOL_GPL(hv_flush_tlb_all);
+
void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
{
struct kvm_arch *kvm_arch = &vcpu->kvm->arch;
diff --git a/arch/x86/kvm/kvm_onhyperv.h b/arch/x86/kvm/kvm_onhyperv.h
index 287e98ef9df3..f24d0ca41d2b 100644
--- a/arch/x86/kvm/kvm_onhyperv.h
+++ b/arch/x86/kvm/kvm_onhyperv.h
@@ -10,11 +10,16 @@
int hv_remote_flush_tlb_with_range(struct kvm *kvm,
struct kvm_tlb_range *range);
int hv_remote_flush_tlb(struct kvm *kvm);
+void hv_flush_tlb_current(struct kvm_vcpu *vcpu);
+void hv_flush_tlb_all(struct kvm_vcpu *vcpu);
void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp);
#else /* !CONFIG_HYPERV */
static inline void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
{
}
+
+static inline void hv_flush_tlb_current(struct kvm_vcpu *vcpu) { }
+static inline void hv_flush_tlb_all(struct kvm_vcpu *vcpu) { }
#endif /* !CONFIG_HYPERV */
#endif
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 252e7f37e4e2..8da6740ef595 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3729,7 +3729,7 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu)
svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
}
-static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+static void svm_flush_tlb_asid(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -3753,6 +3753,18 @@ static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
svm->current_vmcb->asid_generation--;
}
+static void svm_flush_tlb_current(struct kvm_vcpu *vcpu)
+{
+ hv_flush_tlb_current(vcpu);
+ svm_flush_tlb_asid(vcpu);
+}
+
+static void svm_flush_tlb_all(struct kvm_vcpu *vcpu)
+{
+ hv_flush_tlb_all(vcpu);
+ svm_flush_tlb_asid(vcpu);
+}
+
static void svm_flush_tlb_gva(struct kvm_vcpu *vcpu, gva_t gva)
{
struct vcpu_svm *svm = to_svm(vcpu);
@@ -4745,10 +4757,10 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
.set_rflags = svm_set_rflags,
.get_if_flag = svm_get_if_flag,
- .flush_tlb_all = svm_flush_tlb_current,
+ .flush_tlb_all = svm_flush_tlb_all,
.flush_tlb_current = svm_flush_tlb_current,
.flush_tlb_gva = svm_flush_tlb_gva,
- .flush_tlb_guest = svm_flush_tlb_current,
+ .flush_tlb_guest = svm_flush_tlb_asid,
.vcpu_pre_run = svm_vcpu_pre_run,
.vcpu_run = svm_vcpu_run,
--
2.37.2
On some platforms there are some platform devices created with
invalid names. For example: "HID-SENSOR-INT-020b?.39.auto" instead
of "HID-SENSOR-INT-020b.39.auto"
This string include some invalid characters, hence it will fail to
properly load the driver which will handle this custom sensor. Also
it is a problem for some user space tools, which parses the device
names from ftrace and dmesg.
This is because the string, real_usage, is not NULL terminated and
printed with %s to form device name.
To address this, initialize the real_usage string with 0s.
Reported-and-tested-by: Todd Brandt <todd.e.brandt(a)linux.intel.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217169
Fixes: 98c062e82451 ("HID: hid-sensor-custom: Allow more custom iio sensors")
Cc: stable(a)vger.kernel.org
Suggested-by: Philipp Jungkamp <p.jungkamp(a)gmx.net>
Signed-off-by: Philipp Jungkamp <p.jungkamp(a)gmx.net>
Signed-off-by: Todd Brandt <todd.e.brandt(a)intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
---
Changes in v4:
- add the Fixes line
- add patch version change list
Changes in v3:
- update the changelog
- add proper reviewed/signed/suggested links
Changes in v2:
- update the changelog
drivers/hid/hid-sensor-custom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/hid-sensor-custom.c b/drivers/hid/hid-sensor-custom.c
index 3e3f89e01d81..d85398721659 100644
--- a/drivers/hid/hid-sensor-custom.c
+++ b/drivers/hid/hid-sensor-custom.c
@@ -940,7 +940,7 @@ hid_sensor_register_platform_device(struct platform_device *pdev,
struct hid_sensor_hub_device *hsdev,
const struct hid_sensor_custom_match *match)
{
- char real_usage[HID_SENSOR_USAGE_LENGTH];
+ char real_usage[HID_SENSOR_USAGE_LENGTH] = { 0 };
struct platform_device *custom_pdev;
const char *dev_name;
char *c;
--
2.17.1
Read mmu_invalidate_seq before dropping the mmap_lock so that KVM can
detect if the results of vma_lookup() (e.g. vma_shift) become stale
before it acquires kvm->mmu_lock. This fixes a theoretical bug where a
VMA could be changed by userspace after vma_lookup() and before KVM
reads the mmu_invalidate_seq, causing KVM to install page table entries
based on a (possibly) no-longer-valid vma_shift.
Re-order the MMU cache top-up to earlier in user_mem_abort() so that it
is not done after KVM has read mmu_invalidate_seq (i.e. so as to avoid
inducing spurious fault retries).
It's unlikely that any sane userspace currently modifies VMAs in such a
way as to trigger this race. And even with directed testing I was unable
to reproduce it. But a sufficiently motivated host userspace might be
able to exploit this race.
Note KVM/ARM had the same bug and was fixed in a separate, near
identical patch (see Link).
Link: https://lore.kernel.org/kvm/20230313235454.2964067-1-dmatlack@google.com/
Fixes: 9955371cc014 ("RISC-V: KVM: Implement MMU notifiers")
Cc: stable(a)vger.kernel.org
Signed-off-by: David Matlack <dmatlack(a)google.com>
---
Note: Compile-tested only.
arch/riscv/kvm/mmu.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 78211aed36fa..46d692995830 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -628,6 +628,13 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
!(memslot->flags & KVM_MEM_READONLY)) ? true : false;
unsigned long vma_pagesize, mmu_seq;
+ /* We need minimum second+third level pages */
+ ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
+ if (ret) {
+ kvm_err("Failed to topup G-stage cache\n");
+ return ret;
+ }
+
mmap_read_lock(current->mm);
vma = vma_lookup(current->mm, hva);
@@ -648,6 +655,15 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
if (vma_pagesize == PMD_SIZE || vma_pagesize == PUD_SIZE)
gfn = (gpa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT;
+ /*
+ * Read mmu_invalidate_seq so that KVM can detect if the results of
+ * vma_lookup() or gfn_to_pfn_prot() become stale priort to acquiring
+ * kvm->mmu_lock.
+ *
+ * Rely on mmap_read_unlock() for an implicit smp_rmb(), which pairs
+ * with the smp_wmb() in kvm_mmu_invalidate_end().
+ */
+ mmu_seq = kvm->mmu_invalidate_seq;
mmap_read_unlock(current->mm);
if (vma_pagesize != PUD_SIZE &&
@@ -657,15 +673,6 @@ int kvm_riscv_gstage_map(struct kvm_vcpu *vcpu,
return -EFAULT;
}
- /* We need minimum second+third level pages */
- ret = kvm_mmu_topup_memory_cache(pcache, gstage_pgd_levels);
- if (ret) {
- kvm_err("Failed to topup G-stage cache\n");
- return ret;
- }
-
- mmu_seq = kvm->mmu_invalidate_seq;
-
hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writable);
if (hfn == KVM_PFN_ERR_HWPOISON) {
send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva,
base-commit: eeac8ede17557680855031c6f305ece2378af326
--
2.40.0.rc2.332.ga46443480c-goog
The crypto_unregister_alg() function expects callers to ensure that any
algorithm that is unregistered has a refcnt of exactly 1, and issues a
BUG_ON() if this is not the case. However, there are in fact drivers that
will call crypto_unregister_alg() without ensuring that the refcnt has been
lowered first, most notably on system shutdown. This causes the BUG_ON() to
trigger, which prevents a clean shutdown and hangs the system.
To avoid such hangs on shutdown, demote the BUG_ON() in
crypto_unregister_alg() to a WARN_ON() with early return. Cc stable because
this problem was observed on a 6.2 kernel, cf the link below.
Link: https://lore.kernel.org/r/87r0tyq8ph.fsf@toke.dk
Cc: stable(a)vger.kernel.org
Signed-off-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
---
v2:
- Return early if the WARN_ON() triggers
crypto/algapi.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/crypto/algapi.c b/crypto/algapi.c
index d08f864f08be..9de0677b3643 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -493,7 +493,9 @@ void crypto_unregister_alg(struct crypto_alg *alg)
if (WARN(ret, "Algorithm %s is not registered", alg->cra_driver_name))
return;
- BUG_ON(refcount_read(&alg->cra_refcnt) != 1);
+ if (WARN_ON(refcount_read(&alg->cra_refcnt) != 1))
+ return;
+
if (alg->cra_destroy)
alg->cra_destroy(alg);
--
2.39.2
Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developers don't keep an eye on it, I decided to forward it by
mail (note, the reporter *is not* CCed to this mail, see[1]).
Note, it's a stable regression, so it's a bit unclear who's responsible.
I decided to forward it nevertheless, as some of you might want to be
aware of this or might even have an idea what's wrong.
Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=217225 :
> runrin 2023-03-21 17:49:34 UTC
>
> I occasionally manage my fan speed manually by changing the `level' in
> `/proc/acpi/ibm/fan'. As of upgrading to 6.1.20, I am now getting the
> following error when attempting to change the fan speed:
>
> $ echo 'level auto' > /proc/acpi/ibm/fan
> echo: write error: invalid argument
>
> While troubleshooting, I tried double checking that fan_control had been
> set to 1 as noted in the documentation.
>
> I recently upgraded my kernel a few versions at once, so unfortunately i
> can't be sure when this bug originated. It was working with 6.1.15, and
> since upgrading to 6.1.20 it is no longer working.
See the ticket for more details. The reporter was already asked to
bisect. I'll ask to consider testing latest mainline.
[TLDR for the rest of this mail: I'm adding this report to the list of
tracked Linux kernel regressions; the text you find below is based on a
few templates paragraphs you might have encountered already in similar
form.]
BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: v6.1.15..v6.1.20
https://bugzilla.kernel.org/show_bug.cgi?id=217225
#regzbot title: platform: thinkpad: can no longer alter /proc/acpi/ibm/fan
#regzbot ignore-activity
This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.
Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (e.g. the buzgzilla ticket and maybe this mail as well, if
this thread sees some discussion). See page linked in footer for details.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
From: Martin Leung <Martin.Leung(a)amd.com>
[Why & How]
when trying to fix a nullptr dereference on VMs,
accidentally doubly allocated memory for the non VM
case. removed the extra link_srv creation since
dc_construct_ctx is called in both VM and non VM cases
Also added a proper fail check for if kzalloc fails
Cc: stable(a)vger.kernel.org
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Leo Ma <Hanghong.Ma(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Martin Leung <Martin.Leung(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 40f2e174c524..52564b93f7eb 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -887,7 +887,10 @@ static bool dc_construct_ctx(struct dc *dc,
}
dc->ctx = dc_ctx;
+
dc->link_srv = link_create_link_service();
+ if (!dc->link_srv)
+ return false;
return true;
}
@@ -986,8 +989,6 @@ static bool dc_construct(struct dc *dc,
goto fail;
}
- dc->link_srv = link_create_link_service();
-
dc->res_pool = dc_create_resource_pool(dc, init_params, dc_ctx->dce_version);
if (!dc->res_pool)
goto fail;
--
2.34.1
From: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
[Why]
While scanning the top_pipe connections we can run into a case where
the bottom pipe is still connected to a top_pipe but with a NULL
plane_state.
[How]
Treat a NULL plane_state the same as the plane being invisible for
pipe cursor disable logic.
Cc: stable(a)vger.kernel.org
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Charlene Liu <Charlene.Liu(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
index 7f9cceb49f4e..46ca88741cb8 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_hw_sequencer.c
@@ -3385,7 +3385,9 @@ static bool dcn10_can_pipe_disable_cursor(struct pipe_ctx *pipe_ctx)
for (test_pipe = pipe_ctx->top_pipe; test_pipe;
test_pipe = test_pipe->top_pipe) {
// Skip invisible layer and pipe-split plane on same layer
- if (!test_pipe->plane_state->visible || test_pipe->plane_state->layer_index == cur_layer)
+ if (!test_pipe->plane_state ||
+ !test_pipe->plane_state->visible ||
+ test_pipe->plane_state->layer_index == cur_layer)
continue;
r2 = test_pipe->plane_res.scl_data.recout;
--
2.34.1