Since 5.16 and prior to 6.13 KVM can't be used with FSDAX
guest memory (PMD pages). To reproduce the issue you need to reserve
guest memory with `memmap=` cmdline, create and mount FS in DAX mode
(tested both XFS and ext4), see doc link below. ndctl command for test:
ndctl create-namespace -v -e namespace1.0 --map=dev --mode=fsdax -a 2M
Then pass memory object to qemu like:
-m 8G -object memory-backend-file,id=ram0,size=8G,\
mem-path=/mnt/pmem/guestmem,share=on,prealloc=on,dump=off,align=2097152 \
-numa node,memdev=ram0,cpus=0-1
QEMU fails to run guest with error: kvm run failed Bad address
and there are two warnings in dmesg:
WARN_ON_ONCE(!page_count(page)) in kvm_is_zone_device_page() and
WARN_ON_ONCE(folio_ref_count(folio) <= 0) in try_grab_folio() (v6.6.63)
It looks like in the past assumption was made that pfn won't change from
faultin_pfn() to release_pfn_clean(), e.g. see
commit 4cd071d13c5c ("KVM: x86/mmu: Move calls to thp_adjust() down a level")
But kvm_page_fault structure made pfn part of mutable state, so
now release_pfn_clean() can take hugepage-adjusted pfn.
And it works for all cases (/dev/shm, hugetlb, devdax) except fsdax.
Apparently in fsdax mode faultin-pfn and adjusted-pfn may refer to
different folios, so we're getting get_page/put_page imbalance.
To solve this preserve faultin pfn in separate local variable
and pass it in kvm_release_pfn_clean().
Patch tested for all mentioned guest memory backends with tdp_mmu={0,1}.
No bug in upstream as it was solved fundamentally by
commit 8dd861cc07e2 ("KVM: x86/mmu: Put refcounted pages instead of blindly releasing pfns")
and related patch series.
Link: https://nvdimm.docs.kernel.org/2mib_fs_dax.html
Fixes: 2f6305dd5676 ("KVM: MMU: change kvm_tdp_mmu_map() arguments to kvm_page_fault")
Co-developed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Nikolay Kuratov <kniv(a)yandex-team.ru>
---
v1 -> v2:
* Instead of new struct field prefer local variable to snapshot faultin pfn
as suggested by Sean Christopherson.
* Tested patch for 6.1 and 6.12
arch/x86/kvm/mmu/mmu.c | 10 ++++++++--
arch/x86/kvm/mmu/paging_tmpl.h | 5 ++++-
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 294775b7383b..ff85526a9d48 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4363,6 +4363,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
+ kvm_pfn_t orig_pfn;
int r;
/* Dummy roots are used only for shadowing bad guest roots. */
@@ -4384,6 +4385,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
if (r != RET_PF_CONTINUE)
return r;
+ orig_pfn = fault->pfn;
+
r = RET_PF_RETRY;
write_lock(&vcpu->kvm->mmu_lock);
@@ -4398,7 +4401,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
out_unlock:
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
+ kvm_release_pfn_clean(orig_pfn);
return r;
}
@@ -4447,6 +4450,7 @@ EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
struct kvm_page_fault *fault)
{
+ kvm_pfn_t orig_pfn;
int r;
if (page_fault_handle_page_track(vcpu, fault))
@@ -4464,6 +4468,8 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
if (r != RET_PF_CONTINUE)
return r;
+ orig_pfn = fault->pfn;
+
r = RET_PF_RETRY;
read_lock(&vcpu->kvm->mmu_lock);
@@ -4474,7 +4480,7 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
out_unlock:
read_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
+ kvm_release_pfn_clean(orig_pfn);
return r;
}
#endif
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index c85255073f67..c6b2c52aceac 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -777,6 +777,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
struct guest_walker walker;
+ kvm_pfn_t orig_pfn;
int r;
WARN_ON_ONCE(fault->is_tdp);
@@ -835,6 +836,8 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
walker.pte_access &= ~ACC_EXEC_MASK;
}
+ orig_pfn = fault->pfn;
+
r = RET_PF_RETRY;
write_lock(&vcpu->kvm->mmu_lock);
@@ -848,7 +851,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
out_unlock:
write_unlock(&vcpu->kvm->mmu_lock);
- kvm_release_pfn_clean(fault->pfn);
+ kvm_release_pfn_clean(orig_pfn);
return r;
}
--
2.34.1
From: Ard Biesheuvel <ardb(a)kernel.org>
Currently, LPA2 kernel support implies support for up to 52 bits of
physical addressing, and this is reflected in global definitions such as
PHYS_MASK_SHIFT and MAX_PHYSMEM_BITS.
This is potentially problematic, given that LPA2 hardware support is
modeled as a CPU feature which can be overridden, and with LPA2 hardware
support turned off, attempting to map physical regions with address bits
[51:48] set (which may exist on LPA2 capable systems booting with
arm64.nolva) will result in corrupted mappings with a truncated output
address and bogus shareability attributes.
This means that the accepted physical address range in the mapping
routines should be at most 48 bits wide when LPA2 support is configured
but not enabled at runtime.
Fixes: 352b0395b505 ("arm64: Enable 52-bit virtual addressing for 4k and 16k granule configs")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
arch/arm64/include/asm/pgtable-hwdef.h | 6 ------
arch/arm64/include/asm/pgtable-prot.h | 7 +++++++
arch/arm64/include/asm/sparsemem.h | 4 +++-
3 files changed, 10 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable-hwdef.h b/arch/arm64/include/asm/pgtable-hwdef.h
index c78a988cca93..a9136cc551cc 100644
--- a/arch/arm64/include/asm/pgtable-hwdef.h
+++ b/arch/arm64/include/asm/pgtable-hwdef.h
@@ -222,12 +222,6 @@
*/
#define S1_TABLE_AP (_AT(pmdval_t, 3) << 61)
-/*
- * Highest possible physical address supported.
- */
-#define PHYS_MASK_SHIFT (CONFIG_ARM64_PA_BITS)
-#define PHYS_MASK ((UL(1) << PHYS_MASK_SHIFT) - 1)
-
#define TTBR_CNP_BIT (UL(1) << 0)
/*
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 9f9cf13bbd95..a95f1f77bb39 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -81,6 +81,7 @@ extern unsigned long prot_ns_shared;
#define lpa2_is_enabled() false
#define PTE_MAYBE_SHARED PTE_SHARED
#define PMD_MAYBE_SHARED PMD_SECT_S
+#define PHYS_MASK_SHIFT (CONFIG_ARM64_PA_BITS)
#else
static inline bool __pure lpa2_is_enabled(void)
{
@@ -89,8 +90,14 @@ static inline bool __pure lpa2_is_enabled(void)
#define PTE_MAYBE_SHARED (lpa2_is_enabled() ? 0 : PTE_SHARED)
#define PMD_MAYBE_SHARED (lpa2_is_enabled() ? 0 : PMD_SECT_S)
+#define PHYS_MASK_SHIFT (lpa2_is_enabled() ? CONFIG_ARM64_PA_BITS : 48)
#endif
+/*
+ * Highest possible physical address supported.
+ */
+#define PHYS_MASK ((UL(1) << PHYS_MASK_SHIFT) - 1)
+
/*
* If we have userspace only BTI we don't want to mark kernel pages
* guarded even if the system does support BTI.
diff --git a/arch/arm64/include/asm/sparsemem.h b/arch/arm64/include/asm/sparsemem.h
index 8a8acc220371..035e0ca74e88 100644
--- a/arch/arm64/include/asm/sparsemem.h
+++ b/arch/arm64/include/asm/sparsemem.h
@@ -5,7 +5,9 @@
#ifndef __ASM_SPARSEMEM_H
#define __ASM_SPARSEMEM_H
-#define MAX_PHYSMEM_BITS CONFIG_ARM64_PA_BITS
+#include <asm/pgtable-prot.h>
+
+#define MAX_PHYSMEM_BITS PHYS_MASK_SHIFT
/*
* Section size must be at least 512MB for 64K base
--
2.47.0.338.g60cca15819-goog
On Mon, Dec 09 2024 at 06:32, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
> commit 57103d282a874ed716f489b7b336b8d833ba43b2
> Author: Thomas Gleixner <tglx(a)linutronix.de>
> Date: Wed Sep 11 15:17:43 2024 +0200
>
> ntp: Introduce struct ntp_data
>
> [ Upstream commit 68f66f97c5689825012877f58df65964056d4b5d ]
>
> All NTP data is held in static variables. That prevents the NTP code from
> being reuasble for non-system time timekeepers, e.g. per PTP clock
> timekeeping.
>
> Introduce struct ntp_data and move tick_usec into it for a start.
>
> No functional change.
>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Signed-off-by: Anna-Maria Behnsen <anna-maria(a)linutronix.de>
> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
> Acked-by: John Stultz <jstultz(a)google.com>
> Link: https://lore.kernel.org/all/20240911-devel-anna-maria-b4-timers-ptp-ntp-v1-…
> Stable-dep-of: f5807b0606da ("ntp: Remove invalid cast in time offset math")
I sent a backport of this change, which is a one liner:
https://lore.kernel.org/stable/878qssr16f.ffs@tglx/
There is no point to backport the whole data struct change series for
that.
Thanks,
tglx
File-scope "__pmic_glink_lock" mutex protects the file-scope
"__pmic_glink", thus reference to it should be obtained under the lock,
just like pmic_glink_rpmsg_remove() is doing. Otherwise we have a race
during if PMIC GLINK device removal: the pmic_glink_rpmsg_probe()
function could store local reference before mutex in driver removal is
acquired.
Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
Changes in v3:
1. None
Changes in v2:
1. None
---
drivers/soc/qcom/pmic_glink.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c
index caf3f63d940e..11e88053cc11 100644
--- a/drivers/soc/qcom/pmic_glink.c
+++ b/drivers/soc/qcom/pmic_glink.c
@@ -236,10 +236,11 @@ static void pmic_glink_pdr_callback(int state, char *svc_path, void *priv)
static int pmic_glink_rpmsg_probe(struct rpmsg_device *rpdev)
{
- struct pmic_glink *pg = __pmic_glink;
+ struct pmic_glink *pg;
int ret = 0;
mutex_lock(&__pmic_glink_lock);
+ pg = __pmic_glink;
if (!pg) {
ret = dev_err_probe(&rpdev->dev, -ENODEV, "no pmic_glink device to attach to\n");
goto out_unlock;
--
2.43.0
During mass manufacturing, we noticed the mmc_rx_crc_error counter,
as reported by "ethtool -S eth0 | grep mmc_rx_crc_error", to increase
above zero during nuttcp speedtests. Most of the time, this did not
affect the achieved speed, but it prompted this investigation.
Cycling through the rx_delay range on six boards (see table below) of
various ages shows that there is a large good region from 0x12 to 0x35
where we see zero crc errors on all tested boards.
The old rx_delay value (0x10) seems to have always been on the edge for
the KSZ9031RNX that is usually placed on Puma.
Choose "rx_delay = 0x23" to put us smack in the middle of the good
region. This works fine as well with the KSZ9131RNX PHY that was used
for a small number of boards during the COVID chip shortages.
Board S/N PHY rx_delay good region
--------- --- --------------------
Puma TT0069903 KSZ9031RNX 0x11 0x35
Puma TT0157733 KSZ9031RNX 0x11 0x35
Puma TT0681551 KSZ9031RNX 0x12 0x37
Puma TT0681156 KSZ9031RNX 0x10 0x38
Puma 17496030079 KSZ9031RNX 0x10 0x37 (Puma v1.2 from 2017)
Puma TT0681720 KSZ9131RNX 0x02 0x39 (alternative PHY used in very few boards)
Intersection of good regions = 0x12 0x35
Middle of good region = 0x23
Relates-to: PUMA-111
Fixes: 2c66fc34e945 ("arm64: dts: rockchip: add RK3399-Q7 (Puma) SoM")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Quentin Schulz <quentin.schulz(a)cherry.de>
Signed-off-by: Jakob Unterwurzacher <jakob.unterwurzacher(a)cherry.de>
---
v3: use rx_delay = 0x23 instead of 0x11, which was not enough.
v2: cc stable, add "Fixes:", add omitted "there" to commit msg,
add Reviewed-by.
arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi
index 9efcdce0f593..f9b4cd2d7daa 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi
@@ -181,7 +181,7 @@ &gmac {
snps,reset-active-low;
snps,reset-delays-us = <0 10000 50000>;
tx_delay = <0x10>;
- rx_delay = <0x10>;
+ rx_delay = <0x23>;
status = "okay";
};
--
2.39.5
Konrad pointed out during SM8750 review, that numbers are odd, so I
looked at datasheets and downstream DTS for all previous platforms.
Most numbers are odd.
Older platforms like SM8150, SM8250, SC7280, SC8180X seem fine. I could
not check few like SDX75 or SM6115, due to lack of access to datasheets.
SM8350, SM8450, SM8550 tested on hardware. Others not, but I don't
expect any failures because PAS drivers do not use the address space.
Which also explains why odd numbers did not result in any failures.
Best regards,
Krzysztof
---
Krzysztof Kozlowski (19):
arm64: dts: qcom: sm8350: Fix ADSP memory base and length
arm64: dts: qcom: sm8350: Fix CDSP memory base and length
arm64: dts: qcom: sm8350: Fix MPSS memory length
arm64: dts: qcom: sm8450: Fix ADSP memory base and length
arm64: dts: qcom: sm8450: Fix CDSP memory length
arm64: dts: qcom: sm8450: Fix MPSS memory length
arm64: dts: qcom: sm8550: Fix ADSP memory base and length
arm64: dts: qcom: sm8550: Fix CDSP memory length
arm64: dts: qcom: sm8550: Fix MPSS memory length
arm64: dts: qcom: sm8650: Fix ADSP memory base and length
arm64: dts: qcom: sm8650: Fix CDSP memory length
arm64: dts: qcom: sm8650: Fix MPSS memory length
arm64: dts: qcom: x1e80100: Fix ADSP memory base and length
arm64: dts: qcom: x1e80100: Fix CDSP memory length
arm64: dts: qcom: sm6350: Fix ADSP memory length
arm64: dts: qcom: sm6350: Fix MPSS memory length
arm64: dts: qcom: sm6375: Fix ADSP memory length
arm64: dts: qcom: sm6375: Fix CDSP memory base and length
arm64: dts: qcom: sm6375: Fix MPSS memory base and length
arch/arm64/boot/dts/qcom/sm6350.dtsi | 4 +-
arch/arm64/boot/dts/qcom/sm6375.dtsi | 10 +-
arch/arm64/boot/dts/qcom/sm8350.dtsi | 492 ++++++++++++++++-----------------
arch/arm64/boot/dts/qcom/sm8450.dtsi | 216 +++++++--------
arch/arm64/boot/dts/qcom/sm8550.dtsi | 266 +++++++++---------
arch/arm64/boot/dts/qcom/sm8650.dtsi | 300 ++++++++++----------
arch/arm64/boot/dts/qcom/x1e80100.dtsi | 276 +++++++++---------
7 files changed, 782 insertions(+), 782 deletions(-)
---
base-commit: c245a7a79602ccbee780c004c1e4abcda66aec32
change-id: 20241206-dts-qcom-cdsp-mpss-base-address-ae7c406ac9ac
Best regards,
--
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
During suspend/resume process all connectors are explicitly disabled and
then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
It doesn't make sense to check for the Writeback connected status (and
other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
Cc: stable(a)vger.kernel.org
Reported-by: Leonard Lausen <leonard(a)lausen.nl>
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
Tested-by: Leonard Lausen <leonard(a)lausen.nl> # on sc7180 lazor
Tested-by: György Kurucz <me(a)kuruczgy.com>
Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org>
Tested-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
---
Leonard Lausen reported an issue with suspend/resume of the sc7180
devices. Fix the WB atomic check, which caused the issue.
---
Changes in v3:
- Rebased on top of msm-fixes
- Link to v2: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-0-7eac9eb8e895@linaro.org
Changes in v2:
- Reworked the writeback to just drop the connector->status check.
- Expanded commit message for the debugging patch.
- Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
---
drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
index 16f144cbc0c986ee266412223d9e605b01f9fb8c..8ff496082902b1ee713e806140f39b4730ed256a 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
@@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
if (!conn_state || !conn_state->connector) {
DPU_ERROR("invalid connector state\n");
return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
- DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
- return -EINVAL;
}
crtc = conn_state->crtc;
---
base-commit: 86313a9cd152330c634b25d826a281c6a002eb77
change-id: 20240709-dpu-fix-wb-6cd57e3eb182
Best regards,
--
Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Backport this series to 6.1&6.6 because LoongArch gets build errors with
latest binutils which has commit 599df6e2db17d1c4 ("ld, LoongArch: print
error about linking without -fPIC or -fPIE flag in more detail").
CC .vmlinux.export.o
UPD include/generated/utsversion.h
CC init/version-timestamp.o
LD .tmp_vmlinux.kallsyms1
loongarch64-unknown-linux-gnu-ld: kernel/kallsyms.o:(.text+0): relocation R_LARCH_PCALA_HI20 against `kallsyms_markers` can not be used when making a PIE object; recompile with -fPIE
loongarch64-unknown-linux-gnu-ld: kernel/crash_core.o:(.init.text+0x984): relocation R_LARCH_PCALA_HI20 against `kallsyms_names` can not be used when making a PIE object; recompile with -fPIE
loongarch64-unknown-linux-gnu-ld: kernel/bpf/btf.o:(.text+0xcc7c): relocation R_LARCH_PCALA_HI20 against `__start_BTF` can not be used when making a PIE object; recompile with -fPIE
loongarch64-unknown-linux-gnu-ld: BFD (GNU Binutils) 2.43.50.20241126 assertion fail ../../bfd/elfnn-loongarch.c:2673
In theory 5.10&5.15 also need this, but since LoongArch get upstream at
5.19, so I just ignore them because there is no error report about other
archs now.
Weak external linkage is intended for cases where a symbol reference
can remain unsatisfied in the final link. Taking the address of such a
symbol should yield NULL if the reference was not satisfied.
Given that ordinary RIP or PC relative references cannot produce NULL,
some kind of indirection is always needed in such cases, and in position
independent code, this results in a GOT entry. In ordinary code, it is
arch specific but amounts to the same thing.
While unavoidable in some cases, weak references are currently also used
to declare symbols that are always defined in the final link, but not in
the first linker pass. This means we end up with worse codegen for no
good reason. So let's clean this up, by providing preliminary
definitions that are only used as a fallback.
Ard Biesheuvel (3):
kallsyms: Avoid weak references for kallsyms symbols
vmlinux: Avoid weak reference to notes section
btf: Avoid weak external references
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
include/asm-generic/vmlinux.lds.h | 28 ++++++++++++++++++
kernel/bpf/btf.c | 7 +++--
kernel/bpf/sysfs_btf.c | 6 ++--
kernel/kallsyms.c | 6 ----
kernel/kallsyms_internal.h | 30 ++++++++------------
kernel/ksysfs.c | 4 +--
lib/buildid.c | 4 +--
7 files changed, 52 insertions(+), 33 deletions(-)
---
2.27.0