When a device is matched via PRP0001, the driver's OF (DT) match table
must be used to obtain the device match data. If a driver provides both
an acpi_match_table and an of_match_table, the current
acpi_device_get_match_data() path consults the driver's acpi_match_table
and returns NULL (no ACPI ID matches).
Explicitly detect PRP0001 and fetch match data from the driver's
of_match_table via acpi_of_device_get_match_data().
Fixes: 886ca88be6b3 ("ACPI / bus: Respect PRP0001 when retrieving device match data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kartik Rajput <kkartik(a)nvidia.com>
---
Changes in v2:
* Fix build errors.
---
drivers/acpi/bus.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 5e110badac7b..6658c4339656 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1031,8 +1031,9 @@ const void *acpi_device_get_match_data(const struct device *dev)
{
const struct acpi_device_id *acpi_ids = dev->driver->acpi_match_table;
const struct acpi_device_id *match;
+ struct acpi_device *adev = ACPI_COMPANION(dev);
- if (!acpi_ids)
+ if (!strcmp(ACPI_DT_NAMESPACE_HID, acpi_device_hid(adev)))
return acpi_of_device_get_match_data(dev);
match = acpi_match_device(acpi_ids, dev);
--
2.43.0
When software issues a Cache Maintenance Operation (CMO) targeting a
dirty cache line, the CPU and DSU cluster may optimize the operation by
combining the CopyBack Write and CMO into a single combined CopyBack
Write plus CMO transaction presented to the interconnect (MCN).
For these combined transactions, the MCN splits the operation into two
separate transactions, one Write and one CMO, and then propagates the
write and optionally the CMO to the downstream memory system or external
Point of Serialization (PoS).
However, the MCN may return an early CompCMO response to the DSU cluster
before the corresponding Write and CMO transactions have completed at
the external PoS or downstream memory. As a result, stale data may be
observed by external observers that are directly connected to the
external PoS or downstream memory.
This erratum affects any system topology in which the following
conditions apply:
- The Point of Serialization (PoS) is located downstream of the
interconnect.
- A downstream observer accesses memory directly, bypassing the
interconnect.
Conditions:
This erratum occurs only when all of the following conditions are met:
1. Software executes a data cache maintenance operation, specifically,
a clean or invalidate by virtual address (DC CVAC, DC CIVAC, or DC
IVAC), that hits on unique dirty data in the CPU or DSU cache. This
results in a combined CopyBack and CMO being issued to the
interconnect.
2. The interconnect splits the combined transaction into separate Write
and CMO transactions and returns an early completion response to the
CPU or DSU before the write has completed at the downstream memory
or PoS.
3. A downstream observer accesses the affected memory address after the
early completion response is issued but before the actual memory
write has completed. This allows the observer to read stale data
that has not yet been updated at the PoS or downstream memory.
The implementation of workaround put a second loop of CMOs at the same
virtual address whose operation meet erratum conditions to wait until
cache data be cleaned to PoC.. This way of implementation mitigates
performance panalty compared to purly duplicate orignial CMO.
Reported-by: kernel test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org # 6.12.x
Signed-off-by: Lucas Wei <lucaswei(a)google.com>
---
Changes in v2:
1. Fixed warning from kernel test robot by changing
arm_si_l1_workaround_4311569 to static
[Reported-by: kernel test robot <lkp(a)intel.com>]
---
Documentation/arch/arm64/silicon-errata.rst | 3 ++
arch/arm64/Kconfig | 19 +++++++++++++
arch/arm64/include/asm/assembler.h | 10 +++++++
arch/arm64/kernel/cpu_errata.c | 31 +++++++++++++++++++++
arch/arm64/mm/cache.S | 13 ++++++++-
arch/arm64/tools/cpucaps | 1 +
6 files changed, 76 insertions(+), 1 deletion(-)
diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst
index a7ec57060f64..98efdf528719 100644
--- a/Documentation/arch/arm64/silicon-errata.rst
+++ b/Documentation/arch/arm64/silicon-errata.rst
@@ -213,6 +213,9 @@ stable kernels.
| ARM | GIC-700 | #2941627 | ARM64_ERRATUM_2941627 |
+----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | SI L1 | #4311569 | ARM64_ERRATUM_4311569 |
++----------------+-----------------+-----------------+-----------------------------+
++----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_843419 |
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 65db12f66b8f..a834d30859cc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1153,6 +1153,25 @@ config ARM64_ERRATUM_3194386
If unsure, say Y.
+config ARM64_ERRATUM_4311569
+ bool "SI L1: 4311569: workaround for premature CMO completion erratum"
+ default y
+ help
+ This option adds the workaround for ARM SI L1 erratum 4311569.
+
+ The erratum of SI L1 can cause an early response to a combined write
+ and cache maintenance operation (WR+CMO) before the operation is fully
+ completed to the Point of Serialization (POS).
+ This can result in a non-I/O coherent agent observing stale data,
+ potentially leading to system instability or incorrect behavior.
+
+ Enabling this option implements a software workaround by inserting a
+ second loop of Cache Maintenance Operation (CMO) immediately following the
+ end of function to do CMOs. This ensures that the data is correctly serialized
+ before the buffer is handed off to a non-coherent agent.
+
+ If unsure, say Y.
+
config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index f0ca7196f6fa..d3d46e5f7188 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -381,6 +381,9 @@ alternative_endif
.macro dcache_by_myline_op op, domain, start, end, linesz, tmp, fixup
sub \tmp, \linesz, #1
bic \start, \start, \tmp
+alternative_if ARM64_WORKAROUND_4311569
+ mov \tmp, \start
+alternative_else_nop_endif
.Ldcache_op\@:
.ifc \op, cvau
__dcache_op_workaround_clean_cache \op, \start
@@ -402,6 +405,13 @@ alternative_endif
add \start, \start, \linesz
cmp \start, \end
b.lo .Ldcache_op\@
+alternative_if ARM64_WORKAROUND_4311569
+ .ifnc \op, cvau
+ mov \start, \tmp
+ mov \tmp, xzr
+ cbnz \start, .Ldcache_op\@
+ .endif
+alternative_else_nop_endif
dsb \domain
_cond_uaccess_extable .Ldcache_op\@, \fixup
diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c
index 8cb3b575a031..5c0ab6bfd44a 100644
--- a/arch/arm64/kernel/cpu_errata.c
+++ b/arch/arm64/kernel/cpu_errata.c
@@ -141,6 +141,30 @@ has_mismatched_cache_type(const struct arm64_cpu_capabilities *entry,
return (ctr_real != sys) && (ctr_raw != sys);
}
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+static DEFINE_STATIC_KEY_FALSE(arm_si_l1_workaround_4311569);
+static int __init early_arm_si_l1_workaround_4311569_cfg(char *arg)
+{
+ static_branch_enable(&arm_si_l1_workaround_4311569);
+ pr_info("Enabling cache maintenance workaround for ARM SI-L1 erratum 4311569\n");
+
+ return 0;
+}
+early_param("arm_si_l1_workaround_4311569", early_arm_si_l1_workaround_4311569_cfg);
+
+/*
+ * We have some earlier use cases to call cache maintenance operation functions, for example,
+ * dcache_inval_poc() and dcache_clean_poc() in head.S, before making decision to turn on this
+ * workaround. Since the scope of this workaround is limited to non-coherent DMA agents, its
+ * safe to have the workaround off by default.
+ */
+static bool
+need_arm_si_l1_workaround_4311569(const struct arm64_cpu_capabilities *entry, int scope)
+{
+ return static_branch_unlikely(&arm_si_l1_workaround_4311569);
+}
+#endif
+
static void
cpu_enable_trap_ctr_access(const struct arm64_cpu_capabilities *cap)
{
@@ -870,6 +894,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = {
ERRATA_MIDR_RANGE_LIST(erratum_spec_ssbs_list),
},
#endif
+#ifdef CONFIG_ARM64_ERRATUM_4311569
+ {
+ .capability = ARM64_WORKAROUND_4311569,
+ .type = ARM64_CPUCAP_SYSTEM_FEATURE,
+ .matches = need_arm_si_l1_workaround_4311569,
+ },
+#endif
#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
{
.desc = "ARM errata 2966298, 3117295",
diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S
index 503567c864fd..ddf0097624ed 100644
--- a/arch/arm64/mm/cache.S
+++ b/arch/arm64/mm/cache.S
@@ -143,9 +143,14 @@ SYM_FUNC_END(dcache_clean_pou)
* - end - kernel end address of region
*/
SYM_FUNC_START(__pi_dcache_inval_poc)
+alternative_if ARM64_WORKAROUND_4311569
+ mov x4, x0
+ mov x5, x1
+ mov x6, #1
+alternative_else_nop_endif
dcache_line_size x2, x3
sub x3, x2, #1
- tst x1, x3 // end cache line aligned?
+again: tst x1, x3 // end cache line aligned?
bic x1, x1, x3
b.eq 1f
dc civac, x1 // clean & invalidate D / U line
@@ -158,6 +163,12 @@ SYM_FUNC_START(__pi_dcache_inval_poc)
3: add x0, x0, x2
cmp x0, x1
b.lo 2b
+alternative_if ARM64_WORKAROUND_4311569
+ mov x0, x4
+ mov x1, x5
+ sub x6, x6, #1
+ cbz x6, again
+alternative_else_nop_endif
dsb sy
ret
SYM_FUNC_END(__pi_dcache_inval_poc)
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 1b32c1232d28..3b18734f9744 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -101,6 +101,7 @@ WORKAROUND_2077057
WORKAROUND_2457168
WORKAROUND_2645198
WORKAROUND_2658417
+WORKAROUND_4311569
WORKAROUND_AMPERE_AC03_CPU_38
WORKAROUND_AMPERE_AC04_CPU_23
WORKAROUND_TRBE_OVERWRITE_FILL_MODE
base-commit: edde060637b92607f3522252c03d64ad06369933
--
2.52.0.358.g0dd7633a29-goog
The size of the buffer is not the same when alloc'd with
dma_alloc_coherent() in he_init_tpdrq() and freed.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Thomas Fourier <fourier.thomas(a)gmail.com>
---
v1->v2:
- change Fixes: tag to before the change from pci-consistent to dma-coherent.
drivers/atm/he.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/atm/he.c b/drivers/atm/he.c
index ad91cc6a34fc..92a041d5387b 100644
--- a/drivers/atm/he.c
+++ b/drivers/atm/he.c
@@ -1587,7 +1587,8 @@ he_stop(struct he_dev *he_dev)
he_dev->tbrq_base, he_dev->tbrq_phys);
if (he_dev->tpdrq_base)
- dma_free_coherent(&he_dev->pci_dev->dev, CONFIG_TBRQ_SIZE * sizeof(struct he_tbrq),
+ dma_free_coherent(&he_dev->pci_dev->dev,
+ CONFIG_TPDRQ_SIZE * sizeof(struct he_tpdrq),
he_dev->tpdrq_base, he_dev->tpdrq_phys);
dma_pool_destroy(he_dev->tpd_pool);
--
2.43.0
When bnxt_init_one() fails during initialization (e.g.,
bnxt_init_int_mode returns -ENODEV), the error path calls
bnxt_free_hwrm_resources() which destroys the DMA pool and sets
bp->hwrm_dma_pool to NULL. Subsequently, bnxt_ptp_clear() is called,
which invokes ptp_clock_unregister().
Since commit a60fc3294a37 ("ptp: rework ptp_clock_unregister() to
disable events"), ptp_clock_unregister() now calls
ptp_disable_all_events(), which in turn invokes the driver's .enable()
callback (bnxt_ptp_enable()) to disable PTP events before completing the
unregistration.
bnxt_ptp_enable() attempts to send HWRM commands via bnxt_ptp_cfg_pin()
and bnxt_ptp_cfg_event(), both of which call hwrm_req_init(). This
function tries to allocate from bp->hwrm_dma_pool, causing a NULL
pointer dereference:
bnxt_en 0000:01:00.0 (unnamed net_device) (uninitialized): bnxt_init_int_mode err: ffffffed
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
Call Trace:
__hwrm_req_init (drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c:72)
bnxt_ptp_enable (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:323 drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:517)
ptp_disable_all_events (drivers/ptp/ptp_chardev.c:66)
ptp_clock_unregister (drivers/ptp/ptp_clock.c:518)
bnxt_ptp_clear (drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c:1134)
bnxt_init_one (drivers/net/ethernet/broadcom/bnxt/bnxt.c:16889)
Lines are against commit f8f9c1f4d0c7 ("Linux 6.19-rc3")
Fix this by clearing and unregistering ptp (bnxt_ptp_clear()) before
freeing HWRM resources.
Suggested-by: Pavan Chebbi <pavan.chebbi(a)broadcom.com>
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Fixes: a60fc3294a37 ("ptp: rework ptp_clock_unregister() to disable events")
Cc: stable(a)vger.kernel.org
---
Changes in v3:
- Moved bp->ptp_cfg to be closer to the kfree(). (Pavan/Jakub)
- Link to v2: https://patch.msgid.link/20260105-bnxt-v2-1-9ac69edef726@debian.org
Changes in v2:
- Instead of checking for HWRM resources in bnxt_ptp_enable(), call it
when HWRM resources are availble (Pavan Chebbi)
- Link to v1: https://patch.msgid.link/20251231-bnxt-v1-1-8f9cde6698b4@debian.org
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d160e54ac121..8419d1eb4035 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -16891,12 +16891,12 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
init_err_pci_clean:
bnxt_hwrm_func_drv_unrgtr(bp);
- bnxt_free_hwrm_resources(bp);
- bnxt_hwmon_uninit(bp);
- bnxt_ethtool_free(bp);
bnxt_ptp_clear(bp);
kfree(bp->ptp_cfg);
bp->ptp_cfg = NULL;
+ bnxt_free_hwrm_resources(bp);
+ bnxt_hwmon_uninit(bp);
+ bnxt_ethtool_free(bp);
kfree(bp->fw_health);
bp->fw_health = NULL;
bnxt_cleanup_pci(bp);
---
base-commit: e146b276a817807b8f4a94b5781bf80c6c00601b
change-id: 20251231-bnxt-c54d317d8bfe
Best regards,
--
Breno Leitao <leitao(a)debian.org>
From: Willem de Bruijn <willemb(a)google.com>
NULL pointer dereference fix.
msg_get_inq is an input field from caller to callee. Don't set it in
the callee, as the caller may not clear it on struct reuse.
This is a kernel-internal variant of msghdr only, and the only user
does reinitialize the field. So this is not critical for that reason.
But it is more robust to avoid the write, and slightly simpler code.
And it fixes a bug, see below.
Callers set msg_get_inq to request the input queue length to be
returned in msg_inq. This is equivalent to but independent from the
SO_INQ request to return that same info as a cmsg (tp->recvmsg_inq).
To reduce branching in the hot path the second also sets the msg_inq.
That is WAI.
This is a fix to commit 4d1442979e4a ("af_unix: don't post cmsg for
SO_INQ unless explicitly asked for"), which fixed the inverse.
Also avoid NULL pointer dereference in unix_stream_read_generic if
state->msg is NULL and msg->msg_get_inq is written. A NULL state->msg
can happen when splicing as of commit 2b514574f7e8 ("net: af_unix:
implement splice for stream af_unix sockets").
Also collapse two branches using a bitwise or.
Cc: stable(a)vger.kernel.org
Fixes: 4d1442979e4a ("af_unix: don't post cmsg for SO_INQ unless explicitly asked for")
Link: https://lore.kernel.org/netdev/willemdebruijn.kernel.24d8030f7a3de@gmail.co…
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
Jens, I dropped your Reviewed-by because of the commit message updates.
But code is unchanged.
changes nn v1 -> net v1
- add Fixes tag and explain reason
- redirect to net
- s/caller/callee in subject line
nn v1: https://lore.kernel.org/netdev/20260105163338.3461512-1-willemdebruijn.kern…
---
net/ipv4/tcp.c | 8 +++-----
net/unix/af_unix.c | 8 +++-----
2 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index f035440c475a..d5319ebe2452 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2652,10 +2652,8 @@ static int tcp_recvmsg_locked(struct sock *sk, struct msghdr *msg, size_t len,
if (sk->sk_state == TCP_LISTEN)
goto out;
- if (tp->recvmsg_inq) {
+ if (tp->recvmsg_inq)
*cmsg_flags = TCP_CMSG_INQ;
- msg->msg_get_inq = 1;
- }
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
/* Urgent data needs to be handled specially. */
@@ -2929,10 +2927,10 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags,
ret = tcp_recvmsg_locked(sk, msg, len, flags, &tss, &cmsg_flags);
release_sock(sk);
- if ((cmsg_flags || msg->msg_get_inq) && ret >= 0) {
+ if ((cmsg_flags | msg->msg_get_inq) && ret >= 0) {
if (cmsg_flags & TCP_CMSG_TS)
tcp_recv_timestamp(msg, sk, &tss);
- if (msg->msg_get_inq) {
+ if ((cmsg_flags & TCP_CMSG_INQ) | msg->msg_get_inq) {
msg->msg_inq = tcp_inq_hint(sk);
if (cmsg_flags & TCP_CMSG_INQ)
put_cmsg(msg, SOL_TCP, TCP_CM_INQ,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index a7ca74653d94..d0511225799b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2904,7 +2904,6 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
unsigned int last_len;
struct unix_sock *u;
int copied = 0;
- bool do_cmsg;
int err = 0;
long timeo;
int target;
@@ -2930,9 +2929,6 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
u = unix_sk(sk);
- do_cmsg = READ_ONCE(u->recvmsg_inq);
- if (do_cmsg)
- msg->msg_get_inq = 1;
redo:
/* Lock the socket to prevent queue disordering
* while sleeps in memcpy_tomsg
@@ -3090,9 +3086,11 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state,
mutex_unlock(&u->iolock);
if (msg) {
+ bool do_cmsg = READ_ONCE(u->recvmsg_inq);
+
scm_recv_unix(sock, msg, &scm, flags);
- if (msg->msg_get_inq && (copied ?: err) >= 0) {
+ if ((do_cmsg | msg->msg_get_inq) && (copied ?: err) >= 0) {
msg->msg_inq = READ_ONCE(u->inq_len);
if (do_cmsg)
put_cmsg(msg, SOL_SOCKET, SCM_INQ,
--
2.52.0.351.gbe84eed79e-goog
From: Sean Christopherson <seanjc(a)google.com>
When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in
response to a guest WRMSR, clear XFD-disabled features in the saved (or to
be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for
features that are disabled via the guest's XFD. Because the kernel
executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1
will cause XRSTOR to #NM and panic the kernel.
E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848
Modules linked in: kvm_intel kvm irqbypass
CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
switch_fpu_return+0x4a/0xb0
kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1,
and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's
call to fpu_update_guest_xfd().
and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE:
------------[ cut here ]------------
WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867
Modules linked in: kvm_intel kvm irqbypass
CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:exc_device_not_available+0x101/0x110
Call Trace:
<TASK>
asm_exc_device_not_available+0x1a/0x20
RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90
fpu_swap_kvm_fpstate+0x6b/0x120
kvm_load_guest_fpu+0x30/0x80 [kvm]
kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm]
kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x62/0x940
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
---[ end trace 0000000000000000 ]---
The new behavior is consistent with the AMX architecture. Per Intel's SDM,
XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD
(and non-compacted XSAVE saves the initial configuration of the state
component):
If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i,
the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1;
instead, it operates as if XINUSE[i] = 0 (and the state component was
in its initial state): it saves bit i of XSTATE_BV field of the XSAVE
header as 0; in addition, XSAVE saves the initial configuration of the
state component (the other instructions do not save state component i).
Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using
a constant XFD based on the set of enabled features when XSAVEing for
a struct fpu_guest. However, having XSTATE_BV[i]=1 for XFD-disabled
features can only happen in the above interrupt case, or in similar
scenarios involving preemption on preemptible kernels, because
fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the
outgoing FPU state with the current XFD; and that is (on all but the
first WRMSR to XFD) the guest XFD.
Therefore, XFD can only go out of sync with XSTATE_BV in the above
interrupt case, or in similar scenarios involving preemption on
preemptible kernels, and it we can consider it (de facto) part of KVM
ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features.
Reported-by: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 820a6ee944e7 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14)
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
[Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate
to kvm_vcpu_ioctl_x86_set_xsave. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kernel/fpu/core.c | 32 +++++++++++++++++++++++++++++---
arch/x86/kvm/x86.c | 9 +++++++++
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index da233f20ae6f..166c380b0161 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -319,10 +319,29 @@ EXPORT_SYMBOL_FOR_KVM(fpu_enable_guest_xfd_features);
#ifdef CONFIG_X86_64
void fpu_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd)
{
+ struct fpstate *fpstate = guest_fpu->fpstate;
+
fpregs_lock();
- guest_fpu->fpstate->xfd = xfd;
- if (guest_fpu->fpstate->in_use)
- xfd_update_state(guest_fpu->fpstate);
+
+ /*
+ * KVM's guest ABI is that setting XFD[i]=1 *can* immediately revert
+ * the save state to initialized. Likewise, KVM_GET_XSAVE does the
+ * same as XSAVE and returns XSTATE_BV[i]=0 whenever XFD[i]=1.
+ *
+ * If the guest's FPU state is in hardware, just update XFD: the XSAVE
+ * in fpu_swap_kvm_fpstate will clear XSTATE_BV[i] whenever XFD[i]=1.
+ *
+ * If however the guest's FPU state is NOT resident in hardware, clear
+ * disabled components in XSTATE_BV now, or a subsequent XRSTOR will
+ * attempt to load disabled components and generate #NM _in the host_.
+ */
+ if (xfd && test_thread_flag(TIF_NEED_FPU_LOAD))
+ fpstate->regs.xsave.header.xfeatures &= ~xfd;
+
+ fpstate->xfd = xfd;
+ if (fpstate->in_use)
+ xfd_update_state(fpstate);
+
fpregs_unlock();
}
EXPORT_SYMBOL_FOR_KVM(fpu_update_guest_xfd);
@@ -430,6 +449,13 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
+ /*
+ * Disabled features must be in their initial state, otherwise XRSTOR
+ * causes an exception.
+ */
+ if (WARN_ON_ONCE(ustate->xsave.header.xfeatures & kstate->xfd))
+ return -EINVAL;
+
/*
* Nullify @vpkru to preserve its current value if PKRU's bit isn't set
* in the header. KVM's odd ABI is to leave PKRU untouched in this
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ff8812f3a129..c0416f53b5f5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5807,9 +5807,18 @@ static int kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu,
static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu,
struct kvm_xsave *guest_xsave)
{
+ union fpregs_state *xstate = (union fpregs_state *)guest_xsave->region;
+
if (fpstate_is_confidential(&vcpu->arch.guest_fpu))
return vcpu->kvm->arch.has_protected_state ? -EINVAL : 0;
+ /*
+ * Do not reject non-initialized disabled features for backwards
+ * compatibility, but clear XSTATE_BV[i] whenever XFD[i]=1.
+ * Otherwise, XRSTOR would cause a #NM.
+ */
+ xstate->xsave.header.xfeatures &= ~vcpu->arch.guest_fpu.fpstate->xfd;
+
return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu,
guest_xsave->region,
kvm_caps.supported_xcr0,
--
2.52.0
Recenly when test uvc gadget function I find some YUYV pixel format
720p and 1080p stream can't output normally. However, small resulution
and MJPEG format stream works fine. The first patch#1 is to fix the issue.
Patch#2 and #3 are small fix or improvement.
For patch#4: it's a workaround for a long-term issue in videobuf2. With
it, many device can work well and not solely based on the SG allocation
method.
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
---
Xu Yang (4):
usb: gadget: uvc: fix req_payload_size calculation
usb: gadget: uvc: fix interval_duration calculation
usb: gadget: uvc: improve error handling in uvcg_video_init()
usb: gadget: uvc: retry vb2_reqbufs() with vb_vmalloc_memops if use_sg fail
drivers/usb/gadget/function/f_uvc.c | 4 ++++
drivers/usb/gadget/function/uvc.h | 3 ++-
drivers/usb/gadget/function/uvc_queue.c | 23 +++++++++++++++++++----
drivers/usb/gadget/function/uvc_video.c | 14 +++++++-------
4 files changed, 32 insertions(+), 12 deletions(-)
---
base-commit: 56a512a9b4107079f68701e7d55da8507eb963d9
change-id: 20260108-uvc-gadget-fix-patch-aa5996332bb5
Best regards,
--
Xu Yang <xu.yang_2(a)nxp.com>