This reverts commit 2dc016599cfa9672a147528ca26d70c3654a5423.
Users are reporting regressions in regulatory domain detection and
channel availability.
The problem this was trying to resolve was fixed in firmware anyway:
QCA6174 hw3.0: sdio-4.4.1: add firmware.bin_WLAN.RMH.4.4.1-00042
https://github.com/kvalo/ath10k-firmware/commit/4d382787f0efa77dba40394e0bc…
Link: https://bbs.archlinux.org/viewtopic.php?id=254535
Link: http://lists.infradead.org/pipermail/ath10k/2020-April/014871.html
Link: http://lists.infradead.org/pipermail/ath10k/2020-May/015152.html
Fixes: 2dc016599cfa ("ath: add support for special 0x0 regulatory domain")
Cc: <stable(a)vger.kernel.org>
Cc: Wen Gong <wgong(a)codeaurora.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
drivers/net/wireless/ath/regd.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/wireless/ath/regd.c b/drivers/net/wireless/ath/regd.c
index bee9110b91f3..20f4f8ea9f89 100644
--- a/drivers/net/wireless/ath/regd.c
+++ b/drivers/net/wireless/ath/regd.c
@@ -666,14 +666,14 @@ ath_regd_init_wiphy(struct ath_regulatory *reg,
/*
* Some users have reported their EEPROM programmed with
- * 0x8000 or 0x0 set, this is not a supported regulatory
- * domain but since we have more than one user with it we
- * need a solution for them. We default to 0x64, which is
- * the default Atheros world regulatory domain.
+ * 0x8000 set, this is not a supported regulatory domain
+ * but since we have more than one user with it we need
+ * a solution for them. We default to 0x64, which is the
+ * default Atheros world regulatory domain.
*/
static void ath_regd_sanitize(struct ath_regulatory *reg)
{
- if (reg->current_rd != COUNTRY_ERD_FLAG && reg->current_rd != 0)
+ if (reg->current_rd != COUNTRY_ERD_FLAG)
return;
printk(KERN_DEBUG "ath: EEPROM regdomain sanitized\n");
reg->current_rd = 0x64;
--
2.27.0.rc0.183.gde8f92d652-goog
The print of the MTD partitions during boot are off-by-one for the size.
This patch fixes this issue and shows the real last offset.
Jani Nurminen (1):
mtd: mtdpart: Fix cosmetic print
drivers/mtd/mtdpart.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
base-commit: 072e51356cd5a4a1c12c1020bc054c99b98333df
--
2.37.2
Ampere Altra defines CPU clusters in the ACPI PPTT. They share a Snoop
Control Unit, but have no shared CPU-side last level cache.
cpu_coregroup_mask() will return a cpumask with weight 1, while
cpu_clustergroup_mask() will return a cpumask with weight 2.
As a result, build_sched_domain() will BUG() once per CPU with:
BUG: arch topology borken
the CLS domain not a subset of the MC domain
The MC level cpumask is then extended to that of the CLS child, and is
later removed entirely as redundant. This sched domain topology is an
improvement over previous topologies, or those built without
SCHED_CLUSTER, particularly for certain latency sensitive workloads.
With the current scheduler model and heuristics, this is a desirable
default topology for Ampere Altra and Altra Max system.
Rather than create a custom sched domains topology structure and
introduce new logic in arch/arm64 to detect these systems, update the
core_mask so coregroup is never a subset of clustergroup, extending it
to cluster_siblings if necessary. Only do this if CONFIG_SCHED_CLUSTER
is enabled to avoid also changing the topology (MC) when
CONFIG_SCHED_CLUSTER is disabled.
This has the added benefit over a custom topology of working for both
symmetric and asymmetric topologies. It does not address systems where
the CLUSTER topology is above a populated MC topology, but these are not
considered today and can be addressed separately if and when they
appear.
The final sched domain topology for a 2 socket Ampere Altra system is
unchanged with or without CONFIG_SCHED_CLUSTER, and the BUG is avoided:
For CPU0:
CONFIG_SCHED_CLUSTER=y
CLS [0-1]
DIE [0-79]
NUMA [0-159]
CONFIG_SCHED_CLUSTER is not set
DIE [0-79]
NUMA [0-159]
Signed-off-by: Darren Hart <darren(a)os.amperecomputing.com>
Suggested-by: Barry Song <song.bao.hua(a)hisilicon.com>
Reviewed-by: Barry Song <song.bao.hua(a)hisilicon.com>
Acked-by: Sudeep Holla <sudeep.holla(a)arm.com>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann(a)arm.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Vincent Guittot <vincent.guittot(a)linaro.org>
Cc: D. Scott Phillips <scott(a)os.amperecomputing.com>
Cc: Ilkka Koskinen <ilkka(a)os.amperecomputing.com>
Cc: <stable(a)vger.kernel.org> # 5.16.x
---
v1: Drop MC level if coregroup weight == 1
v2: New sd topo in arch/arm64/kernel/smp.c
v3: No new topo, extend core_mask to cluster_siblings
v4: Rebase on 5.18-rc1 for GregKH to pull. Add IS_ENABLED(CONFIG_SCHED_CLUSTER).
v5: Rebase on 5.18-rc2 for GregKH to pull. Add collected tags. No other changes.
drivers/base/arch_topology.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 1d6636ebaac5..5497c5ab7318 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -667,6 +667,15 @@ const struct cpumask *cpu_coregroup_mask(int cpu)
core_mask = &cpu_topology[cpu].llc_sibling;
}
+ /*
+ * For systems with no shared cpu-side LLC but with clusters defined,
+ * extend core_mask to cluster_siblings. The sched domain builder will
+ * then remove MC as redundant with CLS if SCHED_CLUSTER is enabled.
+ */
+ if (IS_ENABLED(CONFIG_SCHED_CLUSTER) &&
+ cpumask_subset(core_mask, &cpu_topology[cpu].cluster_sibling))
+ core_mask = &cpu_topology[cpu].cluster_sibling;
+
return core_mask;
}
--
2.34.1
Fix an issue with the Tyan Tomcat IV S1564D system, the BIOS of which
does not assign PCI buses beyond #2, where our resource reallocation
code preserves the reset default of an I/O BAR assignment outside its
upstream PCI-to-PCI bridge's I/O forwarding range for device 06:08.0 in
this log:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.0: BAR 4: assigned [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 3 try to assign unassigned res
pci 0000:00:11.0: resource 7 [io 0xe000-0xefff] released
[...]
pci 0000:06:08.1: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [io 0x2000-0x2fff]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0x1000-0x2fff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: resource 4 [io 0x0000-0xffff]
pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffff]
pci_bus 0000:01: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:01: resource 1 [mem 0xd8000000-0xdfffffff]
pci_bus 0000:01: resource 2 [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:02: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:02: resource 1 [mem 0xd8000000-0xd8bfffff]
pci_bus 0000:04: resource 0 [io 0x1000-0x1fff]
pci_bus 0000:04: resource 1 [mem 0xd8600000-0xd8afffff]
pci_bus 0000:05: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:05: resource 1 [mem 0xd8000000-0xd85fffff]
pci_bus 0000:06: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:06: resource 1 [mem 0xd8000000-0xd85fffff]
-- note that the assignment of 0xfce0-0xfcff is outside the range of
0x2000-0x2fff assigned to bus #6:
05:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: d8000000-d85fffff
Capabilities: [50] Power Management version 2
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/4 Enable-
Capabilities: [80] #0d [0000]
Capabilities: [90] Express PCI/PCI-X Bridge IRQ 0
06:08.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at fce0 [size=32]
Capabilities: [80] Power Management version 2
06:08.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at 2000 [size=32]
Capabilities: [80] Power Management version 2
Since both 06:08.0 and 06:08.1 have the same reset defaults the latter
device escapes its fate and gets a good assignment owing to an address
conflict with the former device.
Consequently when the device driver tries to access 06:08.0 according to
its designated address range it pokes at an unassigned I/O location,
likely subtractively decoded by the southbridge and forwarded to ISA,
causing the driver to become confused and bail out:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host controller halted, very bad!
uhci_hcd 0000:06:08.0: HCRESET not completed yet!
uhci_hcd 0000:06:08.0: HC died; cleaning up
if good luck happens or if bad luck does, an infinite flood of messages:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
[...]
making the system virtually unusuable.
This is because we have code to deal with a situation from PR #16263,
where broken ACPI firmware reports the wrong address range for the host
bridge's decoding window and trying to adjust to the window causes more
breakage than leaving the BIOS assignments intact.
This may work for a device directly on the root bus decoded by the host
bridge only, but for a device behind one or more PCI-to-PCI (or CardBus)
bridges those bridges' forwarding windows have been standardised and
need to be respected, or leaving whatever has been there in a downstream
device's BAR will have no effect as cycles for the addresses recorded
there will have no chance to appear on the bus the device has been
immediately attached to.
Make sure then for a device behind a PCI-to-PCI bridge that any firmware
assignment is within the bridge's relevant forwarding window or do not
restore the assignment, fixing the system concerned as follows:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io 0xfce0-0xfcff]
[...]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io 0xfce0-0xfcff]
[...]
pci_bus 0000:00: No. 3 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:06:08.1: BAR 4: assigned [io 0x2020-0x203f]
and making device 06:08.0 work correctly.
Cf. <https://bugzilla.kernel.org/show_bug.cgi?id=16263>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 58c84eda0756 ("PCI: fall back to original BIOS BAR addresses")
Cc: stable(a)vger.kernel.org # v2.6.35+
---
Hi,
Resending this patch as it has gone into void. Patch re-verified against
5.17-rc2.
For the record the system's bus topology is as follows:
-[0000:00]-+-00.0
+-07.0
+-07.1
+-07.2
+-11.0-[0000:01-06]----00.0-[0000:02-06]--+-00.0-[0000:03]--
| +-01.0-[0000:04]--+-00.0
| | \-00.3
| \-02.0-[0000:05-06]----00.0-[0000:06]--+-05.0
| +-08.0
| +-08.1
| \-08.2
+-12.0
+-13.0
\-14.0
Maciej
Changes from v1:
- Do restore firmware BAR assignments behind a PCI-PCI bridge, but only if
within the bridge's forwarding window.
- Update the change description and heading accordingly (was: PCI: Do not
restore firmware BAR assignments behind a PCI-PCI bridge).
---
drivers/pci/setup-res.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
linux-pci-setup-res-fw-address-nobridge.diff
Index: linux-macro/drivers/pci/setup-res.c
===================================================================
--- linux-macro.orig/drivers/pci/setup-res.c
+++ linux-macro/drivers/pci/setup-res.c
@@ -212,9 +212,19 @@ static int pci_revert_fw_address(struct
res->end = res->start + size - 1;
res->flags &= ~IORESOURCE_UNSET;
+ /*
+ * If we're behind a P2P or CardBus bridge, make sure we're
+ * inside the relevant forwarding window, or otherwise the
+ * assignment must have been bogus and accesses intended for
+ * the range assigned would not reach the device anyway.
+ * On the root bus accept anything under the assumption the
+ * host bridge will let it through.
+ */
root = pci_find_parent_resource(dev, res);
if (!root) {
- if (res->flags & IORESOURCE_IO)
+ if (dev->bus->parent)
+ return -ENXIO;
+ else if (res->flags & IORESOURCE_IO)
root = &ioport_resource;
else
root = &iomem_resource;
It is not necessary to go through the process of validation, linking of
queues to mdev and vice versa and filtering the APQNs assigned to the
matrix mdev to build an AP configuration for a guest if an adapter or
domain being assigned is already assigned to the matrix mdev. Likewise, it
is not necessary to proceed through the process the unassignment of an
adapter, domain or control domain if it is not assigned to the matrix mdev.
Since it is not necessary to process assignment of a resource resource
already assigned or process unassignment of a resource that is been assigned,
this patch will bypass all assignment/unassignment operations for an adapter,
domain or control domain under these circumstances.
Not only is assignment of a duplicate adapter or domain unnecessary, it
will also cause a hang situation when removing the matrix mdev to which it is
assigned. The reason is because the same vfio_ap_queue objects with an
APQN containing the APID of the adapter or APQI of the domain being
assigned will get added multiple times to the hashtable that holds them.
This results in the pprev and next pointers of the hlist_node (mdev_qnode
field in the vfio_ap_queue object) pointing to the queue object itself
resulting in an interminable loop when the mdev is removed and the queue
table is iterated to reset the queues.
Cc: stable(a)vger.kernel.org
Fixes: 11cb2419fafe ("s390/vfio-ap: manage link between queue struct and matrix mdev")
Reported-by: Matthew Rosato <mjrosato(a)linux.ibm.com>
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 6c8c41fac4e1..ee82207b4e60 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -984,6 +984,11 @@ static ssize_t assign_adapter_store(struct device *dev,
goto done;
}
+ if (test_bit_inv(apid, matrix_mdev->matrix.apm)) {
+ ret = count;
+ goto done;
+ }
+
set_bit_inv(apid, matrix_mdev->matrix.apm);
ret = vfio_ap_mdev_validate_masks(matrix_mdev);
@@ -1109,6 +1114,11 @@ static ssize_t unassign_adapter_store(struct device *dev,
goto done;
}
+ if (!test_bit_inv(apid, matrix_mdev->matrix.apm)) {
+ ret = count;
+ goto done;
+ }
+
clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
ret = count;
@@ -1183,6 +1193,11 @@ static ssize_t assign_domain_store(struct device *dev,
goto done;
}
+ if (test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ ret = count;
+ goto done;
+ }
+
set_bit_inv(apqi, matrix_mdev->matrix.aqm);
ret = vfio_ap_mdev_validate_masks(matrix_mdev);
@@ -1286,6 +1301,11 @@ static ssize_t unassign_domain_store(struct device *dev,
goto done;
}
+ if (!test_bit_inv(apqi, matrix_mdev->matrix.aqm)) {
+ ret = count;
+ goto done;
+ }
+
clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
ret = count;
@@ -1329,6 +1349,11 @@ static ssize_t assign_control_domain_store(struct device *dev,
goto done;
}
+ if (test_bit_inv(id, matrix_mdev->matrix.adm)) {
+ ret = count;
+ goto done;
+ }
+
/* Set the bit in the ADM (bitmask) corresponding to the AP control
* domain number (id). The bits in the mask, from most significant to
* least significant, correspond to IDs 0 up to the one less than the
@@ -1378,6 +1403,11 @@ static ssize_t unassign_control_domain_store(struct device *dev,
goto done;
}
+ if (!test_bit_inv(domid, matrix_mdev->matrix.adm)) {
+ ret = count;
+ goto done;
+ }
+
clear_bit_inv(domid, matrix_mdev->matrix.adm);
if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
--
2.31.1
From: Kyle Huey <me(a)kylehuey.com>
When management of the PKRU register was moved away from XSTATE, emulation
of PKRU's existence in XSTATE was added for reading PKRU through ptrace,
but not for writing PKRU through ptrace. This can be seen by running gdb
and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected
kernels (5.14+) the write to the PKRU register (which gdb performs through
ptrace) is ignored.
There are three APIs that write PKRU: sigreturn, PTRACE_SETREGSET with
NT_X86_XSTATE, and KVM_SET_XSAVE. sigreturn still uses XRSTOR to write to
PKRU. KVM_SET_XSAVE has its own special handling to make PKRU writes take
effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into
copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE pass in
a pointer to the appropriate PKRU slot. copy_sigframe_from_user_to_xstate
depends on copy_uabi_to_xstate populating the PKRU field in the task's
XSTATE so that __fpu_restore_sig can do a XRSTOR from it, so continue doing
that.
This also adds code to initialize the PKRU value to the hardware init value
(namely 0) if the PKRU bit is not set in the XSTATE header provided to
ptrace, to match XRSTOR.
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
Signed-off-by: Kyle Huey <me(a)kylehuey.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.14+
---
arch/x86/kernel/fpu/core.c | 20 +++++++++-----------
arch/x86/kernel/fpu/regset.c | 2 +-
arch/x86/kernel/fpu/signal.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 25 ++++++++++++++++++++-----
arch/x86/kernel/fpu/xstate.h | 4 ++--
5 files changed, 33 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3b28c5b25e12..c273669e8a00 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
{
struct fpstate *kstate = gfpu->fpstate;
const union fpregs_state *ustate = buf;
- struct pkru_state *xpkru;
- int ret;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
@@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
- ret = copy_uabi_from_kernel_to_xstate(kstate, ustate);
- if (ret)
- return ret;
+ /*
+ * Nullify @vpkru to preserve its current value if PKRU's bit isn't set
+ * in the header. KVM's odd ABI is to leave PKRU untouched in this
+ * case (all other components are eventually re-initialized).
+ * (Not clear that this is actually necessary for compat).
+ */
+ if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU))
+ vpkru = NULL;
- /* Retrieve PKRU if not in init state */
- if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
- xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU);
- *vpkru = xpkru->pkru;
- }
- return 0;
+ return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 75ffaef8c299..6d056b68f4ed 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
}
fpu_force_restore(fpu);
- ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
+ ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru);
out:
vfree(tmpbuf);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 91d4b6de58ab..558076dbde5b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
fpregs = &fpu->fpstate->regs;
if (use_xsave() && !fx_only) {
- if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx))
+ if (copy_sigframe_from_user_to_xstate(tsk, buf_fx))
return false;
} else {
if (__copy_from_user(&fpregs->fxsave, buf_fx,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..8f14981a3936 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
- const void __user *ubuf)
+ const void __user *ubuf, u32 *pkru)
{
struct xregs_state *xsave = &fpstate->regs.xsave;
unsigned int offset, size;
@@ -1246,6 +1246,21 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
}
}
+ /*
+ * Update the user protection key storage. Allow KVM to
+ * pass in a NULL pkru pointer if the mask bit is unset
+ * for its legacy ABI behavior.
+ */
+ if (pkru)
+ *pkru = 0;
+
+ if (hdr.xfeatures & XFEATURE_MASK_PKRU) {
+ struct pkru_state *xpkru;
+
+ xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU);
+ *pkru = xpkru->pkru;
+ }
+
/*
* The state that came in from userspace was user-state only.
* Mask all the user states out of 'xfeatures':
@@ -1264,9 +1279,9 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
* Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
* format and copy to the target thread. Used by ptrace and KVM.
*/
-int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru)
{
- return copy_uabi_to_xstate(fpstate, kbuf, NULL);
+ return copy_uabi_to_xstate(fpstate, kbuf, NULL, pkru);
}
/*
@@ -1274,10 +1289,10 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
* XSAVE[S] format and copy to the target thread. This is called from the
* sigreturn() and rt_sigreturn() system calls.
*/
-int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate,
+int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(fpstate, NULL, ubuf);
+ return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
}
static bool validate_independent_components(u64 mask)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 5ad47031383b..a4ecb04d8d64 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
-extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
-extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
+extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
+extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);
extern void fpu__init_cpu_xstate(void);
--
2.37.2
Changelog since v5:
- Avoids a second copy from the uabi buffer as suggested.
- Preserves old KVM_SET_XSAVE behavior where leaving the PKRU bit in the
XSTATE header results in PKRU remaining unchanged instead of
reinitializing it.
- Fixed up patch metadata as requested.
Changelog since v4:
- Selftest additionally checks PKRU readbacks through ptrace.
- Selftest flips all PKRU bits (except the default key).
Changelog since v3:
- The v3 patch is now part 1 of 2.
- Adds a selftest in part 2 of 2.
Changelog since v2:
- Removed now unused variables in fpu_copy_uabi_to_guest_fpstate
Changelog since v1:
- Handles the error case of copy_to_buffer().