This loop requires explicit calls to of_node_put() upon early exits
(break, goto, return) to decrement the child refcounter and avoid memory
leaks if the child is not required out of the loop.
A more robust solution is using the scoped variant of the macro, which
automatically calls of_node_put() when the child goes out of scope.
Cc: stable(a)vger.kernel.org
Fixes: 979987371739 ("spmi: pmic-arb: Add multi bus support")
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
drivers/spmi/spmi-pmic-arb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/spmi/spmi-pmic-arb.c b/drivers/spmi/spmi-pmic-arb.c
index 9ba9495fcc4b..ea843159b745 100644
--- a/drivers/spmi/spmi-pmic-arb.c
+++ b/drivers/spmi/spmi-pmic-arb.c
@@ -1763,14 +1763,13 @@ static int spmi_pmic_arb_register_buses(struct spmi_pmic_arb *pmic_arb,
{
struct device *dev = &pdev->dev;
struct device_node *node = dev->of_node;
- struct device_node *child;
int ret;
/* legacy mode doesn't provide child node for the bus */
if (of_device_is_compatible(node, "qcom,spmi-pmic-arb"))
return spmi_pmic_arb_bus_init(pdev, node, pmic_arb);
- for_each_available_child_of_node(node, child) {
+ for_each_available_child_of_node_scoped(node, child) {
if (of_node_name_eq(child, "spmi")) {
ret = spmi_pmic_arb_bus_init(pdev, child, pmic_arb);
if (ret)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241001-spmi-pmic-arb-scoped-a4b90179edef
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
When multiple FREE_STATEIDs are sent for the same delegation stateid,
it can lead to a possible either use-after-tree or counter refcount
underflow errors.
In nfsd4_free_stateid() under the client lock we find a delegation
stateid, however the code drops the lock before calling nfs4_put_stid(),
that allows another FREE_STATE to find the stateid again. The first one
will proceed to then free the stateid which leads to either
use-after-free or decrementing already zerod counter.
CC: stable(a)vger.kernel.org
Signed-off-by: Olga Kornievskaia <okorniev(a)redhat.com>
---
fs/nfsd/nfs4state.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ac1859c7cc9d..56b261608af4 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7154,6 +7154,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
switch (s->sc_type) {
case SC_TYPE_DELEG:
if (s->sc_status & SC_STATUS_REVOKED) {
+ s->sc_status |= SC_STATUS_CLOSED;
spin_unlock(&s->sc_lock);
dp = delegstateid(s);
list_del_init(&dp->dl_recall_lru);
--
2.43.5
kthread_create_on_cpu() always requires format string to contain one
'%u' at the end, as it automatically adds the CPU ID when passing it
to kthread_create_on_node(). The former isn't marked as __printf()
as it's not printf-like itself, which effectively hides this from
the compiler.
If you convert this function to printf-like, you'll see the following:
In file included from drivers/firmware/psci/psci_checker.c:15:
drivers/firmware/psci/psci_checker.c: In function 'suspend_tests':
drivers/firmware/psci/psci_checker.c:401:48: warning: too many arguments for format [-Wformat-extra-args]
401 | "psci_suspend_test");
| ^~~~~~~~~~~~~~~~~~~
drivers/firmware/psci/psci_checker.c:400:32: warning: data argument not used by format string [-Wformat-extra-args]
400 | (void *)(long)cpu, cpu,
| ^
401 | "psci_suspend_test");
| ~~~~~~~~~~~~~~~~~~~
Add the missing format literal to fix this. Now the corresponding
kthread will be named as "psci_suspend_test-<cpuid>", as it's meant by
kthread_create_on_cpu().
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202408141012.KhvKaxoh-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202408141243.eQiEOQQe-lkp@intel.com
Fixes: ea8b1c4a6019 ("drivers: psci: PSCI checker module")
Cc: stable(a)vger.kernel.org # 4.10+
Signed-off-by: Alexander Lobakin <aleksander.lobakin(a)intel.com>
---
drivers/firmware/psci/psci_checker.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci_checker.c b/drivers/firmware/psci/psci_checker.c
index 116eb465cdb4..ecc511c745ce 100644
--- a/drivers/firmware/psci/psci_checker.c
+++ b/drivers/firmware/psci/psci_checker.c
@@ -398,7 +398,7 @@ static int suspend_tests(void)
thread = kthread_create_on_cpu(suspend_test_thread,
(void *)(long)cpu, cpu,
- "psci_suspend_test");
+ "psci_suspend_test-%u");
if (IS_ERR(thread))
pr_err("Failed to create kthread on CPU %d\n", cpu);
else
--
2.46.2
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 785bf1ab58aa1f89a5dfcb17b682b7089d69c34f
Gitweb: https://git.kernel.org/tip/785bf1ab58aa1f89a5dfcb17b682b7089d69c34f
Author: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
AuthorDate: Thu, 26 Sep 2024 09:10:31 -07:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Tue, 08 Oct 2024 15:19:21 -07:00
x86/bugs: Use code segment selector for VERW operand
Robert Gill reported below #GP in 32-bit mode when dosemu software was
executing vm86() system call:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
EIP: restore_all_switch_stack+0xbe/0xcf
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
Call Trace:
show_regs+0x70/0x78
die_addr+0x29/0x70
exc_general_protection+0x13c/0x348
exc_bounds+0x98/0x98
handle_exception+0x14d/0x14d
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
This only happens in 32-bit mode when VERW based mitigations like MDS/RFDS
are enabled. This is because segment registers with an arbitrary user value
can result in #GP when executing VERW. Intel SDM vol. 2C documents the
following behavior for VERW instruction:
#GP(0) - If a memory operand effective address is outside the CS, DS, ES,
FS, or GS segment limit.
CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Use %cs selector to reference VERW operand. This ensures VERW will
not #GP for an arbitrary user %ds.
Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
Reported-by: Robert Gill <rtgill82(a)gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com
Cc: stable(a)vger.kernel.org # 5.10+
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.i…
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Suggested-by: Brian Gerst <brgerst(a)gmail.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
arch/x86/include/asm/nospec-branch.h | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index ff5f1ec..96b410b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -323,7 +323,16 @@
* Note: Only the memory operand variant of VERW clears the CPU buffers.
*/
.macro CLEAR_CPU_BUFFERS
- ALTERNATIVE "", __stringify(verw _ASM_RIP(mds_verw_sel)), X86_FEATURE_CLEAR_CPU_BUF
+#ifdef CONFIG_X86_64
+ ALTERNATIVE "", "verw mds_verw_sel(%rip)", X86_FEATURE_CLEAR_CPU_BUF
+#else
+ /*
+ * In 32bit mode, the memory operand must be a %cs reference. The data
+ * segments may not be usable (vm86 mode), and the stack segment may not
+ * be flat (ESPFIX32).
+ */
+ ALTERNATIVE "", "verw %cs:mds_verw_sel", X86_FEATURE_CLEAR_CPU_BUF
+#endif
.endm
#ifdef CONFIG_X86_64
This reverts commit 35781d8356a2eecaa6074ceeb80ee22e252fcdae.
Hibernation is not supported on Qualcomm platforms with mainline
kernels yet a broken vendor implementation for the GENI serial driver
made it upstream.
This is effectively dead code that cannot be tested and should just be
removed, but if these paths were ever hit for an open non-console port
they would crash the machine as the driver would fail to enable clocks
during restore() (i.e. all ports would have to be closed by drivers and
user space before hibernating the system to avoid this as a comment in
the code hinted at).
The broken implementation also added a random call to enable the
receiver in the port setup code where it does not belong and which
enables the receiver prematurely for console ports.
Fixes: 35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
Cc: stable(a)vger.kernel.org # 6.2
Cc: Aniket Randive <quic_arandive(a)quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/tty/serial/qcom_geni_serial.c | 41 ++-------------------------
1 file changed, 2 insertions(+), 39 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index c237c9d107cd..2e4a5361f137 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1170,7 +1170,6 @@ static int qcom_geni_serial_port_setup(struct uart_port *uport)
false, true, true);
geni_se_init(&port->se, UART_RX_WM, port->rx_fifo_depth - 2);
geni_se_select_mode(&port->se, port->dev_data->mode);
- qcom_geni_serial_start_rx(uport);
port->setup = true;
return 0;
@@ -1799,38 +1798,6 @@ static int qcom_geni_serial_sys_resume(struct device *dev)
return ret;
}
-static int qcom_geni_serial_sys_hib_resume(struct device *dev)
-{
- int ret = 0;
- struct uart_port *uport;
- struct qcom_geni_private_data *private_data;
- struct qcom_geni_serial_port *port = dev_get_drvdata(dev);
-
- uport = &port->uport;
- private_data = uport->private_data;
-
- if (uart_console(uport)) {
- geni_icc_set_tag(&port->se, QCOM_ICC_TAG_ALWAYS);
- geni_icc_set_bw(&port->se);
- ret = uart_resume_port(private_data->drv, uport);
- /*
- * For hibernation usecase clients for
- * console UART won't call port setup during restore,
- * hence call port setup for console uart.
- */
- qcom_geni_serial_port_setup(uport);
- } else {
- /*
- * Peripheral register settings are lost during hibernation.
- * Update setup flag such that port setup happens again
- * during next session. Clients of HS-UART will close and
- * open the port during hibernation.
- */
- port->setup = false;
- }
- return ret;
-}
-
static const struct qcom_geni_device_data qcom_geni_console_data = {
.console = true,
.mode = GENI_SE_FIFO,
@@ -1842,12 +1809,8 @@ static const struct qcom_geni_device_data qcom_geni_uart_data = {
};
static const struct dev_pm_ops qcom_geni_serial_pm_ops = {
- .suspend = pm_sleep_ptr(qcom_geni_serial_sys_suspend),
- .resume = pm_sleep_ptr(qcom_geni_serial_sys_resume),
- .freeze = pm_sleep_ptr(qcom_geni_serial_sys_suspend),
- .poweroff = pm_sleep_ptr(qcom_geni_serial_sys_suspend),
- .restore = pm_sleep_ptr(qcom_geni_serial_sys_hib_resume),
- .thaw = pm_sleep_ptr(qcom_geni_serial_sys_hib_resume),
+ SYSTEM_SLEEP_PM_OPS(qcom_geni_serial_sys_suspend,
+ qcom_geni_serial_sys_resume)
};
static const struct of_device_id qcom_geni_serial_match_table[] = {
--
2.45.2
In mremap(), move_page_tables() looks at the type of the PMD entry and the
specified address range to figure out by which method the next chunk of
page table entries should be moved.
At that point, the mmap_lock is held in write mode, but no rmap locks are
held yet. For PMD entries that point to page tables and are fully covered
by the source address range, move_pgt_entry(NORMAL_PMD, ...) is called,
which first takes rmap locks, then does move_normal_pmd().
move_normal_pmd() takes the necessary page table locks at source and
destination, then moves an entire page table from the source to the
destination.
The problem is: The rmap locks, which protect against concurrent page table
removal by retract_page_tables() in the THP code, are only taken after the
PMD entry has been read and it has been decided how to move it.
So we can race as follows (with two processes that have mappings of the
same tmpfs file that is stored on a tmpfs mount with huge=advise); note
that process A accesses page tables through the MM while process B does it
through the file rmap:
process A process B
========= =========
mremap
mremap_to
move_vma
move_page_tables
get_old_pmd
alloc_new_pmd
*** PREEMPT ***
madvise(MADV_COLLAPSE)
do_madvise
madvise_walk_vmas
madvise_vma_behavior
madvise_collapse
hpage_collapse_scan_file
collapse_file
retract_page_tables
i_mmap_lock_read(mapping)
pmdp_collapse_flush
i_mmap_unlock_read(mapping)
move_pgt_entry(NORMAL_PMD, ...)
take_rmap_locks
move_normal_pmd
drop_rmap_locks
When this happens, move_normal_pmd() can end up creating bogus PMD entries
in the line `pmd_populate(mm, new_pmd, pmd_pgtable(pmd))`.
The effect depends on arch-specific and machine-specific details; on x86,
you can end up with physical page 0 mapped as a page table, which is likely
exploitable for user->kernel privilege escalation.
Fix the race by letting process B recheck that the PMD still points to a
page table after the rmap locks have been taken. Otherwise, we bail and let
the caller fall back to the PTE-level copying path, which will then bail
immediately at the pmd_none() check.
Bug reachability: Reaching this bug requires that you can create shmem/file
THP mappings - anonymous THP uses different code that doesn't zap stuff
under rmap locks. File THP is gated on an experimental config flag
(CONFIG_READ_ONLY_THP_FOR_FS), so on normal distro kernels you need shmem
THP to hit this bug. As far as I know, getting shmem THP normally requires
that you can mount your own tmpfs with the right mount flags, which would
require creating your own user+mount namespace; though I don't know if some
distros maybe enable shmem THP by default or something like that.
Bug impact: This issue can likely be used for user->kernel privilege
escalation when it is reachable.
Cc: stable(a)vger.kernel.org
Fixes: 1d65b771bc08 ("mm/khugepaged: retract_page_tables() without mmap or vma lock")
Closes: https://project-zero.issues.chromium.org/371047675
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
---
@David: please confirm we can add your Signed-off-by to this patch after
the Co-developed-by.
(Context: David basically wrote the entire patch except for the commit
message.)
@akpm: This replaces the previous "[PATCH] mm/mremap: Prevent racing
change of old pmd type".
---
mm/mremap.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 24712f8dbb6b..dda09e957a5d 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -238,6 +238,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
{
spinlock_t *old_ptl, *new_ptl;
struct mm_struct *mm = vma->vm_mm;
+ bool res = false;
pmd_t pmd;
if (!arch_supports_page_table_move())
@@ -277,19 +278,25 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr,
if (new_ptl != old_ptl)
spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING);
- /* Clear the pmd */
pmd = *old_pmd;
+
+ /* Racing with collapse? */
+ if (unlikely(!pmd_present(pmd) || pmd_leaf(pmd)))
+ goto out_unlock;
+ /* Clear the pmd */
pmd_clear(old_pmd);
+ res = true;
VM_BUG_ON(!pmd_none(*new_pmd));
pmd_populate(mm, new_pmd, pmd_pgtable(pmd));
flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE);
+out_unlock:
if (new_ptl != old_ptl)
spin_unlock(new_ptl);
spin_unlock(old_ptl);
- return true;
+ return res;
}
#else
static inline bool move_normal_pmd(struct vm_area_struct *vma,
---
base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b
change-id: 20241007-move_normal_pmd-vs-collapse-fix-2-387e9a68c7d6
--
Jann Horn <jannh(a)google.com>
A user reported that commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi
device manage_system_start_stop") introduced a spin down + immediate spin
up of the disk both when entering and when resuming from hibernation.
This behavior was not there before, and causes an increased latency both
when when entering and when resuming from hibernation.
Hibernation is done by three consecutive PM events, in the following order:
1) PM_EVENT_FREEZE
2) PM_EVENT_THAW
3) PM_EVENT_HIBERNATE
Commit aa3998dbeb3a ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") modified ata_eh_handle_port_suspend() to call
ata_dev_power_set_standby() (which spins down the disk), for both event
PM_EVENT_FREEZE and event PM_EVENT_HIBERNATE.
Documentation/driver-api/pm/devices.rst, section "Entering Hibernation",
explicitly mentions that PM_EVENT_FREEZE does not have to be put the device
in a low-power state, and actually recommends not doing so. Thus, let's not
spin down the disk on PM_EVENT_FREEZE. (The disk will instead be spun down
during the subsequent PM_EVENT_HIBERNATE event.)
This way, PM_EVENT_FREEZE will behave as it did before commit aa3998dbeb3a
("ata: libata-scsi: Disable scsi device manage_system_start_stop"), while
PM_EVENT_HIBERNATE will continue to spin down the disk.
This will avoid the superfluous spin down + spin up when entering and
resuming from hibernation, while still making sure that the disk is spun
down before actually entering hibernation.
Cc: stable(a)vger.kernel.org # v6.6+
Fixes: aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
---
Changes since v1:
-Add stable to cc.
drivers/ata/libata-eh.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 3f0144e7dc80..fa41ea57a978 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -4099,10 +4099,20 @@ static void ata_eh_handle_port_suspend(struct ata_port *ap)
WARN_ON(ap->pflags & ATA_PFLAG_SUSPENDED);
- /* Set all devices attached to the port in standby mode */
- ata_for_each_link(link, ap, HOST_FIRST) {
- ata_for_each_dev(dev, link, ENABLED)
- ata_dev_power_set_standby(dev);
+ /*
+ * We will reach this point for all of the PM events:
+ * PM_EVENT_SUSPEND (if runtime pm, PM_EVENT_AUTO will also be set)
+ * PM_EVENT_FREEZE, and PM_EVENT_HIBERNATE.
+ *
+ * We do not want to perform disk spin down for PM_EVENT_FREEZE.
+ * (Spin down will be performed by the subsequent PM_EVENT_HIBERNATE.)
+ */
+ if (!(ap->pm_mesg.event & PM_EVENT_FREEZE)) {
+ /* Set all devices attached to the port in standby mode */
+ ata_for_each_link(link, ap, HOST_FIRST) {
+ ata_for_each_dev(dev, link, ENABLED)
+ ata_dev_power_set_standby(dev);
+ }
}
/*
--
2.46.2