There has been a lingering bug in LoongArch Linux systems causing some
GCC tests to intermittently fail (see Closes link). I've made a minimal
reproducer:
zsh% cat measure.s
.align 4
.globl _start
_start:
movfcsr2gr $a0, $fcsr0
bstrpick.w $a0, $a0, 16, 16
beqz $a0, .ok
break 0
.ok:
li.w $a7, 93
syscall 0
zsh% cc mesaure.s -o measure -nostdlib
zsh% echo $((1.0/3))
0.33333333333333331
zsh% while ./measure; do ; done
This while loop should not stop as POSIX is clear that execve must set
fenv to the default, where FCSR should be zero. But in fact it will
just stop after running for a while (normally less than 30 seconds).
Note that "$((1.0/3))" is needed to reproduce the issue because it
raises FE_INVALID and makes fcsr0 non-zero.
The problem is we are relying on SET_PERSONALITY2 to reset
current->thread.fpu.fcsr. But SET_PERSONALITY2 is executed before
start_thread which calls lose_fpu(0). We can see if kernel preempt is
enabled, we may switch to another thread after SET_PERSONALITY2 but
before lose_fpu(0). Then bad thing happens: during the thread switch
the value of the fcsr0 register is stored into current->thread.fpu.fcsr,
making it dirty again.
The issue can be fixed by setting current->thread.fpu.fcsr after
lose_fpu(0) because lose_fpu clears TIF_USEDFPU, then the thread
switch won't touch current->thread.fpu.fcsr.
The only other architecture setting FCSR in SET_PERSONALITY2 is MIPS.
They do this for supporting different FP flavors (NaN encodings etc).
which do not exist on LoongArch. I'm not sure how MIPS evades the issue
(or maybe it's just buggy too) as I don't have a running MIPS hardware
now.
So for LoongArch, just remove the current->thread.fpu.fcsr setting from
SET_PERSONALITY2 and do it in start_thread, after lose_fpu(0). And we
just set it to 0, instead of boot_cpu_data.fpu_csr0 (because we should
provide the userspace a consistent configuration, no matter how hardware
and firmware behave).
The while loop failing with the mainline kernel has survived one hour
after this change.
Closes: https://github.com/loongson-community/discussions/issues/7
Fixes: 803b0fc5c3f2 ("LoongArch: Add process management")
Cc: stable(a)vger.kernel.org
Signed-off-by: Xi Ruoyao <xry111(a)xry111.site>
---
arch/loongarch/include/asm/elf.h | 5 -----
arch/loongarch/kernel/elf.c | 5 -----
arch/loongarch/kernel/process.c | 1 +
3 files changed, 1 insertion(+), 10 deletions(-)
diff --git a/arch/loongarch/include/asm/elf.h b/arch/loongarch/include/asm/elf.h
index 9b16a3b8e706..f16bd42456e4 100644
--- a/arch/loongarch/include/asm/elf.h
+++ b/arch/loongarch/include/asm/elf.h
@@ -241,8 +241,6 @@ void loongarch_dump_regs64(u64 *uregs, const struct pt_regs *regs);
do { \
current->thread.vdso = &vdso_info; \
\
- loongarch_set_personality_fcsr(state); \
- \
if (personality(current->personality) != PER_LINUX) \
set_personality(PER_LINUX); \
} while (0)
@@ -259,7 +257,6 @@ do { \
clear_thread_flag(TIF_32BIT_ADDR); \
\
current->thread.vdso = &vdso_info; \
- loongarch_set_personality_fcsr(state); \
\
p = personality(current->personality); \
if (p != PER_LINUX32 && p != PER_LINUX) \
@@ -340,6 +337,4 @@ extern int arch_elf_pt_proc(void *ehdr, void *phdr, struct file *elf,
extern int arch_check_elf(void *ehdr, bool has_interpreter, void *interp_ehdr,
struct arch_elf_state *state);
-extern void loongarch_set_personality_fcsr(struct arch_elf_state *state);
-
#endif /* _ASM_ELF_H */
diff --git a/arch/loongarch/kernel/elf.c b/arch/loongarch/kernel/elf.c
index 183e94fc9c69..0fa81ced28dc 100644
--- a/arch/loongarch/kernel/elf.c
+++ b/arch/loongarch/kernel/elf.c
@@ -23,8 +23,3 @@ int arch_check_elf(void *_ehdr, bool has_interpreter, void *_interp_ehdr,
{
return 0;
}
-
-void loongarch_set_personality_fcsr(struct arch_elf_state *state)
-{
- current->thread.fpu.fcsr = boot_cpu_data.fpu_csr0;
-}
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index 767d94cce0de..caed58770650 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -92,6 +92,7 @@ void start_thread(struct pt_regs *regs, unsigned long pc, unsigned long sp)
clear_used_math();
regs->csr_era = pc;
regs->regs[3] = sp;
+ current->thread.fpu.fcsr = 0;
}
void flush_thread(void)
--
2.43.0
VIA VT6306/6307/6308 provides PCI interface compliant to 1394 OHCI. When
the hardware is combined with Asmedia ASM1083/1085 PCIe-to-PCI bus bridge,
it appears that accesses to its 'Isochronous Cycle Timer' register (offset
0xf0 on PCI I/O space) often causes unexpected system reboot in any type
of AMD Ryzen machine (both 0x17 and 0x19 families). It does not appears in
the other type of machine (AMD pre-Ryzen machine, Intel machine, at least),
or in the other OHCI 1394 hardware (e.g. Texas Instruments).
The issue explicitly appears at a commit dcadfd7f7c74 ("firewire: core:
use union for callback of transaction completion") added to v6.5 kernel.
It changed 1394 OHCI driver to access to the register every time to
dispatch local asynchronous transaction. However, the issue exists in
older version of kernel as long as it runs in AMD Ryzen machine, since
the access to the register is required to maintain bus time. It is not
hard to imagine that users experience the unexpected system reboot when
generating bus reset by plugging any devices in, or reading the register
by time-aware application programs; e.g. audio sample processing.
Well, this commit suppresses the system reboot in the combination of
hardware. It avoids the access itself. As a result, the software stack can
not provide the hardware time anymore to unit drivers, userspace
applications, and nodes in the same IEEE 1394 bus. It brings apparent
disadvantage since time-aware application programs require it, while
time-unaware applications are available again; e.g. sbp2.
Cc: stable(a)vger.kernel.org
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1215436
Reported-by: Mario Limonciello <mario.limonciello(a)amd.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217994
Reported-by: Tobias Gruetzmacher <tobias-lists(a)23.gs>
Closes: https://sourceforge.net/p/linux1394/mailman/message/58711901/
Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2240973
Closes: https://bugs.launchpad.net/linux/+bug/2043905
Signed-off-by: Takashi Sakamoto <o-takashi(a)sakamocchi.jp>
---
drivers/firewire/ohci.c | 49 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 49 insertions(+)
diff --git a/drivers/firewire/ohci.c b/drivers/firewire/ohci.c
index 7e88fd489741..62af3fa39a70 100644
--- a/drivers/firewire/ohci.c
+++ b/drivers/firewire/ohci.c
@@ -279,6 +279,8 @@ static char ohci_driver_name[] = KBUILD_MODNAME;
#define QUIRK_TI_SLLZ059 0x20
#define QUIRK_IR_WAKE 0x40
+#define QUIRK_REBOOT_BY_CYCLE_TIMER_READ 0x80000000
+
/* In case of multiple matches in ohci_quirks[], only the first one is used. */
static const struct {
unsigned short vendor, device, revision, flags;
@@ -1724,6 +1726,11 @@ static u32 get_cycle_time(struct fw_ohci *ohci)
s32 diff01, diff12;
int i;
+#if IS_ENABLED(CONFIG_X86)
+ if (ohci->quirks & QUIRK_REBOOT_BY_CYCLE_TIMER_READ)
+ return 0;
+#endif
+
c2 = reg_read(ohci, OHCI1394_IsochronousCycleTimer);
if (ohci->quirks & QUIRK_CYCLE_TIMER) {
@@ -3527,6 +3534,45 @@ static const struct fw_card_driver ohci_driver = {
.stop_iso = ohci_stop_iso,
};
+// On PCI Express Root Complex in any type of AMD Ryzen machine, VIA VT6306/6307/6308 with Asmedia
+// ASM1083/1085 brings an inconvenience that read accesses to 'Isochronous Cycle Timer' register
+// (at offset 0xf0 in PCI I/O space) often causes unexpected system reboot. The mechanism is not
+// clear, since the read access to the other registers is enough safe; e.g. 'Node ID' register,
+// while it is probable due to detection of any type of PCIe error.
+#if IS_ENABLED(CONFIG_X86)
+
+#define PCI_DEVICE_ID_ASMEDIA_ASM108X 0x1080
+
+static bool detect_vt630x_with_asm1083_on_amd_ryzen_machine(const struct pci_dev *pdev,
+ struct fw_ohci *ohci)
+{
+ const struct pci_dev *pcie_to_pci_bridge;
+ const struct cpuinfo_x86 *cinfo = &cpu_data(0);
+
+ // Detect any type of AMD Ryzen machine.
+ if (cinfo->x86_vendor != X86_VENDOR_AMD || cinfo->x86 < 0x17)
+ return false;
+
+ // Detect VIA VT6306/6307/6308.
+ if (pdev->vendor != PCI_VENDOR_ID_VIA)
+ return false;
+ if (pdev->device != PCI_DEVICE_ID_VIA_VT630X)
+ return false;
+
+ // Detect Asmedia ASM1083/1085.
+ pcie_to_pci_bridge = pdev->bus->self;
+ if (pcie_to_pci_bridge->vendor != PCI_VENDOR_ID_ASMEDIA)
+ return false;
+ if (pcie_to_pci_bridge->device != PCI_DEVICE_ID_ASMEDIA_ASM108X)
+ return false;
+
+ return true;
+}
+
+#else
+#define detect_vt630x_with_asm1083_on_amd_ryzen_machine(pdev) false
+#endif
+
#ifdef CONFIG_PPC_PMAC
static void pmac_ohci_on(struct pci_dev *dev)
{
@@ -3630,6 +3676,9 @@ static int pci_probe(struct pci_dev *dev,
if (param_quirks)
ohci->quirks = param_quirks;
+ if (detect_vt630x_with_asm1083_on_amd_ryzen_machine(dev, ohci))
+ ohci->quirks |= QUIRK_REBOOT_BY_CYCLE_TIMER_READ;
+
/*
* Because dma_alloc_coherent() allocates at least one page,
* we save space by using a common buffer for the AR request/
--
2.39.2
Hi,
Memory corruption may occur if the location of the HFI memory buffer is not
restored when resuming from hibernation or suspend-to-memory.
During a normal boot, the kernel allocates a memory buffer and gives it to
the hardware for reporting updates in the HFI table. The same allocation
process is done by a restore kernel when resuming from suspend or
hibernation.
The location of the memory that the restore kernel allocates may differ
from that allocated by the image kernel. To prevent memory corruption (the
hardware keeps using the memory buffer from the restore kernel), it is
necessary to disable HFI before transferring control to the image kernel.
Once running, the image kernel must restore the location of the HFI memory
and enable HFI.
The patchset addresses the described bug on systems with one or more HFI
instances (i.e., packages) using CPU hotplug callbacks and a suspend
notifier.
I tested this patchset on Meteor Lake and Sapphire Rapids. The systems
completed 3500 (in two separate tests of 1500 and 2000 repeats) and
1000 hibernate-resume cycles, respectively. I tested it using Rafael's
testing branch as on 20th December 2023.
Thanks and BR,
Ricardo
Ricardo Neri (4):
thermal: intel: hfi: Refactor enabling code into helper functions
thermal: intel: hfi: Enable an HFI instance from its first online CPU
thermal: intel: hfi: Disable an HFI instance when all its CPUs go
offline
thermal: intel: hfi: Add a suspend notifier
drivers/thermal/intel/intel_hfi.c | 142 ++++++++++++++++++++++++------
1 file changed, 116 insertions(+), 26 deletions(-)
--
2.25.1
This patchset backport ksmbd patches between 6.6 and 6.7-rc5 kernel.
Bug fixes were not applied well due to clean-up and new feautre patches.
To facilitate backport, This patch-set included clean-up patches and
new feature patches of ksmbd for stable 6.6 kernel.
--
2.25.1