June 2023 - Linux-stable-mirror

[PATCH v2 2/5] ASoC: tegra: Fix ADX byte map

by Sameer Pujar

From: Sheetal <sheetal(a)nvidia.com> Byte mask for channel-1 of stream-1 is not getting enabled and this causes failures during ADX use cases. This happens because the byte map value 0 matches the byte map array and put() callback returns without enabling the corresponding bits in the byte mask. ADX supports 4 output streams and each stream can have a maximum of 16 channels. Each byte in the input frame is uniquely mapped to a byte in one of these 4 outputs. This mapping is done with the help of byte map array via user space control setting. The byte map array size in the driver is 16 and each array element is of size 4 bytes. This corresponds to 64 byte map values. Each byte in the byte map array can have any value between 0 to 255 to enable the corresponding bits in the byte mask. The value 256 is used as a way to disable the byte map. However the byte map array element cannot store this value. The put() callback disables the byte mask for 256 value and byte map value is reset to 0 for this case. This causes problems during subsequent runs since put() callback, for value of 0, just returns without enabling the byte mask. In short, the problem is coming because 0 and 256 control values are stored as 0 in the byte map array. Right now fix the put() callback by actually looking at the byte mask array state to identify if any change is needed and update the fields accordingly. The get() callback needs an update as well to return the correct control value that user has set before. Note that when user set 256, the value is stored as 0 and byte mask is disabled. So byte mask state is used to either return 256 or the value from byte map array. Given above, this looks bit complicated and all this happens because the byte map array is tightly packed and cannot actually store the 256 value. Right now the priority is to fix the existing failure and a TODO item is put to improve this logic. Fixes: 3c97881b8c8a ("ASoC: tegra: Fix kcontrol put callback in ADX") Cc: stable(a)vger.kernel.org Signed-off-by: Sheetal <sheetal(a)nvidia.com> Reviewed-by: Mohan Kumar D <mkumard(a)nvidia.com> Reviewed-by: Sameer Pujar <spujar(a)nvidia.com> --- sound/soc/tegra/tegra210_adx.c | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/sound/soc/tegra/tegra210_adx.c b/sound/soc/tegra/tegra210_adx.c index bd0b10c..7d003f0 100644 --- a/sound/soc/tegra/tegra210_adx.c +++ b/sound/soc/tegra/tegra210_adx.c @@ -2,7 +2,7 @@ // // tegra210_adx.c - Tegra210 ADX driver // -// Copyright (c) 2021 NVIDIA CORPORATION. All rights reserved. +// Copyright (c) 2021-2023 NVIDIA CORPORATION. All rights reserved. #include <linux/clk.h> #include <linux/device.h> @@ -175,10 +175,20 @@ static int tegra210_adx_get_byte_map(struct snd_kcontrol *kcontrol, mc = (struct soc_mixer_control *)kcontrol->private_value; enabled = adx->byte_mask[mc->reg / 32] & (1 << (mc->reg % 32)); + /* + * TODO: Simplify this logic to just return from bytes_map[] + * + * Presently below is required since bytes_map[] is + * tightly packed and cannot store the control value of 256. + * Byte mask state is used to know if 256 needs to be returned. + * Note that for control value of 256, the put() call stores 0 + * in the bytes_map[] and disables the corresponding bit in + * byte_mask[]. + */ if (enabled) ucontrol->value.integer.value[0] = bytes_map[mc->reg]; else - ucontrol->value.integer.value[0] = 0; + ucontrol->value.integer.value[0] = 256; return 0; } @@ -192,19 +202,19 @@ static int tegra210_adx_put_byte_map(struct snd_kcontrol *kcontrol, int value = ucontrol->value.integer.value[0]; struct soc_mixer_control *mc = (struct soc_mixer_control *)kcontrol->private_value; + unsigned int mask_val = adx->byte_mask[mc->reg / 32]; - if (value == bytes_map[mc->reg]) + if (value >= 0 && value <= 255) + mask_val |= (1 << (mc->reg % 32)); + else + mask_val &= ~(1 << (mc->reg % 32)); + + if (mask_val == adx->byte_mask[mc->reg / 32]) return 0; - if (value >= 0 && value <= 255) { - /* update byte map and enable slot */ - bytes_map[mc->reg] = value; - adx->byte_mask[mc->reg / 32] |= (1 << (mc->reg % 32)); - } else { - /* reset byte map and disable slot */ - bytes_map[mc->reg] = 0; - adx->byte_mask[mc->reg / 32] &= ~(1 << (mc->reg % 32)); - } + /* Update byte map and slot */ + bytes_map[mc->reg] = value % 256; + adx->byte_mask[mc->reg / 32] = mask_val; return 1; } -- 2.7.4

2 years, 5 months

1
0
0 0

[PATCH v2 1/5] ASoC: tegra: Fix AMX byte map

by Sameer Pujar

From: Sheetal <sheetal(a)nvidia.com> Byte mask for channel-1 of stream-1 is not getting enabled and this causes failures during AMX use cases. This happens because the byte map value 0 matches the byte map array and put() callback returns without enabling the corresponding bits in the byte mask. AMX supports 4 input streams and each stream can take a maximum of 16 channels. Each byte in the output frame is uniquely mapped to a byte in one of these 4 inputs. This mapping is done with the help of byte map array via user space control setting. The byte map array size in the driver is 16 and each array element is of size 4 bytes. This corresponds to 64 byte map values. Each byte in the byte map array can have any value between 0 to 255 to enable the corresponding bits in the byte mask. The value 256 is used as a way to disable the byte map. However the byte map array element cannot store this value. The put() callback disables the byte mask for 256 value and byte map value is reset to 0 for this case. This causes problems during subsequent runs since put() callback, for value of 0, just returns without enabling the byte mask. In short, the problem is coming because 0 and 256 control values are stored as 0 in the byte map array. Right now fix the put() callback by actually looking at the byte mask array state to identify if any change is needed and update the fields accordingly. The get() callback needs an update as well to return the correct control value that user has set before. Note that when user sets 256, the value is stored as 0 and byte mask is disabled. So byte mask state is used to either return 256 or the value from byte map array. Given above, this looks bit complicated and all this happens because the byte map array is tightly packed and cannot actually store the 256 value. Right now the priority is to fix the existing failure and a TODO item is put to improve this logic. Fixes: 8db78ace1ba8 ("ASoC: tegra: Fix kcontrol put callback in AMX") Cc: stable(a)vger.kernel.org Signed-off-by: Sheetal <sheetal(a)nvidia.com> Reviewed-by: Mohan Kumar D <mkumard(a)nvidia.com> Reviewed-by: Sameer Pujar <spujar(a)nvidia.com> --- sound/soc/tegra/tegra210_amx.c | 40 ++++++++++++++++++++++------------------ 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/sound/soc/tegra/tegra210_amx.c b/sound/soc/tegra/tegra210_amx.c index 782a141..1798769 100644 --- a/sound/soc/tegra/tegra210_amx.c +++ b/sound/soc/tegra/tegra210_amx.c @@ -2,7 +2,7 @@ // // tegra210_amx.c - Tegra210 AMX driver // -// Copyright (c) 2021 NVIDIA CORPORATION. All rights reserved. +// Copyright (c) 2021-2023 NVIDIA CORPORATION. All rights reserved. #include <linux/clk.h> #include <linux/device.h> @@ -203,10 +203,20 @@ static int tegra210_amx_get_byte_map(struct snd_kcontrol *kcontrol, else enabled = amx->byte_mask[0] & (1 << reg); + /* + * TODO: Simplify this logic to just return from bytes_map[] + * + * Presently below is required since bytes_map[] is + * tightly packed and cannot store the control value of 256. + * Byte mask state is used to know if 256 needs to be returned. + * Note that for control value of 256, the put() call stores 0 + * in the bytes_map[] and disables the corresponding bit in + * byte_mask[]. + */ if (enabled) ucontrol->value.integer.value[0] = bytes_map[reg]; else - ucontrol->value.integer.value[0] = 0; + ucontrol->value.integer.value[0] = 256; return 0; } @@ -221,25 +231,19 @@ static int tegra210_amx_put_byte_map(struct snd_kcontrol *kcontrol, unsigned char *bytes_map = (unsigned char *)&amx->map; int reg = mc->reg; int value = ucontrol->value.integer.value[0]; + unsigned int mask_val = amx->byte_mask[reg / 32]; - if (value == bytes_map[reg]) + if (value >= 0 && value <= 255) + mask_val |= (1 << (reg % 32)); + else + mask_val &= ~(1 << (reg % 32)); + + if (mask_val == amx->byte_mask[reg / 32]) return 0; - if (value >= 0 && value <= 255) { - /* Update byte map and enable slot */ - bytes_map[reg] = value; - if (reg > 31) - amx->byte_mask[1] |= (1 << (reg - 32)); - else - amx->byte_mask[0] |= (1 << reg); - } else { - /* Reset byte map and disable slot */ - bytes_map[reg] = 0; - if (reg > 31) - amx->byte_mask[1] &= ~(1 << (reg - 32)); - else - amx->byte_mask[0] &= ~(1 << reg); - } + /* Update byte map and slot */ + bytes_map[reg] = value % 256; + amx->byte_mask[reg / 32] = mask_val; return 1; } -- 2.7.4

2 years, 5 months

1
0
0 0

[PATCH v4] block: add check that partition length needs to be aligned with block size

by Min Li

Before calling add partition or resize partition, there is no check on whether the length is aligned with the logical block size. If the logical block size of the disk is larger than 512 bytes, then the partition size maybe not the multiple of the logical block size, and when the last sector is read, bio_truncate() will adjust the bio size, resulting in an IO error if the size of the read command is smaller than the logical block size.If integrity data is supported, this will also result in a null pointer dereference when calling bio_integrity_free. Cc: stable(a)vger.kernel.org Signed-off-by: Min Li <min15.li(a)samsung.com> --- Changes from v1: - Add a space after /* and before */. - Move length alignment check before the "start = p.start >> SECTOR_SHIFT" - Move check for p.start being aligned together with this length alignment check. Changes from v2: - Add the assignment on the first line and merge the two lines into one. Changes from v3: - Change the blksz to unsigned int. - Add check if p.start and p.length are negative. --- block/ioctl.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/block/ioctl.c b/block/ioctl.c index 3be11941fb2d..a8061c2fcae0 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -16,9 +16,10 @@ static int blkpg_do_ioctl(struct block_device *bdev, struct blkpg_partition __user *upart, int op) { + unsigned int blksz = bdev_logical_block_size(bdev); struct gendisk *disk = bdev->bd_disk; struct blkpg_partition p; - long long start, length; + sector_t start, length; if (!capable(CAP_SYS_ADMIN)) return -EACCES; @@ -33,14 +34,17 @@ static int blkpg_do_ioctl(struct block_device *bdev, if (op == BLKPG_DEL_PARTITION) return bdev_del_partition(disk, p.pno); + if (p.start < 0 || p.length <= 0 || p.start + p.length < 0) + return -EINVAL; + /* Check that the partition is aligned to the block size */ + if (!IS_ALIGNED(p.start | p.length, blksz)) + return -EINVAL; + start = p.start >> SECTOR_SHIFT; length = p.length >> SECTOR_SHIFT; switch (op) { case BLKPG_ADD_PARTITION: - /* check if partition is aligned to blocksize */ - if (p.start & (bdev_logical_block_size(bdev) - 1)) - return -EINVAL; return bdev_add_partition(disk, p.pno, start, length); case BLKPG_RESIZE_PARTITION: return bdev_resize_partition(disk, p.pno, start, length); -- 2.34.1

2 years, 5 months

3
2
0 0

[PATCH v2] ceph: don't let check_caps skip sending responses for revoke msgs

by xiubli＠redhat.com

From: Xiubo Li <xiubli(a)redhat.com> If a client sends out a cap-update request with the old 'seq' just before a pending cap revoke request, then the MDS might miscalculate the 'seqs' and caps. It's therefore always a good idea to ack the cap revoke request with the bumped up 'seq'. Cc: stable(a)vger.kernel.org Cc: Patrick Donnelly <pdonnell(a)redhat.com> URL: https://tracker.ceph.com/issues/61782 Signed-off-by: Xiubo Li <xiubli(a)redhat.com> --- V2: - Rephrased the commit comment for better understanding from Milind fs/ceph/caps.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index 1052885025b3..eee2fbca3430 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -3737,6 +3737,15 @@ static void handle_cap_grant(struct inode *inode, } BUG_ON(cap->issued & ~cap->implemented); + /* don't let check_caps skip sending a response to MDS for revoke msgs */ + if (le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) { + cap->mds_wanted = 0; + if (cap == ci->i_auth_cap) + check_caps = 1; /* check auth cap only */ + else + check_caps = 2; /* check all caps */ + } + if (extra_info->inline_version > 0 && extra_info->inline_version >= ci->i_inline_version) { ci->i_inline_version = extra_info->inline_version; -- 2.40.1

2 years, 5 months

4
3
0 0

[PATCH 1/2] mpt3sas: Perform additional retries if Doorbell read returns 0

by Ranjan Kumar

Doorbell and Host diagnostic registers could return 0 even after 3 retries and that leads to occasional resets of the controllers, hence increased the retry count to thirty. Fixes: b899202901a8 ("mpt3sas: Add separate function for aero doorbell reads ") Cc: stable(a)vger.kernel.org Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com> --- drivers/scsi/mpt3sas/mpt3sas_base.c | 50 ++++++++++++++++------------- drivers/scsi/mpt3sas/mpt3sas_base.h | 4 ++- 2 files changed, 31 insertions(+), 23 deletions(-) diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index 53f5492579cb..44e7ccb6f780 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -201,20 +201,20 @@ module_param_call(mpt3sas_fwfault_debug, _scsih_set_fwfault_debug, * while reading the system interface register. */ static inline u32 -_base_readl_aero(const volatile void __iomem *addr) +_base_readl_aero(const volatile void __iomem *addr, u8 retry_count) { u32 i = 0, ret_val; do { ret_val = readl(addr); i++; - } while (ret_val == 0 && i < 3); + } while (ret_val == 0 && i < retry_count); return ret_val; } static inline u32 -_base_readl(const volatile void __iomem *addr) +_base_readl(const volatile void __iomem *addr, u8 retry_count) { return readl(addr); } @@ -940,7 +940,7 @@ mpt3sas_halt_firmware(struct MPT3SAS_ADAPTER *ioc) dump_stack(); - doorbell = ioc->base_readl(&ioc->chip->Doorbell); + doorbell = ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY); if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) { mpt3sas_print_fault_code(ioc, doorbell & MPI2_DOORBELL_DATA_MASK); @@ -1617,10 +1617,10 @@ mpt3sas_base_mask_interrupts(struct MPT3SAS_ADAPTER *ioc) u32 him_register; ioc->mask_interrupts = 1; - him_register = ioc->base_readl(&ioc->chip->HostInterruptMask); + him_register = ioc->base_readl(&ioc->chip->HostInterruptMask, READL_RETRY_COUNT_OF_THREE); him_register |= MPI2_HIM_DIM + MPI2_HIM_RIM + MPI2_HIM_RESET_IRQ_MASK; writel(him_register, &ioc->chip->HostInterruptMask); - ioc->base_readl(&ioc->chip->HostInterruptMask); + ioc->base_readl(&ioc->chip->HostInterruptMask, READL_RETRY_COUNT_OF_THREE); } /** @@ -1634,7 +1634,7 @@ mpt3sas_base_unmask_interrupts(struct MPT3SAS_ADAPTER *ioc) { u32 him_register; - him_register = ioc->base_readl(&ioc->chip->HostInterruptMask); + him_register = ioc->base_readl(&ioc->chip->HostInterruptMask, READL_RETRY_COUNT_OF_THREE); him_register &= ~MPI2_HIM_RIM; writel(him_register, &ioc->chip->HostInterruptMask); ioc->mask_interrupts = 0; @@ -6686,7 +6686,7 @@ mpt3sas_base_get_iocstate(struct MPT3SAS_ADAPTER *ioc, int cooked) { u32 s, sc; - s = ioc->base_readl(&ioc->chip->Doorbell); + s = ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY); sc = s & MPI2_IOC_STATE_MASK; return cooked ? sc : s; } @@ -6760,7 +6760,8 @@ _base_wait_for_doorbell_int(struct MPT3SAS_ADAPTER *ioc, int timeout) count = 0; cntdn = 1000 * timeout; do { - int_status = ioc->base_readl(&ioc->chip->HostInterruptStatus); + int_status = ioc->base_readl(&ioc->chip->HostInterruptStatus, + READL_RETRY_COUNT_OF_THREE); if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { dhsprintk(ioc, ioc_info(ioc, "%s: successful count(%d), timeout(%d)\n", @@ -6786,7 +6787,8 @@ _base_spin_on_doorbell_int(struct MPT3SAS_ADAPTER *ioc, int timeout) count = 0; cntdn = 2000 * timeout; do { - int_status = ioc->base_readl(&ioc->chip->HostInterruptStatus); + int_status = ioc->base_readl(&ioc->chip->HostInterruptStatus, + READL_RETRY_COUNT_OF_THREE); if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { dhsprintk(ioc, ioc_info(ioc, "%s: successful count(%d), timeout(%d)\n", @@ -6824,14 +6826,16 @@ _base_wait_for_doorbell_ack(struct MPT3SAS_ADAPTER *ioc, int timeout) count = 0; cntdn = 1000 * timeout; do { - int_status = ioc->base_readl(&ioc->chip->HostInterruptStatus); + int_status = ioc->base_readl(&ioc->chip->HostInterruptStatus, + READL_RETRY_COUNT_OF_THREE); if (!(int_status & MPI2_HIS_SYS2IOC_DB_STATUS)) { dhsprintk(ioc, ioc_info(ioc, "%s: successful count(%d), timeout(%d)\n", __func__, count, timeout)); return 0; } else if (int_status & MPI2_HIS_IOC2SYS_DB_STATUS) { - doorbell = ioc->base_readl(&ioc->chip->Doorbell); + doorbell = ioc->base_readl(&ioc->chip->Doorbell, + READL_RETRY_COUNT_OF_THIRTY); if ((doorbell & MPI2_IOC_STATE_MASK) == MPI2_IOC_STATE_FAULT) { mpt3sas_print_fault_code(ioc, doorbell); @@ -6871,7 +6875,7 @@ _base_wait_for_doorbell_not_used(struct MPT3SAS_ADAPTER *ioc, int timeout) count = 0; cntdn = 1000 * timeout; do { - doorbell_reg = ioc->base_readl(&ioc->chip->Doorbell); + doorbell_reg = ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY); if (!(doorbell_reg & MPI2_DOORBELL_USED)) { dhsprintk(ioc, ioc_info(ioc, "%s: successful count(%d), timeout(%d)\n", @@ -7019,13 +7023,13 @@ _base_handshake_req_reply_wait(struct MPT3SAS_ADAPTER *ioc, int request_bytes, __le32 *mfp; /* make sure doorbell is not in use */ - if ((ioc->base_readl(&ioc->chip->Doorbell) & MPI2_DOORBELL_USED)) { + if ((ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY) & MPI2_DOORBELL_USED)) { ioc_err(ioc, "doorbell is in use (line=%d)\n", __LINE__); return -EFAULT; } /* clear pending doorbell interrupts from previous state changes */ - if (ioc->base_readl(&ioc->chip->HostInterruptStatus) & + if (ioc->base_readl(&ioc->chip->HostInterruptStatus, READL_RETRY_COUNT_OF_THREE) & MPI2_HIS_IOC2SYS_DB_STATUS) writel(0, &ioc->chip->HostInterruptStatus); @@ -7068,7 +7072,7 @@ _base_handshake_req_reply_wait(struct MPT3SAS_ADAPTER *ioc, int request_bytes, } /* read the first two 16-bits, it gives the total length of the reply */ - reply[0] = le16_to_cpu(ioc->base_readl(&ioc->chip->Doorbell) + reply[0] = le16_to_cpu(ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY) & MPI2_DOORBELL_DATA_MASK); writel(0, &ioc->chip->HostInterruptStatus); if ((_base_wait_for_doorbell_int(ioc, 5))) { @@ -7076,7 +7080,7 @@ _base_handshake_req_reply_wait(struct MPT3SAS_ADAPTER *ioc, int request_bytes, __LINE__); return -EFAULT; } - reply[1] = le16_to_cpu(ioc->base_readl(&ioc->chip->Doorbell) + reply[1] = le16_to_cpu(ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY) & MPI2_DOORBELL_DATA_MASK); writel(0, &ioc->chip->HostInterruptStatus); @@ -7087,10 +7091,10 @@ _base_handshake_req_reply_wait(struct MPT3SAS_ADAPTER *ioc, int request_bytes, return -EFAULT; } if (i >= reply_bytes/2) /* overflow case */ - ioc->base_readl(&ioc->chip->Doorbell); + ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY); else reply[i] = le16_to_cpu( - ioc->base_readl(&ioc->chip->Doorbell) + ioc->base_readl(&ioc->chip->Doorbell, READL_RETRY_COUNT_OF_THIRTY) & MPI2_DOORBELL_DATA_MASK); writel(0, &ioc->chip->HostInterruptStatus); } @@ -7949,14 +7953,15 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc) goto out; } - host_diagnostic = ioc->base_readl(&ioc->chip->HostDiagnostic); + host_diagnostic = ioc->base_readl(&ioc->chip->HostDiagnostic, + READL_RETRY_COUNT_OF_THIRTY); drsprintk(ioc, ioc_info(ioc, "wrote magic sequence: count(%d), host_diagnostic(0x%08x)\n", count, host_diagnostic)); } while ((host_diagnostic & MPI2_DIAG_DIAG_WRITE_ENABLE) == 0); - hcb_size = ioc->base_readl(&ioc->chip->HCBSize); + hcb_size = ioc->base_readl(&ioc->chip->HCBSize, READL_RETRY_COUNT_OF_THREE); drsprintk(ioc, ioc_info(ioc, "diag reset: issued\n")); writel(host_diagnostic | MPI2_DIAG_RESET_ADAPTER, @@ -7969,7 +7974,8 @@ _base_diag_reset(struct MPT3SAS_ADAPTER *ioc) for (count = 0; count < (300000000 / MPI2_HARD_RESET_PCIE_SECOND_READ_DELAY_MICRO_SEC); count++) { - host_diagnostic = ioc->base_readl(&ioc->chip->HostDiagnostic); + host_diagnostic = ioc->base_readl(&ioc->chip->HostDiagnostic, + READL_RETRY_COUNT_OF_THIRTY); if (host_diagnostic == 0xFFFFFFFF) { ioc_info(ioc, diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.h b/drivers/scsi/mpt3sas/mpt3sas_base.h index 05364aa15ecd..3b8ec4fd2d21 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -160,6 +160,8 @@ #define IOC_OPERATIONAL_WAIT_COUNT 10 +#define READL_RETRY_COUNT_OF_THIRTY 30 +#define READL_RETRY_COUNT_OF_THREE 3 /* * NVMe defines */ @@ -994,7 +996,7 @@ typedef void (*NVME_BUILD_PRP)(struct MPT3SAS_ADAPTER *ioc, u16 smid, typedef void (*PUT_SMID_IO_FP_HIP) (struct MPT3SAS_ADAPTER *ioc, u16 smid, u16 funcdep); typedef void (*PUT_SMID_DEFAULT) (struct MPT3SAS_ADAPTER *ioc, u16 smid); -typedef u32 (*BASE_READ_REG) (const volatile void __iomem *addr); +typedef u32 (*BASE_READ_REG) (const volatile void __iomem *addr, u8 retry_count); /* * To get high iops reply queue's msix index when high iops mode is enabled * else get the msix index of general reply queues. -- 2.31.1

2 years, 5 months

2
1
0 0

[PATCH 6.1.y] perf symbols: Symbol lookup with kcore can fail if multiple segments match stext

by Krister Johansen

commit 1c249565426e3a9940102c0ba9f63914f7cda73d upstream. This problem was encountered on an arm64 system with a lot of memory. Without kernel debug symbols installed, and with both kcore and kallsyms available, perf managed to get confused and returned "unknown" for all of the kernel symbols that it tried to look up. On this system, stext fell within the vmalloc segment. The kcore symbol matching code tries to find the first segment that contains stext and uses that to replace the segment generated from just the kallsyms information. In this case, however, there were two: a very large vmalloc segment, and the text segment. This caused perf to get confused because multiple overlapping segments were inserted into the RB tree that holds the discovered segments. However, that alone wasn't sufficient to cause the problem. Even when we could find the segment, the offsets were adjusted in such a way that the newly generated symbols didn't line up with the instruction addresses in the trace. The most obvious solution would be to consult which segment type is text from kcore, but this information is not exposed to users. Instead, select the smallest matching segment that contains stext instead of the first matching segment. This allows us to match the text segment instead of vmalloc, if one is contained within the other. Reviewed-by: Adrian Hunter <adrian.hunter(a)intel.com> Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com> Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com> Cc: David Reaver <me(a)davidreaver.com> Cc: Ian Rogers <irogers(a)google.com> Cc: Jiri Olsa <jolsa(a)kernel.org> Cc: Mark Rutland <mark.rutland(a)arm.com> Cc: Michael Petlan <mpetlan(a)redhat.com> Cc: Namhyung Kim <namhyung(a)kernel.org> Cc: Peter Zijlstra <peterz(a)infradead.org> Link: http://lore.kernel.org/lkml/20230125183418.GD1963@templeofstupid.com Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com> Signed-off-by: Krister Johansen <kjlx(a)templeofstupid.com> --- tools/perf/util/symbol.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index a3a165ae933a..98014f937568 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1368,10 +1368,23 @@ static int dso__load_kcore(struct dso *dso, struct map *map, /* Find the kernel map using the '_stext' symbol */ if (!kallsyms__get_function_start(kallsyms_filename, "_stext", &stext)) { + u64 replacement_size = 0; + list_for_each_entry(new_map, &md.maps, node) { - if (stext >= new_map->start && stext < new_map->end) { + u64 new_size = new_map->end - new_map->start; + + if (!(stext >= new_map->start && stext < new_map->end)) + continue; + + /* + * On some architectures, ARM64 for example, the kernel + * text can get allocated inside of the vmalloc segment. + * Select the smallest matching segment, in case stext + * falls within more than one in the list. + */ + if (!replacement_map || new_size < replacement_size) { replacement_map = new_map; - break; + replacement_size = new_size; } } } -- 2.25.1

2 years, 5 months

1
0
0 0

[PATCH 3/3] drm/nouveau/disp: verify mode on atomic_check

by Karol Herbst

We have to verify the selected mode as Userspace might request one which we can't configure the GPU for. X with the modesetting DDX is adding a bunch of modes, some so far outside of hardware limits that things simply break. Sadly we can't fix X this way as on start it sticks to one mode and ignores any error and there is really nothing we can do about this, but at least this way we won't let the GPU run into any errors caused by a non supported display mode. However this does prevent X from switching to such a mode, which to be fair is an improvement as well. Seen on one of my Tesla GPUs with a connected 4K display. Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/199 Cc: Ben Skeggs <bskeggs(a)redhat.com> Cc: Lyude Paul <lyude(a)redhat.com> Cc: stable(a)vger.kernel.org # v6.1+ Signed-off-by: Karol Herbst <kherbst(a)redhat.com> --- drivers/gpu/drm/nouveau/nouveau_connector.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c index 22c42a5e184d..edf490c1490c 100644 --- a/drivers/gpu/drm/nouveau/nouveau_connector.c +++ b/drivers/gpu/drm/nouveau/nouveau_connector.c @@ -1114,6 +1114,25 @@ nouveau_connector_atomic_check(struct drm_connector *connector, struct drm_atomi struct drm_connector_state *conn_state = drm_atomic_get_new_connector_state(state, connector); + /* As we can get any random mode from Userspace, we have to make sure the to be set mode + * is valid and does not violate hardware constraints as we rely on it being sane. + */ + if (conn_state->crtc) { + struct drm_crtc_state *crtc_state = + drm_atomic_get_crtc_state(state, conn_state->crtc); + + if (IS_ERR(crtc_state)) + return PTR_ERR(crtc_state); + + if (crtc_state->enable && (crtc_state->mode_changed || + crtc_state->connectors_changed)) { + struct drm_display_mode *mode = &crtc_state->mode; + + if (connector->helper_private->mode_valid(connector, mode) != MODE_OK) + return -EINVAL; + } + } + if (!nv_conn->dp_encoder || !nv50_has_mst(nouveau_drm(connector->dev))) return 0; -- 2.41.0

2 years, 5 months

1
0
0 0

FAILED: patch "[PATCH] x86/smp: Cure kexec() vs. mwait_play_dead() breakage" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x d7893093a7417527c0d73c9832244e65c9d0114f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062812-bloated-equal-cc64@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d7893093a7417527c0d73c9832244e65c9d0114f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner <tglx(a)linutronix.de> Date: Thu, 15 Jun 2023 22:33:57 +0200 Subject: [PATCH] x86/smp: Cure kexec() vs. mwait_play_dead() breakage TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj(a)intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Ashok Raj <ashok.raj(a)intel.com> Reviewed-by: Ashok Raj <ashok.raj(a)intel.com> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 4e91054c84be..d4ce5cb5c953 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); void cond_wakeup_cpu0(void); +void smp_kick_mwait_play_dead(void); + void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index d842875f986f..174d6232b87f 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -21,6 +21,7 @@ #include <linux/interrupt.h> #include <linux/cpu.h> #include <linux/gfp.h> +#include <linux/kexec.h> #include <asm/mtrr.h> #include <asm/tlbflush.h> @@ -157,6 +158,10 @@ static void native_stop_other_cpus(int wait) if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1) return; + /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */ + if (kexec_in_progress) + smp_kick_mwait_play_dead(); + /* * 1) Send an IPI on the reboot vector to all other CPUs. * diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c5ac5d74cdd4..483df0427678 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include <linux/tboot.h> #include <linux/gfp.h> #include <linux/cpuidle.h> +#include <linux/kexec.h> #include <linux/numa.h> #include <linux/pgtable.h> #include <linux/overflow.h> @@ -106,6 +107,9 @@ struct mwait_cpu_dead { unsigned int status; }; +#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF +#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD + /* * Cache line aligned data for mwait_play_dead(). Separate on purpose so * that it's unlikely to be touched by other CPUs. @@ -173,6 +177,10 @@ static void smp_callin(void) { int cpuid; + /* Mop up eventual mwait_play_dead() wreckage */ + this_cpu_write(mwait_cpu_dead.status, 0); + this_cpu_write(mwait_cpu_dead.control, 0); + /* * If waken up by an INIT in an 82489DX configuration * cpu_callout_mask guarantees we don't get here before @@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* Set up state for the kexec() hack below */ + md->status = CPUDEAD_MWAIT_WAIT; + md->control = CPUDEAD_MWAIT_WAIT; + wbinvd(); while (1) { @@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void) mb(); __mwait(eax, 0); + if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) { + /* + * Kexec is about to happen. Don't go back into mwait() as + * the kexec kernel might overwrite text and data including + * page tables and stack. So mwait() would resume when the + * monitor cache line is written to and then the CPU goes + * south due to overwritten text, page tables and stack. + * + * Note: This does _NOT_ protect against a stray MCE, NMI, + * SMI. They will resume execution at the instruction + * following the HLT instruction and run into the problem + * which this is trying to prevent. + */ + WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT); + while(1) + native_halt(); + } + cond_wakeup_cpu0(); } } +/* + * Kick all "offline" CPUs out of mwait on kexec(). See comment in + * mwait_play_dead(). + */ +void smp_kick_mwait_play_dead(void) +{ + u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT; + struct mwait_cpu_dead *md; + unsigned int cpu, i; + + for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) { + md = per_cpu_ptr(&mwait_cpu_dead, cpu); + + /* Does it sit in mwait_play_dead() ? */ + if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT) + continue; + + /* Wait up to 5ms */ + for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { + /* Bring it out of mwait */ + WRITE_ONCE(md->control, newstate); + udelay(5); + } + + if (READ_ONCE(md->status) != newstate) + pr_err_once("CPU%u is stuck in mwait_play_dead()\n", cpu); + } +} + void __noreturn hlt_play_dead(void) { if (__this_cpu_read(cpu_info.x86) >= 4)

2 years, 5 months

1
0
0 0

FAILED: patch "[PATCH] x86/smp: Cure kexec() vs. mwait_play_dead() breakage" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x d7893093a7417527c0d73c9832244e65c9d0114f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062811-ambush-finishing-abd6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d7893093a7417527c0d73c9832244e65c9d0114f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner <tglx(a)linutronix.de> Date: Thu, 15 Jun 2023 22:33:57 +0200 Subject: [PATCH] x86/smp: Cure kexec() vs. mwait_play_dead() breakage TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj(a)intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Ashok Raj <ashok.raj(a)intel.com> Reviewed-by: Ashok Raj <ashok.raj(a)intel.com> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 4e91054c84be..d4ce5cb5c953 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); void cond_wakeup_cpu0(void); +void smp_kick_mwait_play_dead(void); + void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index d842875f986f..174d6232b87f 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -21,6 +21,7 @@ #include <linux/interrupt.h> #include <linux/cpu.h> #include <linux/gfp.h> +#include <linux/kexec.h> #include <asm/mtrr.h> #include <asm/tlbflush.h> @@ -157,6 +158,10 @@ static void native_stop_other_cpus(int wait) if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1) return; + /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */ + if (kexec_in_progress) + smp_kick_mwait_play_dead(); + /* * 1) Send an IPI on the reboot vector to all other CPUs. * diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c5ac5d74cdd4..483df0427678 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include <linux/tboot.h> #include <linux/gfp.h> #include <linux/cpuidle.h> +#include <linux/kexec.h> #include <linux/numa.h> #include <linux/pgtable.h> #include <linux/overflow.h> @@ -106,6 +107,9 @@ struct mwait_cpu_dead { unsigned int status; }; +#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF +#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD + /* * Cache line aligned data for mwait_play_dead(). Separate on purpose so * that it's unlikely to be touched by other CPUs. @@ -173,6 +177,10 @@ static void smp_callin(void) { int cpuid; + /* Mop up eventual mwait_play_dead() wreckage */ + this_cpu_write(mwait_cpu_dead.status, 0); + this_cpu_write(mwait_cpu_dead.control, 0); + /* * If waken up by an INIT in an 82489DX configuration * cpu_callout_mask guarantees we don't get here before @@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* Set up state for the kexec() hack below */ + md->status = CPUDEAD_MWAIT_WAIT; + md->control = CPUDEAD_MWAIT_WAIT; + wbinvd(); while (1) { @@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void) mb(); __mwait(eax, 0); + if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) { + /* + * Kexec is about to happen. Don't go back into mwait() as + * the kexec kernel might overwrite text and data including + * page tables and stack. So mwait() would resume when the + * monitor cache line is written to and then the CPU goes + * south due to overwritten text, page tables and stack. + * + * Note: This does _NOT_ protect against a stray MCE, NMI, + * SMI. They will resume execution at the instruction + * following the HLT instruction and run into the problem + * which this is trying to prevent. + */ + WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT); + while(1) + native_halt(); + } + cond_wakeup_cpu0(); } } +/* + * Kick all "offline" CPUs out of mwait on kexec(). See comment in + * mwait_play_dead(). + */ +void smp_kick_mwait_play_dead(void) +{ + u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT; + struct mwait_cpu_dead *md; + unsigned int cpu, i; + + for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) { + md = per_cpu_ptr(&mwait_cpu_dead, cpu); + + /* Does it sit in mwait_play_dead() ? */ + if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT) + continue; + + /* Wait up to 5ms */ + for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { + /* Bring it out of mwait */ + WRITE_ONCE(md->control, newstate); + udelay(5); + } + + if (READ_ONCE(md->status) != newstate) + pr_err_once("CPU%u is stuck in mwait_play_dead()\n", cpu); + } +} + void __noreturn hlt_play_dead(void) { if (__this_cpu_read(cpu_info.x86) >= 4)

2 years, 5 months

1
0
0 0

FAILED: patch "[PATCH] x86/smp: Cure kexec() vs. mwait_play_dead() breakage" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x d7893093a7417527c0d73c9832244e65c9d0114f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023062809-ultimate-spider-e3a0@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d7893093a7417527c0d73c9832244e65c9d0114f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner <tglx(a)linutronix.de> Date: Thu, 15 Jun 2023 22:33:57 +0200 Subject: [PATCH] x86/smp: Cure kexec() vs. mwait_play_dead() breakage TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <ashok.raj(a)intel.com> Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de> Tested-by: Ashok Raj <ashok.raj(a)intel.com> Reviewed-by: Ashok Raj <ashok.raj(a)intel.com> Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 4e91054c84be..d4ce5cb5c953 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); void cond_wakeup_cpu0(void); +void smp_kick_mwait_play_dead(void); + void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index d842875f986f..174d6232b87f 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -21,6 +21,7 @@ #include <linux/interrupt.h> #include <linux/cpu.h> #include <linux/gfp.h> +#include <linux/kexec.h> #include <asm/mtrr.h> #include <asm/tlbflush.h> @@ -157,6 +158,10 @@ static void native_stop_other_cpus(int wait) if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1) return; + /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */ + if (kexec_in_progress) + smp_kick_mwait_play_dead(); + /* * 1) Send an IPI on the reboot vector to all other CPUs. * diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c5ac5d74cdd4..483df0427678 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include <linux/tboot.h> #include <linux/gfp.h> #include <linux/cpuidle.h> +#include <linux/kexec.h> #include <linux/numa.h> #include <linux/pgtable.h> #include <linux/overflow.h> @@ -106,6 +107,9 @@ struct mwait_cpu_dead { unsigned int status; }; +#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF +#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD + /* * Cache line aligned data for mwait_play_dead(). Separate on purpose so * that it's unlikely to be touched by other CPUs. @@ -173,6 +177,10 @@ static void smp_callin(void) { int cpuid; + /* Mop up eventual mwait_play_dead() wreckage */ + this_cpu_write(mwait_cpu_dead.status, 0); + this_cpu_write(mwait_cpu_dead.control, 0); + /* * If waken up by an INIT in an 82489DX configuration * cpu_callout_mask guarantees we don't get here before @@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* Set up state for the kexec() hack below */ + md->status = CPUDEAD_MWAIT_WAIT; + md->control = CPUDEAD_MWAIT_WAIT; + wbinvd(); while (1) { @@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void) mb(); __mwait(eax, 0); + if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) { + /* + * Kexec is about to happen. Don't go back into mwait() as + * the kexec kernel might overwrite text and data including + * page tables and stack. So mwait() would resume when the + * monitor cache line is written to and then the CPU goes + * south due to overwritten text, page tables and stack. + * + * Note: This does _NOT_ protect against a stray MCE, NMI, + * SMI. They will resume execution at the instruction + * following the HLT instruction and run into the problem + * which this is trying to prevent. + */ + WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT); + while(1) + native_halt(); + } + cond_wakeup_cpu0(); } } +/* + * Kick all "offline" CPUs out of mwait on kexec(). See comment in + * mwait_play_dead(). + */ +void smp_kick_mwait_play_dead(void) +{ + u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT; + struct mwait_cpu_dead *md; + unsigned int cpu, i; + + for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) { + md = per_cpu_ptr(&mwait_cpu_dead, cpu); + + /* Does it sit in mwait_play_dead() ? */ + if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT) + continue; + + /* Wait up to 5ms */ + for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { + /* Bring it out of mwait */ + WRITE_ONCE(md->control, newstate); + udelay(5); + } + + if (READ_ONCE(md->status) != newstate) + pr_err_once("CPU%u is stuck in mwait_play_dead()\n", cpu); + } +} + void __noreturn hlt_play_dead(void) { if (__this_cpu_read(cpu_info.x86) >= 4)

2 years, 5 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror June 2023