After upgrading from 6.7.0 to 6.7.1 a couple of my systems with md
RAID-5 arrays started experiencing hangs. It starts with some
processes which write to the array getting stuck. The whole system
eventually becomes unresponsive and unclean shutdown must be performed
(poweroff and reboot don't work).
While trying to diagnose the issue, I noticed that the md0_raid5
kernel thread consumes 100% CPU after the issue occurs. No relevant
warnings or errors were found in dmesg.
On 6.7.1, I can reproduce the issue somewhat reliably by copying a
large amount of data to the array. I am unable to reproduce the issue
at all on 6.7.0. The bisection was a bit difficult since I don't have
a 100% reliable method to reproduce the problem, but with some
perseverence I eventually managed to whittle it down to commit
0de40f76d567 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in
raid5d"). After reverting that commit (i.e. reapplying the reverted
commit) on top of 6.7.1 I can no longer reproduce the problem at all.
Some details that might be relevant:
- Both systems are running MD RAID-5 with a journal device.
- mdadm in monitor mode is always running on both systems.
- Both systems were previously running 6.7.0 and earlier just fine.
- The older of the two systems has been running a raid5 array without
incident for many years (kernel going back to at least 5.1) -- this
is the first raid5 issue it has encountered.
Please let me know if there is any other helpful information that I
might be able to provide.
-- Dan
#regzbot introduced: 0de40f76d567133b871cd6ad46bb87afbce46983
Previously, patches have been added to limit the reported count of SATA
ports for asm1064 and asm1166 SATA controllers, as those controllers do
report more ports than physical having.
Unfortunately, this causes trouble for users, which are using SATA
controllers, which provide more ports through SATA PMP
(Port-MultiPlier) and are now not any more recognized.
This happens, as asm1064 and 1166 are handling SATA PMP transparently,
so all non-physical ports needs to be enabled to use that feature.
This patch reverts both patches for asm1064 and asm1166, so old
behavior is restored and SATA PMP will work again, so all physical and
non-physical ports will work again.
Fixes: 0077a504e1a4 ("ahci: asm1166: correct count of reported ports")
Fixes: 9815e3961754 ("ahci: asm1064: correct count of reported ports")
Cc: stable(a)vger.kernel.org
Reported-by: Matt <cryptearth(a)googlemail.com>
Signed-off-by: Conrad Kostecki <conikost(a)gentoo.org>
---
drivers/ata/ahci.c | 13 -------------
1 file changed, 13 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 78570684ff68..562302e2e57c 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -669,19 +669,6 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for mobile chipsets");
static void ahci_pci_save_initial_config(struct pci_dev *pdev,
struct ahci_host_priv *hpriv)
{
- if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA) {
- switch (pdev->device) {
- case 0x1166:
- dev_info(&pdev->dev, "ASM1166 has only six ports\n");
- hpriv->saved_port_map = 0x3f;
- break;
- case 0x1064:
- dev_info(&pdev->dev, "ASM1064 has only four ports\n");
- hpriv->saved_port_map = 0xf;
- break;
- }
- }
-
if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) {
dev_info(&pdev->dev, "JMB361 has only one port\n");
hpriv->saved_port_map = 1;
--
2.44.0
The mlx5 comp irq name scheme is changed a little bit between
commit 3663ad34bc70 ("net/mlx5: Shift control IRQ to the last index")
and commit 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation").
The index in the comp irq name used to start from 0 but now it starts
from 1. There is nothing critical here, but it's harmless to change
back to the old behavior, a.k.a starting from 0.
Fixes: 3354822cde5a ("net/mlx5: Use dynamic msix vectors allocation")
Reviewed-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Reviewed-by: Yuanyuan Zhong <yzhong(a)purestorage.com>
Signed-off-by: Michael Liang <mliang(a)purestorage.com>
---
drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
index 4dcf995cb1a2..6bac8ad70ba6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c
@@ -19,6 +19,7 @@
#define MLX5_IRQ_CTRL_SF_MAX 8
/* min num of vectors for SFs to be enabled */
#define MLX5_IRQ_VEC_COMP_BASE_SF 2
+#define MLX5_IRQ_VEC_COMP_BASE 1
#define MLX5_EQ_SHARE_IRQ_MAX_COMP (8)
#define MLX5_EQ_SHARE_IRQ_MAX_CTRL (UINT_MAX)
@@ -246,6 +247,7 @@ static void irq_set_name(struct mlx5_irq_pool *pool, char *name, int vecidx)
return;
}
+ vecidx -= MLX5_IRQ_VEC_COMP_BASE;
snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", vecidx);
}
@@ -585,7 +587,7 @@ struct mlx5_irq *mlx5_irq_request_vector(struct mlx5_core_dev *dev, u16 cpu,
struct mlx5_irq_table *table = mlx5_irq_table_get(dev);
struct mlx5_irq_pool *pool = table->pcif_pool;
struct irq_affinity_desc af_desc;
- int offset = 1;
+ int offset = MLX5_IRQ_VEC_COMP_BASE;
if (!pool->xa_num_irqs.max)
offset = 0;
--
2.34.1
Framebuffer memory is allocated via vmalloc() from non-contiguous
physical pages. The physical framebuffer start address is therefore
meaningless. Do not set it.
The value is not used within the kernel and only exported to userspace
on dedicated ARM configs. No functional change is expected.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: a5b44c4adb16 ("drm/fbdev-generic: Always use shadow buffering")
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: Zack Rusin <zackr(a)vmware.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v6.4+
---
drivers/gpu/drm/drm_fbdev_generic.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c b/drivers/gpu/drm/drm_fbdev_generic.c
index d647d89764cb9..b4659cd6285ab 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -113,7 +113,6 @@ static int drm_fbdev_generic_helper_fb_probe(struct drm_fb_helper *fb_helper,
/* screen */
info->flags |= FBINFO_VIRTFB | FBINFO_READS_FAST;
info->screen_buffer = screen_buffer;
- info->fix.smem_start = page_to_phys(vmalloc_to_page(info->screen_buffer));
info->fix.smem_len = screen_size;
/* deferred I/O */
--
2.44.0
From: Ashish Kalra <ashish.kalra(a)amd.com>
Some BIOSes allow the end user to set the minimum SEV ASID value
(CPUID 0x8000001F_EDX) to be greater than the maximum number of
encrypted guests, or maximum SEV ASID value (CPUID 0x8000001F_ECX)
in order to dedicate all the SEV ASIDs to SEV-ES or SEV-SNP.
The SEV support, as coded, does not handle the case where the minimum
SEV ASID value can be greater than the maximum SEV ASID value.
As a result, the following confusing message is issued:
[ 30.715724] kvm_amd: SEV enabled (ASIDs 1007 - 1006)
Fix the support to properly handle this case.
Fixes: 916391a2d1dc ("KVM: SVM: Add support for SEV-ES capability in KVM")
Suggested-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com>
Cc: stable(a)vger.kernel.org
Acked-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Link: https://lore.kernel.org/r/20240104190520.62510-1-Ashish.Kalra@amd.com
Link: https://lore.kernel.org/r/20240131235609.4161407-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/svm/sev.c | 29 +++++++++++++++++++----------
1 file changed, 19 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index eeef43c795d8..5f8312edee36 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -144,10 +144,21 @@ static void sev_misc_cg_uncharge(struct kvm_sev_info *sev)
static int sev_asid_new(struct kvm_sev_info *sev)
{
- unsigned int asid, min_asid, max_asid;
+ /*
+ * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid.
+ * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1.
+ * Note: min ASID can end up larger than the max if basic SEV support is
+ * effectively disabled by disallowing use of ASIDs for SEV guests.
+ */
+ unsigned int min_asid = sev->es_active ? 1 : min_sev_asid;
+ unsigned int max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid;
+ unsigned int asid;
bool retry = true;
int ret;
+ if (min_asid > max_asid)
+ return -ENOTTY;
+
WARN_ON(sev->misc_cg);
sev->misc_cg = get_current_misc_cg();
ret = sev_misc_cg_try_charge(sev);
@@ -159,12 +170,6 @@ static int sev_asid_new(struct kvm_sev_info *sev)
mutex_lock(&sev_bitmap_lock);
- /*
- * SEV-enabled guests must use asid from min_sev_asid to max_sev_asid.
- * SEV-ES-enabled guest can use from 1 to min_sev_asid - 1.
- */
- min_asid = sev->es_active ? 1 : min_sev_asid;
- max_asid = sev->es_active ? min_sev_asid - 1 : max_sev_asid;
again:
asid = find_next_zero_bit(sev_asid_bitmap, max_asid + 1, min_asid);
if (asid > max_asid) {
@@ -2234,8 +2239,10 @@ void __init sev_hardware_setup(void)
goto out;
}
- sev_asid_count = max_sev_asid - min_sev_asid + 1;
- WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count));
+ if (min_sev_asid <= max_sev_asid) {
+ sev_asid_count = max_sev_asid - min_sev_asid + 1;
+ WARN_ON_ONCE(misc_cg_set_capacity(MISC_CG_RES_SEV, sev_asid_count));
+ }
sev_supported = true;
/* SEV-ES support requested? */
@@ -2266,7 +2273,9 @@ void __init sev_hardware_setup(void)
out:
if (boot_cpu_has(X86_FEATURE_SEV))
pr_info("SEV %s (ASIDs %u - %u)\n",
- sev_supported ? "enabled" : "disabled",
+ sev_supported ? min_sev_asid <= max_sev_asid ? "enabled" :
+ "unusable" :
+ "disabled",
min_sev_asid, max_sev_asid);
if (boot_cpu_has(X86_FEATURE_SEV_ES))
pr_info("SEV-ES %s (ASIDs %u - %u)\n",
--
2.43.0
Currently when we increase the device statistics, it would always lead
to an error message in the kernel log.
I would argue this behavior is not ideal:
- It would flood the dmesg and bury real important messages
One common scenario is scrub.
If scrub hit some errors, it would cause both scrub and
btrfs_dev_stat_inc_and_print() to print error messages.
And in that case, btrfs_dev_stat_inc_and_print() is completely
useless.
- The results of btrfs_dev_stat_inc_and_print() is mostly for history
monitoring, doesn't has enough details
If we trigger the errors during regular read, such messages from
btrfs_dev_stat_inc_and_print() won't help us to locate the cause
either.
The real usage for the btrfs device statistics is for some user space
daemon to check if there is any new errors, acting like some checks on
SMART, thus we don't really need/want those messages in dmesg.
This patch would reduce the log level to debug (disabled by default) for
btrfs_dev_stat_inc_and_print().
For users really want to utilize btrfs devices statistics, they should
go check "btrfs device stats" periodically, and we should focus the
kernel error messages to more important things.
CC: stable(a)vger.kernel.org
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/volumes.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e49935a54da0..126145950ed3 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7828,7 +7828,7 @@ void btrfs_dev_stat_inc_and_print(struct btrfs_device *dev, int index)
if (!dev->dev_stats_valid)
return;
- btrfs_err_rl_in_rcu(dev->fs_info,
+ btrfs_debug_rl_in_rcu(dev->fs_info,
"bdev %s errs: wr %u, rd %u, flush %u, corrupt %u, gen %u",
btrfs_dev_name(dev),
btrfs_dev_stat_read(dev, BTRFS_DEV_STAT_WRITE_ERRS),
--
2.44.0
The patch titled
Subject: mm: cachestat: fix two shmem bugs
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-cachestat-fix-two-shmem-bugs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: cachestat: fix two shmem bugs
Date: Fri, 15 Mar 2024 05:55:56 -0400
When cachestat on shmem races with swapping and invalidation, there
are two possible bugs:
1) A swapin error can have resulted in a poisoned swap entry in the
shmem inode's xarray. Calling get_shadow_from_swap_cache() on it
will result in an out-of-bounds access to swapper_spaces[].
Validate the entry with non_swap_entry() before going further.
2) When we find a valid swap entry in the shmem's inode, the shadow
entry in the swapcache might not exist yet: swap IO is still in
progress and we're before __remove_mapping; swapin, invalidation,
or swapoff have removed the shadow from swapcache after we saw the
shmem swap entry.
This will send a NULL to workingset_test_recent(). The latter
purely operates on pointer bits, so it won't crash - node 0, memcg
ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a
bogus test. In theory that could result in a false "recently
evicted" count.
Such a false positive wouldn't be the end of the world. But for
code clarity and (future) robustness, be explicit about this case.
Bail on get_shadow_from_swap_cache() returning NULL.
Link: https://lkml.kernel.org/r/20240315095556.GC581298@cmpxchg.org
Fixes: cf264e1329fb ("cachestat: implement cachestat syscall")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Chengming Zhou <chengming.zhou(a)linux.dev> [Bug #1]
Reported-by: Jann Horn <jannh(a)google.com> [Bug #2]
Reviewed-by: Chengming Zhou <chengming.zhou(a)linux.dev>
Reviewed-by: Nhat Pham <nphamcs(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [v6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
--- a/mm/filemap.c~mm-cachestat-fix-two-shmem-bugs
+++ a/mm/filemap.c
@@ -4197,7 +4197,23 @@ static void filemap_cachestat(struct add
/* shmem file - in swap cache */
swp_entry_t swp = radix_to_swp_entry(folio);
+ /* swapin error results in poisoned entry */
+ if (non_swap_entry(swp))
+ goto resched;
+
+ /*
+ * Getting a swap entry from the shmem
+ * inode means we beat
+ * shmem_unuse(). rcu_read_lock()
+ * ensures swapoff waits for us before
+ * freeing the swapper space. However,
+ * we can race with swapping and
+ * invalidation, so there might not be
+ * a shadow in the swapcache (yet).
+ */
shadow = get_shadow_from_swap_cache(swp);
+ if (!shadow)
+ goto resched;
}
#endif
if (workingset_test_recent(shadow, true, &workingset))
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-cachestat-fix-two-shmem-bugs.patch
The patch titled
Subject: init: open /initrd.image with O_LARGEFILE
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
init-open-initrdimage-with-o_largefile.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Sperbeck <jsperbeck(a)google.com>
Subject: init: open /initrd.image with O_LARGEFILE
Date: Sun, 17 Mar 2024 15:15:22 -0700
If initrd data is larger than 2Gb, we'll eventually fail to write to the
/initrd.image file when we hit that limit, unless O_LARGEFILE is set.
Link: https://lkml.kernel.org/r/20240317221522.896040-1-jsperbeck@google.com
Signed-off-by: John Sperbeck <jsperbeck(a)google.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
init/initramfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/init/initramfs.c~init-open-initrdimage-with-o_largefile
+++ a/init/initramfs.c
@@ -682,7 +682,7 @@ static void __init populate_initrd_image
printk(KERN_INFO "rootfs image is not initramfs (%s); looks like an initrd\n",
err);
- file = filp_open("/initrd.image", O_WRONLY | O_CREAT, 0700);
+ file = filp_open("/initrd.image", O_WRONLY|O_CREAT|O_LARGEFILE, 0700);
if (IS_ERR(file))
return;
_
Patches currently in -mm which might be from jsperbeck(a)google.com are
init-open-initrdimage-with-o_largefile.patch