On Thu, Apr 17, 2025 at 08:47:05PM +0200, Jules Noirant wrote:
> Hi,
>
> Thanks for the review. Technically, that patch should at least be Co-authored by me since it's a slightly reworded rebase of a patch I submitted last year: https://lore.kernel.org/lkml/20240304195745.10195-1-jules.noirant@orange.fr…
This is just a copy of what went into Linus's tree, sorry.
greg k-h
The quilt patch titled
Subject: writeback: fix false warning in inode_to_wb()
has been removed from the -mm tree. Its filename was
writeback-fix-false-warning-in-inode_to_wb.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Andreas Gruenbacher <agruenba(a)redhat.com>
Subject: writeback: fix false warning in inode_to_wb()
Date: Sat, 12 Apr 2025 18:39:12 +0200
inode_to_wb() is used also for filesystems that don't support cgroup
writeback. For these filesystems inode->i_wb is stable during the
lifetime of the inode (it points to bdi->wb) and there's no need to hold
locks protecting the inode->i_wb dereference. Improve the warning in
inode_to_wb() to not trigger for these filesystems.
Link: https://lkml.kernel.org/r/20250412163914.3773459-3-agruenba@redhat.com
Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Reviewed-by: Andreas Gruenbacher <agruenba(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/backing-dev.h | 1 +
1 file changed, 1 insertion(+)
--- a/include/linux/backing-dev.h~writeback-fix-false-warning-in-inode_to_wb
+++ a/include/linux/backing-dev.h
@@ -249,6 +249,7 @@ static inline struct bdi_writeback *inod
{
#ifdef CONFIG_LOCKDEP
WARN_ON_ONCE(debug_locks &&
+ (inode->i_sb->s_iflags & SB_I_CGROUPWB) &&
(!lockdep_is_held(&inode->i_lock) &&
!lockdep_is_held(&inode->i_mapping->i_pages.xa_lock) &&
!lockdep_is_held(&inode->i_wb->list_lock)));
_
Patches currently in -mm which might be from agruenba(a)redhat.com are
The quilt patch titled
Subject: mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
has been removed from the -mm tree. Its filename was
mm-gup-fix-wrongly-calculated-returned-value-in-fault_in_safe_writeable.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baoquan He <bhe(a)redhat.com>
Subject: mm/gup: fix wrongly calculated returned value in fault_in_safe_writeable()
Date: Thu, 10 Apr 2025 11:57:14 +0800
Not like fault_in_readable() or fault_in_writeable(), in
fault_in_safe_writeable() local variable 'start' is increased page by page
to loop till the whole address range is handled. However, it mistakenly
calculates the size of the handled range with 'uaddr - start'.
Fix it here.
Andreas said:
: In gfs2, fault_in_iov_iter_writeable() is used in
: gfs2_file_direct_read() and gfs2_file_read_iter(), so this potentially
: affects buffered as well as direct reads. This bug could cause those
: gfs2 functions to spin in a loop.
Link: https://lkml.kernel.org/r/20250410035717.473207-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20250410035717.473207-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Fixes: fe673d3f5bf1 ("mm: gup: make fault_in_safe_writeable() use fixup_user_fault()")
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andreas Gruenbacher <agruenba(a)redhat.com>
Cc: Yanjun.Zhu <yanjun.zhu(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/gup.c~mm-gup-fix-wrongly-calculated-returned-value-in-fault_in_safe_writeable
+++ a/mm/gup.c
@@ -2207,8 +2207,8 @@ size_t fault_in_safe_writeable(const cha
} while (start != end);
mmap_read_unlock(mm);
- if (size > (unsigned long)uaddr - start)
- return size - ((unsigned long)uaddr - start);
+ if (size > start - (unsigned long)uaddr)
+ return size - (start - (unsigned long)uaddr);
return 0;
}
EXPORT_SYMBOL(fault_in_safe_writeable);
_
Patches currently in -mm which might be from bhe(a)redhat.com are
mm-gup-remove-unneeded-checking-in-follow_page_pte.patch
mm-gup-remove-gup_fast_pgd_leaf-and-clean-up-the-relevant-codes.patch
mm-gup-clean-up-codes-in-fault_in_xxx-functions.patch
mm-gup-clean-up-codes-in-fault_in_xxx-functions-v5.patch
On the Lenovo ThinkPad X201, when Intel VT-d is enabled in the BIOS, the
kernel boots with errors related to DMAR, the graphical interface appeared
quite choppy, and the system resets erratically within a minute after it
booted:
DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Write NO_PASID] Request device [00:02.0] fault addr 0xb97ff000
[fault reason 0x05] PTE Write access is not set
Upon comparing boot logs with VT-d on/off, I found that the Intel Calpella
quirk (`quirk_calpella_no_shadow_gtt()') correctly applied the igfx IOMMU
disable/quirk correctly:
pci 0000:00:00.0: DMAR: BIOS has allocated no shadow GTT; disabling IOMMU
for graphics
Whereas with VT-d on, it went into the "else" branch, which then
triggered the DMAR handling fault above:
... else if (!disable_igfx_iommu) {
/* we have to ensure the gfx device is idle before we flush */
pci_info(dev, "Disabling batched IOTLB flush on Ironlake\n");
iommu_set_dma_strict();
}
Now, this is not exactly scientific, but moving 0x0044 to quirk_iommu_igfx
seems to have fixed the aforementioned issue. Running a few `git blame'
runs on the function, I have found that the quirk was originally
introduced as a fix specific to ThinkPad X201:
commit 9eecabcb9a92 ("intel-iommu: Abort IOMMU setup for igfx if BIOS gave
no shadow GTT space")
Which was later revised twice to the "else" branch we saw above:
- 2011: commit 6fbcfb3e467a ("intel-iommu: Workaround IOTLB hang on
Ironlake GPU")
- 2024: commit ba00196ca41c ("iommu/vt-d: Decouple igfx_off from graphic
identity mapping")
I'm uncertain whether further testings on this particular laptops were
done in 2011 and (honestly I'm not sure) 2024, but I would be happy to do
some distro-specific testing if that's what would be required to verify
this patch.
P.S., I also see IDs 0x0040, 0x0062, and 0x006a listed under the same
`quirk_calpella_no_shadow_gtt()' quirk, but I'm not sure how similar these
chipsets are (if they share the same issue with VT-d or even, indeed, if
this issue is specific to a bug in the Lenovo BIOS). With regards to
0x0062, it seems to be a Centrino wireless card, but not a chipset?
I have also listed a couple (distro and kernel) bug reports below as
references (some of them are from 7-8 years ago!), as they seem to be
similar issue found on different Westmere/Ironlake, Haswell, and Broadwell
hardware setups.
Cc: stable(a)vger.kernel.org
Fixes: 6fbcfb3e467a ("intel-iommu: Workaround IOTLB hang on Ironlake GPU")
Fixes: ba00196ca41c ("iommu/vt-d: Decouple igfx_off from graphic identity mapping")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197029
Link: https://groups.google.com/g/qubes-users/c/4NP4goUds2c?pli=1
Link: https://bugs.archlinux.org/task/65362
Link: https://bbs.archlinux.org/viewtopic.php?id=230323
Reported-by: Wenhao Sun <weiguangtwk(a)outlook.com>
Signed-off-by: Mingcong Bai <jeffbai(a)aosc.io>
---
drivers/iommu/intel/iommu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index b29da2d96d0b..64935e14b9ec 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4432,6 +4432,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e30, quirk_iommu_igfx);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e40, quirk_iommu_igfx);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e90, quirk_iommu_igfx);
+/* QM57/QS57 integrated gfx malfunctions with dmar */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_iommu_igfx);
+
/* Broadwell igfx malfunctions with dmar */
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1606, quirk_iommu_igfx);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x160B, quirk_iommu_igfx);
@@ -4509,7 +4512,6 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev *dev)
}
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_calpella_no_shadow_gtt);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0062, quirk_calpella_no_shadow_gtt);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x006a, quirk_calpella_no_shadow_gtt);
--
2.49.0
The workqueue used for the reset worker is marked as WQ_MEM_RECLAIM,
while the GSC one isn't (and can't be as we need to do memory
allocations in the gsc worker). Therefore, we can't flush the latter
from the former.
The reason why we had such a flush was to avoid interrupting either
the GSC FW load or in progress GSC proxy operations. GSC proxy
operations fall into 2 categories:
1) GSC proxy init: this only happen once immediately after GSC FW laod
and does not support being interrupted. The only way to recover from
an interruption of the proxy init is to do an FLR and re-load the GSC.
2) GSC proxy request: this can happen in response to a request that
the driver sends to the GSC. If this is interrupted, the GSC FW will
timeout and the driver request will be failed, but overall the GSC
will keep working fine.
Flushing the work allowed us to avoid interruption in both cases (unless
the hang came from the GSC engine itself, in which case we're toast
anyway). However, a failure on a proxy request is tolerable if we're in
a scenario where we're triggering a GT reset (i.e., something is already
gone pretty wrong), so what we really need to avoid is interrupting
the init flow, which we can do by polling on the register that reports
when the proxy init is complete (as that ensure us that all the load and
init operations have been completed).
Note that during suspend we still want to do a flush of the worker to
make sure it completes any operations involving the HW before the power
is cut.
Fixes: dd0e89e5edc2 ("drm/xe/gsc: GSC FW load")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio(a)intel.com>
Cc: John Harrison <John.C.Harrison(a)Intel.com>
Cc: Alan Previn <alan.previn.teres.alexis(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_gsc.c | 22 ++++++++++++++++++++++
drivers/gpu/drm/xe/xe_gsc.h | 1 +
drivers/gpu/drm/xe/xe_gsc_proxy.c | 11 +++++++++++
drivers/gpu/drm/xe/xe_gsc_proxy.h | 1 +
drivers/gpu/drm/xe/xe_gt.c | 2 +-
drivers/gpu/drm/xe/xe_uc.c | 8 +++++++-
drivers/gpu/drm/xe/xe_uc.h | 1 +
7 files changed, 44 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_gsc.c b/drivers/gpu/drm/xe/xe_gsc.c
index fd41113f8572..64799358e51f 100644
--- a/drivers/gpu/drm/xe/xe_gsc.c
+++ b/drivers/gpu/drm/xe/xe_gsc.c
@@ -555,6 +555,28 @@ void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc)
flush_work(&gsc->work);
}
+void xe_gsc_stop_prepare(struct xe_gsc *gsc)
+{
+ struct xe_gt *gt = gsc_to_gt(gsc);
+ int ret;
+
+ if (!xe_uc_fw_is_loadable(&gsc->fw) || xe_uc_fw_is_in_error_state(&gsc->fw))
+ return;
+
+ xe_force_wake_assert_held(gt_to_fw(gt), XE_FW_GSC);
+
+ /*
+ * If the GSC FW load or the proxy init are interrupted, the only way
+ * to recover it is to do an FLR and reload the GSC from scratch.
+ * Therefore, let's wait for the init to complete before stopping
+ * operations. The proxy init is the last step, so we can just wait on
+ * that
+ */
+ ret = xe_gsc_wait_for_proxy_done(gsc);
+ if (ret)
+ xe_gt_err(gt, "failed to wait for GSC init completion before uc stop\n");
+}
+
/*
* wa_14015076503: if the GSC FW is loaded, we need to alert it before doing a
* GSC engine reset by writing a notification bit in the GS1 register and then
diff --git a/drivers/gpu/drm/xe/xe_gsc.h b/drivers/gpu/drm/xe/xe_gsc.h
index d99f66c38075..b8b8e0810ad9 100644
--- a/drivers/gpu/drm/xe/xe_gsc.h
+++ b/drivers/gpu/drm/xe/xe_gsc.h
@@ -16,6 +16,7 @@ struct xe_hw_engine;
int xe_gsc_init(struct xe_gsc *gsc);
int xe_gsc_init_post_hwconfig(struct xe_gsc *gsc);
void xe_gsc_wait_for_worker_completion(struct xe_gsc *gsc);
+void xe_gsc_stop_prepare(struct xe_gsc *gsc);
void xe_gsc_load_start(struct xe_gsc *gsc);
void xe_gsc_hwe_irq_handler(struct xe_hw_engine *hwe, u16 intr_vec);
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.c b/drivers/gpu/drm/xe/xe_gsc_proxy.c
index 8cf70b228ff3..7ec81cb00b85 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.c
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.c
@@ -71,6 +71,17 @@ bool xe_gsc_proxy_init_done(struct xe_gsc *gsc)
HECI1_FWSTS1_PROXY_STATE_NORMAL;
}
+int xe_gsc_wait_for_proxy_done(struct xe_gsc *gsc)
+{
+ struct xe_gt *gt = gsc_to_gt(gsc);
+
+ /* Proxy init can take up to 500ms, so wait double that for safety */
+ return xe_mmio_wait32(>->mmio, HECI_FWSTS1(MTL_GSC_HECI1_BASE),
+ HECI1_FWSTS1_CURRENT_STATE,
+ HECI1_FWSTS1_PROXY_STATE_NORMAL,
+ USEC_PER_SEC, NULL, false);
+}
+
static void __gsc_proxy_irq_rmw(struct xe_gsc *gsc, u32 clr, u32 set)
{
struct xe_gt *gt = gsc_to_gt(gsc);
diff --git a/drivers/gpu/drm/xe/xe_gsc_proxy.h b/drivers/gpu/drm/xe/xe_gsc_proxy.h
index fdef56995cd4..da8c60362a1c 100644
--- a/drivers/gpu/drm/xe/xe_gsc_proxy.h
+++ b/drivers/gpu/drm/xe/xe_gsc_proxy.h
@@ -12,6 +12,7 @@ struct xe_gsc;
int xe_gsc_proxy_init(struct xe_gsc *gsc);
bool xe_gsc_proxy_init_done(struct xe_gsc *gsc);
+int xe_gsc_wait_for_proxy_done(struct xe_gsc *gsc);
int xe_gsc_proxy_start(struct xe_gsc *gsc);
int xe_gsc_proxy_request_handler(struct xe_gsc *gsc);
diff --git a/drivers/gpu/drm/xe/xe_gt.c b/drivers/gpu/drm/xe/xe_gt.c
index 8068b4bc0a09..0e5d243c9451 100644
--- a/drivers/gpu/drm/xe/xe_gt.c
+++ b/drivers/gpu/drm/xe/xe_gt.c
@@ -899,7 +899,7 @@ void xe_gt_suspend_prepare(struct xe_gt *gt)
fw_ref = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL);
- xe_uc_stop_prepare(>->uc);
+ xe_uc_suspend_prepare(>->uc);
xe_force_wake_put(gt_to_fw(gt), fw_ref);
}
diff --git a/drivers/gpu/drm/xe/xe_uc.c b/drivers/gpu/drm/xe/xe_uc.c
index c14bd2282044..3a8751a8b92d 100644
--- a/drivers/gpu/drm/xe/xe_uc.c
+++ b/drivers/gpu/drm/xe/xe_uc.c
@@ -244,7 +244,7 @@ void xe_uc_gucrc_disable(struct xe_uc *uc)
void xe_uc_stop_prepare(struct xe_uc *uc)
{
- xe_gsc_wait_for_worker_completion(&uc->gsc);
+ xe_gsc_stop_prepare(&uc->gsc);
xe_guc_stop_prepare(&uc->guc);
}
@@ -278,6 +278,12 @@ static void uc_reset_wait(struct xe_uc *uc)
goto again;
}
+void xe_uc_suspend_prepare(struct xe_uc *uc)
+{
+ xe_gsc_wait_for_worker_completion(&uc->gsc);
+ xe_guc_stop_prepare(&uc->guc);
+}
+
int xe_uc_suspend(struct xe_uc *uc)
{
/* GuC submission not enabled, nothing to do */
diff --git a/drivers/gpu/drm/xe/xe_uc.h b/drivers/gpu/drm/xe/xe_uc.h
index 3813c1ede450..c23e6f5e2514 100644
--- a/drivers/gpu/drm/xe/xe_uc.h
+++ b/drivers/gpu/drm/xe/xe_uc.h
@@ -18,6 +18,7 @@ int xe_uc_reset_prepare(struct xe_uc *uc);
void xe_uc_stop_prepare(struct xe_uc *uc);
void xe_uc_stop(struct xe_uc *uc);
int xe_uc_start(struct xe_uc *uc);
+void xe_uc_suspend_prepare(struct xe_uc *uc);
int xe_uc_suspend(struct xe_uc *uc);
int xe_uc_sanitize_reset(struct xe_uc *uc);
void xe_uc_declare_wedged(struct xe_uc *uc);
--
2.43.0
The patch titled
Subject: mm/mempolicy: fix memory leaks in weighted interleave sysfs
has been added to the -mm mm-new branch. Its filename is
mm-mempolicy-fix-memory-leaks-in-weighted-interleave-sysfs.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-new branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Rakie Kim <rakie.kim(a)sk.com>
Subject: mm/mempolicy: fix memory leaks in weighted interleave sysfs
Date: Thu, 17 Apr 2025 16:28:35 +0900
Patch series "Enhance sysfs handling for memory hotplug in weighted
interleave", v9.
The following patch series enhances the weighted interleave policy in the
memory management subsystem by improving sysfs handling, fixing memory
leaks, and introducing dynamic sysfs updates for memory hotplug support.
This patch (of 3):
Memory leaks occurred when removing sysfs attributes for weighted
interleave. Improper kobject deallocation led to unreleased memory when
initialization failed or when nodes were removed.
This patch resolves the issue by replacing unnecessary `kfree()` calls
with proper `kobject_del()` and `kobject_put()` sequences, ensuring
correct teardown and preventing memory leaks.
By explicitly calling `kobject_del()` before `kobject_put()`, the release
function is now invoked safely, and internal sysfs state is correctly
cleaned up. This guarantees that the memory associated with the kobject
is fully released and avoids resource leaks, thereby improving system
stability.
Additionally, sysfs_remove_file() is no longer called from the release
function to avoid accessing invalid sysfs state after kobject_del(). All
attribute removals are now done before kobject_del(), preventing WARN_ON()
in kernfs and ensuring safe and consistent cleanup of sysfs entries.
Link: https://lkml.kernel.org/r/20250417072839.711-1-rakie.kim@sk.com
Link: https://lkml.kernel.org/r/20250417072839.711-2-rakie.kim@sk.com
Fixes: dce41f5ae253 ("mm/mempolicy: implement the sysfs-based weighted_interleave interface")
Signed-off-by: Rakie Kim <rakie.kim(a)sk.com>
Reviewed-by: Gregory Price <gourry(a)gourry.net>
Reviewed-by: Joshua Hahn <joshua.hahnjy(a)gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Honggyu Kim <honggyu.kim(a)sk.com>
Cc: "Huang, Ying" <ying.huang(a)linux.alibaba.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Yunjeong Mun <yunjeong.mun(a)sk.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 121 ++++++++++++++++++++++-------------------------
1 file changed, 59 insertions(+), 62 deletions(-)
--- a/mm/mempolicy.c~mm-mempolicy-fix-memory-leaks-in-weighted-interleave-sysfs
+++ a/mm/mempolicy.c
@@ -3471,8 +3471,8 @@ static ssize_t node_store(struct kobject
static struct iw_node_attr **node_attrs;
-static void sysfs_wi_node_release(struct iw_node_attr *node_attr,
- struct kobject *parent)
+static void sysfs_wi_node_delete(struct iw_node_attr *node_attr,
+ struct kobject *parent)
{
if (!node_attr)
return;
@@ -3481,18 +3481,42 @@ static void sysfs_wi_node_release(struct
kfree(node_attr);
}
-static void sysfs_wi_release(struct kobject *wi_kobj)
+static void sysfs_wi_node_delete_all(struct kobject *wi_kobj)
{
- int i;
+ int nid;
- for (i = 0; i < nr_node_ids; i++)
- sysfs_wi_node_release(node_attrs[i], wi_kobj);
- kobject_put(wi_kobj);
+ for (nid = 0; nid < nr_node_ids; nid++)
+ sysfs_wi_node_delete(node_attrs[nid], wi_kobj);
+}
+
+static void iw_table_free(void)
+{
+ u8 *old;
+
+ mutex_lock(&iw_table_lock);
+ old = rcu_dereference_protected(iw_table,
+ lockdep_is_held(&iw_table_lock));
+ rcu_assign_pointer(iw_table, NULL);
+ mutex_unlock(&iw_table_lock);
+
+ synchronize_rcu();
+ kfree(old);
+}
+
+static void wi_cleanup(struct kobject *wi_kobj) {
+ sysfs_wi_node_delete_all(wi_kobj);
+ iw_table_free();
+ kfree(node_attrs);
+}
+
+static void wi_kobj_release(struct kobject *wi_kobj)
+{
+ kfree(wi_kobj);
}
static const struct kobj_type wi_ktype = {
.sysfs_ops = &kobj_sysfs_ops,
- .release = sysfs_wi_release,
+ .release = wi_kobj_release,
};
static int add_weight_node(int nid, struct kobject *wi_kobj)
@@ -3533,85 +3557,58 @@ static int add_weighted_interleave_group
struct kobject *wi_kobj;
int nid, err;
+ node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
+ GFP_KERNEL);
+ if (!node_attrs)
+ return -ENOMEM;
+
wi_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
- if (!wi_kobj)
+ if (!wi_kobj) {
+ kfree(node_attrs);
return -ENOMEM;
+ }
err = kobject_init_and_add(wi_kobj, &wi_ktype, root_kobj,
"weighted_interleave");
- if (err) {
- kfree(wi_kobj);
- return err;
- }
+ if (err)
+ goto err_put_kobj;
for_each_node_state(nid, N_POSSIBLE) {
err = add_weight_node(nid, wi_kobj);
if (err) {
pr_err("failed to add sysfs [node%d]\n", nid);
- break;
+ goto err_cleanup_kobj;
}
}
- if (err)
- kobject_put(wi_kobj);
- return 0;
-}
-static void mempolicy_kobj_release(struct kobject *kobj)
-{
- u8 *old;
+ return 0;
- mutex_lock(&iw_table_lock);
- old = rcu_dereference_protected(iw_table,
- lockdep_is_held(&iw_table_lock));
- rcu_assign_pointer(iw_table, NULL);
- mutex_unlock(&iw_table_lock);
- synchronize_rcu();
- kfree(old);
- kfree(node_attrs);
- kfree(kobj);
+err_cleanup_kobj:
+ wi_cleanup(wi_kobj);
+ kobject_del(wi_kobj);
+err_put_kobj:
+ kobject_put(wi_kobj);
+ return err;
}
-static const struct kobj_type mempolicy_ktype = {
- .release = mempolicy_kobj_release
-};
-
static int __init mempolicy_sysfs_init(void)
{
int err;
static struct kobject *mempolicy_kobj;
- mempolicy_kobj = kzalloc(sizeof(*mempolicy_kobj), GFP_KERNEL);
- if (!mempolicy_kobj) {
- err = -ENOMEM;
- goto err_out;
- }
-
- node_attrs = kcalloc(nr_node_ids, sizeof(struct iw_node_attr *),
- GFP_KERNEL);
- if (!node_attrs) {
- err = -ENOMEM;
- goto mempol_out;
- }
+ mempolicy_kobj = kobject_create_and_add("mempolicy", mm_kobj);
+ if (!mempolicy_kobj)
+ return -ENOMEM;
- err = kobject_init_and_add(mempolicy_kobj, &mempolicy_ktype, mm_kobj,
- "mempolicy");
+ err = add_weighted_interleave_group(mempolicy_kobj);
if (err)
- goto node_out;
+ goto err_kobj;
- err = add_weighted_interleave_group(mempolicy_kobj);
- if (err) {
- pr_err("mempolicy sysfs structure failed to initialize\n");
- kobject_put(mempolicy_kobj);
- return err;
- }
+ return 0;
- return err;
-node_out:
- kfree(node_attrs);
-mempol_out:
- kfree(mempolicy_kobj);
-err_out:
- pr_err("failed to add mempolicy kobject to the system\n");
+err_kobj:
+ kobject_del(mempolicy_kobj);
+ kobject_put(mempolicy_kobj);
return err;
}
_
Patches currently in -mm which might be from rakie.kim(a)sk.com are
mm-mempolicy-fix-memory-leaks-in-weighted-interleave-sysfs.patch
mm-mempolicy-prepare-weighted-interleave-sysfs-for-memory-hotplug.patch
mm-mempolicy-support-memory-hotplug-in-weighted-interleave.patch
The patch titled
Subject: ocfs2: fix panic in failed foilio allocation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
ocfs2-fix-panic-in-failed-foilio-allocation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mark Tinguely <mark.tinguely(a)oracle.com>
Subject: ocfs2: fix panic in failed foilio allocation
Date: Thu, 10 Apr 2025 14:56:11 -0500
In commit 7e119cff9d0a ("ocfs2: convert w_pages to w_folios") the chunk
page allocations became order 0 folio allocations. If an allocation
failed, the folio array entry should be NULL so the error path can skip
the entry. In the port it is -ENOMEM and the error path panics trying to
free this bad value.
Link: https://lkml.kernel.org/r/150746ad-32ae-415e-bf1d-6dfd195fbb65@oracle.com
Fixes: 7e119cff9d0a ("ocfs2: convert w_pages to w_folios")
Signed-off-by: Mark Tinguely <mark.tinguely(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/ocfs2/aops.c~ocfs2-fix-panic-in-failed-foilio-allocation
+++ a/fs/ocfs2/aops.c
@@ -1071,6 +1071,7 @@ static int ocfs2_grab_folios_for_write(s
if (IS_ERR(wc->w_folios[i])) {
ret = PTR_ERR(wc->w_folios[i]);
mlog_errno(ret);
+ wc->w_folios[i] = NULL;
goto out;
}
}
_
Patches currently in -mm which might be from mark.tinguely(a)oracle.com are
ocfs2-fix-panic-in-failed-foilio-allocation.patch
v2-ocfs2-fix-panic-in-failed-foilio-allocation.patch
From: Ard Biesheuvel <ardb(a)kernel.org>
Communicating with the hypervisor using the shared GHCB page requires
clearing the C bit in the mapping of that page. When executing in the
context of the EFI boot services, the page tables are owned by the
firmware, and this manipulation is not possible.
So switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.
For simplicity, also switch the memory acceptance carried out by the
decompressor when not booting via EFI - this only involves the
allocation for the decompressed kernel, and is generally only called
after kexec, as normal boot will jump straight into the kernel from the
EFI stub.
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dionna Amalie Glaze <dionnaglaze(a)google.com>
Cc: Kevin Loughlin <kevinloughlin(a)google.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
Co-developed-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
Changes since v3 [0]:
- work around the fact that sev_snp_enabled() does not work yet when the
EFI stub accepts misaligned chunks of memory while populating the E820
table
Changes since v2 [1]:
- avoid two separate acceptance APIs; instead, use MSR based page-by-page
acceptance for the decompressor as well
[0] https://lore.kernel.org/all/20250410132850.3708703-2-ardb+git@google.com/T/…
[1] https://lore.kernel.org/all/20250404082921.2767593-8-ardb+git@google.com/T/…
arch/x86/boot/compressed/mem.c | 5 +-
arch/x86/boot/compressed/sev.c | 67 +++++---------------
arch/x86/boot/compressed/sev.h | 2 +
3 files changed, 21 insertions(+), 53 deletions(-)
diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c
index dbba332e4a12..f676156d9f3d 100644
--- a/arch/x86/boot/compressed/mem.c
+++ b/arch/x86/boot/compressed/mem.c
@@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void)
void arch_accept_memory(phys_addr_t start, phys_addr_t end)
{
+ static bool sevsnp;
+
/* Platform-specific memory-acceptance call goes here */
if (early_is_tdx_guest()) {
if (!tdx_accept_memory(start, end))
panic("TDX: Failed to accept memory\n");
- } else if (sev_snp_enabled()) {
+ } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) {
+ sevsnp = true;
snp_accept_memory(start, end);
} else {
error("Cannot accept memory: unknown platform\n");
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 6eadd790f4e5..478eca4f7180 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -169,10 +169,7 @@ bool sev_snp_enabled(void)
static void __page_state_change(unsigned long paddr, enum psc_op op)
{
- u64 val;
-
- if (!sev_snp_enabled())
- return;
+ u64 val, msr;
/*
* If private -> shared then invalidate the page before requesting the
@@ -181,6 +178,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if (op == SNP_PAGE_STATE_SHARED)
pvalidate_4k_page(paddr, paddr, false);
+ /* Save the current GHCB MSR value */
+ msr = sev_es_rd_ghcb_msr();
+
/* Issue VMGEXIT to change the page state in RMP table. */
sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
VMGEXIT();
@@ -190,6 +190,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
+ /* Restore the GHCB MSR value */
+ sev_es_wr_ghcb_msr(msr);
+
/*
* Now that page state is changed in the RMP table, validate it so that it is
* consistent with the RMP entry.
@@ -200,11 +203,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op)
void snp_set_page_private(unsigned long paddr)
{
+ if (!sev_snp_enabled())
+ return;
+
__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
}
void snp_set_page_shared(unsigned long paddr)
{
+ if (!sev_snp_enabled())
+ return;
+
__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
}
@@ -228,56 +237,10 @@ static bool early_setup_ghcb(void)
return true;
}
-static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
- phys_addr_t pa, phys_addr_t pa_end)
-{
- struct psc_hdr *hdr;
- struct psc_entry *e;
- unsigned int i;
-
- hdr = &desc->hdr;
- memset(hdr, 0, sizeof(*hdr));
-
- e = desc->entries;
-
- i = 0;
- while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) {
- hdr->end_entry = i;
-
- e->gfn = pa >> PAGE_SHIFT;
- e->operation = SNP_PAGE_STATE_PRIVATE;
- if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) {
- e->pagesize = RMP_PG_SIZE_2M;
- pa += PMD_SIZE;
- } else {
- e->pagesize = RMP_PG_SIZE_4K;
- pa += PAGE_SIZE;
- }
-
- e++;
- i++;
- }
-
- if (vmgexit_psc(boot_ghcb, desc))
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- pvalidate_pages(desc);
-
- return pa;
-}
-
void snp_accept_memory(phys_addr_t start, phys_addr_t end)
{
- struct snp_psc_desc desc = {};
- unsigned int i;
- phys_addr_t pa;
-
- if (!boot_ghcb && !early_setup_ghcb())
- sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
-
- pa = start;
- while (pa < end)
- pa = __snp_accept_memory(&desc, pa, end);
+ for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+ __page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
}
void sev_es_shutdown_ghcb(void)
diff --git a/arch/x86/boot/compressed/sev.h b/arch/x86/boot/compressed/sev.h
index fc725a981b09..4e463f33186d 100644
--- a/arch/x86/boot/compressed/sev.h
+++ b/arch/x86/boot/compressed/sev.h
@@ -12,11 +12,13 @@
bool sev_snp_enabled(void);
void snp_accept_memory(phys_addr_t start, phys_addr_t end);
+u64 sev_get_status(void);
#else
static inline bool sev_snp_enabled(void) { return false; }
static inline void snp_accept_memory(phys_addr_t start, phys_addr_t end) { }
+static inline u64 sev_get_status(void) { return 0; }
#endif
--
2.49.0.805.g082f7c87e0-goog
The kernel is affected since the following commit rather currently the first
commit in the repository.
d4dccf353db8 ("Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM")
Link: https://lore.kernel.org/stable/SN6PR02MB415791F29F01716CCB1A23FAD4B72@SN6PR…
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
v2: Add commit log
cve/published/2024/CVE-2024-36912.vulnerable | 1 +
1 file changed, 1 insertion(+)
create mode 100644 cve/published/2024/CVE-2024-36912.vulnerable
diff --git a/cve/published/2024/CVE-2024-36912.vulnerable b/cve/published/2024/CVE-2024-36912.vulnerable
new file mode 100644
index 000000000..ffedf3da8
--- /dev/null
+++ b/cve/published/2024/CVE-2024-36912.vulnerable
@@ -0,0 +1 @@
+d4dccf353db80e209f262e3973c834e6e48ba9a9
--
2.34.1