Changes since v1 [1]:
- Clarify the failing condition in patch 3 (Michal)
- Clarify how subsection collisions manifest in shipping systems
(Michal)
- Use zone_idx() (Michal)
- Move section_taint_zone_device() conditions to
move_pfn_range_to_zone() (Michal)
- Fix pfn_to_online_page() to account for pfn_valid() vs
pfn_section_valid() confusion (David)
[1]: http://lore.kernel.org/r/160990599013.2430134.11556277600719835946.stgit@dw…
---
Michal reminds that the discussion about how to ensure pfn-walkers do
not get confused by ZONE_DEVICE pages never resolved. A pfn-walker that
uses pfn_to_online_page() may inadvertently translate a pfn as online
and in the page allocator, when it is offline managed by a ZONE_DEVICE
mapping (details in Patch 3: ("mm: Teach pfn_to_online_page() about
ZONE_DEVICE section collisions")).
The 2 proposals under consideration are teach pfn_to_online_page() to be
precise in the presence of mixed-zone sections, or teach the memory-add
code to drop the System RAM associated with ZONE_DEVICE collisions. In
order to not regress memory capacity by a few 10s to 100s of MiB the
approach taken in this set is to add precision to pfn_to_online_page().
In the course of validating pfn_to_online_page() a couple other fixes
fell out:
1/ soft_offline_page() fails to drop the reference taken in the
madvise(..., MADV_SOFT_OFFLINE) case.
2/ The libnvdimm sysfs attribute visibility code was failing to publish
the resource base for memmap=ss!nn defined namespaces. This is needed
for the regression test for soft_offline_page().
---
Dan Williams (5):
mm: Move pfn_to_online_page() out of line
mm: Teach pfn_to_online_page() to consider subsection validity
mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions
mm: Fix page reference leak in soft_offline_page()
libnvdimm/namespace: Fix visibility of namespace resource attribute
drivers/nvdimm/namespace_devs.c | 10 +++---
include/linux/memory_hotplug.h | 17 +----------
include/linux/mmzone.h | 22 +++++++++-----
mm/memory-failure.c | 20 ++++++++++---
mm/memory_hotplug.c | 62 +++++++++++++++++++++++++++++++++++++++
5 files changed, 99 insertions(+), 32 deletions(-)
From: Eric Biggers <ebiggers(a)google.com>
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that i_state will contain
I_DIRTY_TIME during those.) Therefore, lazy timestamps aren't
persisted by sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Suggested-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/fs-writeback.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af23..c41cb887eb7d3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
--
2.30.0
This is a note to let you know that I've just added the patch titled
usb: gadget: aspeed: fix stop dma register setting.
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4e0dcf62ab4cf917d0cbe751b8bf229a065248d4 Mon Sep 17 00:00:00 2001
From: Ryan Chen <ryan_chen(a)aspeedtech.com>
Date: Fri, 8 Jan 2021 16:12:38 +0800
Subject: usb: gadget: aspeed: fix stop dma register setting.
The vhub engine has two dma mode, one is descriptor list, another
is single stage DMA. Each mode has different stop register setting.
Descriptor list operation (bit2) : 0 disable reset, 1: enable reset
Single mode operation (bit0) : 0 : disable, 1: enable
Fixes: 7ecca2a4080c ("usb/gadget: Add driver for Aspeed SoC virtual hub")
Cc: stable <stable(a)vger.kernel.org>
Acked-by: Felipe Balbi <balbi(a)kernel.org>
Acked-by: Joel Stanley <joel(a)jms.id.au>
Signed-off-by: Ryan Chen <ryan_chen(a)aspeedtech.com>
Link: https://lore.kernel.org/r/20210108081238.10199-2-ryan_chen@aspeedtech.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/udc/aspeed-vhub/epn.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/epn.c b/drivers/usb/gadget/udc/aspeed-vhub/epn.c
index 0bd6b20435b8..02d8bfae58fb 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/epn.c
+++ b/drivers/usb/gadget/udc/aspeed-vhub/epn.c
@@ -420,7 +420,10 @@ static void ast_vhub_stop_active_req(struct ast_vhub_ep *ep,
u32 state, reg, loops;
/* Stop DMA activity */
- writel(0, ep->epn.regs + AST_VHUB_EP_DMA_CTLSTAT);
+ if (ep->epn.desc_mode)
+ writel(VHUB_EP_DMA_CTRL_RESET, ep->epn.regs + AST_VHUB_EP_DMA_CTLSTAT);
+ else
+ writel(0, ep->epn.regs + AST_VHUB_EP_DMA_CTLSTAT);
/* Wait for it to complete */
for (loops = 0; loops < 1000; loops++) {
--
2.30.0
This is a note to let you know that I've just added the patch titled
USB: ehci: fix an interrupt calltrace error
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 643a4df7fe3f6831d14536fd692be85f92670a52 Mon Sep 17 00:00:00 2001
From: Longfang Liu <liulongfang(a)huawei.com>
Date: Tue, 12 Jan 2021 09:57:27 +0800
Subject: USB: ehci: fix an interrupt calltrace error
The system that use Synopsys USB host controllers goes to suspend
when using USB audio player. This causes the USB host controller
continuous send interrupt signal to system, When the number of
interrupts exceeds 100000, the system will forcibly close the
interrupts and output a calltrace error.
When the system goes to suspend, the last interrupt is reported to
the driver. At this time, the system has set the state to suspend.
This causes the last interrupt to not be processed by the system and
not clear the interrupt flag. This uncleared interrupt flag constantly
triggers new interrupt event. This causing the driver to receive more
than 100,000 interrupts, which causes the system to forcibly close the
interrupt report and report the calltrace error.
so, when the driver goes to sleep and changes the system state to
suspend, the interrupt flag needs to be cleared.
Signed-off-by: Longfang Liu <liulongfang(a)huawei.com>
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Link: https://lore.kernel.org/r/1610416647-45774-1-git-send-email-liulongfang@hua…
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/ehci-hub.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/host/ehci-hub.c b/drivers/usb/host/ehci-hub.c
index 087402aec5cb..9f9ab5ccea88 100644
--- a/drivers/usb/host/ehci-hub.c
+++ b/drivers/usb/host/ehci-hub.c
@@ -345,6 +345,9 @@ static int ehci_bus_suspend (struct usb_hcd *hcd)
unlink_empty_async_suspended(ehci);
+ /* Some Synopsys controllers mistakenly leave IAA turned on */
+ ehci_writel(ehci, STS_IAA, &ehci->regs->status);
+
/* Any IAA cycle that started before the suspend is now invalid */
end_iaa_cycle(ehci);
ehci_handle_start_intr_unlinks(ehci);
--
2.30.0
This is a note to let you know that I've just added the patch titled
ehci: fix EHCI host controller initialization sequence
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 280a9045bb18833db921b316a5527d2b565e9f2e Mon Sep 17 00:00:00 2001
From: Eugene Korenevsky <ekorenevsky(a)astralinux.ru>
Date: Sun, 10 Jan 2021 20:36:09 +0300
Subject: ehci: fix EHCI host controller initialization sequence
According to EHCI spec, EHCI HC clears USBSTS.HCHalted whenever
USBCMD.RS=1.
However, it is a good practice to wait some time after setting USBCMD.RS
(approximately 100ms) until USBSTS.HCHalted become zero.
Without this waiting, VirtualBox's EHCI virtual HC accidentally hangs
(see BugLink).
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211095
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Eugene Korenevsky <ekorenevsky(a)astralinux.ru>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210110173609.GA17313@himera.home
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/ehci-hcd.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index e358ae17d51e..1926b328b6aa 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -574,6 +574,7 @@ static int ehci_run (struct usb_hcd *hcd)
struct ehci_hcd *ehci = hcd_to_ehci (hcd);
u32 temp;
u32 hcc_params;
+ int rc;
hcd->uses_new_polling = 1;
@@ -629,9 +630,20 @@ static int ehci_run (struct usb_hcd *hcd)
down_write(&ehci_cf_port_reset_rwsem);
ehci->rh_state = EHCI_RH_RUNNING;
ehci_writel(ehci, FLAG_CF, &ehci->regs->configured_flag);
+
+ /* Wait until HC become operational */
ehci_readl(ehci, &ehci->regs->command); /* unblock posted writes */
msleep(5);
+ rc = ehci_handshake(ehci, &ehci->regs->status, STS_HALT, 0, 100 * 1000);
+
up_write(&ehci_cf_port_reset_rwsem);
+
+ if (rc) {
+ ehci_err(ehci, "USB %x.%x, controller refused to start: %d\n",
+ ((ehci->sbrn & 0xf0)>>4), (ehci->sbrn & 0x0f), rc);
+ return rc;
+ }
+
ehci->last_periodic_enable = ktime_get_real();
temp = HC_VERSION(ehci, ehci_readl(ehci, &ehci->caps->hc_capbase));
--
2.30.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: ipu3-cio2: Fix mbus_code processing in cio2_subdev_set_fmt()
Author: Pavel Machek <pavel(a)denx.de>
Date: Wed Dec 30 13:55:50 2020 +0100
Loop was useless as it would always exit on the first iteration. Fix
it with right condition.
Signed-off-by: Pavel Machek (CIP) <pavel(a)denx.de>
Fixes: a86cf9b29e8b ("media: ipu3-cio2: Validate mbus format in setting subdev format")
Tested-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Cc: stable(a)vger.kernel.org # v4.16 and up
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/pci/intel/ipu3/ipu3-cio2.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
---
diff --git a/drivers/media/pci/intel/ipu3/ipu3-cio2.c b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
index 36e354ecf71e..e8ea69d30bfd 100644
--- a/drivers/media/pci/intel/ipu3/ipu3-cio2.c
+++ b/drivers/media/pci/intel/ipu3/ipu3-cio2.c
@@ -1269,7 +1269,7 @@ static int cio2_subdev_set_fmt(struct v4l2_subdev *sd,
fmt->format.code = formats[0].mbus_code;
for (i = 0; i < ARRAY_SIZE(formats); i++) {
- if (formats[i].mbus_code == fmt->format.code) {
+ if (formats[i].mbus_code == mbus_code) {
fmt->format.code = mbus_code;
break;
}
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: v4l: ioctl: Fix memory leak in video_usercopy
Author: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Date: Sat Dec 19 23:29:58 2020 +0100
When an IOCTL with argument size larger than 128 that also used array
arguments were handled, two memory allocations were made but alas, only
the latter one of them was released. This happened because there was only
a single local variable to hold such a temporary allocation.
Fix this by adding separate variables to hold the pointers to the
temporary allocations.
Reported-by: Arnd Bergmann <arnd(a)kernel.org>
Reported-by: syzbot+1115e79c8df6472c612b(a)syzkaller.appspotmail.com
Fixes: d14e6d76ebf7 ("[media] v4l: Add multi-planar ioctl handling code")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sakari Ailus <sakari.ailus(a)linux.intel.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Acked-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Reviewed-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/v4l2-core/v4l2-ioctl.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)
---
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index 3198abdd538c..9906b41004e9 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -3283,7 +3283,7 @@ video_usercopy(struct file *file, unsigned int orig_cmd, unsigned long arg,
v4l2_kioctl func)
{
char sbuf[128];
- void *mbuf = NULL;
+ void *mbuf = NULL, *array_buf = NULL;
void *parg = (void *)arg;
long err = -EINVAL;
bool has_array_args;
@@ -3318,27 +3318,21 @@ video_usercopy(struct file *file, unsigned int orig_cmd, unsigned long arg,
has_array_args = err;
if (has_array_args) {
- /*
- * When adding new types of array args, make sure that the
- * parent argument to ioctl (which contains the pointer to the
- * array) fits into sbuf (so that mbuf will still remain
- * unused up to here).
- */
- mbuf = kvmalloc(array_size, GFP_KERNEL);
+ array_buf = kvmalloc(array_size, GFP_KERNEL);
err = -ENOMEM;
- if (NULL == mbuf)
+ if (array_buf == NULL)
goto out_array_args;
err = -EFAULT;
if (in_compat_syscall())
- err = v4l2_compat_get_array_args(file, mbuf, user_ptr,
- array_size, orig_cmd,
- parg);
+ err = v4l2_compat_get_array_args(file, array_buf,
+ user_ptr, array_size,
+ orig_cmd, parg);
else
- err = copy_from_user(mbuf, user_ptr, array_size) ?
+ err = copy_from_user(array_buf, user_ptr, array_size) ?
-EFAULT : 0;
if (err)
goto out_array_args;
- *kernel_ptr = mbuf;
+ *kernel_ptr = array_buf;
}
/* Handles IOCTL */
@@ -3360,12 +3354,13 @@ video_usercopy(struct file *file, unsigned int orig_cmd, unsigned long arg,
if (in_compat_syscall()) {
int put_err;
- put_err = v4l2_compat_put_array_args(file, user_ptr, mbuf,
- array_size, orig_cmd,
- parg);
+ put_err = v4l2_compat_put_array_args(file, user_ptr,
+ array_buf,
+ array_size,
+ orig_cmd, parg);
if (put_err)
err = put_err;
- } else if (copy_to_user(user_ptr, mbuf, array_size)) {
+ } else if (copy_to_user(user_ptr, array_buf, array_size)) {
err = -EFAULT;
}
goto out_array_args;
@@ -3381,6 +3376,7 @@ out_array_args:
if (video_put_user((void __user *)arg, parg, cmd, orig_cmd))
err = -EFAULT;
out:
+ kvfree(array_buf);
kvfree(mbuf);
return err;
}
This is a note to let you know that I've just added the patch titled
usb: cdns3: imx: fix can't create core device the second time issue
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 2ef02b846ee2526249a562a66d6dcb25fcbca9d8 Mon Sep 17 00:00:00 2001
From: Peter Chen <peter.chen(a)nxp.com>
Date: Thu, 10 Dec 2020 21:31:37 +0800
Subject: usb: cdns3: imx: fix can't create core device the second time issue
The cdns3 core device is populated by calling of_platform_populate,
the flag OF_POPULATED is set for core device node, if this flag
is not cleared, when calling of_platform_populate the second time
after loading parent module again, the OF code will not try to create
platform device for core device.
To fix it, it uses of_platform_depopulate to depopulate the core
device which the parent created, and the flag OF_POPULATED for
core device node will be cleared accordingly.
Cc: <stable(a)vger.kernel.org>
Fixes: 1e056efab993 ("usb: cdns3: add NXP imx8qm glue layer")
Signed-off-by: Peter Chen <peter.chen(a)nxp.com>
---
drivers/usb/cdns3/cdns3-imx.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/usb/cdns3/cdns3-imx.c b/drivers/usb/cdns3/cdns3-imx.c
index 4d3fedc86753..6b358e8be579 100644
--- a/drivers/usb/cdns3/cdns3-imx.c
+++ b/drivers/usb/cdns3/cdns3-imx.c
@@ -218,20 +218,11 @@ static int cdns_imx_probe(struct platform_device *pdev)
return ret;
}
-static int cdns_imx_remove_core(struct device *dev, void *data)
-{
- struct platform_device *pdev = to_platform_device(dev);
-
- platform_device_unregister(pdev);
-
- return 0;
-}
-
static int cdns_imx_remove(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
- device_for_each_child(dev, NULL, cdns_imx_remove_core);
+ of_platform_depopulate(dev);
platform_set_drvdata(pdev, NULL);
return 0;
--
2.30.0
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race windows is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list
when it is dissolved.
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: stable(a)vger.kernel.org
---
mm/hugetlb.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4741d60f8955..4a9011e12175 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
static int num_fault_mutexes;
struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
+static inline bool PageHugeFreed(struct page *head)
+{
+ return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, 0);
+}
+
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
+ SetPageHugeFreed(page);
}
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
+ ClearPageHugeFreed(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
return page;
@@ -1504,6 +1521,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ ClearPageHugeFreed(page);
spin_unlock(&hugetlb_lock);
}
@@ -1770,6 +1788,14 @@ int dissolve_free_huge_page(struct page *page)
int nid = page_to_nid(head);
if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
+
+ /*
+ * We should make sure that the page is already on the free list
+ * when it is dissolved.
+ */
+ if (unlikely(!PageHugeFreed(head)))
+ goto out;
+
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
--
2.11.0
From: Masahiro Yamada <masahiroy(a)kernel.org>
[ Upstream commit 0cfccb3c04934cdef42ae26042139f16e805b5f7 ]
The top-level boot_targets (uImage and uImage.*) should be phony
targets. They just let Kbuild descend into arch/arc/boot/ and create
files there.
If a file exists in the top directory with the same name, the boot
image will not be created.
You can confirm it by the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig all # vmlinux will be built
$ touch uImage.gz
$ make ARCH=arc uImage.gz
CALL scripts/atomic/check-atomics.sh
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
# arch/arc/boot/uImage.gz is not created
Specify the targets as PHONY to fix this.
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arc/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index 8f8d53f08141d..150656503c117 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -108,6 +108,7 @@ bootpImage: vmlinux
boot_targets += uImage uImage.bin uImage.gz
+PHONY += $(boot_targets)
$(boot_targets): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
--
2.27.0
From: Masahiro Yamada <masahiroy(a)kernel.org>
[ Upstream commit 0cfccb3c04934cdef42ae26042139f16e805b5f7 ]
The top-level boot_targets (uImage and uImage.*) should be phony
targets. They just let Kbuild descend into arch/arc/boot/ and create
files there.
If a file exists in the top directory with the same name, the boot
image will not be created.
You can confirm it by the following steps:
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig all # vmlinux will be built
$ touch uImage.gz
$ make ARCH=arc uImage.gz
CALL scripts/atomic/check-atomics.sh
CALL scripts/checksyscalls.sh
CHK include/generated/compile.h
# arch/arc/boot/uImage.gz is not created
Specify the targets as PHONY to fix this.
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arc/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index fd79faab78926..5dc2d73c64994 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -108,6 +108,7 @@ bootpImage: vmlinux
boot_targets += uImage uImage.bin uImage.gz
+PHONY += $(boot_targets)
$(boot_targets): vmlinux
$(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
--
2.27.0
From: Masahiro Yamada <masahiroy(a)kernel.org>
[ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ]
The deb-pkg builds for ARCH=arc fail.
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make ARCH=arc bindeb-pkg
SORTTAB vmlinux
SYSMAP System.map
MODPOST Module.symvers
make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg
sh ./scripts/package/builddeb
cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory
make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[3]: *** [Makefile:1527: intdeb-pkg] Error 2
make[2]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make: *** [Makefile:1527: bindeb-pkg] Error 2
The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as
the default image, but there is no rule to build it.
Remove the meaningless KBUILD_IMAGE assignment so it will fallback
to the default vmlinux. With this change, you can build the deb package.
I removed the 'bootpImage' target as well. At best, it provides
'make bootpImage' as an alias of 'make vmlinux', but I do not see
much sense in doing so.
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arc/Makefile | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index 2917f56f0ea43..98d31b701a97c 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -99,12 +99,6 @@ libs-y += arch/arc/lib/ $(LIBGCC)
boot := arch/arc/boot
-#default target for make without any arguments.
-KBUILD_IMAGE := $(boot)/bootpImage
-
-all: bootpImage
-bootpImage: vmlinux
-
boot_targets += uImage uImage.bin uImage.gz
$(boot_targets): vmlinux
--
2.27.0
From: Masahiro Yamada <masahiroy(a)kernel.org>
[ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ]
The deb-pkg builds for ARCH=arc fail.
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make ARCH=arc bindeb-pkg
SORTTAB vmlinux
SYSMAP System.map
MODPOST Module.symvers
make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg
sh ./scripts/package/builddeb
cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory
make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[3]: *** [Makefile:1527: intdeb-pkg] Error 2
make[2]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make: *** [Makefile:1527: bindeb-pkg] Error 2
The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as
the default image, but there is no rule to build it.
Remove the meaningless KBUILD_IMAGE assignment so it will fallback
to the default vmlinux. With this change, you can build the deb package.
I removed the 'bootpImage' target as well. At best, it provides
'make bootpImage' as an alias of 'make vmlinux', but I do not see
much sense in doing so.
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arc/Makefile | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index 16e6cc22e25cc..b07fdbdd8c836 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -91,12 +91,6 @@ libs-y += arch/arc/lib/ $(LIBGCC)
boot := arch/arc/boot
-#default target for make without any arguments.
-KBUILD_IMAGE := $(boot)/bootpImage
-
-all: bootpImage
-bootpImage: vmlinux
-
boot_targets += uImage uImage.bin uImage.gz
$(boot_targets): vmlinux
--
2.27.0
From: Masahiro Yamada <masahiroy(a)kernel.org>
[ Upstream commit 9836720911cfec25d3fbdead1c438bf87e0f2841 ]
The deb-pkg builds for ARCH=arc fail.
$ export CROSS_COMPILE=<your-arc-compiler-prefix>
$ make -s ARCH=arc defconfig
$ make ARCH=arc bindeb-pkg
SORTTAB vmlinux
SYSMAP System.map
MODPOST Module.symvers
make KERNELRELEASE=5.10.0-rc4 ARCH=arc KBUILD_BUILD_VERSION=2 -f ./Makefile intdeb-pkg
sh ./scripts/package/builddeb
cp: cannot stat 'arch/arc/boot/bootpImage': No such file or directory
make[4]: *** [scripts/Makefile.package:87: intdeb-pkg] Error 1
make[3]: *** [Makefile:1527: intdeb-pkg] Error 2
make[2]: *** [debian/rules:13: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary subprocess returned exit status 2
make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make: *** [Makefile:1527: bindeb-pkg] Error 2
The reason is obvious; arch/arc/Makefile sets $(boot)/bootpImage as
the default image, but there is no rule to build it.
Remove the meaningless KBUILD_IMAGE assignment so it will fallback
to the default vmlinux. With this change, you can build the deb package.
I removed the 'bootpImage' target as well. At best, it provides
'make bootpImage' as an alias of 'make vmlinux', but I do not see
much sense in doing so.
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Vineet Gupta <vgupta(a)synopsys.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arc/Makefile | 6 ------
1 file changed, 6 deletions(-)
diff --git a/arch/arc/Makefile b/arch/arc/Makefile
index f1c44cccf8d6c..5e5699acefef4 100644
--- a/arch/arc/Makefile
+++ b/arch/arc/Makefile
@@ -90,12 +90,6 @@ libs-y += arch/arc/lib/ $(LIBGCC)
boot := arch/arc/boot
-#default target for make without any arguments.
-KBUILD_IMAGE := $(boot)/bootpImage
-
-all: bootpImage
-bootpImage: vmlinux
-
boot_targets += uImage uImage.bin uImage.gz
$(boot_targets): vmlinux
--
2.27.0
From: Muhammed Fazal <mfazale(a)nvidia.com>
Ignore I2C_M_DMA_SAFE flag as it does not make a difference
for bpmp-i2c, but causes -EINVAL to be returned for valid
transactions.
Signed-off-by: Muhammed Fazal <mfazale(a)nvidia.com>
Cc: stable(a)vger.kernel.org # v4.19+
Signed-off-by: Mikko Perttunen <mperttunen(a)nvidia.com>
---
v2:
- Remove unnecessary check for if the bit is set
---
drivers/i2c/busses/i2c-tegra-bpmp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/i2c/busses/i2c-tegra-bpmp.c b/drivers/i2c/busses/i2c-tegra-bpmp.c
index ec7a7e917edd..aa6685cabde3 100644
--- a/drivers/i2c/busses/i2c-tegra-bpmp.c
+++ b/drivers/i2c/busses/i2c-tegra-bpmp.c
@@ -80,6 +80,8 @@ static int tegra_bpmp_xlate_flags(u16 flags, u16 *out)
flags &= ~I2C_M_RECV_LEN;
}
+ flags &= ~I2C_M_DMA_SAFE;
+
return (flags != 0) ? -EINVAL : 0;
}
--
2.30.0
If a new hugetlb page is allocated during fallocate it will not be
marked as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to
move that page. Such a failure would be unexpected and wrong.
Only export set_page_huge_active, just leave clear_page_huge_active
as static. Because there are no external users.
Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: stable(a)vger.kernel.org
---
fs/hugetlbfs/inode.c | 3 ++-
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index b5c109703daa..21c20fd5f9ee 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ set_page_huge_active(page);
/*
* unlock_page because locked by add_to_page_cache()
- * page_put due to reference from alloc_huge_page()
+ * put_page() due to reference from alloc_huge_page()
*/
unlock_page(page);
put_page(page);
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ebca2ef02212..b5807f23caf8 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma,
}
#endif
+void set_page_huge_active(struct page *page);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 1f3bf1710b66..4741d60f8955 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1348,7 +1348,7 @@ bool page_huge_active(struct page *page)
}
/* never called for tail page */
-static void set_page_huge_active(struct page *page)
+void set_page_huge_active(struct page *page)
{
VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
SetPagePrivate(&page[1]);
--
2.11.0
This is the start of the stable review cycle for the 4.4.251 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 13 Jan 2021 13:00:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.251-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.251-rc1
Ying-Tsun Huang <ying-tsun.huang(a)amd.com>
x86/mtrr: Correct the range check before performing MTRR type lookups
Florian Westphal <fw(a)strlen.de>
netfilter: xt_RATEEST: reject non-null terminated string from userspace
Vasily Averin <vvs(a)virtuozzo.com>
netfilter: ipset: fix shift-out-of-bounds in htable_bits()
Bard Liao <yung-chuan.liao(a)linux.intel.com>
Revert "device property: Keep secondary firmware node secondary by type"
bo liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: add a new hda codec CX11970
Dan Williams <dan.j.williams(a)intel.com>
x86/mm: Fix leak of pmd ptlock
Johan Hovold <johan(a)kernel.org>
USB: serial: keyspan_pda: remove unused variable
Chandana Kishori Chiluveru <cchiluve(a)codeaurora.org>
usb: gadget: configfs: Preserve function ordering after bind failure
Sriharsha Allenki <sallenki(a)codeaurora.org>
usb: gadget: Fix spinlock lockup on usb_function_deactivate
Yang Yingliang <yangyingliang(a)huawei.com>
USB: gadget: legacy: fix return error code in acm_ms_bind()
Zqiang <qiang.zhang(a)windriver.com>
usb: gadget: function: printer: Fix a memory leak for interface descriptor
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: f_uac2: reset wMaxPacketSize
Arnd Bergmann <arnd(a)arndb.de>
usb: gadget: select CONFIG_CRC32
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks
Johan Hovold <johan(a)kernel.org>
USB: usblp: fix DMA to stack
Johan Hovold <johan(a)kernel.org>
USB: yurex: fix control-URB timeout handling
Daniel Palmer <daniel(a)0x0f.com>
USB: serial: option: add LongSung M5710 module support
Johan Hovold <johan(a)kernel.org>
USB: serial: iuu_phoenix: fix DMA from stack
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: uas: Add PNY USB Portable SSD to unusual_uas
Michael Grzeschik <m.grzeschik(a)pengutronix.de>
USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set
Yu Kuai <yukuai3(a)huawei.com>
usb: chipidea: ci_hdrc_imx: add missing put_device() call in usbmisc_get_init_data()
Sean Young <sean(a)mess.org>
USB: cdc-acm: blacklist another IR Droid device
taehyun.cho <taehyun.cho(a)samsung.com>
usb: gadget: enable super speed plus
Dexuan Cui <decui(a)microsoft.com>
video: hyperv_fb: Fix the mmap() regression for v5.4.y and older
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix use-after-free in ucc_geth_remove()
Jeff Dike <jdike(a)akamai.com>
virtio_net: Fix recursive call to cpus_read_lock()
Randy Dunlap <rdunlap(a)infradead.org>
net: sched: prevent invalid Scell_log shift count
Yunjian Wang <wangyunjian(a)huawei.com>
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
Roland Dreier <roland(a)kernel.org>
CDC-NCM: remove "connected" log message
Xie He <xie.he.0141(a)gmail.com>
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
Yunjian Wang <wangyunjian(a)huawei.com>
net: hns: fix return value check in __lb_other_process()
Guillaume Nault <gnault(a)redhat.com>
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Dan Carpenter <dan.carpenter(a)oracle.com>
atm: idt77252: call pci_disable_device() on error path
Linus Torvalds <torvalds(a)linux-foundation.org>
depmod: handle the case of /sbin/depmod without /sbin in PATH
Huang Shijie <sjhuang(a)iluvatar.ai>
lib/genalloc: fix the overflow when size is too big
Yunfeng Ye <yeyunfeng(a)huawei.com>
workqueue: Kick a worker based on the actual activation of delayed works
Dominique Martinet <asmadeus(a)codewreck.org>
kbuild: don't hardcode depmod path
-------------
Diffstat:
Makefile | 6 +--
arch/x86/kernel/cpu/mtrr/generic.c | 6 +--
arch/x86/mm/pgtable.c | 2 +
drivers/atm/idt77252.c | 2 +-
drivers/base/core.c | 2 +-
drivers/net/ethernet/freescale/ucc_geth.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 4 ++
drivers/net/usb/cdc_ncm.c | 3 --
drivers/net/virtio_net.c | 12 +++--
drivers/net/wan/hdlc_ppp.c | 7 +++
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++-
drivers/usb/class/cdc-acm.c | 4 ++
drivers/usb/class/usblp.c | 21 +++++++-
drivers/usb/gadget/Kconfig | 2 +
drivers/usb/gadget/composite.c | 10 +++-
drivers/usb/gadget/configfs.c | 8 +--
drivers/usb/gadget/function/f_printer.c | 1 +
drivers/usb/gadget/function/f_uac2.c | 69 +++++++++++++++++++-----
drivers/usb/gadget/legacy/acm_ms.c | 4 +-
drivers/usb/host/xhci.c | 24 ++++-----
drivers/usb/misc/yurex.c | 3 ++
drivers/usb/serial/iuu_phoenix.c | 20 +++++--
drivers/usb/serial/keyspan_pda.c | 2 -
drivers/usb/serial/option.c | 1 +
drivers/usb/storage/unusual_uas.h | 7 +++
drivers/vhost/net.c | 6 +--
drivers/video/fbdev/hyperv_fb.c | 6 +--
include/net/red.h | 4 +-
kernel/workqueue.c | 13 +++--
lib/genalloc.c | 25 ++++-----
net/dcb/dcbnl.c | 2 +
net/ipv4/fib_frontend.c | 2 +-
net/netfilter/ipset/ip_set_hash_gen.h | 20 ++-----
net/netfilter/xt_RATEEST.c | 3 ++
net/sched/sch_choke.c | 2 +-
net/sched/sch_gred.c | 2 +-
net/sched/sch_red.c | 2 +-
net/sched/sch_sfq.c | 2 +-
scripts/depmod.sh | 2 +
sound/pci/hda/patch_conexant.c | 1 +
sound/usb/midi.c | 4 ++
41 files changed, 221 insertions(+), 103 deletions(-)
This is the start of the stable review cycle for the 4.9.251 release.
There are 45 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 13 Jan 2021 13:00:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.251-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.251-rc1
Ying-Tsun Huang <ying-tsun.huang(a)amd.com>
x86/mtrr: Correct the range check before performing MTRR type lookups
Florian Westphal <fw(a)strlen.de>
netfilter: xt_RATEEST: reject non-null terminated string from userspace
Vasily Averin <vvs(a)virtuozzo.com>
netfilter: ipset: fix shift-out-of-bounds in htable_bits()
Bard Liao <yung-chuan.liao(a)linux.intel.com>
Revert "device property: Keep secondary firmware node secondary by type"
bo liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: add a new hda codec CX11970
Dan Williams <dan.j.williams(a)intel.com>
x86/mm: Fix leak of pmd ptlock
Johan Hovold <johan(a)kernel.org>
USB: serial: keyspan_pda: remove unused variable
Eddie Hung <eddie.hung(a)mediatek.com>
usb: gadget: configfs: Fix use-after-free issue with udc_name
Chandana Kishori Chiluveru <cchiluve(a)codeaurora.org>
usb: gadget: configfs: Preserve function ordering after bind failure
Sriharsha Allenki <sallenki(a)codeaurora.org>
usb: gadget: Fix spinlock lockup on usb_function_deactivate
Yang Yingliang <yangyingliang(a)huawei.com>
USB: gadget: legacy: fix return error code in acm_ms_bind()
Zqiang <qiang.zhang(a)windriver.com>
usb: gadget: function: printer: Fix a memory leak for interface descriptor
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: f_uac2: reset wMaxPacketSize
Arnd Bergmann <arnd(a)arndb.de>
usb: gadget: select CONFIG_CRC32
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks
Johan Hovold <johan(a)kernel.org>
USB: usblp: fix DMA to stack
Johan Hovold <johan(a)kernel.org>
USB: yurex: fix control-URB timeout handling
Daniel Palmer <daniel(a)0x0f.com>
USB: serial: option: add LongSung M5710 module support
Johan Hovold <johan(a)kernel.org>
USB: serial: iuu_phoenix: fix DMA from stack
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: uas: Add PNY USB Portable SSD to unusual_uas
Michael Grzeschik <m.grzeschik(a)pengutronix.de>
USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set
Yu Kuai <yukuai3(a)huawei.com>
usb: chipidea: ci_hdrc_imx: add missing put_device() call in usbmisc_get_init_data()
Sean Young <sean(a)mess.org>
USB: cdc-acm: blacklist another IR Droid device
taehyun.cho <taehyun.cho(a)samsung.com>
usb: gadget: enable super speed plus
Dexuan Cui <decui(a)microsoft.com>
video: hyperv_fb: Fix the mmap() regression for v5.4.y and older
Du Changbin <changbin.du(a)gmail.com>
scripts/gdb: fix lx-version string output
Leonard Crestez <leonard.crestez(a)nxp.com>
scripts/gdb: lx-dmesg: use explicit encoding=utf8 errors=replace
Leonard Crestez <leonard.crestez(a)nxp.com>
scripts/gdb: lx-dmesg: cast log_buf to void* for addr fetch
André Draszik <git(a)andred.net>
scripts/gdb: make lx-dmesg command work (reliably)
Jeff Dike <jdike(a)akamai.com>
virtio_net: Fix recursive call to cpus_read_lock()
Randy Dunlap <rdunlap(a)infradead.org>
net: sched: prevent invalid Scell_log shift count
Yunjian Wang <wangyunjian(a)huawei.com>
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
Roland Dreier <roland(a)kernel.org>
CDC-NCM: remove "connected" log message
Xie He <xie.he.0141(a)gmail.com>
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
Yunjian Wang <wangyunjian(a)huawei.com>
net: hns: fix return value check in __lb_other_process()
Guillaume Nault <gnault(a)redhat.com>
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
net: ethernet: Fix memleak in ethoc_probe
John Wang <wangzhiqiang.bj(a)bytedance.com>
net/ncsi: Use real net-device for response handler
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Dan Carpenter <dan.carpenter(a)oracle.com>
atm: idt77252: call pci_disable_device() on error path
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix use-after-free in ucc_geth_remove()
Linus Torvalds <torvalds(a)linux-foundation.org>
depmod: handle the case of /sbin/depmod without /sbin in PATH
Huang Shijie <sjhuang(a)iluvatar.ai>
lib/genalloc: fix the overflow when size is too big
Yunfeng Ye <yeyunfeng(a)huawei.com>
workqueue: Kick a worker based on the actual activation of delayed works
Dominique Martinet <asmadeus(a)codewreck.org>
kbuild: don't hardcode depmod path
-------------
Diffstat:
Makefile | 6 +--
arch/x86/kernel/cpu/mtrr/generic.c | 6 +--
arch/x86/mm/pgtable.c | 2 +
drivers/atm/idt77252.c | 2 +-
drivers/base/core.c | 2 +-
drivers/net/ethernet/ethoc.c | 3 +-
drivers/net/ethernet/freescale/ucc_geth.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 4 ++
drivers/net/usb/cdc_ncm.c | 3 --
drivers/net/virtio_net.c | 12 +++--
drivers/net/wan/hdlc_ppp.c | 7 +++
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++-
drivers/usb/class/cdc-acm.c | 4 ++
drivers/usb/class/usblp.c | 21 +++++++-
drivers/usb/gadget/Kconfig | 2 +
drivers/usb/gadget/composite.c | 10 +++-
drivers/usb/gadget/configfs.c | 19 ++++---
drivers/usb/gadget/function/f_printer.c | 1 +
drivers/usb/gadget/function/f_uac2.c | 69 +++++++++++++++++++-----
drivers/usb/gadget/legacy/acm_ms.c | 4 +-
drivers/usb/host/xhci.c | 24 ++++-----
drivers/usb/misc/yurex.c | 3 ++
drivers/usb/serial/iuu_phoenix.c | 20 +++++--
drivers/usb/serial/keyspan_pda.c | 2 -
drivers/usb/serial/option.c | 1 +
drivers/usb/storage/unusual_uas.h | 7 +++
drivers/vhost/net.c | 6 +--
drivers/video/fbdev/hyperv_fb.c | 6 +--
include/net/red.h | 4 +-
kernel/workqueue.c | 13 +++--
lib/genalloc.c | 25 ++++-----
net/dcb/dcbnl.c | 2 +
net/ipv4/fib_frontend.c | 2 +-
net/ncsi/ncsi-rsp.c | 2 +-
net/netfilter/ipset/ip_set_hash_gen.h | 20 ++-----
net/netfilter/xt_RATEEST.c | 3 ++
net/sched/sch_choke.c | 2 +-
net/sched/sch_gred.c | 2 +-
net/sched/sch_red.c | 2 +-
net/sched/sch_sfq.c | 2 +-
scripts/depmod.sh | 2 +
scripts/gdb/linux/dmesg.py | 22 +++++---
scripts/gdb/linux/proc.py | 2 +-
sound/pci/hda/patch_conexant.c | 1 +
sound/usb/midi.c | 4 ++
45 files changed, 249 insertions(+), 115 deletions(-)
This is the start of the stable review cycle for the 4.14.215 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 13 Jan 2021 13:00:19 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.215-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.215-rc1
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: x86: fix shift out of bounds reported by UBSAN
Ying-Tsun Huang <ying-tsun.huang(a)amd.com>
x86/mtrr: Correct the range check before performing MTRR type lookups
Florian Westphal <fw(a)strlen.de>
netfilter: xt_RATEEST: reject non-null terminated string from userspace
Vasily Averin <vvs(a)virtuozzo.com>
netfilter: ipset: fix shift-out-of-bounds in htable_bits()
Bard Liao <yung-chuan.liao(a)linux.intel.com>
Revert "device property: Keep secondary firmware node secondary by type"
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - Fix speaker volume control on Lenovo C940
bo liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: add a new hda codec CX11970
Dan Williams <dan.j.williams(a)intel.com>
x86/mm: Fix leak of pmd ptlock
Johan Hovold <johan(a)kernel.org>
USB: serial: keyspan_pda: remove unused variable
Eddie Hung <eddie.hung(a)mediatek.com>
usb: gadget: configfs: Fix use-after-free issue with udc_name
Chandana Kishori Chiluveru <cchiluve(a)codeaurora.org>
usb: gadget: configfs: Preserve function ordering after bind failure
Sriharsha Allenki <sallenki(a)codeaurora.org>
usb: gadget: Fix spinlock lockup on usb_function_deactivate
Yang Yingliang <yangyingliang(a)huawei.com>
USB: gadget: legacy: fix return error code in acm_ms_bind()
Zqiang <qiang.zhang(a)windriver.com>
usb: gadget: function: printer: Fix a memory leak for interface descriptor
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: f_uac2: reset wMaxPacketSize
Arnd Bergmann <arnd(a)arndb.de>
usb: gadget: select CONFIG_CRC32
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Fix UBSAN warnings for MIDI jacks
Johan Hovold <johan(a)kernel.org>
USB: usblp: fix DMA to stack
Johan Hovold <johan(a)kernel.org>
USB: yurex: fix control-URB timeout handling
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Quectel EM160R-GL
Daniel Palmer <daniel(a)0x0f.com>
USB: serial: option: add LongSung M5710 module support
Johan Hovold <johan(a)kernel.org>
USB: serial: iuu_phoenix: fix DMA from stack
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: uas: Add PNY USB Portable SSD to unusual_uas
Randy Dunlap <rdunlap(a)infradead.org>
usb: usbip: vhci_hcd: protect shift size
Michael Grzeschik <m.grzeschik(a)pengutronix.de>
USB: xhci: fix U1/U2 handling for hardware with XHCI_INTEL_HOST quirk set
Yu Kuai <yukuai3(a)huawei.com>
usb: chipidea: ci_hdrc_imx: add missing put_device() call in usbmisc_get_init_data()
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
usb: dwc3: ulpi: Use VStsDone to detect PHY regs access completion
Sean Young <sean(a)mess.org>
USB: cdc-acm: blacklist another IR Droid device
taehyun.cho <taehyun.cho(a)samsung.com>
usb: gadget: enable super speed plus
Ard Biesheuvel <ardb(a)kernel.org>
crypto: ecdh - avoid buffer overflow in ecdh_set_secret()
Dexuan Cui <decui(a)microsoft.com>
video: hyperv_fb: Fix the mmap() regression for v5.4.y and older
Florian Fainelli <f.fainelli(a)gmail.com>
net: systemport: set dev->max_mtu to UMAC_MAX_MTU_SIZE
Stefan Chulski <stefanc(a)marvell.com>
net: mvpp2: Fix GoP port 3 Networking Complex Control configurations
Antoine Tenart <atenart(a)kernel.org>
net-sysfs: take the rtnl lock when accessing xps_cpus_map and num_tc
Randy Dunlap <rdunlap(a)infradead.org>
net: sched: prevent invalid Scell_log shift count
Yunjian Wang <wangyunjian(a)huawei.com>
vhost_net: fix ubuf refcount incorrectly when sendmsg fails
Bjørn Mork <bjorn(a)mork.no>
net: usb: qmi_wwan: add Quectel EM160R-GL
Roland Dreier <roland(a)kernel.org>
CDC-NCM: remove "connected" log message
Xie He <xie.he.0141(a)gmail.com>
net: hdlc_ppp: Fix issues when mod_timer is called while timer is running
Yunjian Wang <wangyunjian(a)huawei.com>
net: hns: fix return value check in __lb_other_process()
Guillaume Nault <gnault(a)redhat.com>
ipv4: Ignore ECN bits for fib lookups in fib_compute_spec_dst()
Grygorii Strashko <grygorii.strashko(a)ti.com>
net: ethernet: ti: cpts: fix ethtool output when no ptp_clock registered
Antoine Tenart <atenart(a)kernel.org>
net-sysfs: take the rtnl lock when storing xps_cpus
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
net: ethernet: Fix memleak in ethoc_probe
John Wang <wangzhiqiang.bj(a)bytedance.com>
net/ncsi: Use real net-device for response handler
Petr Machata <me(a)pmachata.org>
net: dcb: Validate netlink message in DCB handler
Jeff Dike <jdike(a)akamai.com>
virtio_net: Fix recursive call to cpus_read_lock()
Manish Chopra <manishc(a)marvell.com>
qede: fix offload for IPIP tunnel packets
Dan Carpenter <dan.carpenter(a)oracle.com>
atm: idt77252: call pci_disable_device() on error path
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: set dev->max_mtu to 1518
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
ethernet: ucc_geth: fix use-after-free in ucc_geth_remove()
Linus Torvalds <torvalds(a)linux-foundation.org>
depmod: handle the case of /sbin/depmod without /sbin in PATH
Huang Shijie <sjhuang(a)iluvatar.ai>
lib/genalloc: fix the overflow when size is too big
Bart Van Assche <bvanassche(a)acm.org>
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
Adrian Hunter <adrian.hunter(a)intel.com>
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
Yunfeng Ye <yeyunfeng(a)huawei.com>
workqueue: Kick a worker based on the actual activation of delayed works
Dominique Martinet <asmadeus(a)codewreck.org>
kbuild: don't hardcode depmod path
-------------
Diffstat:
Makefile | 6 +--
arch/x86/kernel/cpu/mtrr/generic.c | 6 +--
arch/x86/kvm/mmu.h | 2 +-
arch/x86/mm/pgtable.c | 2 +
crypto/ecdh.c | 3 +-
drivers/atm/idt77252.c | 2 +-
drivers/base/core.c | 2 +-
drivers/ide/ide-atapi.c | 1 -
drivers/ide/ide-io.c | 5 --
drivers/net/ethernet/broadcom/bcmsysport.c | 1 +
drivers/net/ethernet/ethoc.c | 3 +-
drivers/net/ethernet/freescale/ucc_geth.c | 3 +-
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 4 ++
drivers/net/ethernet/marvell/mvpp2.c | 2 +-
drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 ++
drivers/net/ethernet/ti/cpts.c | 2 +
drivers/net/usb/cdc_ncm.c | 3 --
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/virtio_net.c | 12 +++--
drivers/net/wan/hdlc_ppp.c | 7 +++
drivers/scsi/ufs/ufshcd-pci.c | 34 +++++++++++-
drivers/usb/chipidea/ci_hdrc_imx.c | 6 ++-
drivers/usb/class/cdc-acm.c | 4 ++
drivers/usb/class/usblp.c | 21 +++++++-
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/ulpi.c | 2 +-
drivers/usb/gadget/Kconfig | 2 +
drivers/usb/gadget/composite.c | 10 +++-
drivers/usb/gadget/configfs.c | 19 ++++---
drivers/usb/gadget/function/f_printer.c | 1 +
drivers/usb/gadget/function/f_uac2.c | 69 +++++++++++++++++++-----
drivers/usb/gadget/legacy/acm_ms.c | 4 +-
drivers/usb/host/xhci.c | 24 ++++-----
drivers/usb/misc/yurex.c | 3 ++
drivers/usb/serial/iuu_phoenix.c | 20 +++++--
drivers/usb/serial/keyspan_pda.c | 2 -
drivers/usb/serial/option.c | 3 ++
drivers/usb/storage/unusual_uas.h | 7 +++
drivers/usb/usbip/vhci_hcd.c | 2 +
drivers/vhost/net.c | 6 +--
drivers/video/fbdev/hyperv_fb.c | 6 +--
include/net/red.h | 4 +-
kernel/workqueue.c | 13 +++--
lib/genalloc.c | 25 ++++-----
net/core/net-sysfs.c | 29 ++++++++--
net/dcb/dcbnl.c | 2 +
net/ipv4/fib_frontend.c | 2 +-
net/ncsi/ncsi-rsp.c | 2 +-
net/netfilter/ipset/ip_set_hash_gen.h | 20 ++-----
net/netfilter/xt_RATEEST.c | 3 ++
net/sched/sch_choke.c | 2 +-
net/sched/sch_gred.c | 2 +-
net/sched/sch_red.c | 2 +-
net/sched/sch_sfq.c | 2 +-
scripts/depmod.sh | 2 +
sound/pci/hda/patch_conexant.c | 1 +
sound/pci/hda/patch_realtek.c | 6 +++
sound/usb/midi.c | 4 ++
58 files changed, 315 insertions(+), 124 deletions(-)
Upon a communication error, the interrupt handler can call
tegra_i2c_disable_packet_mode. This causes a sleeping poll to happen
unless the current transaction was marked atomic. Fix this by
making the poll happen atomically if we are in an IRQ.
This matches the behavior prior to the patch mentioned
in the Fixes tag.
Fixes: ede2299f7101 ("i2c: tegra: Support atomic transfers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikko Perttunen <mperttunen(a)nvidia.com>
---
v2:
* Use in_irq() instead of passing a flag from the ISR.
Thanks to Dmitry for the suggestion.
* Update commit message.
---
drivers/i2c/busses/i2c-tegra.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-tegra.c b/drivers/i2c/busses/i2c-tegra.c
index 6f08c0c3238d..0727383f4940 100644
--- a/drivers/i2c/busses/i2c-tegra.c
+++ b/drivers/i2c/busses/i2c-tegra.c
@@ -533,7 +533,7 @@ static int tegra_i2c_poll_register(struct tegra_i2c_dev *i2c_dev,
void __iomem *addr = i2c_dev->base + tegra_i2c_reg_addr(i2c_dev, reg);
u32 val;
- if (!i2c_dev->atomic_mode)
+ if (!i2c_dev->atomic_mode && !in_irq())
return readl_relaxed_poll_timeout(addr, val, !(val & mask),
delay_us, timeout_us);
--
2.30.0
The patch titled
Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
has been added to the -mm tree. Its filename is
mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-remove-vm_bug_on_page-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-remove-vm_bug_on_page-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
The page_huge_active() can be called from scan_movable_pages() which
do not hold a reference count to the HugeTLB page. So when we call
page_huge_active() from scan_movable_pages(), the HugeTLB page can
be freed parallel. Then we will trigger a BUG_ON which is in the
page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the
VM_BUG_ON_PAGE.
Link: https://lkml.kernel.org/r/20210110124017.86750-7-songmuchun@bytedance.com
Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active
+++ a/mm/hugetlb.c
@@ -1361,8 +1361,7 @@ struct hstate *size_to_hstate(unsigned l
*/
bool page_huge_active(struct page *page)
{
- VM_BUG_ON_PAGE(!PageHuge(page), page);
- return PageHead(page) && PagePrivate(&page[1]);
+ return PageHeadHuge(page) && PagePrivate(&page[1]);
}
/* never called for tail page */
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch
mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch
mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch
mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch
mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch
The patch titled
Subject: mm: hugetlb: fix a race between isolating and freeing page
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-race-between-iso…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-race-between-iso…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: fix a race between isolating and freeing page
There is a race between isolate_huge_page() and __free_huge_page().
CPU0: CPU1:
if (PageHuge(page))
put_page(page)
__free_huge_page(page)
spin_lock(&hugetlb_lock)
update_and_free_page(page)
set_compound_page_dtor(page,
NULL_COMPOUND_DTOR)
spin_unlock(&hugetlb_lock)
isolate_huge_page(page)
// trigger BUG_ON
VM_BUG_ON_PAGE(!PageHead(page), page)
spin_lock(&hugetlb_lock)
page_huge_active(page)
// trigger BUG_ON
VM_BUG_ON_PAGE(!PageHuge(page), page)
spin_unlock(&hugetlb_lock)
When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the
buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0. Because
it is already freed to the buddy allocator.
Link: https://lkml.kernel.org/r/20210110124017.86750-6-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-isolating-and-freeing-page
+++ a/mm/hugetlb.c
@@ -5581,9 +5581,9 @@ bool isolate_huge_page(struct page *page
{
bool ret = true;
- VM_BUG_ON_PAGE(!PageHead(page), page);
spin_lock(&hugetlb_lock);
- if (!page_huge_active(page) || !get_page_unless_zero(page)) {
+ if (!PageHeadHuge(page) || !page_huge_active(page) ||
+ !get_page_unless_zero(page)) {
ret = false;
goto unlock;
}
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch
mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch
mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch
mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch
mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch
The patch titled
Subject: mm: hugetlb: fix a race between freeing and dissolving the page
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-race-between-fre…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-race-between-fre…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlb: fix a race between freeing and dissolving the page
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race windows is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list
when it is dissolved.
Link: https://lkml.kernel.org/r/20210110124017.86750-4-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page
+++ a/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
static int num_fault_mutexes;
struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
+static inline bool PageHugeFreed(struct page *head)
+{
+ return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, 0);
+}
+
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hst
list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
+ SetPageHugeFreed(page);
}
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_no
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
+ ClearPageHugeFreed(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
return page;
@@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hs
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ ClearPageHugeFreed(page);
spin_unlock(&hugetlb_lock);
}
@@ -1771,6 +1789,14 @@ int dissolve_free_huge_page(struct page
int nid = page_to_nid(head);
if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
+
+ /*
+ * We should make sure that the page is already on the free list
+ * when it is dissolved.
+ */
+ if (unlikely(!PageHugeFreed(head)))
+ goto out;
+
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch
mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch
mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch
mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch
mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch
The patch titled
Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
has been added to the -mm tree. Its filename is
mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlbfs-fix-cannot-migrate-t…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlbfs-fix-cannot-migrate-t…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
If a new hugetlb page is allocated during fallocate it will not be marked
as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to move
that page. Such a failure would be unexpected and wrong.
Only export set_page_huge_active, just leave clear_page_huge_active
as static. Because there are no external users.
Link: https://lkml.kernel.org/r/20210110124017.86750-3-songmuchun@bytedance.com
Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/hugetlbfs/inode.c | 3 ++-
include/linux/hugetlb.h | 2 ++
mm/hugetlb.c | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page
+++ a/fs/hugetlbfs/inode.c
@@ -735,9 +735,10 @@ static long hugetlbfs_fallocate(struct f
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ set_page_huge_active(page);
/*
* unlock_page because locked by add_to_page_cache()
- * page_put due to reference from alloc_huge_page()
+ * put_page() due to reference from alloc_huge_page()
*/
unlock_page(page);
put_page(page);
--- a/include/linux/hugetlb.h~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page
+++ a/include/linux/hugetlb.h
@@ -770,6 +770,8 @@ static inline void huge_ptep_modify_prot
}
#endif
+void set_page_huge_active(struct page *page);
+
#else /* CONFIG_HUGETLB_PAGE */
struct hstate {};
--- a/mm/hugetlb.c~mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page
+++ a/mm/hugetlb.c
@@ -1349,7 +1349,7 @@ bool page_huge_active(struct page *page)
}
/* never called for tail page */
-static void set_page_huge_active(struct page *page)
+void set_page_huge_active(struct page *page)
{
VM_BUG_ON_PAGE(!PageHeadHuge(page), page);
SetPagePrivate(&page[1]);
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-hugetlbfs-fix-cannot-migrate-the-fallocated-hugetlb-page.patch
mm-hugetlb-fix-a-race-between-freeing-and-dissolving-the-page.patch
mm-hugetlb-fix-a-race-between-isolating-and-freeing-page.patch
mm-hugetlb-remove-vm_bug_on_page-from-page_huge_active.patch
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-migrate-do-not-migrate-hugetlb-page-whose-refcount-is-one.patch
mm-hugetlb-add-return-eagain-for-dissolve_free_huge_page.patch
The patch titled
Subject: mm: fix initialization of struct page for holes in memory layout
has been removed from the -mm tree. Its filename was
mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm: fix initialization of struct page for holes in memory layout
There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem() function
that iterated through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields if this page are set to
default values and it is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link in
struct page) in the same pageblock.
Interleave initialization of pages that correspond to holes with the
initialization of memory map, so that zone and node information will be
properly set on such pages.
[akpm(a)linux-foundation.org: coding style fixes]
Link: https://lkml.kernel.org/r/20201209214304.6812-3-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather
that check each PFN")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 152 +++++++++++++++++++---------------------------
1 file changed, 65 insertions(+), 87 deletions(-)
--- a/mm/page_alloc.c~mm-fix-initialization-of-struct-page-for-holes-in-memory-layout
+++ a/mm/page_alloc.c
@@ -6254,24 +6254,85 @@ static void __meminit zone_init_free_lis
}
}
-void __meminit __weak memmap_init(unsigned long size, int nid,
- unsigned long zone,
- unsigned long range_start_pfn)
+#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
+/*
+ * Only struct pages that are backed by physical memory available to the
+ * kernel are zeroed and initialized by memmap_init_zone().
+ * But, there are some struct pages that are either reserved by firmware or
+ * do not correspond to physical page frames because the actual memory bank
+ * is not a multiple of SECTION_SIZE.
+ * Fields of those struct pages may be accessed (for example page_to_pfn()
+ * on some configuration accesses page flags) so we must explicitly
+ * initialize those struct pages.
+ */
+static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn,
+ int zone, int node)
{
- unsigned long start_pfn, end_pfn;
+ unsigned long pfn;
+ u64 pgcnt = 0;
+
+ for (pfn = spfn; pfn < epfn; pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
+ pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
+ + pageblock_nr_pages - 1;
+ continue;
+ }
+ __init_single_page(pfn_to_page(pfn), pfn, zone, node);
+ __SetPageReserved(pfn_to_page(pfn));
+ pgcnt++;
+ }
+
+ return pgcnt;
+}
+#else
+static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
+ int zone, int node)
+{
+ return 0;
+}
+#endif
+
+void __init __weak memmap_init(unsigned long size, int nid,
+ unsigned long zone,
+ unsigned long range_start_pfn)
+{
+ unsigned long start_pfn, end_pfn, hole_start_pfn = 0;
unsigned long range_end_pfn = range_start_pfn + size;
+ u64 pgcnt = 0;
int i;
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn);
end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn);
+ hole_start_pfn = clamp(hole_start_pfn, range_start_pfn,
+ range_end_pfn);
if (end_pfn > start_pfn) {
size = end_pfn - start_pfn;
memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
}
+
+ if (hole_start_pfn < start_pfn)
+ pgcnt += init_unavailable_range(hole_start_pfn,
+ start_pfn, zone, nid);
+ hole_start_pfn = end_pfn;
}
+
+ /*
+ * Early sections always have a fully populated memmap for the whole
+ * section - see pfn_valid(). If the last section has holes at the
+ * end and that section is marked "online", the memmap will be
+ * considered initialized. Make sure that memmap has a well defined
+ * state.
+ */
+ if (hole_start_pfn < range_end_pfn)
+ pgcnt += init_unavailable_range(hole_start_pfn, range_end_pfn,
+ zone, nid);
+
+ if (pgcnt)
+ pr_info("%s: Zeroed struct page in unavailable ranges: %lld\n",
+ zone_names[zone], pgcnt);
}
static int zone_batchsize(struct zone *zone)
@@ -7072,88 +7133,6 @@ void __init free_area_init_memoryless_no
free_area_init_node(nid);
}
-#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
-/*
- * Initialize all valid struct pages in the range [spfn, epfn) and mark them
- * PageReserved(). Return the number of struct pages that were initialized.
- */
-static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
-{
- unsigned long pfn;
- u64 pgcnt = 0;
-
- for (pfn = spfn; pfn < epfn; pfn++) {
- if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
- pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
- + pageblock_nr_pages - 1;
- continue;
- }
- /*
- * Use a fake node/zone (0) for now. Some of these pages
- * (in memblock.reserved but not in memblock.memory) will
- * get re-initialized via reserve_bootmem_region() later.
- */
- __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
- __SetPageReserved(pfn_to_page(pfn));
- pgcnt++;
- }
-
- return pgcnt;
-}
-
-/*
- * Only struct pages that are backed by physical memory are zeroed and
- * initialized by going through __init_single_page(). But, there are some
- * struct pages which are reserved in memblock allocator and their fields
- * may be accessed (for example page_to_pfn() on some configuration accesses
- * flags). We must explicitly initialize those struct pages.
- *
- * This function also addresses a similar issue where struct pages are left
- * uninitialized because the physical address range is not covered by
- * memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=, or when the highest physical
- * address (max_pfn) does not end on a section boundary.
- */
-static void __init init_unavailable_mem(void)
-{
- phys_addr_t start, end;
- u64 i, pgcnt;
- phys_addr_t next = 0;
-
- /*
- * Loop through unavailable ranges not covered by memblock.memory.
- */
- pgcnt = 0;
- for_each_mem_range(i, &start, &end) {
- if (next < start)
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- PFN_UP(start));
- next = end;
- }
-
- /*
- * Early sections always have a fully populated memmap for the whole
- * section - see pfn_valid(). If the last section has holes at the
- * end and that section is marked "online", the memmap will be
- * considered initialized. Make sure that memmap has a well defined
- * state.
- */
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- round_up(max_pfn, PAGES_PER_SECTION));
-
- /*
- * Struct pages that do not have backing memory. This could be because
- * firmware is using some of this memory, or for some other reasons.
- */
- if (pgcnt)
- pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
-}
-#else
-static inline void __init init_unavailable_mem(void)
-{
-}
-#endif /* !CONFIG_FLAT_NODE_MEM_MAP */
-
#if MAX_NUMNODES > 1
/*
* Figure out the number of possible node ids.
@@ -7584,7 +7563,6 @@ void __init free_area_init(unsigned long
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
- init_unavailable_mem();
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid);
_
Patches currently in -mm which might be from rppt(a)linux.ibm.com are
mm-add-definition-of-pmd_page_order.patch
mmap-make-mlock_future_check-global.patch
set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch
set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix.patch
secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch
secretmem-add-memcg-accounting.patch
pm-hibernate-disable-when-there-are-active-secretmem-users.patch
arch-mm-wire-up-memfd_secret-system-call-were-relevant.patch
secretmem-test-add-basic-selftest-for-memfd_secret2.patch
The patch titled
Subject: mm: memblock: enforce overlap of memory.memblock and memory.reserved
has been removed from the -mm tree. Its filename was
mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Rapoport <rppt(a)linux.ibm.com>
Subject: mm: memblock: enforce overlap of memory.memblock and memory.reserved
Patch series "mm: fix initialization of struct page for holes in memory layout", v2.
Commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
rather that check each PFN") exposed several issues with the memory map
initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported back
in April [1]. It seemed back then that the probelm was fixed, but a few
weeks ago Andrea Arcangeli hit the same bug [2] and after a long
discussion between us [3] I think these patches are the proper fix.
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun…
This patch (of 2):
memblock does not require that the reserved memory ranges will be a subset
of memblock.memory.
As a result there may be reserved pages that are not in the range of any
zone or node because zone and node boundaries are detected based on
memblock.memory and pages that only present in memblock.reserved are not
taken into account during zone/node size detection.
Make sure that all ranges in memblock.reserved are added to
memblock.memory before calculating node and zone boundaries.
Link: https://lkml.kernel.org/r/20201209214304.6812-1-rppt@kernel.org
Link: https://lkml.kernel.org/r/20201209214304.6812-2-rppt@kernel.org
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memblock.h | 1 +
mm/memblock.c | 24 ++++++++++++++++++++++++
mm/page_alloc.c | 7 +++++++
3 files changed, 32 insertions(+)
--- a/include/linux/memblock.h~mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved
+++ a/include/linux/memblock.h
@@ -120,6 +120,7 @@ int memblock_clear_nomap(phys_addr_t bas
unsigned long memblock_free_all(void);
void reset_node_managed_pages(pg_data_t *pgdat);
void reset_all_zones_managed_pages(void);
+void memblock_enforce_memory_reserved_overlap(void);
/* Low level functions */
void __next_mem_range(u64 *idx, int nid, enum memblock_flags flags,
--- a/mm/memblock.c~mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved
+++ a/mm/memblock.c
@@ -1860,6 +1860,30 @@ void __init_memblock memblock_trim_memor
}
}
+/**
+ * memblock_enforce_memory_reserved_overlap - make sure every range in
+ * @memblock.reserved is covered by @memblock.memory
+ *
+ * The data in @memblock.memory is used to detect zone and node boundaries
+ * during initialization of the memory map and the page allocator. Make
+ * sure that every memory range present in @memblock.reserved is also added
+ * to @memblock.memory even if the architecture specific memory
+ * initialization failed to do so
+ */
+void __init memblock_enforce_memory_reserved_overlap(void)
+{
+ phys_addr_t start, end;
+ int nid;
+ u64 i;
+
+ __for_each_mem_range(i, &memblock.reserved, &memblock.memory,
+ NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, &nid) {
+ pr_warn("memblock: reserved range [%pa-%pa] is not in memory\n",
+ &start, &end);
+ memblock_add_node(start, (end - start), nid);
+ }
+}
+
void __init_memblock memblock_set_current_limit(phys_addr_t limit)
{
memblock.current_limit = limit;
--- a/mm/page_alloc.c~mm-memblock-enforce-overlap-of-memorymemblock-and-memoryreserved
+++ a/mm/page_alloc.c
@@ -7513,6 +7513,13 @@ void __init free_area_init(unsigned long
memset(arch_zone_highest_possible_pfn, 0,
sizeof(arch_zone_highest_possible_pfn));
+ /*
+ * Some architectures (e.g. x86) have reserved pages outside of
+ * memblock.memory. Make sure these pages are taken into account
+ * when detecting zone and node boundaries
+ */
+ memblock_enforce_memory_reserved_overlap();
+
start_pfn = find_min_pfn_with_active_regions();
descending = arch_has_descending_max_zone_pfns();
_
Patches currently in -mm which might be from rppt(a)linux.ibm.com are
mm-fix-initialization-of-struct-page-for-holes-in-memory-layout.patch
mm-add-definition-of-pmd_page_order.patch
mmap-make-mlock_future_check-global.patch
set_memory-allow-set_direct_map__noflush-for-multiple-pages.patch
set_memory-allow-querying-whether-set_direct_map_-is-actually-enabled.patch
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas.patch
mm-introduce-memfd_secret-system-call-to-create-secret-memory-areas-fix.patch
secretmem-use-pmd-size-pages-to-amortize-direct-map-fragmentation.patch
secretmem-add-memcg-accounting.patch
pm-hibernate-disable-when-there-are-active-secretmem-users.patch
arch-mm-wire-up-memfd_secret-system-call-were-relevant.patch
secretmem-test-add-basic-selftest-for-memfd_secret2.patch
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ae28d1aae48a1258bd09a6f707ebb4231d79a761 Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu(a)intel.com>
Date: Thu, 17 Dec 2020 14:31:18 -0800
Subject: [PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update
PQR_ASSOC MSR
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3…
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.…
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Valentin Schneider <valentin.schneider(a)arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.16082431…
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 29ffb95b25ff..1c6f8a60ac52 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,89 +525,63 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
kfree(rdtgrp);
}
-struct task_move_callback {
- struct callback_head work;
- struct rdtgroup *rdtgrp;
-};
-
-static void move_myself(struct callback_head *head)
+static void _update_task_closid_rmid(void *task)
{
- struct task_move_callback *callback;
- struct rdtgroup *rdtgrp;
-
- callback = container_of(head, struct task_move_callback, work);
- rdtgrp = callback->rdtgrp;
-
/*
- * If resource group was deleted before this task work callback
- * was invoked, then assign the task to root group and free the
- * resource group.
+ * If the task is still current on this CPU, update PQR_ASSOC MSR.
+ * Otherwise, the MSR is updated when the task is scheduled in.
*/
- if (atomic_dec_and_test(&rdtgrp->waitcount) &&
- (rdtgrp->flags & RDT_DELETED)) {
- current->closid = 0;
- current->rmid = 0;
- rdtgroup_remove(rdtgrp);
- }
-
- if (unlikely(current->flags & PF_EXITING))
- goto out;
-
- preempt_disable();
- /* update PQR_ASSOC MSR to make resource group go into effect */
- resctrl_sched_in();
- preempt_enable();
+ if (task == current)
+ resctrl_sched_in();
+}
-out:
- kfree(callback);
+static void update_task_closid_rmid(struct task_struct *t)
+{
+ if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
+ smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+ else
+ _update_task_closid_rmid(t);
}
static int __rdtgroup_move_task(struct task_struct *tsk,
struct rdtgroup *rdtgrp)
{
- struct task_move_callback *callback;
- int ret;
-
- callback = kzalloc(sizeof(*callback), GFP_KERNEL);
- if (!callback)
- return -ENOMEM;
- callback->work.func = move_myself;
- callback->rdtgrp = rdtgrp;
-
/*
- * Take a refcount, so rdtgrp cannot be freed before the
- * callback has been invoked.
+ * Set the task's closid/rmid before the PQR_ASSOC MSR can be
+ * updated by them.
+ *
+ * For ctrl_mon groups, move both closid and rmid.
+ * For monitor groups, can move the tasks only from
+ * their parent CTRL group.
*/
- atomic_inc(&rdtgrp->waitcount);
- ret = task_work_add(tsk, &callback->work, TWA_RESUME);
- if (ret) {
- /*
- * Task is exiting. Drop the refcount and free the callback.
- * No need to check the refcount as the group cannot be
- * deleted before the write function unlocks rdtgroup_mutex.
- */
- atomic_dec(&rdtgrp->waitcount);
- kfree(callback);
- rdt_last_cmd_puts("Task exited\n");
- } else {
- /*
- * For ctrl_mon groups move both closid and rmid.
- * For monitor groups, can move the tasks only from
- * their parent CTRL group.
- */
- if (rdtgrp->type == RDTCTRL_GROUP) {
- tsk->closid = rdtgrp->closid;
+
+ if (rdtgrp->type == RDTCTRL_GROUP) {
+ tsk->closid = rdtgrp->closid;
+ tsk->rmid = rdtgrp->mon.rmid;
+ } else if (rdtgrp->type == RDTMON_GROUP) {
+ if (rdtgrp->mon.parent->closid == tsk->closid) {
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
- if (rdtgrp->mon.parent->closid == tsk->closid) {
- tsk->rmid = rdtgrp->mon.rmid;
- } else {
- rdt_last_cmd_puts("Can't move task to different control group\n");
- ret = -EINVAL;
- }
+ } else {
+ rdt_last_cmd_puts("Can't move task to different control group\n");
+ return -EINVAL;
}
}
- return ret;
+
+ /*
+ * Ensure the task's closid and rmid are written before determining if
+ * the task is current that will decide if it will be interrupted.
+ */
+ barrier();
+
+ /*
+ * By now, the task's closid and rmid are set. If the task is current
+ * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
+ * group go into effect. If the task is not current, the MSR will be
+ * updated when the task is scheduled in.
+ */
+ update_task_closid_rmid(tsk);
+
+ return 0;
}
static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ae28d1aae48a1258bd09a6f707ebb4231d79a761 Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu(a)intel.com>
Date: Thu, 17 Dec 2020 14:31:18 -0800
Subject: [PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update
PQR_ASSOC MSR
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3…
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.…
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Valentin Schneider <valentin.schneider(a)arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.16082431…
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 29ffb95b25ff..1c6f8a60ac52 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,89 +525,63 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
kfree(rdtgrp);
}
-struct task_move_callback {
- struct callback_head work;
- struct rdtgroup *rdtgrp;
-};
-
-static void move_myself(struct callback_head *head)
+static void _update_task_closid_rmid(void *task)
{
- struct task_move_callback *callback;
- struct rdtgroup *rdtgrp;
-
- callback = container_of(head, struct task_move_callback, work);
- rdtgrp = callback->rdtgrp;
-
/*
- * If resource group was deleted before this task work callback
- * was invoked, then assign the task to root group and free the
- * resource group.
+ * If the task is still current on this CPU, update PQR_ASSOC MSR.
+ * Otherwise, the MSR is updated when the task is scheduled in.
*/
- if (atomic_dec_and_test(&rdtgrp->waitcount) &&
- (rdtgrp->flags & RDT_DELETED)) {
- current->closid = 0;
- current->rmid = 0;
- rdtgroup_remove(rdtgrp);
- }
-
- if (unlikely(current->flags & PF_EXITING))
- goto out;
-
- preempt_disable();
- /* update PQR_ASSOC MSR to make resource group go into effect */
- resctrl_sched_in();
- preempt_enable();
+ if (task == current)
+ resctrl_sched_in();
+}
-out:
- kfree(callback);
+static void update_task_closid_rmid(struct task_struct *t)
+{
+ if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
+ smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+ else
+ _update_task_closid_rmid(t);
}
static int __rdtgroup_move_task(struct task_struct *tsk,
struct rdtgroup *rdtgrp)
{
- struct task_move_callback *callback;
- int ret;
-
- callback = kzalloc(sizeof(*callback), GFP_KERNEL);
- if (!callback)
- return -ENOMEM;
- callback->work.func = move_myself;
- callback->rdtgrp = rdtgrp;
-
/*
- * Take a refcount, so rdtgrp cannot be freed before the
- * callback has been invoked.
+ * Set the task's closid/rmid before the PQR_ASSOC MSR can be
+ * updated by them.
+ *
+ * For ctrl_mon groups, move both closid and rmid.
+ * For monitor groups, can move the tasks only from
+ * their parent CTRL group.
*/
- atomic_inc(&rdtgrp->waitcount);
- ret = task_work_add(tsk, &callback->work, TWA_RESUME);
- if (ret) {
- /*
- * Task is exiting. Drop the refcount and free the callback.
- * No need to check the refcount as the group cannot be
- * deleted before the write function unlocks rdtgroup_mutex.
- */
- atomic_dec(&rdtgrp->waitcount);
- kfree(callback);
- rdt_last_cmd_puts("Task exited\n");
- } else {
- /*
- * For ctrl_mon groups move both closid and rmid.
- * For monitor groups, can move the tasks only from
- * their parent CTRL group.
- */
- if (rdtgrp->type == RDTCTRL_GROUP) {
- tsk->closid = rdtgrp->closid;
+
+ if (rdtgrp->type == RDTCTRL_GROUP) {
+ tsk->closid = rdtgrp->closid;
+ tsk->rmid = rdtgrp->mon.rmid;
+ } else if (rdtgrp->type == RDTMON_GROUP) {
+ if (rdtgrp->mon.parent->closid == tsk->closid) {
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
- if (rdtgrp->mon.parent->closid == tsk->closid) {
- tsk->rmid = rdtgrp->mon.rmid;
- } else {
- rdt_last_cmd_puts("Can't move task to different control group\n");
- ret = -EINVAL;
- }
+ } else {
+ rdt_last_cmd_puts("Can't move task to different control group\n");
+ return -EINVAL;
}
}
- return ret;
+
+ /*
+ * Ensure the task's closid and rmid are written before determining if
+ * the task is current that will decide if it will be interrupted.
+ */
+ barrier();
+
+ /*
+ * By now, the task's closid and rmid are set. If the task is current
+ * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
+ * group go into effect. If the task is not current, the MSR will be
+ * updated when the task is scheduled in.
+ */
+ update_task_closid_rmid(tsk);
+
+ return 0;
}
static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ae28d1aae48a1258bd09a6f707ebb4231d79a761 Mon Sep 17 00:00:00 2001
From: Fenghua Yu <fenghua.yu(a)intel.com>
Date: Thu, 17 Dec 2020 14:31:18 -0800
Subject: [PATCH] x86/resctrl: Use an IPI instead of task_work_add() to update
PQR_ASSOC MSR
Currently, when moving a task to a resource group the PQR_ASSOC MSR is
updated with the new closid and rmid in an added task callback. If the
task is running, the work is run as soon as possible. If the task is not
running, the work is executed later in the kernel exit path when the
kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when
the task is scheduled in) and causing system resource waste with the way
in which it is implemented: Work to update the PQR_ASSOC register is
queued every time the user writes a task id to the "tasks" file, even if
the task already belongs to the resource group.
This could result in multiple pending work items associated with a
single task even if they are all identical and even though only a single
update with most recent values is needed. Specifically, even if a task
is moved between different resource groups while it is sleeping then it
is only the last move that is relevant but yet a work item is queued
during each move.
This unnecessary queueing of work items could result in significant
system resource waste, especially on tasks sleeping for a long time.
For example, as demonstrated by Shakeel Butt in [1] writing the same
task id to the "tasks" file can quickly consume significant memory. The
same problem (wasted system resources) occurs when moving a task between
different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue
with the way in which the queueing of work is done in that the task_struct
update is currently done after the work is queued, resulting in a race with
the register update possibly done before the data needed by the update is
available.
To solve these issues, update the PQR_ASSOC MSR in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing
is done since the PQR_ASSOC MSR will be updated next time the task is
scheduled. This is the same way used to update the register when tasks
are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3…
[2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.…
[ bp: Massage commit message and drop the two update_task_closid_rmid()
variants. ]
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Reported-by: Valentin Schneider <valentin.schneider(a)arm.com>
Signed-off-by: Fenghua Yu <fenghua.yu(a)intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: James Morse <james.morse(a)arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider(a)arm.com>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/17aa2fb38fc12ce7bb710106b3e7c7b45acb9e94.16082431…
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 29ffb95b25ff..1c6f8a60ac52 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,89 +525,63 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
kfree(rdtgrp);
}
-struct task_move_callback {
- struct callback_head work;
- struct rdtgroup *rdtgrp;
-};
-
-static void move_myself(struct callback_head *head)
+static void _update_task_closid_rmid(void *task)
{
- struct task_move_callback *callback;
- struct rdtgroup *rdtgrp;
-
- callback = container_of(head, struct task_move_callback, work);
- rdtgrp = callback->rdtgrp;
-
/*
- * If resource group was deleted before this task work callback
- * was invoked, then assign the task to root group and free the
- * resource group.
+ * If the task is still current on this CPU, update PQR_ASSOC MSR.
+ * Otherwise, the MSR is updated when the task is scheduled in.
*/
- if (atomic_dec_and_test(&rdtgrp->waitcount) &&
- (rdtgrp->flags & RDT_DELETED)) {
- current->closid = 0;
- current->rmid = 0;
- rdtgroup_remove(rdtgrp);
- }
-
- if (unlikely(current->flags & PF_EXITING))
- goto out;
-
- preempt_disable();
- /* update PQR_ASSOC MSR to make resource group go into effect */
- resctrl_sched_in();
- preempt_enable();
+ if (task == current)
+ resctrl_sched_in();
+}
-out:
- kfree(callback);
+static void update_task_closid_rmid(struct task_struct *t)
+{
+ if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
+ smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
+ else
+ _update_task_closid_rmid(t);
}
static int __rdtgroup_move_task(struct task_struct *tsk,
struct rdtgroup *rdtgrp)
{
- struct task_move_callback *callback;
- int ret;
-
- callback = kzalloc(sizeof(*callback), GFP_KERNEL);
- if (!callback)
- return -ENOMEM;
- callback->work.func = move_myself;
- callback->rdtgrp = rdtgrp;
-
/*
- * Take a refcount, so rdtgrp cannot be freed before the
- * callback has been invoked.
+ * Set the task's closid/rmid before the PQR_ASSOC MSR can be
+ * updated by them.
+ *
+ * For ctrl_mon groups, move both closid and rmid.
+ * For monitor groups, can move the tasks only from
+ * their parent CTRL group.
*/
- atomic_inc(&rdtgrp->waitcount);
- ret = task_work_add(tsk, &callback->work, TWA_RESUME);
- if (ret) {
- /*
- * Task is exiting. Drop the refcount and free the callback.
- * No need to check the refcount as the group cannot be
- * deleted before the write function unlocks rdtgroup_mutex.
- */
- atomic_dec(&rdtgrp->waitcount);
- kfree(callback);
- rdt_last_cmd_puts("Task exited\n");
- } else {
- /*
- * For ctrl_mon groups move both closid and rmid.
- * For monitor groups, can move the tasks only from
- * their parent CTRL group.
- */
- if (rdtgrp->type == RDTCTRL_GROUP) {
- tsk->closid = rdtgrp->closid;
+
+ if (rdtgrp->type == RDTCTRL_GROUP) {
+ tsk->closid = rdtgrp->closid;
+ tsk->rmid = rdtgrp->mon.rmid;
+ } else if (rdtgrp->type == RDTMON_GROUP) {
+ if (rdtgrp->mon.parent->closid == tsk->closid) {
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
- if (rdtgrp->mon.parent->closid == tsk->closid) {
- tsk->rmid = rdtgrp->mon.rmid;
- } else {
- rdt_last_cmd_puts("Can't move task to different control group\n");
- ret = -EINVAL;
- }
+ } else {
+ rdt_last_cmd_puts("Can't move task to different control group\n");
+ return -EINVAL;
}
}
- return ret;
+
+ /*
+ * Ensure the task's closid and rmid are written before determining if
+ * the task is current that will decide if it will be interrupted.
+ */
+ barrier();
+
+ /*
+ * By now, the task's closid and rmid are set. If the task is current
+ * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
+ * group go into effect. If the task is not current, the MSR will be
+ * updated when the task is scheduled in.
+ */
+ update_task_closid_rmid(tsk);
+
+ return 0;
}
static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5626308bb94d9f930aa5f7c77327df4c6daa7759 Mon Sep 17 00:00:00 2001
From: Lukas Wunner <lukas(a)wunner.de>
Date: Mon, 7 Dec 2020 09:17:05 +0100
Subject: [PATCH] spi: pxa2xx: Fix use-after-free on unbind
pxa2xx_spi_remove() accesses the driver's private data after calling
spi_unregister_controller() even though that function releases the last
reference on the spi_controller and thereby frees the private data.
Fix by switching over to the new devm_spi_alloc_master/slave() helper
which keeps the private data accessible until the driver has unbound.
Fixes: 32e5b57232c0 ("spi: pxa2xx: Fix controller unregister order")
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Cc: <stable(a)vger.kernel.org> # v2.6.17+: 5e844cc37a5c: spi: Introduce device-managed SPI controller allocation
Cc: <stable(a)vger.kernel.org> # v2.6.17+: 32e5b57232c0: spi: pxa2xx: Fix controller unregister order
Cc: <stable(a)vger.kernel.org> # v2.6.17+
Link: https://lore.kernel.org/r/5764b04d4a6e43069ebb7808f64c2f774ac6f193.16072868…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index 62a0f0f86553..bd2354fd438d 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -1691,9 +1691,9 @@ static int pxa2xx_spi_probe(struct platform_device *pdev)
}
if (platform_info->is_slave)
- controller = spi_alloc_slave(dev, sizeof(struct driver_data));
+ controller = devm_spi_alloc_slave(dev, sizeof(*drv_data));
else
- controller = spi_alloc_master(dev, sizeof(struct driver_data));
+ controller = devm_spi_alloc_master(dev, sizeof(*drv_data));
if (!controller) {
dev_err(&pdev->dev, "cannot alloc spi_controller\n");
@@ -1916,7 +1916,6 @@ static int pxa2xx_spi_probe(struct platform_device *pdev)
free_irq(ssp->irq, drv_data);
out_error_controller_alloc:
- spi_controller_put(controller);
pxa_ssp_free(ssp);
return status;
}
We don't actually care about the value, since the kernel will panic
before that; but a value should nonetheless be returned, otherwise the
compiler will complain.
Fixes: 8112c4f140fa ("seccomp: remove 2-phase API")
Cc: stable(a)vger.kernel.org # 4.7+
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
kernel/seccomp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 952dc1c90229..63b40d12896b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -1284,6 +1284,8 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
const bool recheck_after_trace)
{
BUG();
+
+ return -1;
}
#endif
--
2.29.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f9bce7a22a3f8ac9d885c9d75bc45569f24ac8b Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo(a)kernel.org>
Date: Sat, 14 Nov 2020 19:39:05 +0100
Subject: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
If we are using edge IRQs, new samples can arrive while processing
current interrupt since there are no hw guarantees the irq line
stays "low" long enough to properly detect the new interrupt.
In this case the new sample will be missed.
Polling FIFO status register in st_lsm6dsx_handler_thread routine
allow us to read new samples even if the interrupt arrives while
processing previous data and the timeslot where the line is "low"
is too short to be properly detected.
Fixes: 89ca88a7cdf2 ("iio: imu: st_lsm6dsx: support active-low interrupts")
Fixes: 290a6ce11d93 ("iio: imu: add support to lsm6dsx driver")
Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org>
Link: https://lore.kernel.org/r/5e93cda7dc1e665f5685c53ad8e9ea71dbae782d.16053788…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
index 467214e2e77c..7cedaab096a7 100644
--- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
+++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
@@ -2069,19 +2069,35 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
{
struct st_lsm6dsx_hw *hw = private;
+ int fifo_len = 0, len;
bool event;
- int count;
event = st_lsm6dsx_report_motion_event(hw);
if (!hw->settings->fifo_ops.read_fifo)
return event ? IRQ_HANDLED : IRQ_NONE;
- mutex_lock(&hw->fifo_lock);
- count = hw->settings->fifo_ops.read_fifo(hw);
- mutex_unlock(&hw->fifo_lock);
+ /*
+ * If we are using edge IRQs, new samples can arrive while
+ * processing current interrupt since there are no hw
+ * guarantees the irq line stays "low" long enough to properly
+ * detect the new interrupt. In this case the new sample will
+ * be missed.
+ * Polling FIFO status register allow us to read new
+ * samples even if the interrupt arrives while processing
+ * previous data and the timeslot where the line is "low" is
+ * too short to be properly detected.
+ */
+ do {
+ mutex_lock(&hw->fifo_lock);
+ len = hw->settings->fifo_ops.read_fifo(hw);
+ mutex_unlock(&hw->fifo_lock);
+
+ if (len > 0)
+ fifo_len += len;
+ } while (len > 0);
- return count || event ? IRQ_HANDLED : IRQ_NONE;
+ return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
}
static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f9bce7a22a3f8ac9d885c9d75bc45569f24ac8b Mon Sep 17 00:00:00 2001
From: Lorenzo Bianconi <lorenzo(a)kernel.org>
Date: Sat, 14 Nov 2020 19:39:05 +0100
Subject: [PATCH] iio: imu: st_lsm6dsx: fix edge-trigger interrupts
If we are using edge IRQs, new samples can arrive while processing
current interrupt since there are no hw guarantees the irq line
stays "low" long enough to properly detect the new interrupt.
In this case the new sample will be missed.
Polling FIFO status register in st_lsm6dsx_handler_thread routine
allow us to read new samples even if the interrupt arrives while
processing previous data and the timeslot where the line is "low"
is too short to be properly detected.
Fixes: 89ca88a7cdf2 ("iio: imu: st_lsm6dsx: support active-low interrupts")
Fixes: 290a6ce11d93 ("iio: imu: add support to lsm6dsx driver")
Signed-off-by: Lorenzo Bianconi <lorenzo(a)kernel.org>
Link: https://lore.kernel.org/r/5e93cda7dc1e665f5685c53ad8e9ea71dbae782d.16053788…
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
index 467214e2e77c..7cedaab096a7 100644
--- a/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
+++ b/drivers/iio/imu/st_lsm6dsx/st_lsm6dsx_core.c
@@ -2069,19 +2069,35 @@ st_lsm6dsx_report_motion_event(struct st_lsm6dsx_hw *hw)
static irqreturn_t st_lsm6dsx_handler_thread(int irq, void *private)
{
struct st_lsm6dsx_hw *hw = private;
+ int fifo_len = 0, len;
bool event;
- int count;
event = st_lsm6dsx_report_motion_event(hw);
if (!hw->settings->fifo_ops.read_fifo)
return event ? IRQ_HANDLED : IRQ_NONE;
- mutex_lock(&hw->fifo_lock);
- count = hw->settings->fifo_ops.read_fifo(hw);
- mutex_unlock(&hw->fifo_lock);
+ /*
+ * If we are using edge IRQs, new samples can arrive while
+ * processing current interrupt since there are no hw
+ * guarantees the irq line stays "low" long enough to properly
+ * detect the new interrupt. In this case the new sample will
+ * be missed.
+ * Polling FIFO status register allow us to read new
+ * samples even if the interrupt arrives while processing
+ * previous data and the timeslot where the line is "low" is
+ * too short to be properly detected.
+ */
+ do {
+ mutex_lock(&hw->fifo_lock);
+ len = hw->settings->fifo_ops.read_fifo(hw);
+ mutex_unlock(&hw->fifo_lock);
+
+ if (len > 0)
+ fifo_len += len;
+ } while (len > 0);
- return count || event ? IRQ_HANDLED : IRQ_NONE;
+ return fifo_len || event ? IRQ_HANDLED : IRQ_NONE;
}
static int st_lsm6dsx_irq_setup(struct st_lsm6dsx_hw *hw)
The pch_get_backlight(), lpt_get_backlight(), and lpt_set_backlight()
functions operate directly on the hardware registers. If inverting the
value is needed, using intel_panel_compute_brightness(), it should only
be done in the interface between hardware registers and
panel->backlight.level.
The CPU mode takeover code added in commit 5b1ec9ac7ab5
("drm/i915/backlight: Fix backlight takeover on LPT, v3.") reads the
hardware register and converts to panel->backlight.level correctly,
however the value written back should remain in the hardware register
"domain".
This hasn't been an issue, because GM45 machines are the only known
users of i915.invert_brightness and the brightness invert quirk, and
without one of them no conversion is made. It's likely nobody's ever hit
the problem.
Fixes: 5b1ec9ac7ab5 ("drm/i915/backlight: Fix backlight takeover on LPT, v3.")
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Lyude Paul <lyude(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # v5.1+
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_panel.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_panel.c b/drivers/gpu/drm/i915/display/intel_panel.c
index 67f81ae995c4..7a4239d1c241 100644
--- a/drivers/gpu/drm/i915/display/intel_panel.c
+++ b/drivers/gpu/drm/i915/display/intel_panel.c
@@ -1649,16 +1649,13 @@ static int lpt_setup_backlight(struct intel_connector *connector, enum pipe unus
val = pch_get_backlight(connector);
else
val = lpt_get_backlight(connector);
- val = intel_panel_compute_brightness(connector, val);
- panel->backlight.level = clamp(val, panel->backlight.min,
- panel->backlight.max);
if (cpu_mode) {
drm_dbg_kms(&dev_priv->drm,
"CPU backlight register was enabled, switching to PCH override\n");
/* Write converted CPU PWM value to PCH override register */
- lpt_set_backlight(connector->base.state, panel->backlight.level);
+ lpt_set_backlight(connector->base.state, val);
intel_de_write(dev_priv, BLC_PWM_PCH_CTL1,
pch_ctl1 | BLM_PCH_OVERRIDE_ENABLE);
@@ -1666,6 +1663,10 @@ static int lpt_setup_backlight(struct intel_connector *connector, enum pipe unus
cpu_ctl2 & ~BLM_PWM_ENABLE);
}
+ val = intel_panel_compute_brightness(connector, val);
+ panel->backlight.level = clamp(val, panel->backlight.min,
+ panel->backlight.max);
+
return 0;
}
--
2.20.1
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
Commit 73a6e474cb37 ("mm: memmap_init: iterate over
memblock regions rather that check each PFN") exposed several issues with
the memory map initialization and these patches fix those issues.
Initially there were crashes during compaction that Qian Cai reported back
in April [1]. It seemed back then that the probelm was fixed, but a few
weeks ago Andrea Arcangeli hit the same bug [2] and after a long discussion
between us [3] I think these patches are the proper fix.
v2 changes:
* added patch that adds all regions in memblock.reserved that do not
overlap with memblock.memory to memblock.memory in the beginning of
free_area_init()
[1] https://lore.kernel.org/lkml/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
[2] https://lore.kernel.org/lkml/20201121194506.13464-1-aarcange@redhat.com
[3] https://lore.kernel.org/mm-commits/20201206005401.qKuAVgOXr%akpm@linux-foun…
Mike Rapoport (2):
mm: memblock: enforce overlap of memory.memblock and memory.reserved
mm: fix initialization of struct page for holes in memory layout
include/linux/memblock.h | 1 +
mm/memblock.c | 24 ++++++
mm/page_alloc.c | 159 ++++++++++++++++++---------------------
3 files changed, 97 insertions(+), 87 deletions(-)
--
2.28.0
This patch updates dma_direct_unmap_sg() to mark each scatter/gather
entry invalid, after it's unmapped. This fixes two issues:
1. It makes the unmapping code able to tolerate a double unmap.
2. It prevents the NVMe driver from erroneously treating an unmapped DMA
address as mapped.
The bug that motivated this patch was the following sequence, which
occurred within the NVMe driver, with the kernel flag `swiotlb=force`.
* NVMe driver calls dma_direct_map_sg()
* dma_direct_map_sg() fails part way through the scatter gather/list
* dma_direct_map_sg() calls dma_direct_unmap_sg() to unmap any entries
succeeded.
* NVMe driver calls dma_direct_unmap_sg(), redundantly, leading to a
double unmap, which is a bug.
With this patch, a hadoop workload running on a cluster of three AMD
SEV VMs, is able to succeed. Without the patch, the hadoop workload
suffers application-level and even VM-level failures.
Tested-by: Jianxiong Gao <jxgao(a)google.com>
Tested-by: Marc Orr <marcorr(a)google.com>
Reviewed-by: Jianxiong Gao <jxgao(a)google.com>
Signed-off-by: Marc Orr <marcorr(a)google.com>
---
kernel/dma/direct.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 0a4881e59aa7..3d9b17fe5771 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -374,9 +374,11 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
struct scatterlist *sg;
int i;
- for_each_sg(sgl, sg, nents, i)
+ for_each_sg(sgl, sg, nents, i) {
dma_direct_unmap_page(dev, sg->dma_address, sg_dma_len(sg), dir,
attrs);
+ sg->dma_address = DMA_MAPPING_ERROR;
+ }
}
EXPORT_SYMBOL(dma_direct_unmap_sg);
#endif
--
2.30.0.284.gd98b1dd5eaa7-goog
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 98bf2d3f4970179c702ef64db658e0553bc6ef3a Mon Sep 17 00:00:00 2001
From: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Date: Tue, 22 Dec 2020 07:11:18 +0000
Subject: [PATCH] powerpc/32s: Fix RTAS machine check with VMAP stack
When we have VMAP stack, exception prolog 1 sets r1, not r11.
When it is not an RTAS machine check, don't trash r1 because it is
needed by prolog 1.
Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU")
Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
[mpe: Squash in fixup for RTAS machine check from Christophe]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/bc77d61d1c18940e456a2dee464f1e2eda65a3f0.16086210…
diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 349bf3f0c3af..858fbc8b19f3 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -260,10 +260,19 @@ __secondary_hold_acknowledge:
MachineCheck:
EXCEPTION_PROLOG_0
#ifdef CONFIG_PPC_CHRP
+#ifdef CONFIG_VMAP_STACK
+ mtspr SPRN_SPRG_SCRATCH2,r1
+ mfspr r1, SPRN_SPRG_THREAD
+ lwz r1, RTAS_SP(r1)
+ cmpwi cr1, r1, 0
+ bne cr1, 7f
+ mfspr r1, SPRN_SPRG_SCRATCH2
+#else
mfspr r11, SPRN_SPRG_THREAD
lwz r11, RTAS_SP(r11)
cmpwi cr1, r11, 0
bne cr1, 7f
+#endif
#endif /* CONFIG_PPC_CHRP */
EXCEPTION_PROLOG_1 for_rtas=1
7: EXCEPTION_PROLOG_2
From: Muhammed Fazal <mfazale(a)nvidia.com>
Ignore I2C_M_DMA_SAFE flag as it does not make a difference
for bpmp-i2c, but causes -EINVAL to be returned for valid
transactions.
Signed-off-by: Muhammed Fazal <mfazale(a)nvidia.com>
Cc: stable(a)vger.kernel.org # v4.19+
Signed-off-by: Mikko Perttunen <mperttunen(a)nvidia.com>
---
This fixes failures seen with PMIC probing tools on
Tegra186+ boards.
drivers/i2c/busses/i2c-tegra-bpmp.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/i2c/busses/i2c-tegra-bpmp.c b/drivers/i2c/busses/i2c-tegra-bpmp.c
index ec7a7e917edd..998d4b21fb59 100644
--- a/drivers/i2c/busses/i2c-tegra-bpmp.c
+++ b/drivers/i2c/busses/i2c-tegra-bpmp.c
@@ -80,6 +80,9 @@ static int tegra_bpmp_xlate_flags(u16 flags, u16 *out)
flags &= ~I2C_M_RECV_LEN;
}
+ if (flags & I2C_M_DMA_SAFE)
+ flags &= ~I2C_M_DMA_SAFE;
+
return (flags != 0) ? -EINVAL : 0;
}
--
2.30.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5efc1f4b454c6179d35e7b0c3eda0ad5763a00fc Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Tue, 5 Jan 2021 11:42:08 -0500
Subject: [PATCH] Revert "drm/amd/display: Fix memory leaks in S3 resume"
This reverts commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362.
This leads to blank screens on some boards after replugging a
display. Revert until we understand the root cause and can
fix both the leak and the blank screen after replug.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211033
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1427
Cc: Stylon Wang <stylon.wang(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Cc: Andre Tomt <andre(a)tomt.net>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 3fb6baf9b0ba..146486071d01 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2386,8 +2386,7 @@ void amdgpu_dm_update_connector_after_detect(
drm_connector_update_edid_property(connector,
aconnector->edid);
- aconnector->num_modes = drm_add_edid_modes(connector, aconnector->edid);
- drm_connector_list_update(connector);
+ drm_add_edid_modes(connector, aconnector->edid);
if (aconnector->dc_link->aux_mode)
drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,
From: Eric Biggers <ebiggers(a)google.com>
When lazytime is enabled and an inode is being written due to its
in-memory updated timestamps having expired, either due to a sync() or
syncfs() system call or due to dirtytime_expire_interval having elapsed,
the VFS needs to inform the filesystem so that the filesystem can copy
the inode's timestamps out to the on-disk data structures.
This is done by __writeback_single_inode() calling
mark_inode_dirty_sync(), which then calls ->dirty_inode(I_DIRTY_SYNC).
However, this occurs after __writeback_single_inode() has already
cleared the dirty flags from ->i_state. This causes two bugs:
- mark_inode_dirty_sync() redirties the inode, causing it to remain
dirty. This wastefully causes the inode to be written twice. But
more importantly, it breaks cases where sync_filesystem() is expected
to clean dirty inodes. This includes the FS_IOC_REMOVE_ENCRYPTION_KEY
ioctl (as reported at
https://lore.kernel.org/r/20200306004555.GB225345@gmail.com), as well
as possibly filesystem freezing (freeze_super()).
- Since ->i_state doesn't contain I_DIRTY_TIME when ->dirty_inode() is
called from __writeback_single_inode() for lazytime expiration,
xfs_fs_dirty_inode() ignores the notification. (XFS only cares about
lazytime expirations, and it assumes that I_DIRTY_TIME will contain
i_state during those.) Therefore, lazy timestamps aren't persisted by
sync(), syncfs(), or dirtytime_expire_interval on XFS.
Fix this by moving the call to mark_inode_dirty_sync() to earlier in
__writeback_single_inode(), before the dirty flags are cleared from
i_state. This makes filesystems be properly notified of the timestamp
expiration, and it avoids incorrectly redirtying the inode.
This fixes xfstest generic/580 (which tests
FS_IOC_REMOVE_ENCRYPTION_KEY) when run on ext4 or f2fs with lazytime
enabled. It also fixes the new lazytime xfstest I've proposed, which
reproduces the above-mentioned XFS bug
(https://lore.kernel.org/r/20210105005818.92978-1-ebiggers@kernel.org).
Alternatively, we could call ->dirty_inode(I_DIRTY_SYNC) directly. But
due to the introduction of I_SYNC_QUEUED, mark_inode_dirty_sync() is the
right thing to do because mark_inode_dirty_sync() now knows not to move
the inode to a writeback list if it is currently queued for sync.
Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
Cc: stable(a)vger.kernel.org
Depends-on: 5afced3bf281 ("writeback: Avoid skipping inode writeback")
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/fs-writeback.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index acfb55834af23..c41cb887eb7d3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1474,21 +1474,25 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
}
/*
- * Some filesystems may redirty the inode during the writeback
- * due to delalloc, clear dirty metadata flags right before
- * write_inode()
+ * If the inode has dirty timestamps and we need to write them, call
+ * mark_inode_dirty_sync() to notify the filesystem about it and to
+ * change I_DIRTY_TIME into I_DIRTY_SYNC.
*/
- spin_lock(&inode->i_lock);
-
- dirty = inode->i_state & I_DIRTY;
if ((inode->i_state & I_DIRTY_TIME) &&
- ((dirty & I_DIRTY_INODE) ||
- wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
+ (wbc->sync_mode == WB_SYNC_ALL || wbc->for_sync ||
time_after(jiffies, inode->dirtied_time_when +
dirtytime_expire_interval * HZ))) {
- dirty |= I_DIRTY_TIME;
trace_writeback_lazytime(inode);
+ mark_inode_dirty_sync(inode);
}
+
+ /*
+ * Some filesystems may redirty the inode during the writeback
+ * due to delalloc, clear dirty metadata flags right before
+ * write_inode()
+ */
+ spin_lock(&inode->i_lock);
+ dirty = inode->i_state & I_DIRTY;
inode->i_state &= ~dirty;
/*
@@ -1509,8 +1513,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
spin_unlock(&inode->i_lock);
- if (dirty & I_DIRTY_TIME)
- mark_inode_dirty_sync(inode);
/* Don't write the inode if only I_DIRTY_PAGES was set */
if (dirty & ~I_DIRTY_PAGES) {
int err = write_inode(inode, wbc);
--
2.30.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5efc1f4b454c6179d35e7b0c3eda0ad5763a00fc Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Tue, 5 Jan 2021 11:42:08 -0500
Subject: [PATCH] Revert "drm/amd/display: Fix memory leaks in S3 resume"
This reverts commit a135a1b4c4db1f3b8cbed9676a40ede39feb3362.
This leads to blank screens on some boards after replugging a
display. Revert until we understand the root cause and can
fix both the leak and the blank screen after replug.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=211033
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1427
Cc: Stylon Wang <stylon.wang(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Cc: Andre Tomt <andre(a)tomt.net>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 3fb6baf9b0ba..146486071d01 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2386,8 +2386,7 @@ void amdgpu_dm_update_connector_after_detect(
drm_connector_update_edid_property(connector,
aconnector->edid);
- aconnector->num_modes = drm_add_edid_modes(connector, aconnector->edid);
- drm_connector_list_update(connector);
+ drm_add_edid_modes(connector, aconnector->edid);
if (aconnector->dc_link->aux_mode)
drm_dp_cec_set_edid(&aconnector->dm_dp_aux.aux,