The patch titled
Subject: mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
has been added to the -mm tree. Its filename is
mm-hugetlb-filter-out-hugetlb-pages-if-hugepage-migration-is-not-supported.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-filter-out-hugetlb-page…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-filter-out-hugetlb-page…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
When scanning for movable pages, filter out Hugetlb pages if hugepage
migration is not supported. Without this we hit infinte loop in
__offline_pages() where we do
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
Fix this by checking hugepage_migration_supported both in
has_unmovable_pages which is the primary backoff mechanism for page
offlining and for consistency reasons also into scan_movable_pages because
it doesn't make any sense to return a pfn to non-migrateable huge page.
This issue was revealed by, but not caused by 72b39cfc4d75 ("mm,
memory_hotplug: do not fail offlining too early").
Link: http://lkml.kernel.org/r/20180824063314.21981-1-aneesh.kumar@linux.ibm.com
Fixes: 72b39cfc4d75 ("mm, memory_hotplug: do not fail offlining too early")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Haren Myneni <haren(a)linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory_hotplug.c | 3 ++-
mm/page_alloc.c | 4 ++++
2 files changed, 6 insertions(+), 1 deletion(-)
--- a/mm/memory_hotplug.c~mm-hugetlb-filter-out-hugetlb-pages-if-hugepage-migration-is-not-supported
+++ a/mm/memory_hotplug.c
@@ -1333,7 +1333,8 @@ static unsigned long scan_movable_pages(
if (__PageMovable(page))
return pfn;
if (PageHuge(page)) {
- if (page_huge_active(page))
+ if (hugepage_migration_supported(page_hstate(page)) &&
+ page_huge_active(page))
return pfn;
else
pfn = round_up(pfn + 1,
--- a/mm/page_alloc.c~mm-hugetlb-filter-out-hugetlb-pages-if-hugepage-migration-is-not-supported
+++ a/mm/page_alloc.c
@@ -7709,6 +7709,10 @@ bool has_unmovable_pages(struct zone *zo
* handle each tail page individually in migration.
*/
if (PageHuge(page)) {
+
+ if (!hugepage_migration_supported(page_hstate(page)))
+ goto unmovable;
+
iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
continue;
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-hugetlb-filter-out-hugetlb-pages-if-hugepage-migration-is-not-supported.patch
Hallo,
4.4.147 is the last kernel of the 4.4.x series that boots for me. All
subsequent versions panic on boot. How do I report this bug?
If I'm supposed to use https://bugzilla.kernel.org/ I don't know what
to fill into the fields. I don't even know if the longterm kernel falls
under "Mainline" or some other tree.
MSB
--
Fun chemistry experiments:
Sodium chloride doubles its volume
when combined with an equal amount of table salt.
Hi Greg,
Arch Linux recently enabled building the kernel documentation in their
PKGBUILD, which turned my normally peaceful build into a spamfest of
unescaped braces warnings from Perl.
These are fixed upstream with the following two patches:
701b3a3c0ac4 ("PATCH scripts/kernel-doc")
673bb2dfc364 ("scripts/kernel-doc: Escape all literal braces in regexes")
Could they please be applied to 4.18? They should be clean cherry-picks.
Thanks!
Nathan
Hi Greg, Ben, and all
Is https://www.kernel.org/category/releases.html updated in terms of EOL?
Some news out of Linaro conference [2] generated a lot of doubts and questions
around.
Specially because on the way it was stated by the news 3.16 wouldn't be active
anymore. So I'm not sure about the news, but I'd like confirmation from you about
expected EOL.
[2] https://itsfoss.com/linux-lts-kernel-six-years/
Thanks in advance,
Rodrigo.
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which
leaves a window where cm_rep_handler can run, set the connection up,
dequeue pending packets and leave the subsequently queued packets by
start_xmit() sitting on neigh->queue until they're dropped when the
connection is torn down. This only applies to connected mode. These
dropped packets can really upset TCP, for example, and cause
multi-minute delays in transmission for open connections.
Here's the code in start_xmit where we check to see if the connection
is up:
if (ipoib_cm_get(neigh)) {
if (ipoib_cm_up(neigh)) {
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
}
The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet
in start_xmit where packets are queued.
if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
push_pseudo_header(skb, phdr->hwaddr);
spin_lock_irqsave(&priv->lock, flags);
__skb_queue_tail(&neigh->queue, skb);
spin_unlock_irqrestore(&priv->lock, flags);
} else {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
The patch re-checks ipoib_cm_up with priv->lock held to avoid this
race condition. Since odds are the conn should be up most of the time
(and thus the connection *not* down most of the time) we don't hold the
lock for the first check attempt to avoid a slowdown from unecessary
locking for the majority of the packets transmitted during the
connection's life.
Signed-off-by: Aaron Knister <aaron.s.knister(a)nasa.gov>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 46 ++++++++++++++++++++++++++-----
1 file changed, 39 insertions(+), 7 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 26cde95b..a950c916 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1093,6 +1093,21 @@ static void unicast_arp_send(struct sk_buff *skb, struct net_device *dev,
spin_unlock_irqrestore(&priv->lock, flags);
}
+static bool defer_neigh_skb(struct sk_buff *skb,
+ struct net_device *dev,
+ struct ipoib_neigh *neigh,
+ struct ipoib_pseudo_header *phdr)
+{
+ if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
+ push_pseudo_header(skb, phdr->hwaddr);
+ __skb_queue_tail(&neigh->queue, skb);
+ return true;
+ }
+
+ return false;
+}
+
+
static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct ipoib_dev_priv *priv = ipoib_priv(dev);
@@ -1101,6 +1116,7 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
struct ipoib_pseudo_header *phdr;
struct ipoib_header *header;
unsigned long flags;
+ bool deferred_pkt = true;
phdr = (struct ipoib_pseudo_header *) skb->data;
skb_pull(skb, sizeof(*phdr));
@@ -1160,6 +1176,23 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
goto unref;
}
+ /*
+ * Re-check ipoib_cm_up with priv->lock held to avoid
+ * race condition between start_xmit and skb_dequeue in
+ * cm_rep_handler. Since odds are the conn should be up
+ * most of the time, we don't hold the lock for the
+ * first check above
+ */
+ spin_lock_irqsave(&priv->lock, flags);
+ if (ipoib_cm_up(neigh)) {
+ spin_unlock_irqrestore(&priv->lock, flags);
+ ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+ } else {
+ deferred_pkt = defer_neigh_skb(skb, dev, neigh, phdr);
+ spin_unlock_irqrestore(&priv->lock, flags);
+ }
+
+ goto unref;
} else if (neigh->ah && neigh->ah->valid) {
neigh->ah->last_send = rn->send(dev, skb, neigh->ah->ah,
IPOIB_QPN(phdr->hwaddr));
@@ -1168,17 +1201,16 @@ static netdev_tx_t ipoib_start_xmit(struct sk_buff *skb, struct net_device *dev)
neigh_refresh_path(neigh, phdr->hwaddr, dev);
}
- if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
- push_pseudo_header(skb, phdr->hwaddr);
- spin_lock_irqsave(&priv->lock, flags);
- __skb_queue_tail(&neigh->queue, skb);
- spin_unlock_irqrestore(&priv->lock, flags);
- } else {
+ spin_lock_irqsave(&priv->lock, flags);
+ deferred_pkt = defer_neigh_skb(skb, dev, neigh, phdr);
+ spin_unlock_irqrestore(&priv->lock, flags);
+
+unref:
+ if (!deferred_pkt) {
++dev->stats.tx_dropped;
dev_kfree_skb_any(skb);
}
-unref:
ipoib_neigh_put(neigh);
return NETDEV_TX_OK;
--
2.12.3