On 2019/4/20 7:16 上午, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.0.8, v4.19.35, v4.14.112, v4.9.169, v4.4.178, v3.18.138.
>
> v5.0.8: Build OK!
> v4.19.35: Build OK!
> v4.14.112: Failed to apply! Possible dependencies:
> a728eacbbdd2 ("bcache: add journal statistic")
> c4dc2497d50d ("bcache: fix high CPU occupancy during journal")
>
> v4.9.169: Failed to apply! Possible dependencies:
> a728eacbbdd2 ("bcache: add journal statistic")
> c4dc2497d50d ("bcache: fix high CPU occupancy during journal")
>
> v4.4.178: Failed to apply! Possible dependencies:
> a728eacbbdd2 ("bcache: add journal statistic")
> c4dc2497d50d ("bcache: fix high CPU occupancy during journal")
>
> v3.18.138: Failed to apply! Possible dependencies:
> a728eacbbdd2 ("bcache: add journal statistic")
> c4dc2497d50d ("bcache: fix high CPU occupancy during journal")
>
>
> How should we proceed with this patch?
This patch will go into Linux v5.2. We can have them in stable after
they being upstream.
Thanks.
--
Coly Li
This is the start of the stable review cycle for the 4.9.170 release.
There are 50 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat Apr 20 16:03:22 UTC 2019.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.170-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.170-rc1
Lars Persson <lars.persson(a)axis.com>
net: stmmac: Set dma ring length before enabling the DMA
Arnaldo Carvalho de Melo <acme(a)redhat.com>
tools include: Adopt linux/bits.h
Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
tpm/tpm_crb: Avoid unaligned reads in crb_recv()
Pi-Hsun Shih <pihsun(a)chromium.org>
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
Stanislaw Gruszka <sgruszka(a)redhat.com>
lib/div64.c: off by one in shift
YueHaibing <yuehaibing(a)huawei.com>
appletalk: Fix use-after-free in atalk_proc_exit
Yang Shi <yang.shi(a)linaro.org>
ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
Christophe Leroy <christophe.leroy(a)c-s.fr>
lkdtm: Add tests for NULL pointer dereference
Dmitry Osipenko <digetx(a)gmail.com>
soc/tegra: pmc: Drop locking from tegra_powergate_is_powered()
Julia Cartwright <julia(a)ni.com>
iommu/dmar: Fix buffer overflow during PCI bus notification
Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
crypto: sha512/arm - fix crash bug in Thumb2 build
Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
crypto: sha256/arm - fix crash bug in Thumb2 build
Vitaly Kuznetsov <vkuznets(a)redhat.com>
kernel: hung_task.c: disable on suspend
Steve French <stfrench(a)microsoft.com>
cifs: fallback to older infolevels on findfirst queryinfo retry
Ronald Tschalär <ronald(a)innovation.ch>
ACPI / SBS: Fix GPE storm on recent MacBookPro's
Bartlomiej Zolnierkiewicz <b.zolnierkie(a)samsung.com>
ARM: samsung: Limit SAMSUNG_PM_CHECK config option to non-Exynos platforms
Julian Sax <jsbc(a)gmx.de>
HID: i2c-hid: override HID descriptors for certain devices
Michal Simek <michal.simek(a)xilinx.com>
serial: uartps: console_setup() can't be placed to init section
Chao Yu <yuchao0(a)huawei.com>
f2fs: fix to do sanity check with current segment number
Dinu-Razvan Chis-Serban <justcsdr(a)gmail.com>
9p locks: add mount option for lock retry interval
Gertjan Halkes <gertjan(a)google.com>
9p: do not trust pdu content for stat item size
Siva Rebbagondla <siva.rebbagondla(a)redpinesignals.com>
rsi: improve kernel thread handling to fix kernel panic
Robert Jarzmik <robert.jarzmik(a)free.fr>
gpio: pxa: handle corner case of unprobed device
Darrick J. Wong <darrick.wong(a)oracle.com>
ext4: prohibit fstrim in norecovery mode
Steve French <stfrench(a)microsoft.com>
fix incorrect error code mapping for OBJECTID_NOT_FOUND
Nathan Chancellor <natechancellor(a)gmail.com>
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
Lu Baolu <baolu.lu(a)linux.intel.com>
iommu/vt-d: Check capability before disabling protected memory
Matthew Whitehead <tedheadster(a)gmail.com>
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
Aditya Pakki <pakki001(a)umn.edu>
x86/hpet: Prevent potential NULL pointer dereference
Jianguo Chen <chenjianguo3(a)huawei.com>
irqchip/mbigen: Don't clear eventid when freeing an MSI
Changbin Du <changbin.du(a)gmail.com>
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
Changbin Du <changbin.du(a)gmail.com>
perf tests: Fix a memory leak of cpu_map object in the openat_syscall_event_on_all_cpus test
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf evsel: Free evsel->counts in perf_evsel__exit()
Changbin Du <changbin.du(a)gmail.com>
perf hist: Add missing map__put() in error case
Changbin Du <changbin.du(a)gmail.com>
perf top: Fix error handling in cmd_top()
Changbin Du <changbin.du(a)gmail.com>
perf build-id: Fix memory leak in print_sdt_events()
Changbin Du <changbin.du(a)gmail.com>
perf config: Fix a memory leak in collect_config()
Changbin Du <changbin.du(a)gmail.com>
perf config: Fix an error in the config template documentation
David Arcari <darcari(a)redhat.com>
tools/power turbostat: return the exit status of a command
Matthew Garrett <matthewgarrett(a)google.com>
thermal/int340x_thermal: fix mode setting
Matthew Garrett <matthewgarrett(a)google.com>
thermal/int340x_thermal: Add additional UUIDs
Colin Ian King <colin.king(a)canonical.com>
ALSA: opl3: fix mismatch between snd_opl3_drum_switch definition and declaration
Arnd Bergmann <arnd(a)arndb.de>
mmc: davinci: remove extraneous __init annotation
Jack Morgenstein <jackm(a)dev.mellanox.co.il>
IB/mlx4: Fix race condition between catas error reset and aliasguid flows
Kangjie Lu <kjlu(a)umn.edu>
ALSA: sb8: add a check for request_region
Kangjie Lu <kjlu(a)umn.edu>
ALSA: echoaudio: add a check for ioremap_nocache
Lukas Czerner <lczerner(a)redhat.com>
ext4: report real fs size after failed resize
Lukas Czerner <lczerner(a)redhat.com>
ext4: add missing brelse() in add_new_gdb_meta_bg()
Stephane Eranian <eranian(a)google.com>
perf/core: Restore mmap record type correctly
Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
ARC: u-boot args: check that magic number is correct
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/head.S | 1 +
arch/arc/kernel/setup.c | 8 +
arch/arm/crypto/sha256-armv4.pl | 3 +-
arch/arm/crypto/sha256-core.S_shipped | 3 +-
arch/arm/crypto/sha512-armv4.pl | 3 +-
arch/arm/crypto/sha512-core.S_shipped | 3 +-
arch/arm/kernel/patch.c | 6 +-
arch/arm/plat-samsung/Kconfig | 2 +-
arch/x86/kernel/cpu/cyrix.c | 14 +-
arch/x86/kernel/hpet.c | 2 +
arch/x86/kernel/hw_breakpoint.c | 1 +
drivers/acpi/sbs.c | 8 +-
drivers/char/tpm/tpm_crb.c | 22 +-
drivers/gpio/gpio-pxa.c | 6 +
drivers/hid/i2c-hid/Makefile | 3 +
drivers/hid/i2c-hid/{i2c-hid.c => i2c-hid-core.c} | 56 ++--
drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c | 376 ++++++++++++++++++++++
drivers/hid/i2c-hid/i2c-hid.h | 20 ++
drivers/infiniband/hw/mlx4/alias_GUID.c | 2 +-
drivers/iommu/dmar.c | 2 +-
drivers/iommu/intel-iommu.c | 3 +
drivers/irqchip/irq-mbigen.c | 3 +
drivers/misc/lkdtm.h | 2 +
drivers/misc/lkdtm_core.c | 2 +
drivers/misc/lkdtm_perms.c | 18 ++
drivers/mmc/host/davinci_mmc.c | 2 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +-
drivers/net/wireless/rsi/rsi_common.h | 1 -
drivers/soc/tegra/pmc.c | 8 +-
drivers/thermal/int340x_thermal/int3400_thermal.c | 21 +-
drivers/tty/serial/xilinx_uartps.c | 2 +-
fs/9p/v9fs.c | 21 ++
fs/9p/v9fs.h | 1 +
fs/9p/vfs_dir.c | 8 +-
fs/9p/vfs_file.c | 6 +-
fs/cifs/inode.c | 67 ++--
fs/cifs/smb2maperror.c | 3 +-
fs/ext4/ioctl.c | 7 +
fs/ext4/resize.c | 17 +-
fs/f2fs/super.c | 34 +-
include/linux/atalk.h | 2 +-
include/linux/swap.h | 4 +-
kernel/events/core.c | 2 +
kernel/hung_task.c | 30 +-
lib/div64.c | 4 +-
net/9p/protocol.c | 3 +-
net/appletalk/atalk_proc.c | 2 +-
net/appletalk/ddp.c | 37 ++-
net/appletalk/sysctl_net_atalk.c | 5 +-
sound/drivers/opl3/opl3_voice.h | 2 +-
sound/isa/sb/sb8.c | 4 +
sound/pci/echoaudio/echoaudio.c | 5 +
tools/include/linux/bitops.h | 6 +-
tools/include/linux/bits.h | 26 ++
tools/perf/Documentation/perf-config.txt | 2 +-
tools/perf/builtin-top.c | 5 +-
tools/perf/check-headers.sh | 1 +
tools/perf/tests/evsel-tp-sched.c | 1 +
tools/perf/tests/openat-syscall-all-cpus.c | 4 +-
tools/perf/util/build-id.c | 1 +
tools/perf/util/config.c | 3 +-
tools/perf/util/evsel.c | 1 +
tools/perf/util/hist.c | 4 +-
tools/perf/util/parse-events.c | 1 +
tools/power/x86/turbostat/turbostat.c | 3 +
66 files changed, 807 insertions(+), 132 deletions(-)
In the following commit, removing SIGKILL from each thread signal mask
and executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the SIGKILL
signal will be inaccurate.
commit cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Therefore, we need to add trace_signal_deliver before "goto fatal"
after executing sigdelset.
Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
---
kernel/signal.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/signal.c b/kernel/signal.c
index 227ba170298e..439b742e3229 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -2441,6 +2441,8 @@ bool get_signal(struct ksignal *ksig)
if (signal_group_exit(signal)) {
ksig->info.si_signo = signr = SIGKILL;
sigdelset(¤t->pending.signal, SIGKILL);
+ trace_signal_deliver(signr, &ksig->info,
+ &sighand->action[signr - 1]);
recalc_sigpending();
goto fatal;
}
--
2.14.1.windows.1
The patch titled
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
has been removed from the -mm tree. Its filename was
libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields.patch
This patch was dropped because it was withdrawn
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
At namespace creation time there is the potential for the "expected to be
zero" fields of a 'pfn' info-block to be filled with indeterminate data.
While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.
In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly initialized
to be guaranteed zero. Bump the minor version to indicate it is safe to
assume the 'padding' and 'flags' are zero. Otherwise, this corruption is
expected to benign since all other critical fields are explicitly
initialized.
-stable note: It's not a problem unless a kernel implementation is
explicitly expecting those fields to be zero-initialized. I only
marked it for -stable in case some future kernel backports patch12.
Otherwise it's benign on older kernels that don't have patch12 since
all fields are indeed initialized.
Link: http://lkml.kernel.org/r/155552639290.2015392.17304211251966796338.stgit@dw…
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Oscar Salvador <osalvador(a)suse.de>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/nvdimm/dax_devs.c | 2 +-
drivers/nvdimm/pfn.h | 1 +
drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++---
3 files changed, 17 insertions(+), 4 deletions(-)
--- a/drivers/nvdimm/dax_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/dax_devs.c
@@ -126,7 +126,7 @@ int nd_dax_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!dax_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
--- a/drivers/nvdimm/pfn_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn_devs.c
@@ -420,6 +420,15 @@ static int nd_pfn_clear_memmap_errors(st
return 0;
}
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
@@ -565,7 +574,7 @@ int nd_pfn_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!pfn_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -702,7 +711,7 @@ static int nd_pfn_init(struct nd_pfn *nd
u64 checksum;
int rc;
- pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
@@ -711,11 +720,14 @@ static int nd_pfn_init(struct nd_pfn *nd
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
/* no info block, do init */;
+ memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(&nd_pfn->dev,
@@ -768,7 +780,7 @@ static int nd_pfn_init(struct nd_pfn *nd
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
- pfn_sb->version_minor = cpu_to_le16(2);
+ pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
--- a/drivers/nvdimm/pfn.h~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn.h
@@ -36,6 +36,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+ /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
};
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
libnvdimm-pfn-stop-padding-pmem-namespaces-to-section-alignment.patch
mm-shuffle-initial-free-memory-to-improve-memory-side-cache-utilization.patch
mm-shuffle-initial-free-memory-to-improve-memory-side-cache-utilization-fix.patch
mm-move-buddy-list-manipulations-into-helpers.patch
mm-move-buddy-list-manipulations-into-helpers-fix.patch
mm-maintain-randomization-of-page-free-lists.patch
The patch titled
Subject: mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
has been added to the -mm tree. Its filename is
mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-do-not-boost-watermarks-to-avoi…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-do-not-boost-watermarks-to-avoi…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
Mikulas Patocka reported that 1c30844d2dfe ("mm: reclaim small amounts of
memory when an external fragmentation event occurs") "broke" memory
management on parisc. The machine is not NUMA but the DISCONTIG model
creates three pgdats even though it's a UMA machine for the following
ranges
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
Mikulas reported:
With the patch 1c30844d2, the kernel will incorrectly reclaim the
first zone when it fills up, ignoring the fact that there are two
completely free zones. Basiscally, it limits cache size to 1GiB.
For example, if I run:
# dd if=/dev/sda of=/dev/null bs=1M count=2048
- with the proper kernel, there should be "Buffers - 2GiB"
when this command finishes. With the patch 1c30844d2, buffers
will consume just 1GiB or slightly more, because the kernel was
incorrectly reclaiming them.
The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis. Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.
DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.
Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reported-by: Mikulas Patocka <mpatocka(a)redhat.com>
Tested-by: Mikulas Patocka <mpatocka(a)redhat.com>
Cc: James Bottomley <James.Bottomley(a)hansenpartnership.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/sysctl/vm.txt | 16 ++++++++--------
mm/page_alloc.c | 13 +++++++++++++
2 files changed, 21 insertions(+), 8 deletions(-)
--- a/Documentation/sysctl/vm.txt~mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model
+++ a/Documentation/sysctl/vm.txt
@@ -866,14 +866,14 @@ The intent is that compaction has less w
increase the success rate of future high-order allocations such as SLUB
allocations, THP and hugetlbfs pages.
-To make it sensible with respect to the watermark_scale_factor parameter,
-the unit is in fractions of 10,000. The default value of 15,000 means
-that up to 150% of the high watermark will be reclaimed in the event of
-a pageblock being mixed due to fragmentation. The level of reclaim is
-determined by the number of fragmentation events that occurred in the
-recent past. If this value is smaller than a pageblock then a pageblocks
-worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor
-of 0 will disable the feature.
+To make it sensible with respect to the watermark_scale_factor
+parameter, the unit is in fractions of 10,000. The default value of
+15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
+watermark will be reclaimed in the event of a pageblock being mixed due
+to fragmentation. The level of reclaim is determined by the number of
+fragmentation events that occurred in the recent past. If this value is
+smaller than a pageblock then a pageblocks worth of pages will be reclaimed
+(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature.
=============================================================
--- a/mm/page_alloc.c~mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model
+++ a/mm/page_alloc.c
@@ -266,7 +266,20 @@ compound_page_dtor * const compound_page
int min_free_kbytes = 1024;
int user_min_free_kbytes = -1;
+#ifdef CONFIG_DISCONTIGMEM
+/*
+ * DiscontigMem defines memory ranges as separate pg_data_t even if the ranges
+ * are not on separate NUMA nodes. Functionally this works but with
+ * watermark_boost_factor, it can reclaim prematurely as the ranges can be
+ * quite small. By default, do not boost watermarks on discontigmem as in
+ * many cases very high-order allocations like THP are likely to be
+ * unsupported and the premature reclaim offsets the advantage of long-term
+ * fragmentation avoidance.
+ */
+int watermark_boost_factor __read_mostly;
+#else
int watermark_boost_factor __read_mostly = 15000;
+#endif
int watermark_scale_factor = 10;
static unsigned long nr_kernel_pages __initdata;
_
Patches currently in -mm which might be from mgorman(a)techsingularity.net are
mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model.patch