The patch titled
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
BUG at mm/page_alloc.c:1913
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone() the same
way as in move_freepages_block().
Link: http://lkml.kernel.org/r/1519988497-28941-1-git-send-email-neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 9 +++++++--
2 files changed, 13 insertions(+), 9 deletions(-)
diff -puN mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/memblock.c
--- a/mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_rang
*out_nid = r->nid;
}
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);
do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_n
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);
if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}
/**
diff -puN mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not as move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-memblock-hardcode-the-end_pfn-being-1.patch
The patch titled
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
has been added to the -mm tree. Its filename is
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-memmap_init_zone…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-memmap_init_zone…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
BUG at mm/page_alloc.c:1913
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone() the same
way as in move_freepages_block().
Link: http://lkml.kernel.org/r/1519988497-28941-1-git-send-email-neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 9 +++++++--
2 files changed, 13 insertions(+), 9 deletions(-)
diff -puN mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/memblock.c
--- a/mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_rang
*out_nid = r->nid;
}
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);
do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_n
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);
if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}
/**
diff -puN mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not as move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This is the start of the stable review cycle for the 3.18.98 release.
There are 24 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Mar 4 08:42:22 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.98-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.98-rc1
Yangbo Lu <yangbo.lu(a)nxp.com>
net: gianfar_ptp: move set_fipers() to spinlock protecting area
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: make use of pre-calculated len
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix partial gntdev_mmap() cleanup
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix off-by-one error when unmapping with holes
Sergei Shtylyov <sergei.shtylyov(a)cogentembedded.com>
SolutionEngine771x: fix Ether platform data
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
mdio-sun4i: Fix a memory leak
Eduardo Otubo <otubo(a)redhat.com>
xen-netfront: enable device after manual module load
Xiongwei Song <sxwjean(a)gmail.com>
drm/ttm: check the return value of kzalloc
Tushar Dave <tushar.n.dave(a)oracle.com>
e1000: fix disabling already-disabled warning
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: check result of register_shrinker()
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: fix missed destroy of qi_tree_lock
Stefan Haberland <sth(a)linux.vnet.ibm.com>
s390/dasd: fix wrongly assigned configuration data
Matthieu CASTET <matthieu.castet(a)parrot.com>
led: core: Fix brightness setting when setting delay_off=0
Guilherme G. Piccoli <gpiccoli(a)linux.vnet.ibm.com>
bnx2x: Improve reliability in case of nested PCI errors
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Enable PHY reset in MTU change path for 5720
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Add workaround to restrict 5762 MRRS to 2048
Cathy Avery <cavery(a)redhat.com>
scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
Alexander Kochetkov <al.kochet(a)gmail.com>
net: arc_emac: fix arc_emac_rx() error paths
Radu Pirea <radu.pirea(a)microchip.com>
spi: atmel: fixed spin_lock usage inside atmel_spi_remove
Al Viro <viro(a)zeniv.linux.org.uk>
sget(): handle failures of register_shrinker()
Brendan McGrath <redmcg(a)redmandi.dyndns.org>
ipv6: icmp6: Allow icmp messages to be looped back
Sascha Hauer <s.hauer(a)pengutronix.de>
mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
Anna-Maria Gleixner <anna-maria(a)linutronix.de>
hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
Jakub Sitnicki <jkbs(a)redhat.com>
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
-------------
Diffstat:
Makefile | 4 +-
arch/sh/boards/mach-se/770x/setup.c | 10 ++++-
drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 +
drivers/leds/led-core.c | 2 +-
drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 6 +--
drivers/net/ethernet/arc/emac_main.c | 53 ++++++++++++++----------
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 ++++++-
drivers/net/ethernet/broadcom/tg3.c | 13 +++++-
drivers/net/ethernet/broadcom/tg3.h | 4 ++
drivers/net/ethernet/freescale/gianfar_ptp.c | 3 +-
drivers/net/ethernet/intel/e1000/e1000.h | 3 +-
drivers/net/ethernet/intel/e1000/e1000_main.c | 27 +++++++++---
drivers/net/phy/mdio-sun4i.c | 6 ++-
drivers/net/xen-netfront.c | 1 +
drivers/s390/block/dasd_3990_erp.c | 10 +++++
drivers/scsi/storvsc_drv.c | 3 +-
drivers/spi/spi-atmel.c | 2 +-
drivers/xen/gntdev.c | 8 ++--
fs/super.c | 6 ++-
fs/xfs/xfs_qm.c | 46 +++++++++++++-------
kernel/time/hrtimer.c | 7 +++-
net/ipv6/ip6_output.c | 11 ++---
net/ipv6/route.c | 1 +
net/sctp/socket.c | 16 ++++---
25 files changed, 180 insertions(+), 82 deletions(-)
This is the start of the stable review cycle for the 4.9.86 release.
There are 56 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Mar 4 08:44:26 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.86-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.86-rc1
James Hogan <jhogan(a)kernel.org>
MIPS: Implement __multi3 for GCC7 MIPS64r6 builds
Punit Agrawal <punit.agrawal(a)arm.com>
KVM: arm/arm64: Fix check for hugepage size when allocating at Stage 2
Yangbo Lu <yangbo.lu(a)nxp.com>
net: gianfar_ptp: move set_fipers() to spinlock protecting area
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: make use of pre-calculated len
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix partial gntdev_mmap() cleanup
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix off-by-one error when unmapping with holes
Sergei Shtylyov <sergei.shtylyov(a)cogentembedded.com>
SolutionEngine771x: fix Ether platform data
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
mdio-sun4i: Fix a memory leak
Eduardo Otubo <otubo(a)redhat.com>
xen-netfront: enable device after manual module load
Venkat Duvvuru <venkatkumar.duvvuru(a)broadcom.com>
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
Luu An Phu <phu.luuan(a)nxp.com>
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
Johannes Berg <johannes.berg(a)intel.com>
mac80211: mesh: drop frames appearing to be from us
Hao Chen <flank3rsky(a)gmail.com>
nl80211: Check for the required netlink attribute presence
Alexander Duyck <alexander.h.duyck(a)intel.com>
i40e/i40evf: Account for frags split over multiple descriptors in check linearize
Felix Janda <felix.janda(a)posteo.de>
uapi libc compat: add fallback for unsupported libcs
Xiongwei Song <sxwjean(a)gmail.com>
drm/ttm: check the return value of kzalloc
SZ Lin (林上智) <sz.lin(a)moxa.com>
NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625
Tushar Dave <tushar.n.dave(a)oracle.com>
e1000: fix disabling already-disabled warning
Gao Feng <gfree.wind(a)vip.163.com>
macvlan: Fix one possible double free
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: check result of register_shrinker()
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: fix missed destroy of qi_tree_lock
Erez Shitrit <erezsh(a)mellanox.com>
IB/ipoib: Fix race condition in neigh creation
Leon Romanovsky <leonro(a)mellanox.com>
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Stefan Haberland <sth(a)linux.vnet.ibm.com>
s390/dasd: fix wrongly assigned configuration data
Guenter Roeck <linux(a)roeck-us.net>
genirq: Guard handle_bad_irq log messages
Nitzan Carmi <nitzanc(a)mellanox.com>
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
Matthieu CASTET <matthieu.castet(a)parrot.com>
led: core: Fix brightness setting when setting delay_off=0
Guilherme G. Piccoli <gpiccoli(a)linux.vnet.ibm.com>
bnx2x: Improve reliability in case of nested PCI errors
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Enable PHY reset in MTU change path for 5720
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Add workaround to restrict 5762 MRRS to 2048
Tommi Rantala <tommi.t.rantala(a)nokia.com>
tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
Tommi Rantala <tommi.t.rantala(a)nokia.com>
tipc: error path leak fixes in tipc_enable_bearer()
James Hogan <jhogan(a)kernel.org>
lib/mpi: Fix umul_ppmm() for MIPS64r6
Arnd Bergmann <arnd(a)arndb.de>
ARM: dts: ls1021a: fix incorrect clock references
Cathy Avery <cavery(a)redhat.com>
scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
Fredrik Hallenberg <megahallon(a)gmail.com>
net: stmmac: Fix TX timestamp calculation
Xin Long <lucien.xin(a)gmail.com>
ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
Alexander Kochetkov <al.kochet(a)gmail.com>
net: arc_emac: fix arc_emac_rx() error paths
Sean Wang <sean.wang(a)mediatek.com>
net: mediatek: setup proper state for disabled GMAC on the default
Abhijeet Kumar <abhijeet.kumar(a)intel.com>
ASoC: nau8825: fix issue that pop noise when start capture
Radu Pirea <radu.pirea(a)microchip.com>
spi: atmel: fixed spin_lock usage inside atmel_spi_remove
Jia-Ju Bai <baijiaju1990(a)163.com>
mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
Karol Herbst <kherbst(a)redhat.com>
drm/nouveau/pci: do a msi rearm on init
Alexey Khoroshilov <khoroshilov(a)ispras.ru>
net: phy: xgene: disable clk on error paths
Al Viro <viro(a)zeniv.linux.org.uk>
sget(): handle failures of register_shrinker()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
x86/asm: Allow again using asm.h when building for the 'bpf' clang target
Chunyan Zhang <zhang.lyra(a)gmail.com>
ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
Brendan McGrath <redmcg(a)redmandi.dyndns.org>
ipv6: icmp6: Allow icmp messages to be looped back
Albert Hsieh <wen.hsieh(a)broadcom.com>
mtd: nand: brcmnand: Zero bitflip is not an error
Sascha Hauer <s.hauer(a)pengutronix.de>
mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
Keith Busch <keith.busch(a)intel.com>
nvme: check hw sectors before setting chunk sectors
Andreas Platschek <andreas.platschek(a)opentech.at>
dmaengine: fsl-edma: disable clks on all error paths
Yunlei He <heyunlei(a)huawei.com>
f2fs: fix a bug caused by NULL extent tree
Ben Gardner <gardner.ben(a)gmail.com>
i2c: designware: must wait for enable
Anna-Maria Gleixner <anna-maria(a)linutronix.de>
hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/ls1021a-qds.dts | 2 +-
arch/arm/boot/dts/ls1021a-twr.dts | 2 +-
arch/arm/kvm/mmu.c | 2 +-
arch/arm/lib/csumpartialcopyuser.S | 4 ++
arch/mips/lib/Makefile | 3 +-
arch/mips/lib/libgcc.h | 17 +++++++
arch/mips/lib/multi3.c | 54 +++++++++++++++++++++
arch/sh/boards/mach-se/770x/setup.c | 10 +++-
arch/x86/include/asm/asm.h | 2 +
drivers/dma/fsl-edma.c | 28 +++++------
drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c | 7 +++
drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 +
drivers/i2c/busses/i2c-designware-core.c | 2 +-
drivers/infiniband/hw/mlx4/mr.c | 2 +-
drivers/infiniband/hw/mlx5/mr.c | 1 +
drivers/infiniband/ulp/ipoib/ipoib_main.c | 25 +++++++---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 5 +-
drivers/leds/led-core.c | 2 +-
drivers/mtd/nand/brcmnand/brcmnand.c | 2 +-
drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 6 +--
drivers/net/can/flexcan.c | 2 +-
drivers/net/ethernet/arc/emac_main.c | 53 ++++++++++++---------
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 +++++-
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
drivers/net/ethernet/broadcom/tg3.c | 13 ++++-
drivers/net/ethernet/broadcom/tg3.h | 4 ++
drivers/net/ethernet/freescale/gianfar_ptp.c | 3 +-
drivers/net/ethernet/intel/e1000/e1000.h | 3 +-
drivers/net/ethernet/intel/e1000/e1000_main.c | 27 +++++++++--
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 ++++++++--
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 26 ++++++++--
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 11 +++--
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 6 ++-
drivers/net/macvlan.c | 7 ++-
drivers/net/phy/mdio-sun4i.c | 6 ++-
drivers/net/phy/mdio-xgene.c | 21 ++++++---
drivers/net/usb/qmi_wwan.c | 2 +
drivers/net/wireless/mac80211_hwsim.c | 2 +-
drivers/net/xen-netfront.c | 1 +
drivers/nvme/host/core.c | 3 +-
drivers/s390/block/dasd_3990_erp.c | 10 ++++
drivers/scsi/storvsc_drv.c | 3 +-
drivers/spi/spi-atmel.c | 2 +-
drivers/xen/gntdev.c | 8 ++--
fs/f2fs/extent_cache.c | 12 ++++-
fs/super.c | 6 ++-
fs/xfs/xfs_qm.c | 46 +++++++++++-------
include/uapi/linux/libc-compat.h | 55 +++++++++++++++++++++-
kernel/irq/debug.h | 5 ++
kernel/time/hrtimer.c | 7 ++-
lib/mpi/longlong.h | 18 ++++++-
net/ipv6/ip6_tunnel.c | 9 +++-
net/ipv6/route.c | 1 +
net/mac80211/rx.c | 2 +
net/sctp/socket.c | 16 ++++---
net/tipc/bearer.c | 5 +-
net/tipc/monitor.c | 6 ++-
net/wireless/nl80211.c | 3 +-
sound/soc/codecs/nau8825.c | 1 +
61 files changed, 498 insertions(+), 135 deletions(-)
The patch titled
Subject: mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
has been added to the -mm tree. Its filename is
mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-gup-teach-get_user_pages_unlock…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-gup-teach-get_user_pages_unlock…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
KVM is hanging during postcopy live migration with userfaultfd because
get_user_pages_unlocked is not capable to handle FOLL_NOWAIT.
Earlier FOLL_NOWAIT was only ever passed to get_user_pages.
Specifically faultin_page (the callee of get_user_pages_unlocked caller)
doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the page fault
flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't actually
released (even if nonblocking is not NULL). So it sets *nonblocking to
zero and the caller won't release the mmap_sem thinking it was already
released, but it wasn't because of FOLL_NOWAIT.
Link: http://lkml.kernel.org/r/20180302174343.5421-2-aarcange@redhat.com
Fixes: ce53053ce378c ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Dr. David Alan Gilbert <dgilbert(a)redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff -puN mm/gup.c~mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait mm/gup.c
--- a/mm/gup.c~mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait
+++ a/mm/gup.c
@@ -516,7 +516,7 @@ static int faultin_page(struct task_stru
}
if (ret & VM_FAULT_RETRY) {
- if (nonblocking)
+ if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
*nonblocking = 0;
return -EBUSY;
}
@@ -890,7 +890,10 @@ static __always_inline long __get_user_p
break;
}
if (*locked) {
- /* VM_FAULT_RETRY didn't trigger */
+ /*
+ * VM_FAULT_RETRY didn't trigger or it was a
+ * FOLL_NOWAIT.
+ */
if (!pages_done)
pages_done = ret;
break;
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait.patch
On Fri, Mar 02, 2018 at 05:41:28PM +0000, Harsh Shandilya wrote:
> On Fri 2 Mar, 2018, 2:25 PM Greg Kroah-Hartman, <gregkh(a)linuxfoundation.org>
> wrote:
>
> > This is the start of the stable review cycle for the 3.18.98 release.
> > There are 24 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Mar 4 08:42:22 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >
> > https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.98-rc…
> > or in the git tree and branch at:
> > git://
> > git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > linux-3.18.y
> > and the diffstat can be found below.
> >
>
> Builds and boots on the OnePlus 3T, no merge issues with CAF's msm-3.18
> tree.
Great, thanks for testing and letting me know.
greg k-h
> >
When SPI transfers can be offloaded using DMA, the SPI core need to
build a scatterlist to make sure that the buffer to be transferred is
dma-able.
This patch fixes the scatterlist entry size computation in the case
where the maximum acceptable scatterlist entry supported by the DMA
controller is less than PAGE_SIZE, when the buffer is vmalloced.
For each entry, the actual size is given by the minimum between the
desc_len (which is the max buffer size supported by the DMA controller)
and the remaining buffer length until we cross a page boundary.
Fixes: 65598c13fd66 ("spi: Fix per-page mapping of unaligned vmalloc-ed buffer")
Signed-off-by: Maxime Chevallier <maxime.chevallier(a)bootlin.com>
---
drivers/spi/spi.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index b33a727a0158..4153f959f28c 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -779,8 +779,14 @@ static int spi_map_buf(struct spi_controller *ctlr, struct device *dev,
for (i = 0; i < sgs; i++) {
if (vmalloced_buf || kmap_buf) {
- min = min_t(size_t,
- len, desc_len - offset_in_page(buf));
+ /*
+ * Next scatterlist entry size is the minimum between
+ * the desc_len and the remaining buffer length that
+ * fits in a page.
+ */
+ min = min_t(size_t, desc_len,
+ min_t(size_t, len,
+ PAGE_SIZE - offset_in_page(buf)));
if (vmalloced_buf)
vm_page = vmalloc_to_page(buf);
else
--
2.11.0
On 02/03/18 15:58, Simon Gaiser wrote:
> Juergen Gross:
>> On 20/02/18 05:56, Simon Gaiser wrote:
>>> Juergen Gross:
>>>> On 07/02/18 23:22, Simon Gaiser wrote:
>>>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>>>
>>>>> Before on an error response to XS_TRANSACTION_END
>>>>> xenbus_dev_request_and_reply() would not decrement the active
>>>>> transaction counter. But xenbus_transaction_end() has always counted the
>>>>> transaction as finished regardless of the response.
>>>>
>>>> Which is correct now. Xenstore will free all transaction related
>>>> data regardless of the response. A once failed transaction can't
>>>> be repaired, it has to be repeated completely.
>>>
>>> So if xenstore frees the transaction why should we keep it in the list
>>> with pending transaction in xenbus_dev_frontend? That's exactly what
>>> this patch fixes by always removing it from the list, not only on a
>>> successful response (See below for the EINVAL case).
>>
>> Aah, sorry, I seem to have misread my own coding. :-(
>>
>> Yes, you are right. Sorry for not seeing it before.
>>
>>>
>>> [...]
>>>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>>>> transaction twice corrupts the reference count. So fix this by also
>>>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>>>> regardless of the return code.
>>>>
>>>> A transaction in the list of transactions should not considered to be
>>>> finished. Either it is not on the list or it is still pending.
>>>
>>> With "considering a transaction closed" I mean "take the code path which
>>> removes the transaction from the list with pending transactions".
>>>
>>> From the follow-up mail:
>>>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>>>> xenbus_transaction_end() will always count the transaction as finished
>>>>>> regardless the response code (handled in xs_request_exit()).
>>>>>
>>>>> ENOENT should not decrement the transaction counter, while all
>>>>> other responses to XS_TRANSACTION_END should still do so.
>>>>
>>>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>>>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>>>> for ENOENT, though.
>>>>
>>>> What should be handled is EINVAL: this would happen if a user specified
>>>> a string different from "T" and "F".
>>>
>>> Ok, I will handle those cases in xs_request_exit(). Although I don't
>>> like that this depends on the internals of xenstore (At least to me it's
>>> not obvious why it should only return ENOENT or EINVAL in this case).
>>>
>>> In the xenbus_write_transaction() case checking the string before
>>> sending the transaction (like the transaction itself is verified) would
>>> avoid this problem.
>>
>> Right. I'd prefer this solution.
>>
>> Remains the only problem you tried to tackle with your second patch: a
>> kernel driver going crazy and ending transactions it never started (or
>> ending them multiple times). The EINVAL case can't happen here, but
>> ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
>> need to keep track of the transactions like in the user interface and
>> refuse ending an unknown transaction. Or you trust the kernel users.
>> Trying to fix the usage counter seems to be the wrong approach IMO.
>
> The point of the second patch was to detect such bugs. This would have
> saved quite some time to find this bug. I added the "fix" of the counter
> I just because it was trivial after having the if there.
>
> Adding tracking seems to be a quite complex solution for a _potential_
> problem.
I agree.
> So I would go with checking ENOENT in xs_request_exit(). Should this be
> WARN_ON_ONCE()? Since this normally should not happen I would say yes.
Yes, having a WARN_ON_ONCE here will help.
> Should I keep the reference counter sanity check? And if yes, with the
> "fix" to the counter?
I'd drop it. This really should not happen and blowing up kernel size
with checks of impossible situations isn't the way to go.
In case you really want to do something here you can add something like
ASSERT(xs_state_users) before decrementing the counter.
Juergen
On 20/02/18 05:56, Simon Gaiser wrote:
> Juergen Gross:
>> On 07/02/18 23:22, Simon Gaiser wrote:
>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>
>>> Before on an error response to XS_TRANSACTION_END
>>> xenbus_dev_request_and_reply() would not decrement the active
>>> transaction counter. But xenbus_transaction_end() has always counted the
>>> transaction as finished regardless of the response.
>>
>> Which is correct now. Xenstore will free all transaction related
>> data regardless of the response. A once failed transaction can't
>> be repaired, it has to be repeated completely.
>
> So if xenstore frees the transaction why should we keep it in the list
> with pending transaction in xenbus_dev_frontend? That's exactly what
> this patch fixes by always removing it from the list, not only on a
> successful response (See below for the EINVAL case).
Aah, sorry, I seem to have misread my own coding. :-(
Yes, you are right. Sorry for not seeing it before.
>
> [...]
>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>> transaction twice corrupts the reference count. So fix this by also
>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>> regardless of the return code.
>>
>> A transaction in the list of transactions should not considered to be
>> finished. Either it is not on the list or it is still pending.
>
> With "considering a transaction closed" I mean "take the code path which
> removes the transaction from the list with pending transactions".
>
> From the follow-up mail:
>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>> xenbus_transaction_end() will always count the transaction as finished
>>>> regardless the response code (handled in xs_request_exit()).
>>>
>>> ENOENT should not decrement the transaction counter, while all
>>> other responses to XS_TRANSACTION_END should still do so.
>>
>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>> for ENOENT, though.
>>
>> What should be handled is EINVAL: this would happen if a user specified
>> a string different from "T" and "F".
>
> Ok, I will handle those cases in xs_request_exit(). Although I don't
> like that this depends on the internals of xenstore (At least to me it's
> not obvious why it should only return ENOENT or EINVAL in this case).
>
> In the xenbus_write_transaction() case checking the string before
> sending the transaction (like the transaction itself is verified) would
> avoid this problem.
Right. I'd prefer this solution.
Remains the only problem you tried to tackle with your second patch: a
kernel driver going crazy and ending transactions it never started (or
ending them multiple times). The EINVAL case can't happen here, but
ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
need to keep track of the transactions like in the user interface and
refuse ending an unknown transaction. Or you trust the kernel users.
Trying to fix the usage counter seems to be the wrong approach IMO.
Juergen