On Tue, Oct 10, 2017 at 10:31 AM, Julia Lawall <julia.lawall(a)lip6.fr> wrote:
>
>
> On Tue, 10 Oct 2017, Levin, Alexander (Sasha Levin) wrote:
>
>> (Cc'ed Julia)
>>
>> On Mon, Oct 09, 2017 at 09:33:01AM -0700, Laura Abbott wrote:
>> >On 10/06/2017 08:10 PM, Levin, Alexander (Sasha Levin) wrote:
>> >> We are experimenting with using neural network to aid with patch
>> >> selection for stable kernel trees. There are quite a few commits that
>> >> were not marked for stable, but are stable material, and we're trying
>> >> to get them into their appropriate kernel trees.
>> >>
>> >
>> >Apart from the practical which has been covered, I'd be interested
>> >in hearing about the details of how this works if you can share
>> >them.
>>
>> This work is based on Julia's work
>> (https://soarsmu.github.io/papers/icse12-patch.pdf) to identify
>> commits that fix bugs.
>>
>> Essentially, my approach to this is to extract as much information as
>> possbile form the commit, including things such as:
>>
>> - How many times a certain word appeared in the message
>> - Who is the author
>> - Code metrics
>> - etc
>>
>> In my case, I end up with about 30,000 of these "inputs", and train a
>> neural network based on whether a given commit was included in a
>> stable tree or not.
>>
>> This approach has a few drawbacks compared to the one Julia
>> described in her paper:
>>
>> - Not every bug fixing commit ends up in stable (some end up in -rc
>> fixing a bug from the current merge window).
>> - Same as above, but for commits we miss and fail to add to stable.
>> - Sometimes commits get added to stable even though they don't follow
>> the rules at all (security fixes are a simple example).
>>
>> But it does seem to be effective at finding bug fixing commits that
>> should be in stable.
>>
>> At this stage we are still trying to figure out what a "bug fixing"
>> commit really is. For example, an observation we recently made was
>> that the code metrics actually don't have much weight in determining
>> whether a commit should be in stable or not.
>>
>> As we just started, I'm still experimenting with a few approaches, and
>> I belive Julia is waiting for a new student to take over this, so we
>> don't have any big insights to share just yet :)
>
> That's a good summary of the current status. Thanks!
>
> julia
I just started noticing the AUTOSEL tags yesterday and I think that's
a great idea to tag patches, but was there any thought to also putting
something in the commit message this way they're easily identifiable
in the git logs? I think it would be useful if there was some metadata
in the commit message which identified that it was selected through
some automated system. That way if I find a regression and it
identifies one of these commits I can know that maybe it was chosen
incorrectly, and also would allow me to alert the owner of the
selection script to better help refine its selection process.
Otherwise I'd have to track back through the mailing lists to see how
it landed in the stable release.
Just a thought. Also, thank you for trying to improve the stable kernels!
--
Josh
This is the start of the stable review cycle for the 4.4.98 release.
There are 56 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Nov 15 12:55:32 UTC 2017.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.98-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.98-rc1
Colin Ian King <colin.king(a)canonical.com>
PKCS#7: fix unitialized boolean 'want'
Borislav Petkov <bp(a)suse.de>
x86/oprofile/ppro: Do not use __this_cpu*() in preemptible context
Richard Schütz <rschuetz(a)uni-koblenz.de>
can: c_can: don't indicate triple sampling support for D_CAN
Gerhard Bertelsmann <info(a)gerhard-bertelsmann.de>
can: sun4i: handle overrun in RX FIFO
Ilya Dryomov <idryomov(a)gmail.com>
rbd: use GFP_NOIO for parent stat and data requests
Sinclair Yeh <syeh(a)vmware.com>
drm/vmwgfx: Fix Ubuntu 17.10 Wayland black screen issue
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Input: elan_i2c - add ELAN060C to the ACPI table
Oswald Buddenhagen <oswald.buddenhagen(a)gmx.de>
MIPS: AR7: Ensure that serial ports are properly set up
Jonas Gorski <jonas.gorski(a)gmail.com>
MIPS: AR7: Defer registration of GPIO
Luis R. Rodriguez <mcgrof(a)kernel.org>
tools: firmware: check for distro fallback udev cancel rule
Luis R. Rodriguez <mcgrof(a)kernel.org>
selftests: firmware: send expected errors to /dev/null
Brian Norris <computersforpeace(a)gmail.com>
selftests: firmware: add empty string and async tests
Brian Norris <computersforpeace(a)gmail.com>
test: firmware_class: report errors properly on failure
Matt Redfearn <matt.redfearn(a)imgtec.com>
MIPS: SMP: Fix deadlock & online race
Matija Glavinic Pecotic <matija.glavinic-pecotic.ext(a)nokia.com>
MIPS: Fix race on setting and getting cpu_online_mask
Matt Redfearn <matt.redfearn(a)imgtec.com>
MIPS: SMP: Use a completion event to signal CPU up
Paul Burton <paul.burton(a)mips.com>
MIPS: Fix CM region target definitions
Gustavo A. R. Silva <garsilva(a)embeddedor.com>
MIPS: microMIPS: Fix incorrect mask in insn_table_MM
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: Avoid invalid lockdep class warning
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: Fix OSS sysex delivery in OSS emulation
Mark Rutland <mark.rutland(a)arm.com>
ARM: 8720/1: ensure dump_instr() checks addr_limit
Eric Biggers <ebiggers(a)google.com>
KEYS: fix NULL pointer dereference during ASN.1 parsing [ver #2]
Andrey Ryabinin <aryabinin(a)virtuozzo.com>
crypto: x86/sha1-mb - fix panic due to unaligned access
Li Bin <huawei.libin(a)huawei.com>
workqueue: Fix NULL pointer dereference
Peter Zijlstra <peterz(a)infradead.org>
x86/uaccess, sched/preempt: Verify access_ok() context
Carlo Caione <carlo(a)endlessm.com>
platform/x86: hp-wmi: Do not shadow error values
Carlo Caione <carlo(a)endlessm.com>
platform/x86: hp-wmi: Fix error value for hp_wmi_tablet_state
Eric Biggers <ebiggers(a)google.com>
KEYS: trusted: fix writing past end of buffer in trusted_read()
Eric Biggers <ebiggers(a)google.com>
KEYS: trusted: sanitize all key material
Enrico Mioso <mrkiko.rs(a)gmail.com>
cdc_ncm: Set NTB format again after altsetting switch for Huawei devices
Carlo Caione <carlo(a)endlessm.com>
platform/x86: hp-wmi: Fix detection for dock and tablet mode
Vivien Didelot <vivien.didelot(a)savoirfairelinux.com>
net: dsa: select NET_SWITCHDEV
Julian Wiedmann <jwi(a)linux.vnet.ibm.com>
s390/qeth: issue STARTLAN as first IPA command
Feras Daoud <ferasda(a)mellanox.com>
IB/ipoib: Change list_del to list_del_init in the tx object
Akinobu Mita <akinobu.mita(a)gmail.com>
Input: mpr121 - set missing event capability
Akinobu Mita <akinobu.mita(a)gmail.com>
Input: mpr121 - handle multiple bits change of status register
Gilad Ben-Yossef <gilad(a)benyossef.com>
IPsec: do not ignore crypto err in ah4 input
Liping Zhang <zlpnobody(a)gmail.com>
netfilter: nft_meta: deal with PACKET_LOOPBACK in netdev family
William wu <wulf(a)rock-chips.com>
usb: hcd: initialize hcd->flags to 0 when rm hcd
Laurent Pinchart <laurent.pinchart+renesas(a)ideasonboard.com>
serial: sh-sci: Fix register offsets for the IRDA serial port
Volodymyr Bendiuga <volodymyr.bendiuga(a)gmail.com>
phy: increase size of MII_BUS_ID_SIZE and bus_id
David Lechner <david(a)lechnology.com>
dt-bindings: Add vendor prefix for LEGO
David Lechner <david(a)lechnology.com>
dt-bindings: Add LEGO MINDSTORMS EV3 compatible specification
Alison Schofield <amsfield22(a)gmail.com>
iio: trigger: free trigger resource correctly
Li Zhong <zhong(a)linux.vnet.ibm.com>
crypto: vmx - disable preemption to enable vsx in aes_ctr.c
Tony Lindgren <tony(a)atomide.com>
ARM: omap2plus_defconfig: Fix probe errors on UARTs 5 and 6
Valentin Longchamp <valentin.longchamp(a)keymile.com>
powerpc/corenet: explicitly disable the SDHC controller on kmcoge4
Nate Watterson <nwatters(a)codeaurora.org>
iommu/arm-smmu-v3: Clear prior settings when updating STEs
Li Zhong <zhong(a)linux.vnet.ibm.com>
KVM: PPC: Book 3S: XICS: correct the real mode ICP rejecting counter
Noralf Trønnes <noralf(a)tronnes.org>
drm: drm_minor_register(): Clean up debugfs on failure
Harninder Rai <harninder.rai(a)nxp.com>
dt-bindings: clockgen: Add compatible string for LS1012A
Patrick Bruenn <p.bruenn(a)beckhoff.com>
ARM: dts: imx53-qsb-common: fix FEC pinmux config
Juergen Gross <jgross(a)suse.com>
xen/netback: set default upper limit of tx/rx queues to 8
Jason Gunthorpe <jgunthorpe(a)obsidianresearch.com>
PCI: mvebu: Handle changes to the bridge windows while enabled
Maciej W. Rozycki <macro(a)linux-mips.org>
video: fbdev: pmag-ba-fb: Remove bad `__init' annotation
Lars-Peter Clausen <lars(a)metafoo.de>
adv7604: Initialize drive strength to default when using DT
-------------
Diffstat:
Documentation/devicetree/bindings/arm/davinci.txt | 4 +
.../devicetree/bindings/clock/qoriq-clock.txt | 1 +
.../devicetree/bindings/vendor-prefixes.txt | 1 +
Makefile | 4 +-
arch/arm/boot/dts/imx53-qsb-common.dtsi | 20 ++--
arch/arm/configs/omap2plus_defconfig | 1 +
arch/arm/kernel/traps.c | 28 ++++--
arch/mips/ar7/platform.c | 5 +
arch/mips/ar7/prom.c | 2 -
arch/mips/include/asm/mips-cm.h | 4 +-
arch/mips/kernel/process.c | 4 +-
arch/mips/kernel/smp.c | 29 ++++--
arch/mips/mm/uasm-micromips.c | 2 +-
arch/powerpc/boot/dts/fsl/kmcoge4.dts | 4 +
arch/powerpc/kvm/book3s_hv_rm_xics.c | 5 +-
arch/sh/kernel/cpu/sh3/setup-sh770x.c | 1 -
arch/x86/crypto/sha-mb/sha1_mb_mgr_flush_avx2.S | 12 +--
arch/x86/include/asm/uaccess.h | 14 ++-
arch/x86/oprofile/op_model_ppro.c | 4 +-
crypto/asymmetric_keys/pkcs7_parser.c | 2 +-
drivers/block/rbd.c | 4 +-
drivers/crypto/vmx/aes_ctr.c | 6 ++
drivers/gpu/drm/drm_drv.c | 2 +-
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +-
drivers/iio/trigger/iio-trig-interrupt.c | 8 +-
drivers/iio/trigger/iio-trig-sysfs.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 +-
drivers/input/keyboard/mpr121_touchkey.c | 24 +++--
drivers/input/mouse/elan_i2c_core.c | 1 +
drivers/iommu/arm-smmu-v3.c | 10 +-
drivers/media/i2c/adv7604.c | 3 +
drivers/net/can/c_can/c_can_pci.c | 1 -
drivers/net/can/c_can/c_can_platform.c | 1 -
drivers/net/can/sun4i_can.c | 12 ++-
drivers/net/usb/cdc_ncm.c | 28 ++++++
drivers/net/usb/huawei_cdc_ncm.c | 6 ++
drivers/net/xen-netback/netback.c | 6 +-
drivers/pci/host/pci-mvebu.c | 101 ++++++++++++---------
drivers/platform/x86/hp-wmi.c | 60 +++++++-----
drivers/s390/net/qeth_core.h | 1 -
drivers/s390/net/qeth_core_main.c | 21 ++++-
drivers/s390/net/qeth_l2_main.c | 15 ---
drivers/s390/net/qeth_l3_main.c | 15 ---
drivers/staging/iio/trigger/iio-trig-bfin-timer.c | 4 +-
drivers/tty/serial/sh-sci.c | 17 ++--
drivers/usb/core/hcd.c | 1 +
drivers/video/fbdev/pmag-ba-fb.c | 2 +-
include/linux/phy.h | 8 +-
include/linux/preempt.h | 21 +++--
include/linux/usb/cdc_ncm.h | 1 +
include/sound/seq_kernel.h | 3 +-
kernel/workqueue_internal.h | 3 +-
lib/asn1_decoder.c | 4 +-
lib/test_firmware.c | 11 ++-
net/dsa/Kconfig | 5 +-
net/ipv4/ah4.c | 3 +
net/netfilter/nft_meta.c | 28 +++++-
security/keys/trusted.c | 71 +++++++--------
sound/core/seq/oss/seq_oss_midi.c | 4 +-
sound/core/seq/oss/seq_oss_readq.c | 29 ++++++
sound/core/seq/oss/seq_oss_readq.h | 2 +
tools/testing/selftests/firmware/fw_filesystem.sh | 10 +-
tools/testing/selftests/firmware/fw_userhelper.sh | 28 +++++-
63 files changed, 468 insertions(+), 265 deletions(-)
I've cc'ed some folks in hopes to get this resolved upstream.
Either way, 4.1's EoL was previously moved to about 6 months from now,
so hopefully we'll have more than enough time to get this resolved.
On Sat, Nov 11, 2017 at 10:13:55PM +0000, Tuncer Ayaz wrote:
>The predicament I'm in on my machines is that ever since drm-intel has
>implemented atomic modesetting, there's a list regressions caused by
>those fundamental architecture changes and the code churn it implied.
>This means 4.1 is (from what I can tell) the last kernel before atomic
>modesetting was added and the only kernel free of all those issues
>which necessitate trying out various combinations of flags on the
>kernel cmdline.
>
>For instance, right now I'm trying 4.13.12 with these flags:
>video=SVIDEO-1:d
>i915.semaphores=1
>i915.enable_rc6=0
>i915.enable_psr=0
>intel_iommu=igfx_off
>
>PS: I'm kinda confused how anyone uses DMAR with VT-d when it's known
>to be buggy.
>
>The flags seem to decrease the chances of provoking the bugs, but after a
>day of running Xorg, it's possible to still hit the RCS0 GPU hangs.
>
>If you don't pass video=SVIDEO-1:d, then atomic's flip_done times out
>on boot or exit to VT console. It's good that other people have the same
>issues and have been following the bugzilla tickets, and con confirm
>the results.
>
>I'm kinda glad I don't have a machine that's newer than Sandybridge
>since that means I can use 4.1, though it's not a long-term solution,
>and the plan is for the reported bugzilla tickets to be resolved at
>some point, or me switching away from Intel GPUs, which might be
>doable if I save money and get an AMD APU laptop next summer and
>switch my desktop to a discrete GPU.
>
>For example:
>https://bugs.freedesktop.org/show_bug.cgi?id=101237
>https://bugs.freedesktop.org/show_bug.cgi?id=103076
>https://bbs.archlinux.org/viewtopic.php?id=218581&p=3
>https://bugs.archlinux.org/task/51703
>
>So, since 4.4, 4.9 and 4.12, drm-tip are still regressive,
>I wanted to ask if you considered pushing back 4.1's EOL.
>
>Given a look at bugzilla, I have the impression that those issues will
>need at least another year before they're fixed, since most of them
>have been sitting there for many, many months. I suspect the Intel DRM
>team doesn't have the bandwidth to address the issues in a timely
>fashion while still adding upbringing for new GPUs and features
>(fences, etc.).
>
>The generic modesetting DDX and Wayland are less susceptible to the
>GPU hangs, but can be made to provoke it if tried long enough.
>However, the modesetting DDX tears heavily and is about to gain atomic
>modesetting in the next Xorg release, so will suffer from the same
>easy GPU hang likelihood.
>
>Prior to SandyBridge there was zero tearing but beginning with
>SandyBridge xf86-video-intel's TearFree=TRUE is the only reliable way
>to fix Xorg tearing.
>
>I do appreciate you maintaining 4.1 so far and hate to admit that I'm
>reliant on it on more than two machines, before and after Sandybridge,
>exluding those machines which need a newer kernel. I also understand
>how much work this is and since I'm not using Linux professionally for
>a product, I can't offer compensation for your time. I can only offer
>to collect and point you at a list of DRM bugs for validation of my
>claims.
--
Thanks,
Sasha
The rps_resp buffer in ata_device is a DMA target, but it isn't
explicitly cacheline aligned. Due to this, adjacent fields can be
overwritten with stale data from memory on non-coherent architectures.
As a result, the kernel is sometimes unable to communicate with an
SATA device behind a SAS expander.
Fix this by ensuring that the rps_resp buffer is cacheline aligned.
This issue is similar to that fixed by Commit 84bda12af31f93 ("libata:
align ap->sector_buf") and Commit 4ee34ea3a12396f35b26 ("libata: Align
ata_device's id on a cacheline").
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
include/scsi/libsas.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h
index 0f9cbf9..6df6fe0 100644
--- a/include/scsi/libsas.h
+++ b/include/scsi/libsas.h
@@ -159,11 +159,11 @@ struct expander_device {
struct sata_device {
unsigned int class;
- struct smp_resp rps_resp; /* report_phy_sata_resp */
u8 port_no; /* port number, if this is a PM (Port) */
struct ata_port *ap;
struct ata_host ata_host;
+ struct smp_resp rps_resp ____cacheline_aligned; /* report_phy_sata_resp */
u8 fis[ATA_RESP_FIS_SIZE];
};
--
2.7.0
The patch titled
Subject: mm/page_ext.c: check if page_ext is not prepared
has been removed from the -mm tree. Its filename was
mm-page_ext-check-if-page_ext-is-not-prepared.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jaewon Kim <jaewon31.kim(a)samsung.com>
Subject: mm/page_ext.c: check if page_ext is not prepared
online_page_ext() and page_ext_init() allocate page_ext for each section,
but they do not allocate if the first PFN is !pfn_present(pfn) or
!pfn_valid(pfn). Then section->page_ext remains as NULL. lookup_page_ext
checks NULL only if CONFIG_DEBUG_VM is enabled. For a valid PFN,
__set_page_owner will try to get page_ext through lookup_page_ext.
Without CONFIG_DEBUG_VM lookup_page_ext will misuse NULL pointer as value
0. This incurrs invalid address access.
This is the panic example when PFN 0x100000 is not valid but PFN 0x13FC00
is being used for page_ext. section->page_ext is NULL, get_entry returned
invalid page_ext address as 0x1DFA000 for a PFN 0x13FC00.
To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
will be checked at all times.
<1>[ 11.618085] Unable to handle kernel paging request at virtual address 01dfa014
<1>[ 11.618140] pgd = ffffffc0c6dc9000
<1>[ 11.618174] [01dfa014] *pgd=0000000000000000, *pud=0000000000000000
<4>[ 11.618240] ------------[ cut here ]------------
<2>[ 11.618278] Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
<0>[ 11.618338] Internal error: Oops: 96000045 [#1] PREEMPT SMP
<4>[ 11.618381] Modules linked in:
<4>[ 11.618524] task: ffffffc0c6ec9180 task.stack: ffffffc0c6f40000
<4>[ 11.618569] PC is at __set_page_owner+0x48/0x78
<4>[ 11.618607] LR is at __set_page_owner+0x44/0x78
<4>[ 11.626025] [<ffffff80082371e0>] __set_page_owner+0x48/0x78
<4>[ 11.626071] [<ffffff80081df9f0>] get_page_from_freelist+0x880/0x8e8
<4>[ 11.626118] [<ffffff80081e00a4>] __alloc_pages_nodemask+0x14c/0xc48
<4>[ 11.626165] [<ffffff80081e610c>] __do_page_cache_readahead+0xdc/0x264
<4>[ 11.626214] [<ffffff80081d8824>] filemap_fault+0x2ac/0x550
<4>[ 11.626259] [<ffffff80082e5cf8>] ext4_filemap_fault+0x3c/0x58
<4>[ 11.626305] [<ffffff800820a2f8>] __do_fault+0x80/0x120
<4>[ 11.626347] [<ffffff800820eb4c>] handle_mm_fault+0x704/0xbb0
<4>[ 11.626393] [<ffffff800809ba70>] do_page_fault+0x2e8/0x394
<4>[ 11.626437] [<ffffff8008080be4>] do_mem_abort+0x88/0x124
Pre-4.7 kernels also need f86e427197 ("mm: check the return value of
lookup_page_ext for all call sites").
Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
Fixes: eefa864b701d ("mm/page_ext: resurrect struct page extending code for debugging")
Signed-off-by: Jaewon Kim <jaewon31.kim(a)samsung.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Joonsoo Kim <js1304(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [depends on f86e427197, see above]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_ext.c | 4 ----
1 file changed, 4 deletions(-)
diff -puN mm/page_ext.c~mm-page_ext-check-if-page_ext-is-not-prepared mm/page_ext.c
--- a/mm/page_ext.c~mm-page_ext-check-if-page_ext-is-not-prepared
+++ a/mm/page_ext.c
@@ -125,7 +125,6 @@ struct page_ext *lookup_page_ext(struct
struct page_ext *base;
base = NODE_DATA(page_to_nid(page))->node_page_ext;
-#if defined(CONFIG_DEBUG_VM)
/*
* The sanity checks the page allocator does upon freeing a
* page can reach here before the page_ext arrays are
@@ -134,7 +133,6 @@ struct page_ext *lookup_page_ext(struct
*/
if (unlikely(!base))
return NULL;
-#endif
index = pfn - round_down(node_start_pfn(page_to_nid(page)),
MAX_ORDER_NR_PAGES);
return get_entry(base, index);
@@ -199,7 +197,6 @@ struct page_ext *lookup_page_ext(struct
{
unsigned long pfn = page_to_pfn(page);
struct mem_section *section = __pfn_to_section(pfn);
-#if defined(CONFIG_DEBUG_VM)
/*
* The sanity checks the page allocator does upon freeing a
* page can reach here before the page_ext arrays are
@@ -208,7 +205,6 @@ struct page_ext *lookup_page_ext(struct
*/
if (!section->page_ext)
return NULL;
-#endif
return get_entry(section->page_ext, pfn);
}
_
Patches currently in -mm which might be from jaewon31.kim(a)samsung.com are
The patch titled
Subject: mm/page_alloc.c: broken deferred calculation
has been removed from the -mm tree. Its filename was
mm-broken-deferred-calculation.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Subject: mm/page_alloc.c: broken deferred calculation
In reset_deferred_meminit() we determine number of pages that must not be
deferred. We initialize pages for at least 2G of memory, but also pages
for reserved memory in this node.
The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical addresses,
and returns size in bytes. However, reset_deferred_meminit() assumes that
that this function operates with pfns, and returns page count.
The result is that in the best case machine boots slower than expected due
to initializing more pages than needed in single thread, and in the worst
case panics because fewer than needed pages are initialized early.
Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mmzone.h | 3 ++-
mm/page_alloc.c | 27 ++++++++++++++++++---------
2 files changed, 20 insertions(+), 10 deletions(-)
diff -puN include/linux/mmzone.h~mm-broken-deferred-calculation include/linux/mmzone.h
--- a/include/linux/mmzone.h~mm-broken-deferred-calculation
+++ a/include/linux/mmzone.h
@@ -700,7 +700,8 @@ typedef struct pglist_data {
* is the first PFN that needs to be initialised.
*/
unsigned long first_deferred_pfn;
- unsigned long static_init_size;
+ /* Number of non-deferred pages */
+ unsigned long static_init_pgcnt;
#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
diff -puN mm/page_alloc.c~mm-broken-deferred-calculation mm/page_alloc.c
--- a/mm/page_alloc.c~mm-broken-deferred-calculation
+++ a/mm/page_alloc.c
@@ -291,28 +291,37 @@ EXPORT_SYMBOL(nr_online_nodes);
int page_group_by_mobility_disabled __read_mostly;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
+
+/*
+ * Determine how many pages need to be initialized durig early boot
+ * (non-deferred initialization).
+ * The value of first_deferred_pfn will be set later, once non-deferred pages
+ * are initialized, but for now set it ULONG_MAX.
+ */
static inline void reset_deferred_meminit(pg_data_t *pgdat)
{
- unsigned long max_initialise;
- unsigned long reserved_lowmem;
+ phys_addr_t start_addr, end_addr;
+ unsigned long max_pgcnt;
+ unsigned long reserved;
/*
* Initialise at least 2G of a node but also take into account that
* two large system hashes that can take up 1GB for 0.25TB/node.
*/
- max_initialise = max(2UL << (30 - PAGE_SHIFT),
- (pgdat->node_spanned_pages >> 8));
+ max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
+ (pgdat->node_spanned_pages >> 8));
/*
* Compensate the all the memblock reservations (e.g. crash kernel)
* from the initial estimation to make sure we will initialize enough
* memory to boot.
*/
- reserved_lowmem = memblock_reserved_memory_within(pgdat->node_start_pfn,
- pgdat->node_start_pfn + max_initialise);
- max_initialise += reserved_lowmem;
+ start_addr = PFN_PHYS(pgdat->node_start_pfn);
+ end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
+ reserved = memblock_reserved_memory_within(start_addr, end_addr);
+ max_pgcnt += PHYS_PFN(reserved);
- pgdat->static_init_size = min(max_initialise, pgdat->node_spanned_pages);
+ pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
pgdat->first_deferred_pfn = ULONG_MAX;
}
@@ -339,7 +348,7 @@ static inline bool update_defer_init(pg_
if (zone_end < pgdat_end_pfn(pgdat))
return true;
(*nr_initialised)++;
- if ((*nr_initialised > pgdat->static_init_size) &&
+ if ((*nr_initialised > pgdat->static_init_pgcnt) &&
(pfn & (PAGES_PER_SECTION - 1)) == 0) {
pgdat->first_deferred_pfn = pfn;
return false;
_
Patches currently in -mm which might be from pasha.tatashin(a)oracle.com are
sparc64-ng4-memset-32-bits-overflow.patch
The patch titled
Subject: mm, swap: fix false error message in __swp_swapcount()
has been removed from the -mm tree. Its filename was
mm-swap-fix-false-error-message-in-__swp_swapcount.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Huang Ying <huang.ying.caritas(a)gmail.com>
Subject: mm, swap: fix false error message in __swp_swapcount()
When a page fault occurs for a swap entry, the physical swap readahead
(not the VMA base swap readahead) may readahead several swap entries after
the fault swap entry. The readahead algorithm calculates some of the swap
entries to readahead via increasing the offset of the fault swap entry
without checking whether they are beyond the end of the swap device and it
relys on the __swp_swapcount() and swapcache_prepare() to check it.
Although __swp_swapcount() checks for the swap entry passed in, it will
complain with the error message as follow for the expected invalid swap
entry. This may make the end users confused.
swap_info_get: Bad swap offset entry 0200f8a7
To fix the false error message, the swap entry checking is added in
swapin_readahead() to avoid to pass the out-of-bound swap entries and the
swap entry reserved for the swap header to __swp_swapcount() and
swapcache_prepare().
Link: http://lkml.kernel.org/r/20171102054225.22897-1-ying.huang@intel.com
Fixes: e8c26ab60598 ("mm/swap: skip readahead for unreferenced swap slots")
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com>
Reported-by: Christian Kujau <lists(a)nerdbynature.de>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Suggested-by: Minchan Kim <minchan(a)kernel.org>
Cc: Tim Chen <tim.c.chen(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.11+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swap_state.c | 3 +++
1 file changed, 3 insertions(+)
diff -puN mm/swap_state.c~mm-swap-fix-false-error-message-in-__swp_swapcount mm/swap_state.c
--- a/mm/swap_state.c~mm-swap-fix-false-error-message-in-__swp_swapcount
+++ a/mm/swap_state.c
@@ -559,6 +559,7 @@ struct page *swapin_readahead(swp_entry_
unsigned long offset = entry_offset;
unsigned long start_offset, end_offset;
unsigned long mask;
+ struct swap_info_struct *si = swp_swap_info(entry);
struct blk_plug plug;
bool do_poll = true, page_allocated;
@@ -572,6 +573,8 @@ struct page *swapin_readahead(swp_entry_
end_offset = offset | mask;
if (!start_offset) /* First page is swap header. */
start_offset++;
+ if (end_offset >= si->max)
+ end_offset = si->max - 1;
blk_start_plug(&plug);
for (offset = start_offset; offset <= end_offset ; offset++) {
_
Patches currently in -mm which might be from huang.ying.caritas(a)gmail.com are
The patch titled
Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr()
has been removed from the -mm tree. Its filename was
ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: alex chen <alex.chen(a)huawei.com>
Subject: ocfs2: should wait dio before inode lock in ocfs2_setattr()
we should wait dio requests to finish before inode lock in
ocfs2_setattr(), otherwise the following deadlock will happen:
process 1 process 2 process 3
truncate file 'A' end_io of writing file 'A' receiving the bast messages
ocfs2_setattr
ocfs2_inode_lock_tracker
ocfs2_inode_lock_full
inode_dio_wait
__inode_dio_wait
-->waiting for all dio
requests finish
dlm_proxy_ast_handler
dlm_do_local_bast
ocfs2_blocking_ast
ocfs2_generic_handle_bast
set OCFS2_LOCK_BLOCKED flag
dio_end_io
dio_bio_end_aio
dio_complete
ocfs2_dio_end_io
ocfs2_dio_end_io_write
ocfs2_inode_lock
__ocfs2_cluster_lock
ocfs2_wait_for_mask
-->waiting for OCFS2_LOCK_BLOCKED
flag to be cleared, that is waiting
for 'process 1' unlocking the inode lock
inode_dio_end
-->here dec the i_dio_count, but will never
be called, so a deadlock happened.
Link: http://lkml.kernel.org/r/59F81636.70508@huawei.com
Signed-off-by: Alex Chen <alex.chen(a)huawei.com>
Reviewed-by: Jun Piao <piaojun(a)huawei.com>
Reviewed-by: Joseph Qi <jiangqi903(a)gmail.com>
Acked-by: Changwei Ge <ge.changwei(a)h3c.com>
Cc: Mark Fasheh <mfasheh(a)versity.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff -puN fs/ocfs2/file.c~ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr fs/ocfs2/file.c
--- a/fs/ocfs2/file.c~ocfs2-should-wait-dio-before-inode-lock-in-ocfs2_setattr
+++ a/fs/ocfs2/file.c
@@ -1161,6 +1161,13 @@ int ocfs2_setattr(struct dentry *dentry,
}
size_change = S_ISREG(inode->i_mode) && attr->ia_valid & ATTR_SIZE;
if (size_change) {
+ /*
+ * Here we should wait dio to finish before inode lock
+ * to avoid a deadlock between ocfs2_setattr() and
+ * ocfs2_dio_end_io_write()
+ */
+ inode_dio_wait(inode);
+
status = ocfs2_rw_lock(inode, 1);
if (status < 0) {
mlog_errno(status);
@@ -1200,8 +1207,6 @@ int ocfs2_setattr(struct dentry *dentry,
if (status)
goto bail_unlock;
- inode_dio_wait(inode);
-
if (i_size_read(inode) >= attr->ia_size) {
if (ocfs2_should_order_data(inode)) {
status = ocfs2_begin_ordered_truncate(inode,
_
Patches currently in -mm which might be from alex.chen(a)huawei.com are
The patch titled
Subject: ocfs2: fix cluster hang after a node dies
has been removed from the -mm tree. Its filename was
ocfs2-fix-cluster-hang-after-a-node-dies.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Changwei Ge <ge.changwei(a)h3c.com>
Subject: ocfs2: fix cluster hang after a node dies
When a node dies, other live nodes have to choose a new master for an
existed lock resource mastered by the dead node.
As for ocfs2/dlm implementation, this is done by function -
dlm_move_lockres_to_recovery_list which marks those lock rsources as
DLM_LOCK_RES_RECOVERING and manages them via a list from which DLM changes
lock resource's master later.
So without invoking dlm_move_lockres_to_recovery_list, no master will be
choosed after dlm recovery accomplishment since no lock resource can be
found through ::resource list.
What's worse is that if DLM_LOCK_RES_RECOVERING is not marked for lock
resources mastered a dead node, it will break up synchronization among
nodes.
So invoke dlm_move_lockres_to_recovery_list again.
Fixs: 'commit ee8f7fcbe638 ("ocfs2/dlm: continue to purge recovery lockres when recovery master goes down")'
Link: http://lkml.kernel.org/r/63ADC13FD55D6546B7DECE290D39E373CED6E0F9@H3CMLB14-…
Signed-off-by: Changwei Ge <ge.changwei(a)h3c.com>
Reported-by: Vitaly Mayatskih <v.mayatskih(a)gmail.com>
Tested-by: Vitaly Mayatskikh <v.mayatskih(a)gmail.com>
Cc: Mark Fasheh <mfasheh(a)versity.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Joseph Qi <jiangqi903(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/dlm/dlmrecovery.c | 1 +
1 file changed, 1 insertion(+)
diff -puN fs/ocfs2/dlm/dlmrecovery.c~ocfs2-fix-cluster-hang-after-a-node-dies fs/ocfs2/dlm/dlmrecovery.c
--- a/fs/ocfs2/dlm/dlmrecovery.c~ocfs2-fix-cluster-hang-after-a-node-dies
+++ a/fs/ocfs2/dlm/dlmrecovery.c
@@ -2419,6 +2419,7 @@ static void dlm_do_local_recovery_cleanu
dlm_lockres_put(res);
continue;
}
+ dlm_move_lockres_to_recovery_list(dlm, res);
} else if (res->owner == dlm->node_num) {
dlm_free_dead_locks(dlm, res, dead_node);
__dlm_lockres_calc_usage(dlm, res);
_
Patches currently in -mm which might be from ge.changwei(a)h3c.com are