From: Mel Gorman <mgorman(a)techsingularity.net>
Subject: mm: do not boost watermarks to avoid fragmentation for the DISCONTIG memory model
Mikulas Patocka reported that 1c30844d2dfe ("mm: reclaim small amounts of
memory when an external fragmentation event occurs") "broke" memory
management on parisc. The machine is not NUMA but the DISCONTIG model
creates three pgdats even though it's a UMA machine for the following
ranges
0) Start 0x0000000000000000 End 0x000000003fffffff Size 1024 MB
1) Start 0x0000000100000000 End 0x00000001bfdfffff Size 3070 MB
2) Start 0x0000004040000000 End 0x00000040ffffffff Size 3072 MB
Mikulas reported:
With the patch 1c30844d2, the kernel will incorrectly reclaim the
first zone when it fills up, ignoring the fact that there are two
completely free zones. Basiscally, it limits cache size to 1GiB.
For example, if I run:
# dd if=/dev/sda of=/dev/null bs=1M count=2048
- with the proper kernel, there should be "Buffers - 2GiB"
when this command finishes. With the patch 1c30844d2, buffers
will consume just 1GiB or slightly more, because the kernel was
incorrectly reclaiming them.
The page allocator and reclaim makes assumptions that pgdats really
represent NUMA nodes and zones represent ranges and makes decisions on
that basis. Watermark boosting for small pgdats leads to unexpected
results even though this would have behaved reasonably on SPARSEMEM.
DISCONTIG is essentially deprecated and even parisc plans to move to
SPARSEMEM so there is no need to be fancy, this patch simply disables
watermark boosting by default on DISCONTIGMEM.
Link: http://lkml.kernel.org/r/20190419094335.GJ18914@techsingularity.net
Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs")
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Reported-by: Mikulas Patocka <mpatocka(a)redhat.com>
Tested-by: Mikulas Patocka <mpatocka(a)redhat.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: James Bottomley <James.Bottomley(a)hansenpartnership.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/sysctl/vm.txt | 16 ++++++++--------
mm/page_alloc.c | 13 +++++++++++++
2 files changed, 21 insertions(+), 8 deletions(-)
--- a/Documentation/sysctl/vm.txt~mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model
+++ a/Documentation/sysctl/vm.txt
@@ -866,14 +866,14 @@ The intent is that compaction has less w
increase the success rate of future high-order allocations such as SLUB
allocations, THP and hugetlbfs pages.
-To make it sensible with respect to the watermark_scale_factor parameter,
-the unit is in fractions of 10,000. The default value of 15,000 means
-that up to 150% of the high watermark will be reclaimed in the event of
-a pageblock being mixed due to fragmentation. The level of reclaim is
-determined by the number of fragmentation events that occurred in the
-recent past. If this value is smaller than a pageblock then a pageblocks
-worth of pages will be reclaimed (e.g. 2MB on 64-bit x86). A boost factor
-of 0 will disable the feature.
+To make it sensible with respect to the watermark_scale_factor
+parameter, the unit is in fractions of 10,000. The default value of
+15,000 on !DISCONTIGMEM configurations means that up to 150% of the high
+watermark will be reclaimed in the event of a pageblock being mixed due
+to fragmentation. The level of reclaim is determined by the number of
+fragmentation events that occurred in the recent past. If this value is
+smaller than a pageblock then a pageblocks worth of pages will be reclaimed
+(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature.
=============================================================
--- a/mm/page_alloc.c~mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model
+++ a/mm/page_alloc.c
@@ -266,7 +266,20 @@ compound_page_dtor * const compound_page
int min_free_kbytes = 1024;
int user_min_free_kbytes = -1;
+#ifdef CONFIG_DISCONTIGMEM
+/*
+ * DiscontigMem defines memory ranges as separate pg_data_t even if the ranges
+ * are not on separate NUMA nodes. Functionally this works but with
+ * watermark_boost_factor, it can reclaim prematurely as the ranges can be
+ * quite small. By default, do not boost watermarks on discontigmem as in
+ * many cases very high-order allocations like THP are likely to be
+ * unsupported and the premature reclaim offsets the advantage of long-term
+ * fragmentation avoidance.
+ */
+int watermark_boost_factor __read_mostly;
+#else
int watermark_boost_factor __read_mostly = 15000;
+#endif
int watermark_scale_factor = 10;
static unsigned long nr_kernel_pages __initdata;
_
From: YueHaibing <yuehaibing(a)huawei.com>
Subject: lib/Kconfig.debug: fix build error without CONFIG_BLOCK
If CONFIG_TEST_KMOD is set to M, while CONFIG_BLOCK is not set, XFS and
BTRFS can not be compiled successly.
Link: http://lkml.kernel.org/r/20190410075434.35220-1-yuehaibing@huawei.com
Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader")
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Cc: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Kconfig.debug | 1 +
1 file changed, 1 insertion(+)
--- a/lib/Kconfig.debug~lib-kconfigdebug-fix-build-error-without-config_block
+++ a/lib/Kconfig.debug
@@ -1929,6 +1929,7 @@ config TEST_KMOD
depends on m
depends on BLOCK && (64BIT || LBDAF) # for XFS, BTRFS
depends on NETDEVICES && NET_CORE && INET # for TUN
+ depends on BLOCK
select TEST_LKM
select XFS_FS
select TUN
_
From: Jérôme Glisse <jglisse(a)redhat.com>
Subject: zram: pass down the bvec we need to read into in the work struct
When scheduling work item to read page we need to pass down the proper
bvec struct which points to the page to read into. Before this patch it
uses a randomly initialized bvec (only if PAGE_SIZE != 4096) which is
wrong.
Note that without this patch on arch/kernel where PAGE_SIZE != 4096
userspace could read random memory through a zram block device (thought
userspace probably would have no control on the address being read).
Link: http://lkml.kernel.org/r/20190408183219.26377-1-jglisse@redhat.com
Signed-off-by: Jérôme Glisse <jglisse(a)redhat.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky(a)gmail.com>
Acked-by: Minchan Kim <minchan(a)kernel.org>
Cc: Nitin Gupta <ngupta(a)vflare.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/block/zram/zram_drv.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/block/zram/zram_drv.c~zram-pass-down-the-bvec-we-need-to-read-into-in-the-work-struct
+++ a/drivers/block/zram/zram_drv.c
@@ -774,18 +774,18 @@ struct zram_work {
struct zram *zram;
unsigned long entry;
struct bio *bio;
+ struct bio_vec bvec;
};
#if PAGE_SIZE != 4096
static void zram_sync_read(struct work_struct *work)
{
- struct bio_vec bvec;
struct zram_work *zw = container_of(work, struct zram_work, work);
struct zram *zram = zw->zram;
unsigned long entry = zw->entry;
struct bio *bio = zw->bio;
- read_from_bdev_async(zram, &bvec, entry, bio);
+ read_from_bdev_async(zram, &zw->bvec, entry, bio);
}
/*
@@ -798,6 +798,7 @@ static int read_from_bdev_sync(struct zr
{
struct zram_work work;
+ work.bvec = *bvec;
work.zram = zram;
work.entry = entry;
work.bio = bio;
_
This is a note to let you know that I've just added the patch titled
mm/memory_hotplug: Do not unlock when fails to take the
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From d2ab99403ee00d8014e651728a4702ea1ae5e52c Mon Sep 17 00:00:00 2001
From: zhong jiang <zhongjiang(a)huawei.com>
Date: Mon, 8 Apr 2019 12:07:17 +0800
Subject: mm/memory_hotplug: Do not unlock when fails to take the
device_hotplug_lock
When adding the memory by probing memory block in sysfs interface, there is an
obvious issue that we will unlock the device_hotplug_lock when fails to takes it.
That issue was introduced in Commit 8df1d0e4a265
("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
We should drop out in time when fails to take the device_hotplug_lock.
Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
Reported-by: Yang yingliang <yangyingliang(a)huawei.com>
Signed-off-by: zhong jiang <zhongjiang(a)huawei.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index cb8347500ce2..e49028a60429 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -506,7 +506,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
ret = lock_device_hotplug_sysfs();
if (ret)
- goto out;
+ return ret;
nid = memory_add_physaddr_to_nid(phys_addr);
ret = __add_memory(nid, phys_addr,
--
2.21.0
This is the start of the stable review cycle for the 4.9.171 release.
There are 44 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri 26 Apr 2019 05:07:31 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.171-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.171-rc1
Linus Torvalds <torvalds(a)linux-foundation.org>
i2c-hid: properly terminate i2c_hid_dmi_desc_override_table[] array
Matteo Croce <mcroce(a)redhat.com>
percpu: stop printing kernel addresses
Takashi Iwai <tiwai(a)suse.de>
ALSA: info: Fix racy addition/deletion of nodes
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
mm/vmstat.c: fix /proc/vmstat format for CONFIG_DEBUG_TLBFLUSH=y CONFIG_SMP=n
Jann Horn <jannh(a)google.com>
device_cgroup: fix RCU imbalance in error case
Phil Auld <pauld(a)redhat.com>
sched/fair: Limit sched_cfs_period_timer() loop to avoid hard lockup
Matthias Kaehlcke <mka(a)chromium.org>
Revert "kbuild: use -Oz instead of -Os when using clang"
Kim Phillips <kim.phillips(a)amd.com>
perf/x86/amd: Add event map for AMD Family 17h
Felix Fietkau <nbd(a)nbd.name>
mac80211: do not call driver wake_tx_queue op during reconfig
Vijayakumar Durai <vijayakumar.durai1(a)vivint.com>
rt2x00: do not increment sequence number while re-transmitting
Masami Hiramatsu <mhiramat(a)kernel.org>
kprobes: Fix error check when reusing optimized probes
Masami Hiramatsu <mhiramat(a)kernel.org>
kprobes: Mark ftrace mcount handler functions nokprobe
Masami Hiramatsu <mhiramat(a)kernel.org>
x86/kprobes: Verify stack frame on kretprobe
Nathan Chancellor <natechancellor(a)gmail.com>
arm64: futex: Restore oldval initialization to work around buggy compilers
Eric Biggers <ebiggers(a)google.com>
crypto: x86/poly1305 - fix overflow during partial reduction
Suthikulpanit, Suravee <Suravee.Suthikulpanit(a)amd.com>
Revert "svm: Fix AVIC incomplete IPI emulation"
Saurav Kashyap <skashyap(a)marvell.com>
Revert "scsi: fcoe: clear FC_RP_STARTED flags when receiving a LOGO"
Takashi Iwai <tiwai(a)suse.de>
ALSA: core: Fix card races between register and disconnect
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: ni_usb6501: Fix possible double-free of ->usb_rx_buf
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: ni_usb6501: Fix use of uninitialized mutex
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: vmk80xx: Fix possible double-free of ->usb_rx_buf
Ian Abbott <abbotti(a)mev.co.uk>
staging: comedi: vmk80xx: Fix use of uninitialized semaphore
he, bo <bo.he(a)intel.com>
io: accel: kxcjk1013: restore the range after resume.
Georg Ottinger <g.ottinger(a)abatec.at>
iio: adc: at91: disable adc channel interrupt in timeout case
Dragos Bogdan <dragos.bogdan(a)analog.com>
iio: ad_sigma_delta: select channel when reading register
Mike Looijmans <mike.looijmans(a)topic.nl>
iio/gyro/bmg160: Use millidegrees for temperature scale
Mircea Caprioru <mircea.caprioru(a)analog.com>
staging: iio: ad7192: Fix ad7193 channel address
Sean Christopherson <sean.j.christopherson(a)intel.com>
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
Aurelien Aptel <aaptel(a)suse.com>
CIFS: keep FileInfo handle live during oplock break
Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is incomplete
Masahiro Yamada <yamada.masahiro(a)socionext.com>
modpost: file2alias: check prototype of handler
Masahiro Yamada <yamada.masahiro(a)socionext.com>
modpost: file2alias: go back to simple devtable lookup
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: sdhci: Fix data command CRC error handling
Christian Lamparter <chunkeey(a)gmail.com>
crypto: crypto4xx - properly set IV after de- and encrypt
Eric Dumazet <edumazet(a)google.com>
ipv4: ensure rcu_read_lock() in ipv4_link_failure()
Stephen Suryaputra <ssuryaextr(a)gmail.com>
ipv4: recompile ip options in ipv4_link_failure
Jason Wang <jasowang(a)redhat.com>
vhost: reject zero size iova range
Hangbin Liu <liuhangbin(a)gmail.com>
team: set slave to promisc if team is already in promisc mode
Eric Dumazet <edumazet(a)google.com>
tcp: tcp_grow_window() needs to respect tcp_space()
Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
net: fou: do not use guehdr after iptunnel_pull_offloads in gue_udp_recv
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: multicast: use rcu to access port list from br_multicast_start_querier
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: fix per-port af_packet sockets
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
net: atm: Fix potential Spectre v1 vulnerabilities
Sabrina Dubroca <sd(a)queasysnail.net>
bonding: fix event handling for stacked bonds
-------------
Diffstat:
Makefile | 7 +-
arch/arm64/include/asm/futex.h | 2 +-
arch/x86/crypto/poly1305-avx2-x86_64.S | 14 ++-
arch/x86/crypto/poly1305-sse2-x86_64.S | 22 ++--
arch/x86/events/amd/core.c | 35 ++++--
arch/x86/kernel/kprobes/core.c | 26 +++++
arch/x86/kvm/emulate.c | 21 ++--
arch/x86/kvm/svm.c | 19 ++-
crypto/testmgr.h | 44 ++++++-
drivers/char/tpm/tpm_i2c_atmel.c | 10 +-
drivers/crypto/amcc/crypto4xx_alg.c | 3 +-
drivers/crypto/amcc/crypto4xx_core.c | 9 ++
drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c | 3 +-
drivers/iio/accel/kxcjk-1013.c | 2 +
drivers/iio/adc/ad_sigma_delta.c | 1 +
drivers/iio/adc/at91_adc.c | 28 +++--
drivers/iio/gyro/bmg160_core.c | 6 +-
drivers/mmc/host/sdhci.c | 40 +++----
drivers/net/bonding/bond_main.c | 6 +-
drivers/net/team/team.c | 26 +++++
drivers/net/wireless/ralink/rt2x00/rt2x00.h | 1 -
drivers/net/wireless/ralink/rt2x00/rt2x00mac.c | 10 --
drivers/net/wireless/ralink/rt2x00/rt2x00queue.c | 15 ++-
drivers/scsi/libfc/fc_rport.c | 1 -
drivers/staging/comedi/drivers/ni_usb6501.c | 10 +-
drivers/staging/comedi/drivers/vmk80xx.c | 8 +-
drivers/staging/iio/adc/ad7192.c | 8 +-
drivers/vhost/vhost.c | 6 +-
fs/cifs/cifsglob.h | 2 +
fs/cifs/file.c | 30 ++++-
fs/cifs/misc.c | 25 +++-
fs/cifs/smb2misc.c | 6 +-
include/linux/kprobes.h | 1 +
kernel/kprobes.c | 6 +-
kernel/sched/fair.c | 25 ++++
kernel/trace/ftrace.c | 6 +-
mm/percpu.c | 8 +-
mm/vmstat.c | 5 -
net/atm/lec.c | 6 +-
net/bridge/br_input.c | 23 ++--
net/bridge/br_multicast.c | 4 +-
net/ipv4/fou.c | 4 +-
net/ipv4/route.c | 16 ++-
net/ipv4/tcp_input.c | 10 +-
net/mac80211/driver-ops.h | 3 +
scripts/mod/file2alias.c | 141 ++++++++---------------
security/device_cgroup.c | 2 +-
sound/core/info.c | 12 +-
sound/core/init.c | 18 +--
49 files changed, 470 insertions(+), 266 deletions(-)
Commit-ID: a860fa7b96e1a1c974556327aa1aee852d434c21
Gitweb: https://git.kernel.org/tip/a860fa7b96e1a1c974556327aa1aee852d434c21
Author: Xie XiuQi <xiexiuqi(a)huawei.com>
AuthorDate: Sat, 20 Apr 2019 16:34:16 +0800
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Thu, 25 Apr 2019 19:58:54 +0200
sched/numa: Fix a possible divide-by-zero
sched_clock_cpu() may not be consistent between CPUs. If a task
migrates to another CPU, then se.exec_start is set to that CPU's
rq_clock_task() by update_stats_curr_start(). Specifically, the new
value might be before the old value due to clock skew.
So then if in numa_get_avg_runtime() the expression:
'now - p->last_task_numa_placement'
ends up as -1, then the divider '*period + 1' in task_numa_placement()
is 0 and things go bang. Similar to update_curr(), check if time goes
backwards to avoid this.
[ peterz: Wrote new changelog. ]
[ mingo: Tweaked the code comment. ]
Signed-off-by: Xie XiuQi <xiexiuqi(a)huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: cj.chengjian(a)huawei.com
Cc: <stable(a)vger.kernel.org>
Link: http://lkml.kernel.org/r/20190425080016.GX11158@hirez.programming.kicks-ass…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
kernel/sched/fair.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a4d9e14bf138..35f3ea375084 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2007,6 +2007,10 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period)
if (p->last_task_numa_placement) {
delta = runtime - p->last_sum_exec_runtime;
*period = now - p->last_task_numa_placement;
+
+ /* Avoid time going backwards, prevent potential divide error: */
+ if (unlikely((s64)*period < 0))
+ *period = 0;
} else {
delta = p->se.avg.load_sum;
*period = LOAD_AVG_MAX;
This is a note to let you know that I've just added the patch titled
mm/memory_hotplug: Do not unlock when fails to take the
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the driver-core-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From d2ab99403ee00d8014e651728a4702ea1ae5e52c Mon Sep 17 00:00:00 2001
From: zhong jiang <zhongjiang(a)huawei.com>
Date: Mon, 8 Apr 2019 12:07:17 +0800
Subject: mm/memory_hotplug: Do not unlock when fails to take the
device_hotplug_lock
When adding the memory by probing memory block in sysfs interface, there is an
obvious issue that we will unlock the device_hotplug_lock when fails to takes it.
That issue was introduced in Commit 8df1d0e4a265
("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
We should drop out in time when fails to take the device_hotplug_lock.
Fixes: 8df1d0e4a265 ("mm/memory_hotplug: make add_memory() take the device_hotplug_lock")
Reported-by: Yang yingliang <yangyingliang(a)huawei.com>
Signed-off-by: zhong jiang <zhongjiang(a)huawei.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index cb8347500ce2..e49028a60429 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -506,7 +506,7 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
ret = lock_device_hotplug_sysfs();
if (ret)
- goto out;
+ return ret;
nid = memory_add_physaddr_to_nid(phys_addr);
ret = __add_memory(nid, phys_addr,
--
2.21.0
The ACEPC T8 and T11 mini PCs contain quite generic names in the sys_vendor
and product_name DMI strings, without this patch brcmfmac will try to load:
"brcmfmac43455-sdio.Default string-Default string.txt" as nvram file which
is way too generic.
The DMI strings on which we are matching are somewhat generic too, but
"To be filled by O.E.M." is less common then "Default string" and the
system-sku and bios-version strings are pretty unique. Beside the DMI
strings we also check the wifi-module chip-id and revision. I'm confident
that the combination of all this is unique.
Both the T8 and T11 use the same wifi-module, this commit adds DMI
quirks for both mini PCs pointing to brcmfmac43455-sdio.acepc-t8.txt .
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1690852
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
.../broadcom/brcm80211/brcmfmac/dmi.c | 26 +++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/dmi.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/dmi.c
index 7535cb0d4ac0..9f1417e00073 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/dmi.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/dmi.c
@@ -31,6 +31,10 @@ struct brcmf_dmi_data {
/* NOTE: Please keep all entries sorted alphabetically */
+static const struct brcmf_dmi_data acepc_t8_data = {
+ BRCM_CC_4345_CHIP_ID, 6, "acepc-t8"
+};
+
static const struct brcmf_dmi_data gpd_win_pocket_data = {
BRCM_CC_4356_CHIP_ID, 2, "gpd-win-pocket"
};
@@ -48,6 +52,28 @@ static const struct brcmf_dmi_data pov_tab_p1006w_data = {
};
static const struct dmi_system_id dmi_platform_data[] = {
+ {
+ /* ACEPC T8 Cherry Trail Z8350 mini PC */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "To be filled by O.E.M."),
+ DMI_EXACT_MATCH(DMI_BOARD_NAME, "Cherry Trail CR"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "T8"),
+ /* also match on somewhat unique bios-version */
+ DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1.000"),
+ },
+ .driver_data = (void *)&acepc_t8_data,
+ },
+ {
+ /* ACEPC T11 Cherry Trail Z8350 mini PC, same wifi as the T8 */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_BOARD_VENDOR, "To be filled by O.E.M."),
+ DMI_EXACT_MATCH(DMI_BOARD_NAME, "Cherry Trail CR"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "T11"),
+ /* also match on somewhat unique bios-version */
+ DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1.000"),
+ },
+ .driver_data = (void *)&acepc_t8_data,
+ },
{
/* Match for the GPDwin which unfortunately uses somewhat
* generic dmi strings, which is why we test for 4 strings.
--
2.21.0
It was reported on OpenWrt bug tracking system[1], that several users
are affected by the endless reboot of their routers if they configure
5GHz interface with channel 44 or 48.
The reboot loop is caused by the following excessive number of WARN_ON
messages:
WARNING: CPU: 0 PID: 0 at backports-4.19.23-1/net/mac80211/rx.c:4516
ieee80211_rx_napi+0x1fc/0xa54 [mac80211]
as the messages are being correctly emitted by the following guard:
case RX_ENC_LEGACY:
if (WARN_ON(status->rate_idx >= sband->n_bitrates))
as the rate_idx is in this case erroneously set to 251 (0xfb). This fix
simply converts previously used magic number to proper constant and
guards against substraction which is leading to the currently observed
underflow.
1. https://bugs.openwrt.org/index.php?do=details&task_id=2218
Fixes: 854783444bab ("mwl8k: properly set receive status rate index on 5 GHz receive")
Cc: <stable(a)vger.kernel.org>
Tested-by: Eubert Bao <bunnier(a)gmail.com>
Reported-by: Eubert Bao <bunnier(a)gmail.com>
Signed-off-by: Petr Štetiar <ynezz(a)true.cz>
---
drivers/net/wireless/marvell/mwl8k.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwl8k.c b/drivers/net/wireless/marvell/mwl8k.c
index 8e4e9b6..ffc565a 100644
--- a/drivers/net/wireless/marvell/mwl8k.c
+++ b/drivers/net/wireless/marvell/mwl8k.c
@@ -441,6 +441,9 @@ struct mwl8k_sta {
#define MWL8K_CMD_UPDATE_STADB 0x1123
#define MWL8K_CMD_BASTREAM 0x1125
+#define MWL8K_LEGACY_5G_RATE_OFFSET \
+ (ARRAY_SIZE(mwl8k_rates_24) - ARRAY_SIZE(mwl8k_rates_50))
+
static const char *mwl8k_cmd_name(__le16 cmd, char *buf, int bufsize)
{
u16 command = le16_to_cpu(cmd);
@@ -1016,8 +1019,9 @@ static void mwl8k_rxd_ap_refill(void *_rxd, dma_addr_t addr, int len)
if (rxd->channel > 14) {
status->band = NL80211_BAND_5GHZ;
- if (!(status->encoding == RX_ENC_HT))
- status->rate_idx -= 5;
+ if (!(status->encoding == RX_ENC_HT) &&
+ status->rate_idx >= MWL8K_LEGACY_5G_RATE_OFFSET)
+ status->rate_idx -= MWL8K_LEGACY_5G_RATE_OFFSET;
} else {
status->band = NL80211_BAND_2GHZ;
}
@@ -1124,8 +1128,9 @@ static void mwl8k_rxd_sta_refill(void *_rxd, dma_addr_t addr, int len)
if (rxd->channel > 14) {
status->band = NL80211_BAND_5GHZ;
- if (!(status->encoding == RX_ENC_HT))
- status->rate_idx -= 5;
+ if (!(status->encoding == RX_ENC_HT) &&
+ status->rate_idx >= MWL8K_LEGACY_5G_RATE_OFFSET)
+ status->rate_idx -= MWL8K_LEGACY_5G_RATE_OFFSET;
} else {
status->band = NL80211_BAND_2GHZ;
}
--
1.9.1
When remounting with debug_want_extra_isize, we were not performing the
same checks that we do during a normal mount. That allowed us to set a
value for s_want_extra_isize that reached outside the s_inode_size.
Reported-by: syzbot+f584efa0ac7213c226b7(a)syzkaller.appspotmail.com
Signed-off-by: Barret Rhoden <brho(a)google.com>
Cc: stable(a)vger.kernel.org
---
- In the current code, it looks like someone could mount with want_extra_isize
with some value > 0 but less than the minimums in the s_es. If that's a bug,
I can submit a follow-on patch.
- Similarly, on a failed remount, sbi->s_want_extra_isize is changed to the
remounted value. I can fix that too if it's a problem.
- Is it OK to remount with a smaller s_want_extra_isize than the previous mount?
I thought it was, but figured I'd ask while I'm looking at it.
fs/ext4/super.c | 58 +++++++++++++++++++++++++++++--------------------
1 file changed, 34 insertions(+), 24 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 6ed4eb81e674..184944d4d8d1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3513,6 +3513,37 @@ int ext4_calculate_overhead(struct super_block *sb)
return 0;
}
+static void ext4_clamp_want_extra_isize(struct super_block *sb)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ struct ext4_super_block *es = sbi->s_es;
+
+ /* determine the minimum size of new large inodes, if present */
+ if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE &&
+ sbi->s_want_extra_isize == 0) {
+ sbi->s_want_extra_isize = sizeof(struct ext4_inode) -
+ EXT4_GOOD_OLD_INODE_SIZE;
+ if (ext4_has_feature_extra_isize(sb)) {
+ if (sbi->s_want_extra_isize <
+ le16_to_cpu(es->s_want_extra_isize))
+ sbi->s_want_extra_isize =
+ le16_to_cpu(es->s_want_extra_isize);
+ if (sbi->s_want_extra_isize <
+ le16_to_cpu(es->s_min_extra_isize))
+ sbi->s_want_extra_isize =
+ le16_to_cpu(es->s_min_extra_isize);
+ }
+ }
+ /* Check if enough inode space is available */
+ if (EXT4_GOOD_OLD_INODE_SIZE + sbi->s_want_extra_isize >
+ sbi->s_inode_size) {
+ sbi->s_want_extra_isize = sizeof(struct ext4_inode) -
+ EXT4_GOOD_OLD_INODE_SIZE;
+ ext4_msg(sb, KERN_INFO,
+ "required extra inode space not available");
+ }
+}
+
static void ext4_set_resv_clusters(struct super_block *sb)
{
ext4_fsblk_t resv_clusters;
@@ -4387,30 +4418,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
} else if (ret)
goto failed_mount4a;
- /* determine the minimum size of new large inodes, if present */
- if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE &&
- sbi->s_want_extra_isize == 0) {
- sbi->s_want_extra_isize = sizeof(struct ext4_inode) -
- EXT4_GOOD_OLD_INODE_SIZE;
- if (ext4_has_feature_extra_isize(sb)) {
- if (sbi->s_want_extra_isize <
- le16_to_cpu(es->s_want_extra_isize))
- sbi->s_want_extra_isize =
- le16_to_cpu(es->s_want_extra_isize);
- if (sbi->s_want_extra_isize <
- le16_to_cpu(es->s_min_extra_isize))
- sbi->s_want_extra_isize =
- le16_to_cpu(es->s_min_extra_isize);
- }
- }
- /* Check if enough inode space is available */
- if (EXT4_GOOD_OLD_INODE_SIZE + sbi->s_want_extra_isize >
- sbi->s_inode_size) {
- sbi->s_want_extra_isize = sizeof(struct ext4_inode) -
- EXT4_GOOD_OLD_INODE_SIZE;
- ext4_msg(sb, KERN_INFO, "required extra inode space not"
- "available");
- }
+ ext4_clamp_want_extra_isize(sb);
ext4_set_resv_clusters(sb);
@@ -5194,6 +5202,8 @@ static int ext4_remount(struct super_block *sb, int *flags, char *data)
goto restore_opts;
}
+ ext4_clamp_want_extra_isize(sb);
+
if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^
test_opt(sb, JOURNAL_CHECKSUM)) {
ext4_msg(sb, KERN_ERR, "changing journal_checksum "
--
2.21.0.392.gf8f6787159e-goog
This is a note to let you know that I've just added the patch titled
usb: dwc3: Allow building USB_DWC3_QCOM without EXTCON
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 77a4946516fe488b6a33390de6d749f934a243ba Mon Sep 17 00:00:00 2001
From: Marc Gonzalez <marc.w.gonzalez(a)free.fr>
Date: Wed, 24 Apr 2019 17:00:57 +0200
Subject: usb: dwc3: Allow building USB_DWC3_QCOM without EXTCON
Keep EXTCON support optional, as some platforms do not need it.
Do the same for USB_DWC3_OMAP while we're at it.
Fixes: 3def4031b3e3f ("usb: dwc3: add EXTCON dependency for qcom")
Signed-off-by: Marc Gonzalez <marc.w.gonzalez(a)free.fr>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/Kconfig | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/Kconfig b/drivers/usb/dwc3/Kconfig
index 2b1494460d0c..784309435916 100644
--- a/drivers/usb/dwc3/Kconfig
+++ b/drivers/usb/dwc3/Kconfig
@@ -54,7 +54,8 @@ comment "Platform Glue Driver Support"
config USB_DWC3_OMAP
tristate "Texas Instruments OMAP5 and similar Platforms"
- depends on EXTCON && (ARCH_OMAP2PLUS || COMPILE_TEST)
+ depends on ARCH_OMAP2PLUS || COMPILE_TEST
+ depends on EXTCON || !EXTCON
depends on OF
default USB_DWC3
help
@@ -115,7 +116,8 @@ config USB_DWC3_ST
config USB_DWC3_QCOM
tristate "Qualcomm Platform"
- depends on EXTCON && (ARCH_QCOM || COMPILE_TEST)
+ depends on ARCH_QCOM || COMPILE_TEST
+ depends on EXTCON || !EXTCON
depends on OF
default USB_DWC3
help
--
2.21.0
This is a note to let you know that I've just added the patch titled
TTY: serial_core, add ->install
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 4cdd17ba1dff20ffc99fdbd2e6f0201fc7fe67df Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jslaby(a)suse.cz>
Date: Wed, 17 Apr 2019 10:58:53 +0200
Subject: TTY: serial_core, add ->install
We need to compute the uart state only on the first open. This is
usually what is done in the ->install hook. serial_core used to do this
in ->open on every open. So move it to ->install.
As a side effect, it ensures the state is set properly in the window
after tty_init_dev is called, but before uart_open. This fixes a bunch
of races between tty_open and flush_to_ldisc we were dealing with
recently.
One of such bugs was attempted to fix in commit fedb5760648a (serial:
fix race between flush_to_ldisc and tty_open), but it only took care of
a couple of functions (uart_start and uart_unthrottle). I was able to
reproduce the crash on a SLE system, but in uart_write_room which is
also called from flush_to_ldisc via process_echoes. I was *unable* to
reproduce the bug locally. It is due to having this patch in my queue
since 2012!
general protection fault: 0000 [#1] SMP KASAN PTI
CPU: 1 PID: 5 Comm: kworker/u4:0 Tainted: G L 4.12.14-396-default #1 SLE15-SP1 (unreleased)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
Workqueue: events_unbound flush_to_ldisc
task: ffff8800427d8040 task.stack: ffff8800427f0000
RIP: 0010:uart_write_room+0xc4/0x590
RSP: 0018:ffff8800427f7088 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 000000000000002f RSI: 00000000000000ee RDI: ffff88003888bd90
RBP: ffffffffb9545850 R08: 0000000000000001 R09: 0000000000000400
R10: ffff8800427d825c R11: 000000000000006e R12: 1ffff100084fee12
R13: ffffc900004c5000 R14: ffff88003888bb28 R15: 0000000000000178
FS: 0000000000000000(0000) GS:ffff880043300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000561da0794148 CR3: 000000000ebf4000 CR4: 00000000000006e0
Call Trace:
tty_write_room+0x6d/0xc0
__process_echoes+0x55/0x870
n_tty_receive_buf_common+0x105e/0x26d0
tty_ldisc_receive_buf+0xb7/0x1c0
tty_port_default_receive_buf+0x107/0x180
flush_to_ldisc+0x35d/0x5c0
...
0 in rbx means tty->driver_data is NULL in uart_write_room. 0x178 is
tried to be dereferenced (0x178 >> 3 is 0x2f in rdx) at
uart_write_room+0xc4. 0x178 is exactly (struct uart_state *)NULL->refcount
used in uart_port_lock from uart_write_room.
So revert the upstream commit here as my local patch should fix the
whole family.
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Cc: Li RongQing <lirongqing(a)baidu.com>
Cc: Wang Li <wangli39(a)baidu.com>
Cc: Zhang Yu <zhangyu31(a)baidu.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/serial_core.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 69f48717546b..0decb0bf991d 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -130,9 +130,6 @@ static void uart_start(struct tty_struct *tty)
struct uart_port *port;
unsigned long flags;
- if (!state)
- return;
-
port = uart_port_lock(state, flags);
__uart_start(tty);
uart_port_unlock(port, flags);
@@ -730,9 +727,6 @@ static void uart_unthrottle(struct tty_struct *tty)
upstat_t mask = UPSTAT_SYNC_FIFO;
struct uart_port *port;
- if (!state)
- return;
-
port = uart_port_ref(state);
if (!port)
return;
@@ -1747,6 +1741,16 @@ static void uart_dtr_rts(struct tty_port *port, int raise)
uart_port_deref(uport);
}
+static int uart_install(struct tty_driver *driver, struct tty_struct *tty)
+{
+ struct uart_driver *drv = driver->driver_state;
+ struct uart_state *state = drv->state + tty->index;
+
+ tty->driver_data = state;
+
+ return tty_standard_install(driver, tty);
+}
+
/*
* Calls to uart_open are serialised by the tty_lock in
* drivers/tty/tty_io.c:tty_open()
@@ -1759,11 +1763,8 @@ static void uart_dtr_rts(struct tty_port *port, int raise)
*/
static int uart_open(struct tty_struct *tty, struct file *filp)
{
- struct uart_driver *drv = tty->driver->driver_state;
- int retval, line = tty->index;
- struct uart_state *state = drv->state + line;
-
- tty->driver_data = state;
+ struct uart_state *state = tty->driver_data;
+ int retval;
retval = tty_port_open(&state->port, tty, filp);
if (retval > 0)
@@ -2448,6 +2449,7 @@ static void uart_poll_put_char(struct tty_driver *driver, int line, char ch)
#endif
static const struct tty_operations uart_ops = {
+ .install = uart_install,
.open = uart_open,
.close = uart_close,
.write = uart_write,
--
2.21.0
On a very specific subset of ThinkPad P50 SKUs, particularly ones that
come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
seems to have a very nasty habit of not always resetting the secondary
Nvidia GPU between full reboots if the laptop is configured in Hybrid
Graphics mode. The reason for this happening is unknown, but the
following steps and possibly a good bit of patience will reproduce the
issue:
1. Boot up the laptop normally in Hybrid graphics mode
2. Make sure nouveau is loaded and that the GPU is awake
2. Allow the nvidia GPU to runtime suspend itself after being idle
3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
4. If nouveau loads up properly, reboot the machine again and go back to
step 2 until you reproduce the issue
This results in some very strange behavior: the GPU will
quite literally be left in exactly the same state it was in when the
previously booted kernel started the reboot. This has all sorts of bad
sideaffects: for starters, this completely breaks nouveau starting with a
mysterious EVO channel failure that happens well before we've actually
used the EVO channel for anything:
nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000
00000002
Later on, this causes us to timeout trying to bring up the GR ctx:
------------[ cut here ]------------
nouveau 0000:01:00.0: timeout
WARNING: CPU: 0 PID: 12 at
drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547
gf100_grctx_generate+0x7b2/0x850 [nouveau]
Modules linked in: nouveau mxm_wmi i915 crc32c_intel ttm i2c_algo_bit
serio_raw drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
xhci_pci drm xhci_hcd i2c_core wmi video
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc5Lyude-Test+ #29
Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 )
12/18/2018
Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
RIP: 0010:gf100_grctx_generate+0x7b2/0x850 [nouveau]
Code: 85 d2 75 04 48 8b 57 10 48 89 95 28 ff ff ff e8 b4 37 0e e1 48 8b
95 28 ff ff ff 48 c7 c7 b1 97 57 a0 48 89 c6 e8 5a 38 c0 e0 <0f> 0b e9
b9 fd ff ff 48 8b 85 60 ff ff ff 48 8b 40 10 48 8b 78 10
RSP: 0018:ffffc900000b77f0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff888871af8000 RCX: 0000000000000000
RDX: ffff88887f41dfe0 RSI: ffff88887f415698 RDI: ffff88887f415698
RBP: ffffc900000b78c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888872118000
R13: 0000000000000000 R14: ffffffffa0551420 R15: ffffc900000b7818
FS: 0000000000000000(0000) GS:ffff88887f400000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005644d0556ca8 CR3: 0000000002214006 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
gf100_gr_init_ctxctl+0x27b/0x2d0 [nouveau]
gf100_gr_init+0x5bd/0x5e0 [nouveau]
gf100_gr_init_+0x61/0x70 [nouveau]
nvkm_gr_init+0x1d/0x20 [nouveau]
nvkm_engine_init+0xcb/0x210 [nouveau]
nvkm_subdev_init+0xd6/0x230 [nouveau]
nvkm_engine_ref.part.0+0x52/0x70 [nouveau]
nvkm_engine_ref+0x13/0x20 [nouveau]
nvkm_ioctl_new+0x12c/0x260 [nouveau]
? nvkm_fifo_chan_child_del+0xa0/0xa0 [nouveau]
? gf100_gr_dtor+0xe0/0xe0 [nouveau]
nvkm_ioctl+0xe2/0x180 [nouveau]
nvkm_client_ioctl+0x12/0x20 [nouveau]
nvif_object_ioctl+0x47/0x50 [nouveau]
nvif_object_init+0xc8/0x120 [nouveau]
nvc0_fbcon_accel_init+0x5c/0x960 [nouveau]
nouveau_fbcon_create+0x5a5/0x5d0 [nouveau]
? drm_setup_crtcs+0x27b/0xcb0 [drm_kms_helper]
? __lock_is_held+0x5e/0xa0
__drm_fb_helper_initial_config_and_unlock+0x27c/0x520 [drm_kms_helper]
drm_fb_helper_hotplug_event.part.29+0xae/0xc0 [drm_kms_helper]
drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
nouveau_fbcon_output_poll_changed+0xb8/0x110 [nouveau]
drm_kms_helper_hotplug_event+0x2a/0x40 [drm_kms_helper]
drm_dp_send_link_address+0x176/0x1c0 [drm_kms_helper]
drm_dp_check_and_send_link_address+0xa0/0xb0 [drm_kms_helper]
drm_dp_mst_link_probe_work+0xa4/0xc0 [drm_kms_helper]
process_one_work+0x22f/0x5c0
worker_thread+0x44/0x3a0
kthread+0x12b/0x150
? wq_pool_ids_show+0x140/0x140
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x3a/0x50
irq event stamp: 22490
hardirqs last enabled at (22489): [<ffffffff8113281d>]
console_unlock+0x44d/0x5f0
hardirqs last disabled at (22490): [<ffffffff81001c03>]
trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last enabled at (22486): [<ffffffff81c00330>]
__do_softirq+0x330/0x44d
softirqs last disabled at (22479): [<ffffffff810c3105>]
irq_exit+0xe5/0xf0
WARNING: CPU: 0 PID: 12 at
drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547
gf100_grctx_generate+0x7b2/0x850 [nouveau]
---[ end trace bf0976ed88b122a8 ]---
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine
00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000
unknown]
>From which the GPU never manages to recover. Booting without nouveau
loading causes issues as well, since the GPU starts sending spurious
interrupts that cause other device's IRQs to get disabled by the kernel:
irq 16: nobody cared (try booting with the "irqpoll" option)
…
handlers:
[<000000007faa9e99>] i801_isr [i2c_i801]
Disabling IRQ #16
…
serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX
register (-110).
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled
interrupts!
Which in turn causes the touchpad and sometimes even other things to get
disabled.
Since the GPU staying on causes problems even without nouveau's
intervention, we can't fix this problem from nouveau itself. We have to
fix it as early as possible in the boot sequence in order to make sure
that the GPU is in a clean state before it has a chance to spam us with
interrupts and break things.
So to do this, we add a new pci quirk using
DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
at boot finishes. From there, we check to make sure that this is indeed
the specific P50 variant of this GPU. We also make sure that the GPU PCI
device is advertising NoReset- in order to prevent us from trying to
reset the GPU when the machine is in Dedicated graphics mode (where the
GPU being initialized by the BIOS is normal and expected). Finally, we
try mapping the MMIO space for the GPU which should only work if the GPU
is actually active in D0 mode. We can then read the magic 0x2240c
register on the GPU, which will have bit 1 set if the GPU's firmware has
already been posted during a previous boot. Once we've confirmed all of
this, we reset the PCI device and re-disable it - bringing the GPU back
into a healthy state.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: nouveau(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Ben Skeggs <skeggsb(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b0a413f3f7ca..948492fda8bf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3 */
SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */
SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */
SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */
+
+/*
+ * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia
+ * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting
+ * the nvidia GPU between reboots if the system is configured to use hybrid
+ * graphics mode. This results in the GPU being left in whatever state it was
+ * in during the previous boot which causes spurious interrupts from the GPU,
+ * which in turn cause us to disable the wrong IRQs and end up breaking the
+ * touchpad. Unsurprisingly, this also completely breaks nouveau.
+ *
+ * Luckily, it seems a simple reset of the PCI device for the nvidia GPU
+ * manages to bring the GPU back into a clean state and fix all of these
+ * issues. Additionally since the GPU will report NoReset+ when the machine is
+ * configured in Dedicated display mode, we don't need to worry about
+ * accidentally resetting the GPU when it's supposed to already be
+ * initialized.
+ */
+static void
+quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev)
+{
+ void __iomem *map;
+ int ret;
+
+ if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
+ pdev->subsystem_device != 0x222e ||
+ !pdev->reset_fn)
+ return;
+
+ /*
+ * If we can't enable the device's mmio space, it's probably not even
+ * initialized. This is fine, and means we can just skip the quirk
+ * entirely.
+ */
+ if (pci_enable_device_mem(pdev)) {
+ pci_dbg(pdev, "Can't enable device mem, no reset needed\n");
+ return;
+ }
+
+ /* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */
+ map = ioremap(pci_resource_start(pdev, 0), 0x102000);
+ if (!map) {
+ pci_err(pdev, "Can't map MMIO space, this is probably very bad\n");
+ goto out_disable;
+ }
+
+ /*
+ * Be extra careful, and make sure that the GPU firmware is posted
+ * before trying a reset
+ */
+ if (ioread32(map + 0x2240c) & 0x2) {
+ pci_info(pdev,
+ FW_BUG "GPU left initialized by EFI, resetting\n");
+ ret = pci_reset_function(pdev);
+ if (ret < 0)
+ pci_err(pdev, "Failed to reset GPU: %d\n", ret);
+ }
+
+ iounmap(map);
+out_disable:
+ pci_disable_device(pdev);
+}
+
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
+ PCI_CLASS_DISPLAY_VGA, 8,
+ quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot);
--
2.20.1