Backport commits from master that fix boot failure on some intel
machines.
I have only boot tested this in a VM. Functional testing for v4.14 is
out of my scope as patches differ only on a trivial conflict from v4.19,
where I discovered/debugged the issue. While testing v4.14 stable on
affected nodes would require porting some other [local] patches,
which is out of my timelimit for the backport.
Hopefully, that's fine.
Cc: David Woodhouse <dwmw2(a)infradead.org>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: Lu Baolu <baolu.lu(a)linux.intel.com>
Dmitry Safonov (1):
iommu/vt-d: Don't queue_iova() if there is no flush queue
Joerg Roedel (1):
iommu/iova: Fix compilation error with !CONFIG_IOMMU_IOVA
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/iova.c | 18 ++++++++++++++----
include/linux/iova.h | 6 ++++++
3 files changed, 21 insertions(+), 5 deletions(-)
--
2.22.0
Hello Greg,
Can you please consider including the following patches in the stable
linux-4.14.y branch?
The following patch series attempts to address the issues raised by
Stan Hu around NFSv4 lookup revalidation.
The first patch in the series should, in principle suffice to address
the exact issue raised by Stan, however when looking at the
implementation of nfs4_lookup_revalidate(), it becomes clear that we're
not doing enough to revalidate the dentry itself when performing NFSv4.1
opens either.
Trond Myklebust (3):
NFS: Fix dentry revalidation on NFSv4 lookup
NFS: Refactor nfs_lookup_revalidate()
NFSv4: Fix lookup revalidate of regular files
zhangliguang (1):
NFS: Remove redundant semicolon
fs/nfs/dir.c | 295 ++++++++++++++++++++++++++++++------------------------
fs/nfs/nfs4proc.c | 15 ++-
2 files changed, 174 insertions(+), 136 deletions(-)
--
2.15.3.AMZN
From: Munehisa Kamata <kamatam(a)amazon.com>
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: linux-block(a)vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br(a)gmail.com>
Cc: nbd(a)other.debian.org
Cc: stable(a)vger.kernel.org
Cc: David Woodhouse <dwmw(a)amazon.com>
Signed-off-by: Munehisa Kamata <kamatam(a)amazon.com>
CR: https://code.amazon.com/reviews/CR-7629288
---
I reproduced this phenomenon on the fat file system.
reproduce steps :
1.Establish a nbd connection.
2.Run two threads:one do mount and umount,anther one do clear_sock ioctl
3.Then hit the BUG_ON.
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 9bcde23..e21d2de 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1231,7 +1231,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
sock_shutdown(nbd);
- kill_bdev(bdev);
+ __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
--
2.7.4
From: Will Deacon <will.deacon(a)arm.com>
[ Upstream commit 24951465cbd279f60b1fdc2421b3694405bcff42 ]
arch/arm/ defines a SIGMINSTKSZ of 2k, so we should use the same value
for compat tasks.
Cc: <stable(a)vger.kernel.org> # 4.9+
Cc: Aurelien Jarno <aurelien(a)aurel32.net>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Dominik Brodowski <linux(a)dominikbrodowski.net>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Dave Martin <Dave.Martin(a)arm.com>
Reported-by: Steve McIntyre <steve.mcintyre(a)arm.com>
Tested-by: Steve McIntyre <93sam(a)debian.org>
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
Aurelien points out that this didn't get selected for -stable despite its
counterpart (22839869f21a ("signal: Introduce COMPAT_SIGMINSTKSZ for use
in compat_sys_sigaltstack")) being backported to 4.9. Oops.
arch/arm64/include/asm/compat.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h
index 1a037b94eba1..cee28a05ee98 100644
--- a/arch/arm64/include/asm/compat.h
+++ b/arch/arm64/include/asm/compat.h
@@ -159,6 +159,7 @@ static inline compat_uptr_t ptr_to_compat(void __user *uptr)
}
#define compat_user_stack_pointer() (user_stack_pointer(task_pt_regs(current)))
+#define COMPAT_MINSIGSTKSZ 2048
static inline void __user *arch_compat_alloc_user_space(long len)
{
--
2.11.0
From: Todd Kjos <tkjos(a)android.com>
commit a370003cc301d4361bae20c9ef615f89bf8d1e8a upstream
There is a race between the binder driver cleaning
up a completed transaction via binder_free_transaction()
and a user calling binder_ioctl(BC_FREE_BUFFER) to
release a buffer. It doesn't matter which is first but
they need to be protected against running concurrently
which can result in a UAF.
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Cc: stable <stable(a)vger.kernel.org> # 4.14 4.19
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/android/binder.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 5d67f5fec6c1b..2decb1a5a8e2f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1960,8 +1960,18 @@ static struct binder_thread *binder_get_txn_from_and_acq_inner(
static void binder_free_transaction(struct binder_transaction *t)
{
- if (t->buffer)
- t->buffer->transaction = NULL;
+ struct binder_proc *target_proc = t->to_proc;
+
+ if (target_proc) {
+ binder_inner_proc_lock(target_proc);
+ if (t->buffer)
+ t->buffer->transaction = NULL;
+ binder_inner_proc_unlock(target_proc);
+ }
+ /*
+ * If the transaction has no target_proc, then
+ * t->buffer->transaction has already been cleared.
+ */
kfree(t);
binder_stats_deleted(BINDER_STAT_TRANSACTION);
}
@@ -3484,10 +3494,12 @@ static int binder_thread_write(struct binder_proc *proc,
buffer->debug_id,
buffer->transaction ? "active" : "finished");
+ binder_inner_proc_lock(proc);
if (buffer->transaction) {
buffer->transaction->buffer = NULL;
buffer->transaction = NULL;
}
+ binder_inner_proc_unlock(proc);
if (buffer->async_transaction && buffer->target_node) {
struct binder_node *buf_node;
struct binder_work *w;
--
2.22.0.709.g102302147b-goog
From: allen yan <yanwei(a)marvell.com>
commit c737abc193d16e62e23e2fb585b8b7398ab380d8 upstream.
Armada-37xx UART0 registers are 0x200 bytes wide. Right next to them are
the UART1 registers that should not be declared in this node.
Update the example in DT bindings document accordingly.
Signed-off-by: allen yan <yanwei(a)marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)free-electrons.com>
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
---
Cherry-picked from lede/openwrt tree
https://git.lede-project.org/?p=source.git.
Build tested for ARCH=arm64 + defconfig
Cleanly apply on 4.9.y as well but since
lede stopped supporting v4.9.y, I'm not
sure if this patch is tested on v4.9.y at all.
Documentation/devicetree/bindings/serial/mvebu-uart.txt | 2 +-
arch/arm64/boot/dts/marvell/armada-37xx.dtsi | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/devicetree/bindings/serial/mvebu-uart.txt b/Documentation/devicetree/bindings/serial/mvebu-uart.txt
index 6087defd9f93..d37fabe17bd1 100644
--- a/Documentation/devicetree/bindings/serial/mvebu-uart.txt
+++ b/Documentation/devicetree/bindings/serial/mvebu-uart.txt
@@ -8,6 +8,6 @@ Required properties:
Example:
serial@12000 {
compatible = "marvell,armada-3700-uart";
- reg = <0x12000 0x400>;
+ reg = <0x12000 0x200>;
interrupts = <43>;
};
diff --git a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
index 8c0cf7efac65..b554cdaf5e53 100644
--- a/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-37xx.dtsi
@@ -134,7 +134,7 @@
uart0: serial@12000 {
compatible = "marvell,armada-3700-uart";
- reg = <0x12000 0x400>;
+ reg = <0x12000 0x200>;
interrupts = <GIC_SPI 11 IRQ_TYPE_LEVEL_HIGH>;
status = "disabled";
};
--
2.7.4
'commit df9bde015a72 ("xen/gntdev.c: convert to use vm_map_pages()")'
breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages()
will:
- use map->pages starting at vma->vm_pgoff instead of 0
- verify map->count against vma_pages()+vma->vm_pgoff instead of just
vma_pages().
In practice, this breaks using a single gntdev FD for mapping multiple
grants.
relevant strace output:
[pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) =
0x777f1211b000
[pid 857] ioctl(7, IOCTL_GNTDEV_SET_UNMAP_NOTIFY, 0x7ffd3407b710) = 0
[pid 857] ioctl(7, IOCTL_GNTDEV_MAP_GRANT_REF, 0x7ffd3407b6d0) = 0
[pid 857] mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 7,
0x1000) = -1 ENXIO (No such device or address)
details here:
https://github.com/QubesOS/qubes-issues/issues/5199
The reason is -> ( copying Marek's word from discussion)
vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
basically using this parameter for "which grant reference to map".
map struct returned by gntdev_find_map_index() describes just the pages
to be mapped. Specifically map->pages[0] should be mapped at
vma->vm_start, not vma->vm_start+vma->vm_pgoff*PAGE_SIZE.
When trying to map grant with index (aka vma->vm_pgoff) > 1,
__vm_map_pages() will refuse to map it because it will expect map->count
to be at least vma_pages(vma)+vma->vm_pgoff, while it is exactly
vma_pages(vma).
Converting vm_map_pages() to use vm_map_pages_zero() will fix the
problem.
Marek has tested and confirmed the same.
Reported-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
Signed-off-by: Souptick Joarder <jrdr.linux(a)gmail.com>
Tested-by: Marek Marczykowski-Górecki <marmarek(a)invisiblethingslab.com>
---
drivers/xen/gntdev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 4c339c7..a446a72 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -1143,7 +1143,7 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
goto out_put_map;
if (!use_ptemod) {
- err = vm_map_pages(vma, map->pages, map->count);
+ err = vm_map_pages_zero(vma, map->pages, map->count);
if (err)
goto out_put_map;
} else {
--
1.9.1
お問い合わせありがとうございます。
下記の内容にて送信いたしました。
■お名前
Williamecome様
■メールアドレス
stable(a)vger.kernel.org
■ご住所
■電話番号
457264
■お問い合わせ項目
■詳細内容
Find Girls in Near Me Area for Sex: http://consparceces.tk/fl38z?Z5RA5HC
このメールは カビ取り名人・rain-or-shineのHP (http://fukuoka-kawa-kenkyujyo.com) のお問い合わせフォームから送信されました
Hi
As this goes to the 4.9 series according to
https://www.kernel.org/doc/html/latest/networking/netdev-FAQ.html#q-are-all…
I'm sending it primarily to stable(a)v.k.o but Cc'ing Dave and Xin Long.
Could you please apply 99253eb750fd ("ipv6: check sk sk_type and
protocol early in ip_mroute_set/getsockopt") to the 4.9 stable series?
While 5e1859fbcc3c was done back in 3.8-rc1, 99253eb750fd from
4.11-rc1 was not backported to older stable series itself, where it is
needed as well.
Only checked if applicable without change in 4.9, but the fix should
probably go as well to the 4.4 and 3.16.
Regards,
Salvatore
A single 32-bit PSR2 training pattern field follows the sixteen element
array of PSR table entries in the VBT spec. But, we incorrectly define
this PSR2 field for each of the PSR table entries. As a result, the PSR1
training pattern duration for any panel_type != 0 will be parsed
incorrectly. Secondly, PSR2 training pattern durations for VBTs with bdb
version >= 226 will also be wrong.
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: José Roberto de Souza <jose.souza(a)intel.com>
Cc: stable(a)vger.kernel.org
Cc: stable(a)vger.kernel.org #v5.2
Fixes: 88a0d9606aff ("drm/i915/vbt: Parse and use the new field with PSR2 TP2/3 wakeup time")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111088
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204183
Signed-off-by: Dhinakaran Pandiyan <dhinakaran.pandiyan(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: José Roberto de Souza <jose.souza(a)intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Tested-by: François Guerraz <kubrick(a)fgv6.net>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190717223451.2595-1-dhinaka…
(cherry picked from commit b5ea9c9337007d6e700280c8a60b4e10d070fb53)
---
drivers/gpu/drm/i915/intel_bios.c | 2 +-
drivers/gpu/drm/i915/intel_vbt_defs.h | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c
index 1dc8d03ff127..ee6fa75d65a2 100644
--- a/drivers/gpu/drm/i915/intel_bios.c
+++ b/drivers/gpu/drm/i915/intel_bios.c
@@ -762,7 +762,7 @@ parse_psr(struct drm_i915_private *dev_priv, const struct bdb_header *bdb)
}
if (bdb->version >= 226) {
- u32 wakeup_time = psr_table->psr2_tp2_tp3_wakeup_time;
+ u32 wakeup_time = psr->psr2_tp2_tp3_wakeup_time;
wakeup_time = (wakeup_time >> (2 * panel_type)) & 0x3;
switch (wakeup_time) {
diff --git a/drivers/gpu/drm/i915/intel_vbt_defs.h b/drivers/gpu/drm/i915/intel_vbt_defs.h
index fdbbb9a53804..796c070bbe6f 100644
--- a/drivers/gpu/drm/i915/intel_vbt_defs.h
+++ b/drivers/gpu/drm/i915/intel_vbt_defs.h
@@ -772,13 +772,13 @@ struct psr_table {
/* TP wake up time in multiple of 100 */
u16 tp1_wakeup_time;
u16 tp2_tp3_wakeup_time;
-
- /* PSR2 TP2/TP3 wakeup time for 16 panels */
- u32 psr2_tp2_tp3_wakeup_time;
} __packed;
struct bdb_psr {
struct psr_table psr_table[16];
+
+ /* PSR2 TP2/TP3 wakeup time for 16 panels */
+ u32 psr2_tp2_tp3_wakeup_time;
} __packed;
/*
--
2.17.1
The patch titled
Subject: mm/memcontrol.c: fix use after free in mem_cgroup_iter()
has been added to the -mm tree. Its filename is
mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-fix-use-after-free-i…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-fix-use-after-free-i…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Miles Chen <miles.chen(a)mediatek.com>
Subject: mm/memcontrol.c: fix use after free in mem_cgroup_iter()
This patch is sent to report an use after free in mem_cgroup_iter() after
merging commit be2657752e9e ("mm: memcg: fix use after free in
mem_cgroup_iter()").
I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e
("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged to
the trees. However, I can still observe use after free issues addressed
in the commit be2657752e9e. (on low-end devices, a few times this month)
backtrace:
css_tryget <- crash here
mem_cgroup_iter
shrink_node
shrink_zones
do_try_to_free_pages
try_to_free_pages
__perform_reclaim
__alloc_pages_direct_reclaim
__alloc_pages_slowpath
__alloc_pages_nodemask
To debug, I poisoned mem_cgroup before freeing it:
static void __mem_cgroup_free(struct mem_cgroup *memcg)
for_each_node(node)
free_mem_cgroup_per_node_info(memcg, node);
free_percpu(memcg->stat);
+ /* poison memcg before freeing it */
+ memset(memcg, 0x78, sizeof(struct mem_cgroup));
kfree(memcg);
}
The coredump shows the position=0xdbbc2a00 is freed.
(gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
$13 = {position = 0xdbbc2a00, generation = 0x2efd}
0xdbbc2a00: 0xdbbc2e00 0x00000000 0xdbbc2800 0x00000100
0xdbbc2a10: 0x00000200 0x78787878 0x00026218 0x00000000
0xdbbc2a20: 0xdcad6000 0x00000001 0x78787800 0x00000000
0xdbbc2a30: 0x78780000 0x00000000 0x0068fb84 0x78787878
0xdbbc2a40: 0x78787878 0x78787878 0x78787878 0xe3fa5cc0
0xdbbc2a50: 0x78787878 0x78787878 0x00000000 0x00000000
0xdbbc2a60: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a70: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a80: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2a90: 0x00000001 0x00000000 0x00000000 0x00100000
0xdbbc2aa0: 0x00000001 0xdbbc2ac8 0x00000000 0x00000000
0xdbbc2ab0: 0x00000000 0x00000000 0x00000000 0x00000000
0xdbbc2ac0: 0x00000000 0x00000000 0xe5b02618 0x00001000
0xdbbc2ad0: 0x00000000 0x78787878 0x78787878 0x78787878
0xdbbc2ae0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2af0: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b00: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b10: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b20: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b30: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b40: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b50: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b60: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b70: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2b80: 0x78787878 0x78787878 0x00000000 0x78787878
0xdbbc2b90: 0x78787878 0x78787878 0x78787878 0x78787878
0xdbbc2ba0: 0x78787878 0x78787878 0x78787878 0x78787878
In the reclaim path, try_to_free_pages() does not setup
sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
shrink_node().
In mem_cgroup_iter(), root is set to root_mem_cgroup because
sc->target_mem_cgroup is NULL. It is possible to assign a memcg to
root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter().
try_to_free_pages
struct scan_control sc = {...}, target_mem_cgroup is 0x0;
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup *root = sc->target_mem_cgroup;
memcg = mem_cgroup_iter(root, NULL, &reclaim);
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
My device uses memcg non-hierarchical mode. When we release a memcg:
invalidate_reclaim_iterators() reaches only dead_memcg and its parents.
If non-hierarchical mode is used, invalidate_reclaim_iterators() never
reaches root_mem_cgroup.
static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
{
struct mem_cgroup *memcg = dead_memcg;
for (; memcg; memcg = parent_mem_cgroup(memcg)
...
}
So the use after free scenario looks like:
CPU1 CPU2
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
css = css_next_descendant_pre(css, &root->css);
memcg = mem_cgroup_from_css(css);
cmpxchg(&iter->position, pos, memcg);
invalidate_reclaim_iterators(memcg);
...
__mem_cgroup_free()
kfree(memcg);
try_to_free_pages
do_try_to_free_pages
shrink_zones
shrink_node
mem_cgroup_iter()
if (!root)
root = root_mem_cgroup;
...
mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
iter = &mz->iter[reclaim->priority];
pos = READ_ONCE(iter->position);
css_tryget(&pos->css) <- use after free
To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter in
invalidate_reclaim_iterators().
Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com
Fixes: 5ac8fb31ad2e ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
Signed-off-by: Miles Chen <miles.chen(a)mediatek.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 39 +++++++++++++++++++++++++++++----------
1 file changed, 29 insertions(+), 10 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter
+++ a/mm/memcontrol.c
@@ -1130,26 +1130,45 @@ void mem_cgroup_iter_break(struct mem_cg
css_put(&prev->css);
}
-static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
+static void __invalidate_reclaim_iterators(struct mem_cgroup *from,
+ struct mem_cgroup *dead_memcg)
{
- struct mem_cgroup *memcg = dead_memcg;
struct mem_cgroup_reclaim_iter *iter;
struct mem_cgroup_per_node *mz;
int nid;
int i;
- for (; memcg; memcg = parent_mem_cgroup(memcg)) {
- for_each_node(nid) {
- mz = mem_cgroup_nodeinfo(memcg, nid);
- for (i = 0; i <= DEF_PRIORITY; i++) {
- iter = &mz->iter[i];
- cmpxchg(&iter->position,
- dead_memcg, NULL);
- }
+ for_each_node(nid) {
+ mz = mem_cgroup_nodeinfo(from, nid);
+ for (i = 0; i <= DEF_PRIORITY; i++) {
+ iter = &mz->iter[i];
+ cmpxchg(&iter->position,
+ dead_memcg, NULL);
}
}
}
+static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
+{
+ struct mem_cgroup *memcg = dead_memcg;
+ struct mem_cgroup *last;
+
+ do {
+ __invalidate_reclaim_iterators(memcg, dead_memcg);
+ last = memcg;
+ } while (memcg = parent_mem_cgroup(memcg));
+
+ /*
+ * When cgruop1 non-hierarchy mode is used,
+ * parent_mem_cgroup() does not walk all the way up to the
+ * cgroup root (root_mem_cgroup). So we have to handle
+ * dead_memcg from cgroup root separately.
+ */
+ if (last != root_mem_cgroup)
+ __invalidate_reclaim_iterators(root_mem_cgroup,
+ dead_memcg);
+}
+
/**
* mem_cgroup_scan_tasks - iterate over tasks of a memory cgroup hierarchy
* @memcg: hierarchy root
_
Patches currently in -mm which might be from miles.chen(a)mediatek.com are
mm-memcontrol-fix-use-after-free-in-mem_cgroup_iter.patch
Although SAS3 & SAS3.5 IT HBA controllers support
64-bit DMA addressing, as per hardware design,
if DMA able range contains all 64-bits set (0xFFFFFFFF-FFFFFFFF) then
it results in a firmware fault.
e.g. SGE's start address is 0xFFFFFFFF-FFFF000 and
data length is 0x1000 bytes. when HBA tries to DMA the data
at 0xFFFFFFFF-FFFFFFFF location then HBA will
fault the firmware.
Fix:
Driver will set 63-bit DMA mask to ensure the above address
will not be used.
Cc: <stable(a)vger.kernel.org> # 5.1.20+
Signed-off-by: Suganath Prabu <suganath-prabu.subramani(a)broadcom.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
---
V1 Change: Added tag for stable tree
V2 Change: Updated patch description.
drivers/scsi/mpt3sas/mpt3sas_base.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 6846628..050c0f0 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -2703,6 +2703,8 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
{
u64 required_mask, coherent_mask;
struct sysinfo s;
+ /* Set 63 bit DMA mask for all SAS3 and SAS35 controllers */
+ int dma_mask = (ioc->hba_mpi_version_belonged > MPI2_VERSION) ? 63 : 64;
if (ioc->is_mcpu_endpoint)
goto try_32bit;
@@ -2712,17 +2714,17 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
goto try_32bit;
if (ioc->dma_mask)
- coherent_mask = DMA_BIT_MASK(64);
+ coherent_mask = DMA_BIT_MASK(dma_mask);
else
coherent_mask = DMA_BIT_MASK(32);
- if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(64)) ||
+ if (dma_set_mask(&pdev->dev, DMA_BIT_MASK(dma_mask)) ||
dma_set_coherent_mask(&pdev->dev, coherent_mask))
goto try_32bit;
ioc->base_add_sg_single = &_base_add_sg_single_64;
ioc->sge_size = sizeof(Mpi2SGESimple64_t);
- ioc->dma_mask = 64;
+ ioc->dma_mask = dma_mask;
goto out;
try_32bit:
@@ -2744,7 +2746,7 @@ static int
_base_change_consistent_dma_mask(struct MPT3SAS_ADAPTER *ioc,
struct pci_dev *pdev)
{
- if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64))) {
+ if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(ioc->dma_mask))) {
if (pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(32)))
return -ENODEV;
}
@@ -4989,7 +4991,7 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER *ioc)
total_sz += sz;
} while (ioc->rdpq_array_enable && (++i < ioc->reply_queue_count));
- if (ioc->dma_mask == 64) {
+ if (ioc->dma_mask > 32) {
if (_base_change_consistent_dma_mask(ioc, ioc->pdev) != 0) {
ioc_warn(ioc, "no suitable consistent DMA mask for %s\n",
pci_name(ioc->pdev));
--
1.8.3.1
CLANG_FLAGS is initialized by the following line:
CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
..., which is run only when CROSS_COMPILE is set.
Some build targets (bindeb-pkg etc.) recurse to the top Makefile.
When you build the kernel with Clang but without CROSS_COMPILE,
the same compiler flags such as -no-integrated-as are accumulated
into CLANG_FLAGS.
If you run 'make CC=clang' and then 'make CC=clang bindeb-pkg',
Kbuild will recompile everything needlessly due to the build command
change.
Fix this by correctly initializing CLANG_FLAGS.
Fixes: 238bcbc4e07f ("kbuild: consolidate Clang compiler flags")
Cc: <stable(a)vger.kernel.org> # v4.20+
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
---
Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile
index fa0fbe7851ea..5ee6f6889869 100644
--- a/Makefile
+++ b/Makefile
@@ -472,6 +472,7 @@ KBUILD_CFLAGS_MODULE := -DMODULE
KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds
KBUILD_LDFLAGS :=
GCC_PLUGINS_CFLAGS :=
+CLANG_FLAGS :=
export ARCH SRCARCH CONFIG_SHELL HOSTCC KBUILD_HOSTCFLAGS CROSS_COMPILE AS LD CC
export CPP AR NM STRIP OBJCOPY OBJDUMP PAHOLE KBUILD_HOSTLDFLAGS KBUILD_HOSTLDLIBS
@@ -519,7 +520,7 @@ endif
ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
ifneq ($(CROSS_COMPILE),)
-CLANG_FLAGS := --target=$(notdir $(CROSS_COMPILE:%-=%))
+CLANG_FLAGS += --target=$(notdir $(CROSS_COMPILE:%-=%))
GCC_TOOLCHAIN_DIR := $(dir $(shell which $(CROSS_COMPILE)elfedit))
CLANG_FLAGS += --prefix=$(GCC_TOOLCHAIN_DIR)
GCC_TOOLCHAIN := $(realpath $(GCC_TOOLCHAIN_DIR)/..)
--
2.17.1
A build rule fails, the .DELETE_ON_ERROR special target removes the
target, but does nothing for the .*.cmd file, which might be corrupted.
So, .*.cmd files should be included only when the corresponding targets
exist.
Commit 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd
files") missed to fix up this file.
Fixes: 392885ee82d3 ("kbuild: let fixdep directly write to .*.cmd")
Cc: <stable(a)vger.kernel.org> # v5.0+
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
---
scripts/Makefile.modpost | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 6b19c1a4eae5..ad4b9829a456 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -145,10 +145,8 @@ FORCE:
# optimization, we don't need to read them if the target does not
# exist, we will rebuild anyway in that case.
-cmd_files := $(wildcard $(foreach f,$(sort $(targets)),$(dir $(f)).$(notdir $(f)).cmd))
+existing-targets := $(wildcard $(sort $(targets)))
-ifneq ($(cmd_files),)
- include $(cmd_files)
-endif
+-include $(foreach f,$(existing-targets),$(dir $(f)).$(notdir $(f)).cmd)
.PHONY: $(PHONY)
--
2.17.1
Now that -Wimplicit-fallthrough is passed to GCC by default, the
following warnings shows up:
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_format_get_bpp’:
../drivers/gpu/drm/arm/malidp_hw.c:387:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
bpp = 30;
~~~~^~~~
../drivers/gpu/drm/arm/malidp_hw.c:388:3: note: here
case DRM_FORMAT_YUV420_10BIT:
^~~~
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_se_irq’:
../drivers/gpu/drm/arm/malidp_hw.c:1311:4: warning: this statement may fall
through [-Wimplicit-fallthrough=]
drm_writeback_signal_completion(&malidp->mw_connector, 0);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/arm/malidp_hw.c:1313:3: note: here
case MW_START:
^~~~
Rework to add a 'break;' in a case that didn't have it so that
the compiler doesn't warn about fall-through.
Cc: stable(a)vger.kernel.org # v5.2+
Fixes: b8207562abdd ("drm/arm/malidp: Specified the rotation memory requirements for AFBC YUV formats")
Acked-by: Liviu Dudau <liviu.dudau(a)arm.com>
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
drivers/gpu/drm/arm/malidp_hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/arm/malidp_hw.c b/drivers/gpu/drm/arm/malidp_hw.c
index 50af399d7f6f..380be66d4c6e 100644
--- a/drivers/gpu/drm/arm/malidp_hw.c
+++ b/drivers/gpu/drm/arm/malidp_hw.c
@@ -385,6 +385,7 @@ int malidp_format_get_bpp(u32 fmt)
switch (fmt) {
case DRM_FORMAT_VUY101010:
bpp = 30;
+ break;
case DRM_FORMAT_YUV420_10BIT:
bpp = 15;
break;
@@ -1309,7 +1310,7 @@ static irqreturn_t malidp_se_irq(int irq, void *arg)
break;
case MW_RESTART:
drm_writeback_signal_completion(&malidp->mw_connector, 0);
- /* fall through to a new start */
+ /* fall through - to a new start */
case MW_START:
/* writeback started, need to emulate one-shot mode */
hw->disable_memwrite(hwdev);
--
2.20.1
On Tue, Jul 30, 2019 at 7:52 PM Marek Marczykowski-Górecki
<marmarek(a)invisiblethingslab.com> wrote:
>
> On Tue, Jul 30, 2019 at 10:05:42AM -0400, Boris Ostrovsky wrote:
> > On 7/30/19 2:03 AM, Souptick Joarder wrote:
> > > On Mon, Jul 29, 2019 at 7:06 PM Marek Marczykowski-Górecki
> > > <marmarek(a)invisiblethingslab.com> wrote:
> > >> On Mon, Jul 29, 2019 at 02:02:54PM +0530, Souptick Joarder wrote:
> > >>> On Mon, Jul 29, 2019 at 1:35 PM Souptick Joarder <jrdr.linux(a)gmail.com> wrote:
> > >>>> On Sun, Jul 28, 2019 at 11:36 PM Marek Marczykowski-Górecki
> > >>>> <marmarek(a)invisiblethingslab.com> wrote:
> > >>>>> On Fri, Feb 15, 2019 at 08:18:31AM +0530, Souptick Joarder wrote:
> > >>>>>> Convert to use vm_map_pages() to map range of kernel
> > >>>>>> memory to user vma.
> > >>>>>>
> > >>>>>> map->count is passed to vm_map_pages() and internal API
> > >>>>>> verify map->count against count ( count = vma_pages(vma))
> > >>>>>> for page array boundary overrun condition.
> > >>>>> This commit breaks gntdev driver. If vma->vm_pgoff > 0, vm_map_pages
> > >>>>> will:
> > >>>>> - use map->pages starting at vma->vm_pgoff instead of 0
> > >>>> The actual code ignores vma->vm_pgoff > 0 scenario and mapped
> > >>>> the entire map->pages[i]. Why the entire map->pages[i] needs to be mapped
> > >>>> if vma->vm_pgoff > 0 (in original code) ?
> > >> vma->vm_pgoff is used as index passed to gntdev_find_map_index. It's
> > >> basically (ab)using this parameter for "which grant reference to map".
> > >>
> > >>>> are you referring to set vma->vm_pgoff = 0 irrespective of value passed
> > >>>> from user space ? If yes, using vm_map_pages_zero() is an alternate
> > >>>> option.
> > >> Yes, that should work.
> > > I prefer to use vm_map_pages_zero() to resolve both the issues. Alternatively
> > > the patch can be reverted as you suggested. Let me know you opinion and wait
> > > for feedback from others.
> > >
> > > Boris, would you like to give any feedback ?
> >
> > vm_map_pages_zero() looks good to me. Marek, does it work for you?
>
> Yes, replacing vm_map_pages() with vm_map_pages_zero() fixes the
> problem for me.
Marek, I can send a patch for the same if you are ok.
We need to cc stable as this changes are available in 5.2.4.
>
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
Some devices are supposed to do not support on-die ECC but experience
shows that internal ECC machinery can actually be enabled through the
"SET FEATURE (EFh)" command, even if a read of the "READ ID Parameter
Tables" returns that it is not.
Currently, the driver checks the "READ ID Parameter" field directly
after having enabled the feature. If the check fails it returns
immediately but leaves the ECC on. When using buggy chips like
MT29F2G08ABAGA and MT29F2G08ABBGA, all future read/program cycles will
go through the on-die ECC, confusing the host controller which is
supposed to be the one handling correction.
To address this in a common way we need to turn off the on-die ECC
directly after reading the "READ ID Parameter" and before checking the
"ECC status".
Cc: stable(a)vger.kernel.org
Fixes: dbc44edbf833 ("mtd: rawnand: micron: Fix on-die ECC detection logic")
Signed-off-by: Marco Felsch <m.felsch(a)pengutronix.de>
Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com>
---
v2:
- adapt commit message according Miquel comments
- add fixes, stable tags
- add Boris rb-tag
drivers/mtd/nand/raw/nand_micron.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_micron.c b/drivers/mtd/nand/raw/nand_micron.c
index 1622d3145587..fb199ad2f1a6 100644
--- a/drivers/mtd/nand/raw/nand_micron.c
+++ b/drivers/mtd/nand/raw/nand_micron.c
@@ -390,6 +390,14 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
(chip->id.data[4] & MICRON_ID_INTERNAL_ECC_MASK) != 0x2)
return MICRON_ON_DIE_UNSUPPORTED;
+ /*
+ * It seems that there are devices which do not support ECC official.
+ * At least the MT29F2G08ABAGA / MT29F2G08ABBGA devices supports
+ * enabling the ECC feature but don't reflect that to the READ_ID table.
+ * So we have to guarantee that we disable the ECC feature directly
+ * after we did the READ_ID table command. Later we can evaluate the
+ * ECC_ENABLE support.
+ */
ret = micron_nand_on_die_ecc_setup(chip, true);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
@@ -398,13 +406,13 @@ static int micron_supports_on_die_ecc(struct nand_chip *chip)
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
- if (!(id[4] & MICRON_ID_ECC_ENABLED))
- return MICRON_ON_DIE_UNSUPPORTED;
-
ret = micron_nand_on_die_ecc_setup(chip, false);
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
+ if (!(id[4] & MICRON_ID_ECC_ENABLED))
+ return MICRON_ON_DIE_UNSUPPORTED;
+
ret = nand_readid_op(chip, 0, id, sizeof(id));
if (ret)
return MICRON_ON_DIE_UNSUPPORTED;
--
2.20.1
Synchronization is recommended before disabling the trace registers
to prevent any start or stop points being speculative at the point
of disabling the unit (section 7.3.77 of ARM IHI 0064D).
Synchronization is also recommended after programming the trace
registers to ensure all updates are committed prior to normal code
resuming (section 4.3.7 of ARM IHI 0064D).
Let's ensure these syncronization points are present in the code
and clearly commented.
Note that we could rely on the barriers in CS_LOCK and
coresight_disclaim_device_unlocked or the context switch to user
space - however coresight may be of use in the kernel.
On armv8 the mb macro is defined as dsb(sy) - Given that the etm4x is
only used on armv8 let's directly use dsb(sy) instead of mb(). This
removes some ambiguity and makes it easier to correlate the code with
the TRM.
Signed-off-by: Andrew Murray <andrew.murray(a)arm.com>
Reviewed-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
CC: stable(a)vger.kernel.org
---
drivers/hwtracing/coresight/coresight-etm4x.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.c b/drivers/hwtracing/coresight/coresight-etm4x.c
index 7ad15651e069..ec9468880c71 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x.c
@@ -188,6 +188,13 @@ static int etm4_enable_hw(struct etmv4_drvdata *drvdata)
dev_err(etm_dev,
"timeout while waiting for Idle Trace Status\n");
+ /*
+ * As recommended by section 4.3.7 ("Synchronization when using the
+ * memory-mapped interface") of ARM IHI 0064D
+ */
+ dsb(sy);
+ isb();
+
done:
CS_LOCK(drvdata->base);
@@ -453,8 +460,12 @@ static void etm4_disable_hw(void *info)
/* EN, bit[0] Trace unit enable bit */
control &= ~0x1;
- /* make sure everything completes before disabling */
- mb();
+ /*
+ * Make sure everything completes before disabling, as recommended
+ * by section 7.3.77 ("TRCVICTLR, ViewInst Main Control Register,
+ * SSTATUS") of ARM IHI 0064D
+ */
+ dsb(sy);
isb();
writel_relaxed(control, drvdata->base + TRCPRGCTLR);
--
2.21.0
This is a note to let you know that I've just added the patch titled
driver core: platform: return -ENXIO for missing GpioInt
to my driver-core git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git
in the driver-core-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 46c42d844211ef5902e32aa507beac0817c585e9 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Mon, 29 Jul 2019 13:49:54 -0700
Subject: driver core: platform: return -ENXIO for missing GpioInt
Commit daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in
platform_get_irq()") broke the Embedded Controller driver on most LPC
Chromebooks (i.e., most x86 Chromebooks), because cros_ec_lpc expects
platform_get_irq() to return -ENXIO for non-existent IRQs.
Unfortunately, acpi_dev_gpio_irq_get() doesn't follow this convention
and returns -ENOENT instead. So we get this error from cros_ec_lpc:
couldn't retrieve IRQ number (-2)
I see a variety of drivers that treat -ENXIO specially, so rather than
fix all of them, let's fix up the API to restore its previous behavior.
I reported this on v2 of this patch:
https://lore.kernel.org/lkml/20190220180538.GA42642@google.com/
but apparently the patch had already been merged before v3 got sent out:
https://lore.kernel.org/lkml/20190221193429.161300-1-egranata@chromium.org/
and the result is that the bug landed and remains unfixed.
I differ from the v3 patch by:
* allowing for ret==0, even though acpi_dev_gpio_irq_get() specifically
documents (and enforces) that 0 is not a valid return value (noted on
the v3 review)
* adding a small comment
Reported-by: Brian Norris <briannorris(a)chromium.org>
Reported-by: Salvatore Bellizzi <salvatore.bellizzi(a)linux.seppia.net>
Cc: Enrico Granata <egranata(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Fixes: daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in platform_get_irq()")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Acked-by: Enrico Granata <egranata(a)google.com>
Link: https://lore.kernel.org/r/20190729204954.25510-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/platform.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 506a0175a5a7..ec974ba9c0c4 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -157,8 +157,13 @@ int platform_get_irq(struct platform_device *dev, unsigned int num)
* the device will only expose one IRQ, and this fallback
* allows a common code path across either kind of resource.
*/
- if (num == 0 && has_acpi_companion(&dev->dev))
- return acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
+ if (num == 0 && has_acpi_companion(&dev->dev)) {
+ int ret = acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
+
+ /* Our callers expect -ENXIO for missing IRQs. */
+ if (ret >= 0 || ret == -EPROBE_DEFER)
+ return ret;
+ }
return -ENXIO;
#endif
--
2.22.0
With recent changes in AOSP, adb is now using asynchronous I/O.
While adb works good for the most part, there have been issues with
adb root/unroot commands which cause adb hang. The issue is caused
by a request being queued twice. A series of 3 patches from
Felipe Balbi in upstream tree fixes this issue.
Felipe Balbi (3):
usb: dwc3: gadget: add dwc3_request status tracking
usb: dwc3: gadget: prevent dwc3_request from being queued twice
usb: dwc3: gadget: remove req->started flag
drivers/usb/dwc3/core.h | 11 +++++++++--
drivers/usb/dwc3/gadget.c | 9 ++++++++-
drivers/usb/dwc3/gadget.h | 4 ++--
3 files changed, 19 insertions(+), 5 deletions(-)
--
1.9.1
This is a note to let you know that I've just added the patch titled
usb: typec: tcpm: Add NULL check before dereferencing config
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 1957de95d425d1c06560069dc7277a73a8b28683 Mon Sep 17 00:00:00 2001
From: Guenter Roeck <linux(a)roeck-us.net>
Date: Wed, 24 Jul 2019 07:38:32 -0700
Subject: usb: typec: tcpm: Add NULL check before dereferencing config
When instantiating tcpm on an NXP OM 13588 board with NXP PTN5110,
the following crash is seen when writing into the 'preferred_role'
sysfs attribute.
Unable to handle kernel NULL pointer dereference at virtual address 00000028
pgd = f69149ad
[00000028] *pgd=00000000
Internal error: Oops: 5 [#1] THUMB2
Modules linked in: tcpci tcpm
CPU: 0 PID: 1882 Comm: bash Not tainted 5.1.18-sama5-armv7-r2 #4
Hardware name: Atmel SAMA5
PC is at tcpm_try_role+0x3a/0x4c [tcpm]
LR is at tcpm_try_role+0x15/0x4c [tcpm]
pc : [<bf8000e2>] lr : [<bf8000bd>] psr: 60030033
sp : dc1a1e88 ip : c03fb47d fp : 00000000
r10: dc216190 r9 : dc1a1f78 r8 : 00000001
r7 : df4ae044 r6 : dd032e90 r5 : dd1ce340 r4 : df4ae054
r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : df4ae044
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment none
Control: 50c53c7d Table: 3efec059 DAC: 00000051
Process bash (pid: 1882, stack limit = 0x6a6d4aa5)
Stack: (0xdc1a1e88 to 0xdc1a2000)
1e80: dd05d808 dd1ce340 00000001 00000007 dd1ce340 c03fb4a7
1ea0: 00000007 00000007 dc216180 00000000 00000000 c01e1e03 00000000 00000000
1ec0: c0907008 dee98b40 c01e1d5d c06106c4 00000000 00000000 00000007 c0194e8b
1ee0: 0000000a 00000400 00000000 c01a97db dc22bf00 ffffe000 df4b6a00 df745900
1f00: 00000001 00000001 000000dd c01a9c2f 7aeab3be c0907008 00000000 dc22bf00
1f20: c0907008 00000000 00000000 00000000 00000000 7aeab3be 00000007 dee98b40
1f40: 005dc318 dc1a1f78 00000000 00000000 00000007 c01969f7 0000000a c01a20cb
1f60: dee98b40 c0907008 dee98b40 005dc318 00000000 c0196b9b 00000000 00000000
1f80: dee98b40 7aeab3be 00000074 005dc318 b6f3bdb0 00000004 c0101224 dc1a0000
1fa0: 00000004 c0101001 00000074 005dc318 00000001 005dc318 00000007 00000000
1fc0: 00000074 005dc318 b6f3bdb0 00000004 00000007 00000007 00000000 00000000
1fe0: 00000004 be800880 b6ed35b3 b6e5c746 60030030 00000001 00000000 00000000
[<bf8000e2>] (tcpm_try_role [tcpm]) from [<c03fb4a7>] (preferred_role_store+0x2b/0x5c)
[<c03fb4a7>] (preferred_role_store) from [<c01e1e03>] (kernfs_fop_write+0xa7/0x150)
[<c01e1e03>] (kernfs_fop_write) from [<c0194e8b>] (__vfs_write+0x1f/0x104)
[<c0194e8b>] (__vfs_write) from [<c01969f7>] (vfs_write+0x6b/0x104)
[<c01969f7>] (vfs_write) from [<c0196b9b>] (ksys_write+0x43/0x94)
[<c0196b9b>] (ksys_write) from [<c0101001>] (ret_fast_syscall+0x1/0x62)
Since commit 96232cbc6c994 ("usb: typec: tcpm: support get typec and pd
config from device properties"), the 'config' pointer in struct tcpc_dev
is optional when registering a Type-C port. Since it is optional, we have
to check if it is NULL before dereferencing it.
Reported-by: Douglas Gilbert <dgilbert(a)interlog.com>
Cc: Douglas Gilbert <dgilbert(a)interlog.com>
Fixes: 96232cbc6c994 ("usb: typec: tcpm: support get typec and pd config from device properties")
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Jun Li <jun.li(a)nxp.com>
Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Link: https://lore.kernel.org/r/1563979112-22483-1-git-send-email-linux@roeck-us.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/typec/tcpm/tcpm.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index fba32d84e578..77f71f602f73 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -379,7 +379,8 @@ static enum tcpm_state tcpm_default_state(struct tcpm_port *port)
return SNK_UNATTACHED;
else if (port->try_role == TYPEC_SOURCE)
return SRC_UNATTACHED;
- else if (port->tcpc->config->default_role == TYPEC_SINK)
+ else if (port->tcpc->config &&
+ port->tcpc->config->default_role == TYPEC_SINK)
return SNK_UNATTACHED;
/* Fall through to return SRC_UNATTACHED */
} else if (port->port_type == TYPEC_PORT_SNK) {
@@ -4114,7 +4115,7 @@ static int tcpm_try_role(const struct typec_capability *cap, int role)
mutex_lock(&port->lock);
if (tcpc->try_role)
ret = tcpc->try_role(tcpc, role);
- if (!ret && !tcpc->config->try_role_hw)
+ if (!ret && (!tcpc->config || !tcpc->config->try_role_hw))
port->try_role = role;
port->try_src_count = 0;
port->try_snk_count = 0;
@@ -4701,7 +4702,7 @@ static int tcpm_copy_caps(struct tcpm_port *port,
port->typec_caps.prefer_role = tcfg->default_role;
port->typec_caps.type = tcfg->type;
port->typec_caps.data = tcfg->data;
- port->self_powered = port->tcpc->config->self_powered;
+ port->self_powered = tcfg->self_powered;
return 0;
}
--
2.22.0
Even in source code of this driver there is an author's description:
/*
* Even if we have an I2C bus, we can't assume that the cable
* is disconnected if drm_probe_ddc fails. Some cables don't
* wire the DDC pins, or the I2C bus might not be working at
* all.
*/
That's true. DDC and VGA channels are independent, and therefore
we cannot decide whether the monitor is connected or not,
depending on the information from the DDC.
So the monitor should always be considered connected.
Thus there is no reason to use connector detect callback for this
driver: DRM sub-system considers monitor always connected if there
is no detect() callback registered with drm_connector_init().
How to reproduce the bug:
* setup: i.MX8QXP, LCDIF video module + gpu/drm/mxsfb driver,
adv712x VGA DAC + dumb-vga-dac driver, VGA-connector w/o DDC;
* try to use drivers chain mxsfb-drm + dumb-vga-dac;
* any DRM applications consider the monitor is not connected:
===========
$ weston-start
$ cat /var/log/weston.log
...
DRM: head 'VGA-1' found, connector 32 is disconnected.
...
$ cat /sys/devices/platform/5a180000.lcdif/drm/card0/card0-VGA-1/status
unknown
===========
Oleksandr Suvorov (1):
drm/bridge: vga-dac: Fix detect of monitor connection
drivers/gpu/drm/bridge/dumb-vga-dac.c | 18 ------------------
1 file changed, 18 deletions(-)
--
2.20.1
Hello Greg,
Can you please consider including the following patches in the stable
linux-4.14.y branch?
An NFS server accepts only a limited number of concurrent v4.1+ mounts. Once
that limit is reached, on the affected client side, mount.nfs appears to hang to
keep reissuing CREATE_SESSION calls until one of them succeeds. This is to bump
the limit, also return smaller ca_maxrequests as the limit approaches instead of
waiting till we have to fail CREATE_SESSION completely.
44d8660d3bb0("nfsd: increase DRC cache limit")
de766e570413("nfsd: give out fewer session slots as limit approaches")
c54f24e338ed("nfsd: fix performance-limiting session calculation")
Thanks,
Qian Lu
Hi,
I forgot to mark a few patches for io_uring as stable. In order
of how to apply, can you add the following commits for 5.2?
f7b76ac9d17e16e44feebb6d2749fec92bfd6dd4
c0e48f9dea9129aa11bec3ed13803bcc26e96e49
bd11b3a391e3df6fa958facbe4b3f9f4cca9bd49
36703247d5f52a679df9da51192b6950fe81689f
Thanks!
--
Jens Axboe
The constraint from the zpool use of z3fold_destroy_pool() is there are no
outstanding handles to memory (so no active allocations), but it is possible
for there to be outstanding work on either of the two wqs in the pool.
If there is work queued on pool->compact_workqueue when it is called,
z3fold_destroy_pool() will do:
z3fold_destroy_pool()
destroy_workqueue(pool->release_wq)
destroy_workqueue(pool->compact_wq)
drain_workqueue(pool->compact_wq)
do_compact_page(zhdr)
kref_put(&zhdr->refcount)
__release_z3fold_page(zhdr, ...)
queue_work_on(pool->release_wq, &pool->work) *BOOM*
So compact_wq needs to be destroyed before release_wq.
Fixes: 5d03a6613957 ("mm/z3fold.c: use kref to prevent page free/compact race")
Signed-off-by: Henry Burns <henryburns(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
mm/z3fold.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/z3fold.c b/mm/z3fold.c
index 1a029a7432ee..43de92f52961 100644
--- a/mm/z3fold.c
+++ b/mm/z3fold.c
@@ -818,8 +818,15 @@ static void z3fold_destroy_pool(struct z3fold_pool *pool)
{
kmem_cache_destroy(pool->c_handle);
z3fold_unregister_migration(pool);
- destroy_workqueue(pool->release_wq);
+
+ /*
+ * We need to destroy pool->compact_wq before pool->release_wq,
+ * as any pending work on pool->compact_wq will call
+ * queue_work(pool->release_wq, &pool->work).
+ */
+
destroy_workqueue(pool->compact_wq);
+ destroy_workqueue(pool->release_wq);
kfree(pool);
}
--
2.22.0.709.g102302147b-goog
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ee006eb00a00198d21dad60696318fd443a59f88 Mon Sep 17 00:00:00 2001
From: Lyude Paul <lyude(a)redhat.com>
Date: Thu, 20 Jun 2019 19:21:26 -0400
Subject: [PATCH] drm/amdgpu: Don't skip display settings in hwmgr_resume()
I'm not entirely sure why this is, but for some reason:
921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed")
Breaks runtime PM resume on the Radeon PRO WX 3100 (Lexa) in one the
pre-production laptops I have. The issue manifests as the following
messages in dmesg:
[drm] UVD and UVD ENC initialized successfully.
amdgpu 0000:3b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vce1 test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vce_v3_0> failed -110
[drm:amdgpu_device_resume [amdgpu]] *ERROR* amdgpu_device_ip_resume failed (-110).
And happens after about 6-10 runtime PM suspend/resume cycles (sometimes
sooner, if you're lucky!). Unfortunately I can't seem to pin down
precisely which part in psm_adjust_power_state_dynamic that is causing
the issue, but not skipping the display setting setup seems to fix it.
Hopefully if there is a better fix for this, this patch will spark
discussion around it.
Fixes: 921935dc6404 ("drm/amd/powerplay: enforce display related settings only on needed")
Cc: Evan Quan <evan.quan(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Huang Rui <ray.huang(a)amd.com>
Cc: Rex Zhu <Rex.Zhu(a)amd.com>
Cc: Likun Gao <Likun.Gao(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v5.1+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
index 0b0d83c2a678..a24beaa4fb01 100644
--- a/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
+++ b/drivers/gpu/drm/amd/powerplay/hwmgr/hwmgr.c
@@ -327,7 +327,7 @@ int hwmgr_resume(struct pp_hwmgr *hwmgr)
if (ret)
return ret;
- ret = psm_adjust_power_state_dynamic(hwmgr, true, NULL);
+ ret = psm_adjust_power_state_dynamic(hwmgr, false, NULL);
return ret;
}
Hi Greg,
Second attempt on this topic [1]:
"""
Upstream commmit 0eb77e988032 ("vmstat: make vmstat_updater deferrable
again and shut down on idle") was back ported in v4.4.178
(bdf3c006b9a2). For -rt we definitely need the bugfix f01f17d3705b
("mm, vmstat: make quiet_vmstat lighter") as well.
Since the offending patch was back ported to v4.4 stable only, the
other stable branches don't need an update (offending patch and bug
fix are already in).
"""
Though I missed a dependency as Jon noted[2]. The missing patch is
587198ba5206 ("vmstat: Remove BUG_ON from vmstat_update"). I've tested
this on a Tegra K1 one board which exposed the bug. With this should
be fine.
While at it, I looked on all relevant changes for
vmstat_updated(). These two patches are the only relevant changes
which are missing. It seems almost all changes from mainline have made
it back to v4.
Could you please queue the above patches for v4.4.y?
Thanks,
Daniel
[1] https://lore.kernel.org/stable/20190513061237.4915-1-wagi@monom.org
[2] https://lore.kernel.org/stable/f32de22f-c928-2eaa-ee3f-d2b26c184dd4@nvidia.…
Christoph Lameter (1):
vmstat: Remove BUG_ON from vmstat_update
Michal Hocko (1):
mm, vmstat: make quiet_vmstat lighter
mm/vmstat.c | 80 +++++++++++++++++++++++++++++++----------------------
1 file changed, 47 insertions(+), 33 deletions(-)
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ca6bf264f6d856f959c4239cda1047b587745c67 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:08:21 -0700
Subject: [PATCH] libnvdimm/bus: Fix wait_nvdimm_bus_probe_idle() ABBA deadlock
A multithreaded namespace creation/destruction stress test currently
deadlocks with the following lockup signature:
INFO: task ndctl:2924 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2924 1176 0x00000000
Call Trace:
? __schedule+0x27e/0x780
schedule+0x30/0xb0
wait_nvdimm_bus_probe_idle+0x8a/0xd0 [libnvdimm]
? finish_wait+0x80/0x80
uuid_store+0xe6/0x2e0 [libnvdimm]
kernfs_fop_write+0xf0/0x1a0
vfs_write+0xb7/0x1b0
ksys_write+0x5c/0xd0
do_syscall_64+0x60/0x240
INFO: task ndctl:2923 blocked for more than 122 seconds.
Tainted: G OE 5.2.0-rc4+ #3382
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ndctl D 0 2923 1175 0x00000000
Call Trace:
? __schedule+0x27e/0x780
? __mutex_lock+0x489/0x910
schedule+0x30/0xb0
schedule_preempt_disabled+0x11/0x20
__mutex_lock+0x48e/0x910
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
? __lock_acquire+0x23f/0x1710
? nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
nvdimm_namespace_common_probe+0x95/0x4d0 [libnvdimm]
__dax_pmem_probe+0x5e/0x210 [dax_pmem_core]
? nvdimm_bus_probe+0x1d0/0x2c0 [libnvdimm]
dax_pmem_probe+0xc/0x20 [dax_pmem]
nvdimm_bus_probe+0x90/0x2c0 [libnvdimm]
really_probe+0xef/0x390
driver_probe_device+0xb4/0x100
In this sequence an 'nd_dax' device is being probed and trying to take
the lock on its backing namespace to validate that the 'nd_dax' device
indeed has exclusive access to the backing namespace. Meanwhile, another
thread is trying to update the uuid property of that same backing
namespace. So one thread is in the probe path trying to acquire the
lock, and the other thread has acquired the lock and tries to flush the
probe path.
Fix this deadlock by not holding the namespace device_lock over the
wait_nvdimm_bus_probe_idle() synchronization step. In turn this requires
the device_lock to be held on entry to wait_nvdimm_bus_probe_idle() and
subsequently dropped internally to wait_nvdimm_bus_probe_idle().
Cc: <stable(a)vger.kernel.org>
Fixes: bf9bccc14c05 ("libnvdimm: pmem label sets and namespace instantiation")
Cc: Vishal Verma <vishal.l.verma(a)intel.com>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341210094.292348.2384694131126767789.stgit@dwi…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index a38572bf486b..df41f3571dc9 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -887,10 +887,12 @@ void wait_nvdimm_bus_probe_idle(struct device *dev)
do {
if (nvdimm_bus->probe_active == 0)
break;
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
wait_event(nvdimm_bus->wait,
nvdimm_bus->probe_active == 0);
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
} while (true);
}
@@ -1016,7 +1018,7 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
case ND_CMD_ARS_START:
case ND_CMD_CLEAR_ERROR:
case ND_CMD_CALL:
- dev_dbg(&nvdimm_bus->dev, "'%s' command while read-only.\n",
+ dev_dbg(dev, "'%s' command while read-only.\n",
nvdimm ? nvdimm_cmd_name(cmd)
: nvdimm_bus_cmd_name(cmd));
return -EPERM;
@@ -1105,7 +1107,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
goto out;
}
- nvdimm_bus_lock(&nvdimm_bus->dev);
+ device_lock(dev);
+ nvdimm_bus_lock(dev);
rc = nd_cmd_clear_to_send(nvdimm_bus, nvdimm, func, buf);
if (rc)
goto out_unlock;
@@ -1125,7 +1128,8 @@ static int __nd_ioctl(struct nvdimm_bus *nvdimm_bus, struct nvdimm *nvdimm,
rc = -EFAULT;
out_unlock:
- nvdimm_bus_unlock(&nvdimm_bus->dev);
+ nvdimm_bus_unlock(dev);
+ device_unlock(dev);
out:
kfree(in_env);
kfree(out_env);
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index 4fed9ce9c2fe..a15276cdec7d 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -422,10 +422,12 @@ static ssize_t available_size_show(struct device *dev,
* memory nvdimm_bus_lock() is dropped, but that's userspace's
* problem to not race itself.
*/
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_available_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
@@ -437,10 +439,12 @@ static ssize_t max_available_extent_show(struct device *dev,
struct nd_region *nd_region = to_nd_region(dev);
unsigned long long available = 0;
+ device_lock(dev);
nvdimm_bus_lock(dev);
wait_nvdimm_bus_probe_idle(dev);
available = nd_region_allocatable_dpa(nd_region);
nvdimm_bus_unlock(dev);
+ device_unlock(dev);
return sprintf(buf, "%llu\n", available);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8aac0e2338916e273ccbd438a2b7a1e8c61749f5 Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:58 -0700
Subject: [PATCH] libnvdimm/bus: Prevent duplicate device_unregister() calls
A multithreaded namespace creation/destruction stress test currently
fails with signatures like the following:
sysfs group 'power' not found for kobject 'dax1.1'
RIP: 0010:sysfs_remove_group+0x76/0x80
Call Trace:
device_del+0x73/0x370
device_unregister+0x16/0x50
nd_async_device_unregister+0x1e/0x30 [libnvdimm]
async_run_entry_fn+0x39/0x160
process_one_work+0x23c/0x5e0
worker_thread+0x3c/0x390
BUG: kernel NULL pointer dereference, address: 0000000000000020
RIP: 0010:klist_put+0x1b/0x6c
Call Trace:
klist_del+0xe/0x10
device_del+0x8a/0x2c9
? __switch_to_asm+0x34/0x70
? __switch_to_asm+0x40/0x70
device_unregister+0x44/0x4f
nd_async_device_unregister+0x22/0x2d [libnvdimm]
async_run_entry_fn+0x47/0x15a
process_one_work+0x1a2/0x2eb
worker_thread+0x1b8/0x26e
Use the kill_device() helper to atomically resolve the race of multiple
threads issuing kill, device_unregister(), requests.
Reported-by: Jane Chu <jane.chu(a)oracle.com>
Reported-by: Erwin Tsaur <erwin.tsaur(a)oracle.com>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Link: https://github.com/pmem/ndctl/issues/96
Tested-by: Tested-by: Jane Chu <jane.chu(a)oracle.com>
Link: https://lore.kernel.org/r/156341207846.292348.10435719262819764054.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 2dca3034fee0..42713b210f51 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -547,13 +547,38 @@ EXPORT_SYMBOL(nd_device_register);
void nd_device_unregister(struct device *dev, enum nd_async_mode mode)
{
+ bool killed;
+
switch (mode) {
case ND_ASYNC:
+ /*
+ * In the async case this is being triggered with the
+ * device lock held and the unregistration work needs to
+ * be moved out of line iff this is thread has won the
+ * race to schedule the deletion.
+ */
+ if (!kill_device(dev))
+ return;
+
get_device(dev);
async_schedule_domain(nd_async_device_unregister, dev,
&nd_async_domain);
break;
case ND_SYNC:
+ /*
+ * In the sync case the device is being unregistered due
+ * to a state change of the parent. Claim the kill state
+ * to synchronize against other unregistration requests,
+ * or otherwise let the async path handle it if the
+ * unregistration was already queued.
+ */
+ device_lock(dev);
+ killed = kill_device(dev);
+ device_unlock(dev);
+
+ if (!killed)
+ return;
+
nd_synchronize();
device_unregister(dev);
break;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 00289cd87676e14913d2d8492d1ce05c4baafdae Mon Sep 17 00:00:00 2001
From: Dan Williams <dan.j.williams(a)intel.com>
Date: Wed, 17 Jul 2019 18:07:53 -0700
Subject: [PATCH] drivers/base: Introduce kill_device()
The libnvdimm subsystem arranges for devices to be destroyed as a result
of a sysfs operation. Since device_unregister() cannot be called from
an actively running sysfs attribute of the same device libnvdimm
arranges for device_unregister() to be performed in an out-of-line async
context.
The driver core maintains a 'dead' state for coordinating its own racing
async registration / de-registration requests. Rather than add local
'dead' state tracking infrastructure to libnvdimm device objects, export
the existing state tracking via a new kill_device() helper.
The kill_device() helper simply marks the device as dead, i.e. that it
is on its way to device_del(), or returns that the device was already
dead. This can be used in advance of calling device_unregister() for
subsystems like libnvdimm that might need to handle multiple user
threads racing to delete a device.
This refactoring does not change any behavior, but it is a pre-requisite
for follow-on fixes and therefore marked for -stable.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Fixes: 4d88a97aa9e8 ("libnvdimm, nvdimm: dimm driver and base libnvdimm device-driver...")
Cc: <stable(a)vger.kernel.org>
Tested-by: Jane Chu <jane.chu(a)oracle.com>
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/156341207332.292348.14959761496009347574.stgit@dw…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..eaf3aa0cb803 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2211,6 +2211,24 @@ void put_device(struct device *dev)
}
EXPORT_SYMBOL_GPL(put_device);
+bool kill_device(struct device *dev)
+{
+ /*
+ * Require the device lock and set the "dead" flag to guarantee that
+ * the update behavior is consistent with the other bitfields near
+ * it and that we cannot have an asynchronous probe routine trying
+ * to run while we are tearing out the bus/class/sysfs from
+ * underneath the device.
+ */
+ lockdep_assert_held(&dev->mutex);
+
+ if (dev->p->dead)
+ return false;
+ dev->p->dead = true;
+ return true;
+}
+EXPORT_SYMBOL_GPL(kill_device);
+
/**
* device_del - delete device from system.
* @dev: device.
@@ -2230,15 +2248,8 @@ void device_del(struct device *dev)
struct kobject *glue_dir = NULL;
struct class_interface *class_intf;
- /*
- * Hold the device lock and set the "dead" flag to guarantee that
- * the update behavior is consistent with the other bitfields near
- * it and that we cannot have an asynchronous probe routine trying
- * to run while we are tearing out the bus/class/sysfs from
- * underneath the device.
- */
device_lock(dev);
- dev->p->dead = true;
+ kill_device(dev);
device_unlock(dev);
/* Notify clients of device removal. This call must come
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..0da5c67f6be1 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -1373,6 +1373,7 @@ extern int (*platform_notify_remove)(struct device *dev);
*/
extern struct device *get_device(struct device *dev);
extern void put_device(struct device *dev);
+extern bool kill_device(struct device *dev);
#ifdef CONFIG_DEVTMPFS
extern int devtmpfs_create_node(struct device *dev);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9eed17d37c77171cf5ffb95c4257f87df3cd4c8f Mon Sep 17 00:00:00 2001
From: Chris Wilson <chris(a)chris-wilson.co.uk>
Date: Sat, 20 Jul 2019 19:08:48 +0100
Subject: [PATCH] iommu/iova: Remove stale cached32_node
Since the cached32_node is allowed to be advanced above dma_32bit_pfn
(to provide a shortcut into the limited range), we need to be careful to
remove the to be freed node if it is the cached32_node.
[ 48.477773] BUG: KASAN: use-after-free in __cached_rbnode_delete_update+0x68/0x110
[ 48.477812] Read of size 8 at addr ffff88870fc19020 by task kworker/u8:1/37
[ 48.477843]
[ 48.477879] CPU: 1 PID: 37 Comm: kworker/u8:1 Tainted: G U 5.2.0+ #735
[ 48.477915] Hardware name: Intel Corporation NUC7i5BNK/NUC7i5BNB, BIOS BNKBL357.86A.0052.2017.0918.1346 09/18/2017
[ 48.478047] Workqueue: i915 __i915_gem_free_work [i915]
[ 48.478075] Call Trace:
[ 48.478111] dump_stack+0x5b/0x90
[ 48.478137] print_address_description+0x67/0x237
[ 48.478178] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478212] __kasan_report.cold.3+0x1c/0x38
[ 48.478240] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478280] ? __cached_rbnode_delete_update+0x68/0x110
[ 48.478308] __cached_rbnode_delete_update+0x68/0x110
[ 48.478344] private_free_iova+0x2b/0x60
[ 48.478378] iova_magazine_free_pfns+0x46/0xa0
[ 48.478403] free_iova_fast+0x277/0x340
[ 48.478443] fq_ring_free+0x15a/0x1a0
[ 48.478473] queue_iova+0x19c/0x1f0
[ 48.478597] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.478712] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.478826] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.478940] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.479053] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.479081] ? __sg_free_table+0x9e/0xf0
[ 48.479116] ? kfree+0x7f/0x150
[ 48.479234] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.479352] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.479465] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.479579] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.479607] process_one_work+0x495/0x710
[ 48.479642] worker_thread+0x4c7/0x6f0
[ 48.479687] ? process_one_work+0x710/0x710
[ 48.479724] kthread+0x1b2/0x1d0
[ 48.479774] ? kthread_create_worker_on_cpu+0xa0/0xa0
[ 48.479820] ret_from_fork+0x1f/0x30
[ 48.479864]
[ 48.479907] Allocated by task 631:
[ 48.479944] save_stack+0x19/0x80
[ 48.479994] __kasan_kmalloc.constprop.6+0xc1/0xd0
[ 48.480038] kmem_cache_alloc+0x91/0xf0
[ 48.480082] alloc_iova+0x2b/0x1e0
[ 48.480125] alloc_iova_fast+0x58/0x376
[ 48.480166] intel_alloc_iova+0x90/0xc0
[ 48.480214] intel_map_sg+0xde/0x1f0
[ 48.480343] i915_gem_gtt_prepare_pages+0xb8/0x170 [i915]
[ 48.480465] huge_get_pages+0x232/0x2b0 [i915]
[ 48.480590] ____i915_gem_object_get_pages+0x40/0xb0 [i915]
[ 48.480712] __i915_gem_object_get_pages+0x90/0xa0 [i915]
[ 48.480834] i915_gem_object_prepare_write+0x2d6/0x330 [i915]
[ 48.480955] create_test_object.isra.54+0x1a9/0x3e0 [i915]
[ 48.481075] igt_shared_ctx_exec+0x365/0x3c0 [i915]
[ 48.481210] __i915_subtests.cold.4+0x30/0x92 [i915]
[ 48.481341] __run_selftests.cold.3+0xa9/0x119 [i915]
[ 48.481466] i915_live_selftests+0x3c/0x70 [i915]
[ 48.481583] i915_pci_probe+0xe7/0x220 [i915]
[ 48.481620] pci_device_probe+0xe0/0x180
[ 48.481665] really_probe+0x163/0x4e0
[ 48.481710] device_driver_attach+0x85/0x90
[ 48.481750] __driver_attach+0xa5/0x180
[ 48.481796] bus_for_each_dev+0xda/0x130
[ 48.481831] bus_add_driver+0x205/0x2e0
[ 48.481882] driver_register+0xca/0x140
[ 48.481927] do_one_initcall+0x6c/0x1af
[ 48.481970] do_init_module+0x106/0x350
[ 48.482010] load_module+0x3d2c/0x3ea0
[ 48.482058] __do_sys_finit_module+0x110/0x180
[ 48.482102] do_syscall_64+0x62/0x1f0
[ 48.482147] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 48.482190]
[ 48.482224] Freed by task 37:
[ 48.482273] save_stack+0x19/0x80
[ 48.482318] __kasan_slab_free+0x12e/0x180
[ 48.482363] kmem_cache_free+0x70/0x140
[ 48.482406] __free_iova+0x1d/0x30
[ 48.482445] fq_ring_free+0x15a/0x1a0
[ 48.482490] queue_iova+0x19c/0x1f0
[ 48.482624] cleanup_page_dma.isra.64+0x62/0xb0 [i915]
[ 48.482749] __gen8_ppgtt_cleanup+0x63/0x80 [i915]
[ 48.482873] __gen8_ppgtt_cleanup+0x42/0x80 [i915]
[ 48.482999] __gen8_ppgtt_clear+0x433/0x4b0 [i915]
[ 48.483123] __gen8_ppgtt_clear+0x462/0x4b0 [i915]
[ 48.483250] i915_vma_unbind+0x1e2/0x240 [i915]
[ 48.483378] i915_vma_destroy+0x3a/0x280 [i915]
[ 48.483500] __i915_gem_free_objects+0xf0/0x2d0 [i915]
[ 48.483622] __i915_gem_free_work+0x41/0xa0 [i915]
[ 48.483659] process_one_work+0x495/0x710
[ 48.483704] worker_thread+0x4c7/0x6f0
[ 48.483748] kthread+0x1b2/0x1d0
[ 48.483787] ret_from_fork+0x1f/0x30
[ 48.483831]
[ 48.483868] The buggy address belongs to the object at ffff88870fc19000
[ 48.483868] which belongs to the cache iommu_iova of size 40
[ 48.483920] The buggy address is located 32 bytes inside of
[ 48.483920] 40-byte region [ffff88870fc19000, ffff88870fc19028)
[ 48.483964] The buggy address belongs to the page:
[ 48.484006] page:ffffea001c3f0600 refcount:1 mapcount:0 mapping:ffff8888181a91c0 index:0x0 compound_mapcount: 0
[ 48.484045] flags: 0x8000000000010200(slab|head)
[ 48.484096] raw: 8000000000010200 ffffea001c421a08 ffffea001c447e88 ffff8888181a91c0
[ 48.484141] raw: 0000000000000000 0000000000120012 00000001ffffffff 0000000000000000
[ 48.484188] page dumped because: kasan: bad access detected
[ 48.484230]
[ 48.484265] Memory state around the buggy address:
[ 48.484314] ffff88870fc18f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484361] ffff88870fc18f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484406] >ffff88870fc19000: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
[ 48.484451] ^
[ 48.484494] ffff88870fc19080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 48.484530] ffff88870fc19100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108602
Fixes: e60aa7b53845 ("iommu/iova: Extend rbtree node caching")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: <stable(a)vger.kernel.org> # v4.15+
Reviewed-by: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 8413ae54904a..3e1a8a675572 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -137,8 +137,9 @@ __cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free)
struct iova *cached_iova;
cached_iova = rb_entry(iovad->cached32_node, struct iova, node);
- if (free->pfn_hi < iovad->dma_32bit_pfn &&
- free->pfn_lo >= cached_iova->pfn_lo) {
+ if (free == cached_iova ||
+ (free->pfn_hi < iovad->dma_32bit_pfn &&
+ free->pfn_lo >= cached_iova->pfn_lo)) {
iovad->cached32_node = rb_next(&free->node);
iovad->max32_alloc_size = iovad->dma_32bit_pfn;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From da0ef93310e67ae6902efded60b6724dab27a5d1 Mon Sep 17 00:00:00 2001
From: Suraj Jitindar Singh <sjitindarsingh(a)gmail.com>
Date: Wed, 10 Jul 2019 15:20:18 +1000
Subject: [PATCH] powerpc/mm: Limit rma_size to 1TB when running without HV
mode
The virtual real mode addressing (VRMA) mechanism is used when a
partition is using HPT (Hash Page Table) translation and performs real
mode accesses (MSR[IR|DR] = 0) in non-hypervisor mode. In this mode
effective address bits 0:23 are treated as zero (i.e. the access is
aliased to 0) and the access is performed using an implicit 1TB SLB
entry.
The size of the RMA (Real Memory Area) is communicated to the guest as
the size of the first memory region in the device tree. And because of
the mechanism described above can be expected to not exceed 1TB. In
the event that the host erroneously represents the RMA as being larger
than 1TB, guest accesses in real mode to memory addresses above 1TB
will be aliased down to below 1TB. This means that a memory access
performed in real mode may differ to one performed in virtual mode for
the same memory address, which would likely have unintended
consequences.
To avoid this outcome have the guest explicitly limit the size of the
RMA to the current maximum, which is 1TB. This means that even if the
first memory block is larger than 1TB, only the first 1TB should be
accessed in real mode.
Fixes: c610d65c0ad0 ("powerpc/pseries: lift RTAS limit for hash")
Cc: stable(a)vger.kernel.org # v4.16+
Signed-off-by: Suraj Jitindar Singh <sjitindarsingh(a)gmail.com>
Tested-by: Satheesh Rajendran <sathnaga(a)linux.vnet.ibm.com>
Reviewed-by: David Gibson <david(a)gibson.dropbear.id.au>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20190710052018.14628-1-sjitindarsingh@gmail.com
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 9a5963e07a82..b8ad14bb1170 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -1899,11 +1899,20 @@ void hash__setup_initial_memory_limit(phys_addr_t first_memblock_base,
*
* For guests on platforms before POWER9, we clamp the it limit to 1G
* to avoid some funky things such as RTAS bugs etc...
+ *
+ * On POWER9 we limit to 1TB in case the host erroneously told us that
+ * the RMA was >1TB. Effective address bits 0:23 are treated as zero
+ * (meaning the access is aliased to zero i.e. addr = addr % 1TB)
+ * for virtual real mode addressing and so it doesn't make sense to
+ * have an area larger than 1TB as it can't be addressed.
*/
if (!early_cpu_has_feature(CPU_FTR_HVMODE)) {
ppc64_rma_size = first_memblock_size;
if (!early_cpu_has_feature(CPU_FTR_ARCH_300))
ppc64_rma_size = min_t(u64, ppc64_rma_size, 0x40000000);
+ else
+ ppc64_rma_size = min_t(u64, ppc64_rma_size,
+ 1UL << SID_SHIFT_1T);
/* Finally limit subsequent allocations */
memblock_set_current_limit(ppc64_rma_size);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 42c16da6d684391db83788eb680accd84f6c2083 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 1 Jul 2019 05:12:46 +0000
Subject: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
As btrfs(5) specified:
Note
If nodatacow or nodatasum are enabled, compression is disabled.
If NODATASUM or NODATACOW set, we should not compress the extent.
Normally NODATACOW is detected properly in run_delalloc_range() so
compression won't happen for NODATACOW.
However for NODATASUM we don't have any check, and it can cause
compressed extent without csum pretty easily, just by:
mkfs.btrfs -f $dev
mount $dev $mnt -o nodatasum
touch $mnt/foobar
mount -o remount,datasum,compress $mnt
xfs_io -f -c "pwrite 0 128K" $mnt/foobar
And in fact, we have a bug report about corrupted compressed extent
without proper data checksum so even RAID1 can't recover the corruption.
(https://bugzilla.kernel.org/show_bug.cgi?id=199707)
Running compression without proper checksum could cause more damage when
corruption happens, as compressed data could make the whole extent
unreadable, so there is no need to allow compression for
NODATACSUM.
The fix will refactor the inode compression check into two parts:
- inode_can_compress()
As the hard requirement, checked at btrfs_run_delalloc_range(), so no
compression will happen for NODATASUM inode at all.
- inode_need_compress()
As the soft requirement, checked at btrfs_run_delalloc_range() and
compress_file_range().
Reported-by: James Harvey <jamespharvey20(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1af069a9a0c7..ee582a36653d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -395,10 +395,31 @@ static noinline int add_async_extent(struct async_chunk *cow,
return 0;
}
+/*
+ * Check if the inode has flags compatible with compression
+ */
+static inline bool inode_can_compress(struct inode *inode)
+{
+ if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
+ BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
+ return false;
+ return true;
+}
+
+/*
+ * Check if the inode needs to be submitted to compression, based on mount
+ * options, defragmentation, properties or heuristics.
+ */
static inline int inode_need_compress(struct inode *inode, u64 start, u64 end)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+ if (!inode_can_compress(inode)) {
+ WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+ KERN_ERR "BTRFS: unexpected compression for ino %llu\n",
+ btrfs_ino(BTRFS_I(inode)));
+ return 0;
+ }
/* force compress */
if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
return 1;
@@ -1631,7 +1652,8 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page,
} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
ret = run_delalloc_nocow(inode, locked_page, start, end,
page_started, 0, nr_written);
- } else if (!inode_need_compress(inode, start, end)) {
+ } else if (!inode_can_compress(inode) ||
+ !inode_need_compress(inode, start, end)) {
ret = cow_file_range(inode, locked_page, start, end, end,
page_started, nr_written, 1, NULL);
} else {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 42c16da6d684391db83788eb680accd84f6c2083 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu(a)suse.com>
Date: Mon, 1 Jul 2019 05:12:46 +0000
Subject: [PATCH] btrfs: inode: Don't compress if NODATASUM or NODATACOW set
As btrfs(5) specified:
Note
If nodatacow or nodatasum are enabled, compression is disabled.
If NODATASUM or NODATACOW set, we should not compress the extent.
Normally NODATACOW is detected properly in run_delalloc_range() so
compression won't happen for NODATACOW.
However for NODATASUM we don't have any check, and it can cause
compressed extent without csum pretty easily, just by:
mkfs.btrfs -f $dev
mount $dev $mnt -o nodatasum
touch $mnt/foobar
mount -o remount,datasum,compress $mnt
xfs_io -f -c "pwrite 0 128K" $mnt/foobar
And in fact, we have a bug report about corrupted compressed extent
without proper data checksum so even RAID1 can't recover the corruption.
(https://bugzilla.kernel.org/show_bug.cgi?id=199707)
Running compression without proper checksum could cause more damage when
corruption happens, as compressed data could make the whole extent
unreadable, so there is no need to allow compression for
NODATACSUM.
The fix will refactor the inode compression check into two parts:
- inode_can_compress()
As the hard requirement, checked at btrfs_run_delalloc_range(), so no
compression will happen for NODATASUM inode at all.
- inode_need_compress()
As the soft requirement, checked at btrfs_run_delalloc_range() and
compress_file_range().
Reported-by: James Harvey <jamespharvey20(a)gmail.com>
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 1af069a9a0c7..ee582a36653d 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -395,10 +395,31 @@ static noinline int add_async_extent(struct async_chunk *cow,
return 0;
}
+/*
+ * Check if the inode has flags compatible with compression
+ */
+static inline bool inode_can_compress(struct inode *inode)
+{
+ if (BTRFS_I(inode)->flags & BTRFS_INODE_NODATACOW ||
+ BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)
+ return false;
+ return true;
+}
+
+/*
+ * Check if the inode needs to be submitted to compression, based on mount
+ * options, defragmentation, properties or heuristics.
+ */
static inline int inode_need_compress(struct inode *inode, u64 start, u64 end)
{
struct btrfs_fs_info *fs_info = btrfs_sb(inode->i_sb);
+ if (!inode_can_compress(inode)) {
+ WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG),
+ KERN_ERR "BTRFS: unexpected compression for ino %llu\n",
+ btrfs_ino(BTRFS_I(inode)));
+ return 0;
+ }
/* force compress */
if (btrfs_test_opt(fs_info, FORCE_COMPRESS))
return 1;
@@ -1631,7 +1652,8 @@ int btrfs_run_delalloc_range(struct inode *inode, struct page *locked_page,
} else if (BTRFS_I(inode)->flags & BTRFS_INODE_PREALLOC && !force_cow) {
ret = run_delalloc_nocow(inode, locked_page, start, end,
page_started, 0, nr_written);
- } else if (!inode_need_compress(inode, start, end)) {
+ } else if (!inode_can_compress(inode) ||
+ !inode_need_compress(inode, start, end)) {
ret = cow_file_range(inode, locked_page, start, end, end,
page_started, nr_written, 1, NULL);
} else {
Few patches were recently marked for stable@ but commits are not
backportable as-is and require a few tweaks. Here is 4.19 stable backport.
Jan Kiszka (1):
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Paolo Bonzini (1):
KVM: nVMX: do not use dangling shadow VMCS after guest reset
arch/x86/kvm/vmx.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--
2.20.1
Ever since the conversion of DAX to the Xarray a RocksDB benchmark has
been encountering intermittent lockups. In the failing case a thread
that is taking a PMD-fault is awaiting a wakeup while holding the
'mmap_sem' for read. As soon as the next mmap() event occurs that tries
to take the 'mmap_sem' for write it causes ps(1) and any new 'mmap_sem'
reader to block.
Debug shows that there are no outstanding Xarray entry-lock holders in
the hang state which indicates that a PTE lock-holder thread caused a
PMD thread to wait. When the PTE index-lock is released it may wake the
wrong waitqueue depending on how the index hashes. Brute-force fix this
by arranging for PTE-aligned indices within a PMD-span to hash to the
same waitqueue as the PMD-index.
This fix may increase waitqueue contention, but a fix for that is saved
for a larger rework. In the meantime this fix is suitable for -stable
backports.
Link: https://lore.kernel.org/linux-fsdevel/CAPcyv4hwHpX-MkUEqxwdTj7wCCZCN4RV-L4j…>
Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray")
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Boaz Harrosh <openosd(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Reported-by: Robert Barror <robert.barror(a)intel.com>
Reported-by: Seema Pandit <seema.pandit(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
fs/dax.c | 34 ++++++++++++----------------------
1 file changed, 12 insertions(+), 22 deletions(-)
diff --git a/fs/dax.c b/fs/dax.c
index 9fd908f3df32..592944c522b8 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -144,19 +144,14 @@ struct wait_exceptional_entry_queue {
struct exceptional_entry_key key;
};
-static wait_queue_head_t *dax_entry_waitqueue(struct xa_state *xas,
- void *entry, struct exceptional_entry_key *key)
+static wait_queue_head_t *dax_index_waitqueue(struct xa_state *xas,
+ struct exceptional_entry_key *key)
{
unsigned long hash;
unsigned long index = xas->xa_index;
- /*
- * If 'entry' is a PMD, align the 'index' that we use for the wait
- * queue to the start of that PMD. This ensures that all offsets in
- * the range covered by the PMD map to the same bit lock.
- */
- if (dax_is_pmd_entry(entry))
- index &= ~PG_PMD_COLOUR;
+ /* PMD-align the index to ensure PTE events wakeup PMD waiters */
+ index &= ~PG_PMD_COLOUR;
key->xa = xas->xa;
key->entry_start = index;
@@ -177,17 +172,12 @@ static int wake_exceptional_entry_func(wait_queue_entry_t *wait,
return autoremove_wake_function(wait, mode, sync, NULL);
}
-/*
- * @entry may no longer be the entry at the index in the mapping.
- * The important information it's conveying is whether the entry at
- * this index used to be a PMD entry.
- */
-static void dax_wake_entry(struct xa_state *xas, void *entry, bool wake_all)
+static void dax_wake_index(struct xa_state *xas, bool wake_all)
{
struct exceptional_entry_key key;
wait_queue_head_t *wq;
- wq = dax_entry_waitqueue(xas, entry, &key);
+ wq = dax_index_waitqueue(xas, &key);
/*
* Checking for locked entry and prepare_to_wait_exclusive() happens
@@ -222,7 +212,7 @@ static void *get_unlocked_entry(struct xa_state *xas)
!dax_is_locked(entry))
return entry;
- wq = dax_entry_waitqueue(xas, entry, &ewait.key);
+ wq = dax_index_waitqueue(xas, &ewait.key);
prepare_to_wait_exclusive(wq, &ewait.wait,
TASK_UNINTERRUPTIBLE);
xas_unlock_irq(xas);
@@ -246,7 +236,7 @@ static void wait_entry_unlocked(struct xa_state *xas, void *entry)
init_wait(&ewait.wait);
ewait.wait.func = wake_exceptional_entry_func;
- wq = dax_entry_waitqueue(xas, entry, &ewait.key);
+ wq = dax_index_waitqueue(xas, &ewait.key);
/*
* Unlike get_unlocked_entry() there is no guarantee that this
* path ever successfully retrieves an unlocked entry before an
@@ -263,7 +253,7 @@ static void put_unlocked_entry(struct xa_state *xas, void *entry)
{
/* If we were the only waiter woken, wake the next one */
if (entry)
- dax_wake_entry(xas, entry, false);
+ dax_wake_index(xas, false);
}
/*
@@ -281,7 +271,7 @@ static void dax_unlock_entry(struct xa_state *xas, void *entry)
old = xas_store(xas, entry);
xas_unlock_irq(xas);
BUG_ON(!dax_is_locked(old));
- dax_wake_entry(xas, entry, false);
+ dax_wake_index(xas, false);
}
/*
@@ -522,7 +512,7 @@ static void *grab_mapping_entry(struct xa_state *xas,
dax_disassociate_entry(entry, mapping, false);
xas_store(xas, NULL); /* undo the PMD join */
- dax_wake_entry(xas, entry, true);
+ dax_wake_index(xas, true);
mapping->nrexceptional--;
entry = NULL;
xas_set(xas, index);
@@ -915,7 +905,7 @@ static int dax_writeback_one(struct xa_state *xas, struct dax_device *dax_dev,
xas_lock_irq(xas);
xas_store(xas, entry);
xas_clear_mark(xas, PAGECACHE_TAG_DIRTY);
- dax_wake_entry(xas, entry, false);
+ dax_wake_index(xas, false);
trace_dax_writeback_one(mapping->host, index, count);
return ret;
This is the start of the stable review cycle for the 5.2.4 release.
There are 66 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 28 Jul 2019 03:21:13 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.4-rc1.…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.2.4-rc1
Damien Le Moal <damien.lemoal(a)wdc.com>
block: Limit zone array allocation size
Damien Le Moal <damien.lemoal(a)wdc.com>
sd_zbc: Fix report zones buffer allocation
Paolo Bonzini <pbonzini(a)redhat.com>
Revert "kvm: x86: Use task structs fpu field for user"
Jan Kiszka <jan.kiszka(a)siemens.com>
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: nVMX: do not use dangling shadow VMCS after guest reset
Theodore Ts'o <tytso(a)mit.edu>
ext4: allow directory holes
Ross Zwisler <zwisler(a)chromium.org>
ext4: use jbd2_inode dirty range scoping
Ross Zwisler <zwisler(a)chromium.org>
jbd2: introduce jbd2_inode dirty range scoping
Ross Zwisler <zwisler(a)chromium.org>
mm: add filemap_fdatawait_range_keep_errors()
Theodore Ts'o <tytso(a)mit.edu>
ext4: enforce the immutable flag on open files
Darrick J. Wong <darrick.wong(a)oracle.com>
ext4: don't allow any modifications to an immutable file
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix race between close() and fork()
Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
perf/core: Fix exclusive events' grouping
Song Liu <songliubraving(a)fb.com>
perf script: Assume native_arch for pipe mode
Paul Cercueil <paul(a)crapouillou.net>
MIPS: lb60: Fix pin mappings
Keerthy <j-keerthy(a)ti.com>
gpio: davinci: silence error prints in case of EPROBE_DEFER
Nishka Dasgupta <nishkadg.linux(a)gmail.com>
gpiolib: of: fix a memory leak in of_gpio_flags_quirks()
Linus Walleij <linus.walleij(a)linaro.org>
Revert "gpio/spi: Fix spi-gpio regression on active high CS"
Chris Wilson <chris(a)chris-wilson.co.uk>
dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
Jérôme Glisse <jglisse(a)redhat.com>
dma-buf: balance refcount inbalance
Ido Schimmel <idosch(a)mellanox.com>
mlxsw: spectrum: Do not process learned records with a dummy FID
Maor Gottlieb <maorg(a)mellanox.com>
net/mlx5: E-Switch, Fix default encap mode
Petr Machata <petrm(a)mellanox.com>
mlxsw: spectrum_dcb: Configure DSCP map as the last rule is removed
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix VNIC accounting when enabling aRFS on 57500 chips.
Aya Levin <ayal(a)mellanox.com>
net/mlx5e: Fix error flow in tx reporter diagnose
Aya Levin <ayal(a)mellanox.com>
net/mlx5e: Fix return value from timeout recover function
Saeed Mahameed <saeedm(a)mellanox.com>
net/mlx5e: Rx, Fix checksum calculation for new hardware
Eli Britstein <elibr(a)mellanox.com>
net/mlx5e: Fix port tunnel GRE entropy control
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: reject offload of TLS 1.3
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: fix poll ignoring partially copied records
Frank de Brabander <debrabander(a)gmail.com>
selftests: txring_overwrite: fix incorrect test of mmap() return value
Cong Wang <xiyou.wangcong(a)gmail.com>
netrom: hold sock when setting skb->destructor
Cong Wang <xiyou.wangcong(a)gmail.com>
netrom: fix a memory leak in nr_rx_frame()
Andreas Steinmetz <ast(a)domdv.de>
macsec: fix checksumming after decryption
Andreas Steinmetz <ast(a)domdv.de>
macsec: fix use-after-free of skb during RX
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: stp: don't cache eth dest pointer before skb pull
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: don't cache ether dest pointer on input
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
Aya Levin <ayal(a)mellanox.com>
net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
Peter Kosyh <p.kosyh(a)gmail.com>
vrf: make sure skb->data contains ip header to make routing
Christoph Paasch <cpaasch(a)apple.com>
tcp: Reset bytes_acked and bytes_received when disconnecting
Eric Dumazet <edumazet(a)google.com>
tcp: fix tcp_set_congestion_control() use from bpf hook
Eric Dumazet <edumazet(a)google.com>
tcp: be more careful in tcp_fragment()
Takashi Iwai <tiwai(a)suse.de>
sky2: Disable MSI on ASUS P6T
Xin Long <lucien.xin(a)gmail.com>
sctp: not bind the socket in sctp_connect
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: fix error handling on stream scheduler initialization
David Howells <dhowells(a)redhat.com>
rxrpc: Fix send on a connected, but unbound socket
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
Yang Wei <albin_yang(a)163.com>
nfc: fix potential illegal memory access
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: make sure offload also gets the keys wiped
Jose Abreu <Jose.Abreu(a)synopsys.com>
net: stmmac: Re-work the queue selection for TSO packets
Cong Wang <xiyou.wangcong(a)gmail.com>
net_sched: unset TCQ_F_CAN_BYPASS when adding filters
Andrew Lunn <andrew(a)lunn.ch>
net: phy: sfp: hwmon: Fix scaling of RX power
John Hurley <john.hurley(a)netronome.com>
net: openvswitch: fix csum updates for MPLS actions
Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
net: neigh: fix multiple neigh timer scheduling
Florian Westphal <fw(a)strlen.de>
net: make skb_dst_force return true when dst is refcounted
Baruch Siach <baruch(a)tkos.co.il>
net: dsa: mv88e6xxx: wait after reset deactivation
Justin Chen <justinpopo6(a)gmail.com>
net: bcmgenet: use promisc for unsupported filters
Ido Schimmel <idosch(a)mellanox.com>
ipv6: Unlink sibling route in case of failure
David Ahern <dsahern(a)gmail.com>
ipv6: rt6_check should return NULL if 'from' is NULL
Matteo Croce <mcroce(a)redhat.com>
ipv4: don't set IPv6 only flags to IPv4 addresses
Eric Dumazet <edumazet(a)google.com>
igmp: fix memory leak in igmpv3_del_delrec()
Haiyang Zhang <haiyangz(a)microsoft.com>
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
Taehee Yoo <ap420073(a)gmail.com>
caif-hsi: fix possible deadlock in cfhsi_exit_module()
Brian King <brking(a)linux.vnet.ibm.com>
bnx2x: Prevent load reordering in tx completion processing
-------------
Diffstat:
Makefile | 4 +-
arch/mips/jz4740/board-qi_lb60.c | 16 +--
arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/kvm/vmx/nested.c | 10 +-
arch/x86/kvm/x86.c | 4 +-
block/blk-zoned.c | 46 ++++---
drivers/dma-buf/dma-buf.c | 1 +
drivers/dma-buf/reservation.c | 4 +
drivers/gpio/gpio-davinci.c | 5 +-
drivers/gpio/gpiolib-of.c | 10 +-
drivers/net/caif/caif_hsi.c | 2 +-
drivers/net/dsa/mv88e6xxx/chip.c | 2 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 7 +-
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 57 ++++-----
drivers/net/ethernet/marvell/sky2.c | 7 ++
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
.../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 10 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 +-
drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 -
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 7 ++
.../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 9 +-
.../net/ethernet/mellanox/mlx5/core/lib/port_tun.c | 23 +---
drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 1 +
drivers/net/ethernet/mellanox/mlxsw/spectrum_dcb.c | 16 +--
drivers/net/ethernet/mellanox/mlxsw/spectrum_fid.c | 10 ++
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 6 +
drivers/net/ethernet/realtek/r8169.c | 137 +++++++++++++++++++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 28 +++--
drivers/net/hyperv/netvsc_drv.c | 1 -
drivers/net/macsec.c | 6 +-
drivers/net/phy/sfp.c | 2 +-
drivers/net/vrf.c | 58 +++++----
drivers/scsi/sd_zbc.c | 104 +++++++++++-----
fs/ext4/dir.c | 19 ++-
fs/ext4/ext4_jbd2.h | 12 +-
fs/ext4/file.c | 4 +
fs/ext4/inode.c | 24 +++-
fs/ext4/ioctl.c | 46 ++++++-
fs/ext4/move_extent.c | 3 +-
fs/ext4/namei.c | 45 +++++--
fs/jbd2/commit.c | 23 +++-
fs/jbd2/journal.c | 4 +
fs/jbd2/transaction.c | 49 ++++----
include/linux/blkdev.h | 5 +
include/linux/fs.h | 2 +
include/linux/jbd2.h | 22 ++++
include/linux/mlx5/mlx5_ifc.h | 3 +-
include/linux/perf_event.h | 5 +
include/net/dst.h | 5 +-
include/net/tcp.h | 8 +-
include/net/tls.h | 1 +
kernel/events/core.c | 83 ++++++++++---
mm/filemap.c | 22 ++++
net/bridge/br_input.c | 8 +-
net/bridge/br_multicast.c | 23 ++--
net/bridge/br_stp_bpdu.c | 3 +-
net/core/filter.c | 2 +-
net/core/neighbour.c | 2 +
net/ipv4/devinet.c | 8 ++
net/ipv4/igmp.c | 8 +-
net/ipv4/tcp.c | 6 +-
net/ipv4/tcp_cong.c | 6 +-
net/ipv4/tcp_output.c | 13 +-
net/ipv6/ip6_fib.c | 18 ++-
net/ipv6/route.c | 2 +-
net/netfilter/nf_queue.c | 6 +-
net/netrom/af_netrom.c | 4 +-
net/nfc/nci/data.c | 2 +-
net/openvswitch/actions.c | 6 +-
net/rxrpc/af_rxrpc.c | 4 +-
net/sched/cls_api.c | 1 +
net/sched/sch_fq_codel.c | 2 -
net/sched/sch_sfq.c | 2 -
net/sctp/socket.c | 24 +---
net/sctp/stream.c | 9 +-
net/tls/tls_device.c | 10 +-
net/tls/tls_main.c | 4 +-
net/tls/tls_sw.c | 3 +-
tools/perf/builtin-script.c | 3 +-
tools/testing/selftests/net/txring_overwrite.c | 2 +-
82 files changed, 850 insertions(+), 335 deletions(-)
In case of AEAD decryption verifcation error we were using the
wrong value to zero out the plaintext buffer leaving the end of
the buffer with the false plaintext.
Signed-off-by: Gilad Ben-Yossef <gilad(a)benyossef.com>
Fixes: ff27e85a85bb ("crypto: ccree - add AEAD support")
CC: stable(a)vger.kernel.org # v4.17+
---
drivers/crypto/ccree/cc_aead.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/ccree/cc_aead.c b/drivers/crypto/ccree/cc_aead.c
index 19abb872329c..8a6c825d40e8 100644
--- a/drivers/crypto/ccree/cc_aead.c
+++ b/drivers/crypto/ccree/cc_aead.c
@@ -268,7 +268,7 @@ static void cc_aead_complete(struct device *dev, void *cc_req, int err)
/* In case of payload authentication failure, MUST NOT
* revealed the decrypted message --> zero its memory.
*/
- cc_zero_sgl(areq->dst, areq_ctx->cryptlen);
+ cc_zero_sgl(areq->dst, areq->cryptlen);
err = -EBADMSG;
}
/*ENCRYPT*/
--
2.21.0
When fall-through warnings was enabled by default, commit d93512ef0f0e
("Makefile: Globally enable fall-through warning"), the following
warnings was starting to show up:
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_format_get_bpp’:
../drivers/gpu/drm/arm/malidp_hw.c:387:8: warning: this statement may fall
through [-Wimplicit-fallthrough=]
bpp = 30;
~~~~^~~~
../drivers/gpu/drm/arm/malidp_hw.c:388:3: note: here
case DRM_FORMAT_YUV420_10BIT:
^~~~
../drivers/gpu/drm/arm/malidp_hw.c: In function ‘malidp_se_irq’:
../drivers/gpu/drm/arm/malidp_hw.c:1311:4: warning: this statement may fall
through [-Wimplicit-fallthrough=]
drm_writeback_signal_completion(&malidp->mw_connector, 0);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../drivers/gpu/drm/arm/malidp_hw.c:1313:3: note: here
case MW_START:
^~~~
Rework to add a 'break;' in a case that didn't have it so that
the compiler doesn't warn about fall-through.
Cc: stable(a)vger.kernel.org # v4.9+
Fixes: b8207562abdd ("drm/arm/malidp: Specified the rotation memory requirements for AFBC YUV formats")
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
drivers/gpu/drm/arm/malidp_hw.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/arm/malidp_hw.c b/drivers/gpu/drm/arm/malidp_hw.c
index 50af399d7f6f..dc5fff9af338 100644
--- a/drivers/gpu/drm/arm/malidp_hw.c
+++ b/drivers/gpu/drm/arm/malidp_hw.c
@@ -385,6 +385,7 @@ int malidp_format_get_bpp(u32 fmt)
switch (fmt) {
case DRM_FORMAT_VUY101010:
bpp = 30;
+ break;
case DRM_FORMAT_YUV420_10BIT:
bpp = 15;
break;
@@ -1309,7 +1310,7 @@ static irqreturn_t malidp_se_irq(int irq, void *arg)
break;
case MW_RESTART:
drm_writeback_signal_completion(&malidp->mw_connector, 0);
- /* fall through to a new start */
+ /* fall through */
case MW_START:
/* writeback started, need to emulate one-shot mode */
hw->disable_memwrite(hwdev);
--
2.20.1
Note, this will be the LAST 5.1.y kernel release. Everyone should move
to the 5.2.y series at this point in time.
This is the start of the stable review cycle for the 5.1.21 release.
There are 62 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 28 Jul 2019 03:21:13 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.1.21-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.1.21-rc1
Kuo-Hsin Yang <vovoy(a)chromium.org>
mm: vmscan: scan anonymous pages on file refaults
Damien Le Moal <damien.lemoal(a)wdc.com>
block: Limit zone array allocation size
Damien Le Moal <damien.lemoal(a)wdc.com>
sd_zbc: Fix report zones buffer allocation
Paolo Bonzini <pbonzini(a)redhat.com>
Revert "kvm: x86: Use task structs fpu field for user"
Jan Kiszka <jan.kiszka(a)siemens.com>
KVM: nVMX: Clear pending KVM_REQ_GET_VMCS12_PAGES when leaving nested
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: nVMX: do not use dangling shadow VMCS after guest reset
Theodore Ts'o <tytso(a)mit.edu>
ext4: allow directory holes
Ross Zwisler <zwisler(a)chromium.org>
ext4: use jbd2_inode dirty range scoping
Ross Zwisler <zwisler(a)chromium.org>
jbd2: introduce jbd2_inode dirty range scoping
Ross Zwisler <zwisler(a)chromium.org>
mm: add filemap_fdatawait_range_keep_errors()
Theodore Ts'o <tytso(a)mit.edu>
ext4: enforce the immutable flag on open files
Darrick J. Wong <darrick.wong(a)oracle.com>
ext4: don't allow any modifications to an immutable file
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix race between close() and fork()
Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
perf/core: Fix exclusive events' grouping
Song Liu <songliubraving(a)fb.com>
perf script: Assume native_arch for pipe mode
Paul Cercueil <paul(a)crapouillou.net>
MIPS: lb60: Fix pin mappings
Keerthy <j-keerthy(a)ti.com>
gpio: davinci: silence error prints in case of EPROBE_DEFER
Nishka Dasgupta <nishkadg.linux(a)gmail.com>
gpiolib: of: fix a memory leak in of_gpio_flags_quirks()
Chris Wilson <chris(a)chris-wilson.co.uk>
dma-buf: Discard old fence_excl on retrying get_fences_rcu for realloc
Jérôme Glisse <jglisse(a)redhat.com>
dma-buf: balance refcount inbalance
Aya Levin <ayal(a)mellanox.com>
net/mlx5e: Fix error flow in tx reporter diagnose
Aya Levin <ayal(a)mellanox.com>
net/mlx5e: Fix return value from timeout recover function
Saeed Mahameed <saeedm(a)mellanox.com>
net/mlx5e: Rx, Fix checksum calculation for new hardware
Eli Britstein <elibr(a)mellanox.com>
net/mlx5e: Fix port tunnel GRE entropy control
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: reject offload of TLS 1.3
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: fix poll ignoring partially copied records
Frank de Brabander <debrabander(a)gmail.com>
selftests: txring_overwrite: fix incorrect test of mmap() return value
Cong Wang <xiyou.wangcong(a)gmail.com>
netrom: hold sock when setting skb->destructor
Cong Wang <xiyou.wangcong(a)gmail.com>
netrom: fix a memory leak in nr_rx_frame()
Andreas Steinmetz <ast(a)domdv.de>
macsec: fix checksumming after decryption
Andreas Steinmetz <ast(a)domdv.de>
macsec: fix use-after-free of skb during RX
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: stp: don't cache eth dest pointer before skb pull
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: don't cache ether dest pointer on input
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: mcast: fix stale ipv6 hdr pointer when handling v6 query
Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
net: bridge: mcast: fix stale nsrcs pointer in igmp3/mld2 report handling
Aya Levin <ayal(a)mellanox.com>
net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn
Peter Kosyh <p.kosyh(a)gmail.com>
vrf: make sure skb->data contains ip header to make routing
Christoph Paasch <cpaasch(a)apple.com>
tcp: Reset bytes_acked and bytes_received when disconnecting
Eric Dumazet <edumazet(a)google.com>
tcp: fix tcp_set_congestion_control() use from bpf hook
Eric Dumazet <edumazet(a)google.com>
tcp: be more careful in tcp_fragment()
Takashi Iwai <tiwai(a)suse.de>
sky2: Disable MSI on ASUS P6T
Xin Long <lucien.xin(a)gmail.com>
sctp: not bind the socket in sctp_connect
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: fix error handling on stream scheduler initialization
David Howells <dhowells(a)redhat.com>
rxrpc: Fix send on a connected, but unbound socket
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix issue with confused RX unit after PHY power-down on RTL8411b
Yang Wei <albin_yang(a)163.com>
nfc: fix potential illegal memory access
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: make sure offload also gets the keys wiped
Jose Abreu <Jose.Abreu(a)synopsys.com>
net: stmmac: Re-work the queue selection for TSO packets
Cong Wang <xiyou.wangcong(a)gmail.com>
net_sched: unset TCQ_F_CAN_BYPASS when adding filters
Andrew Lunn <andrew(a)lunn.ch>
net: phy: sfp: hwmon: Fix scaling of RX power
John Hurley <john.hurley(a)netronome.com>
net: openvswitch: fix csum updates for MPLS actions
Lorenzo Bianconi <lorenzo.bianconi(a)redhat.com>
net: neigh: fix multiple neigh timer scheduling
Florian Westphal <fw(a)strlen.de>
net: make skb_dst_force return true when dst is refcounted
Baruch Siach <baruch(a)tkos.co.il>
net: dsa: mv88e6xxx: wait after reset deactivation
Justin Chen <justinpopo6(a)gmail.com>
net: bcmgenet: use promisc for unsupported filters
Ido Schimmel <idosch(a)mellanox.com>
ipv6: Unlink sibling route in case of failure
David Ahern <dsahern(a)gmail.com>
ipv6: rt6_check should return NULL if 'from' is NULL
Matteo Croce <mcroce(a)redhat.com>
ipv4: don't set IPv6 only flags to IPv4 addresses
Eric Dumazet <edumazet(a)google.com>
igmp: fix memory leak in igmpv3_del_delrec()
Haiyang Zhang <haiyangz(a)microsoft.com>
hv_netvsc: Fix extra rcu_read_unlock in netvsc_recv_callback()
Taehee Yoo <ap420073(a)gmail.com>
caif-hsi: fix possible deadlock in cfhsi_exit_module()
Brian King <brking(a)linux.vnet.ibm.com>
bnx2x: Prevent load reordering in tx completion processing
-------------
Diffstat:
Makefile | 4 +-
arch/mips/jz4740/board-qi_lb60.c | 16 +--
arch/x86/include/asm/kvm_host.h | 7 +-
arch/x86/kvm/vmx/nested.c | 10 +-
arch/x86/kvm/x86.c | 4 +-
block/blk-zoned.c | 46 ++++---
drivers/dma-buf/dma-buf.c | 1 +
drivers/dma-buf/reservation.c | 4 +
drivers/gpio/gpio-davinci.c | 5 +-
drivers/gpio/gpiolib-of.c | 1 +
drivers/net/caif/caif_hsi.c | 2 +-
drivers/net/dsa/mv88e6xxx/chip.c | 2 +
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 3 +
drivers/net/ethernet/broadcom/genet/bcmgenet.c | 57 ++++-----
drivers/net/ethernet/marvell/sky2.c | 7 ++
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
.../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 10 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 7 +-
.../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 9 +-
.../net/ethernet/mellanox/mlx5/core/lib/port_tun.c | 23 +---
drivers/net/ethernet/realtek/r8169.c | 137 +++++++++++++++++++++
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 29 +++--
drivers/net/hyperv/netvsc_drv.c | 1 -
drivers/net/macsec.c | 6 +-
drivers/net/phy/sfp.c | 2 +-
drivers/net/vrf.c | 58 +++++----
drivers/scsi/sd_zbc.c | 104 +++++++++++-----
fs/ext4/dir.c | 19 ++-
fs/ext4/ext4_jbd2.h | 12 +-
fs/ext4/file.c | 4 +
fs/ext4/inode.c | 24 +++-
fs/ext4/ioctl.c | 46 ++++++-
fs/ext4/move_extent.c | 3 +-
fs/ext4/namei.c | 45 +++++--
fs/jbd2/commit.c | 23 +++-
fs/jbd2/journal.c | 4 +
fs/jbd2/transaction.c | 49 ++++----
include/linux/blkdev.h | 5 +
include/linux/fs.h | 2 +
include/linux/jbd2.h | 22 ++++
include/linux/mlx5/mlx5_ifc.h | 3 +-
include/linux/perf_event.h | 5 +
include/net/dst.h | 5 +-
include/net/tcp.h | 8 +-
include/net/tls.h | 1 +
kernel/events/core.c | 83 ++++++++++---
mm/filemap.c | 22 ++++
mm/vmscan.c | 6 +-
net/bridge/br_input.c | 8 +-
net/bridge/br_multicast.c | 23 ++--
net/bridge/br_stp_bpdu.c | 3 +-
net/core/filter.c | 2 +-
net/core/neighbour.c | 2 +
net/ipv4/devinet.c | 8 ++
net/ipv4/igmp.c | 8 +-
net/ipv4/tcp.c | 6 +-
net/ipv4/tcp_cong.c | 6 +-
net/ipv4/tcp_output.c | 13 +-
net/ipv6/ip6_fib.c | 18 ++-
net/ipv6/route.c | 2 +-
net/netfilter/nf_queue.c | 6 +-
net/netrom/af_netrom.c | 4 +-
net/nfc/nci/data.c | 2 +-
net/openvswitch/actions.c | 6 +-
net/rxrpc/af_rxrpc.c | 4 +-
net/sched/cls_api.c | 1 +
net/sched/sch_fq_codel.c | 2 -
net/sched/sch_sfq.c | 2 -
net/sctp/socket.c | 24 +---
net/sctp/stream.c | 9 +-
net/tls/tls_device.c | 10 +-
net/tls/tls_main.c | 4 +-
net/tls/tls_sw.c | 3 +-
tools/perf/builtin-script.c | 3 +-
tools/testing/selftests/net/txring_overwrite.c | 2 +-
76 files changed, 816 insertions(+), 315 deletions(-)