Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: 277e4ab7d530 SUNRPC: Simplify TCP receive code by switching to using iterators.
The bot has tested the following trees: v4.20.0,
v4.20.0: Build failed! Errors:
net/sunrpc/xprtsock.c:396:24: error: ‘struct bio_vec’ has no member named ‘page’; did you mean ‘bv_page’?
How should we proceed with this patch?
--
Thanks,
Sasha
On Tue, Dec 11, 2018 at 03:29:31PM -0500, Dietmar May wrote:
>Sasha,
>
>I've verified that 4.9.143 no longer exhibits this problem.
Thanks for confirming!
>The revert hasn't shown up in 4.4 yet; but I'll verify once merged there.
There was no 4.4 release yet; it's coming, I promise :)
--
Thanks,
Sasha
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0568e82dbe2510fc1fa664f58e5c997d3f1e649e Mon Sep 17 00:00:00 2001
From: Josef Bacik <jbacik(a)fb.com>
Date: Fri, 30 Nov 2018 11:52:14 -0500
Subject: [PATCH] btrfs: run delayed items before dropping the snapshot
With my delayed refs patches in place we started seeing a large amount
of aborts in __btrfs_free_extent:
BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0
Call Trace:
? btrfs_merge_delayed_refs+0xaf/0x340
__btrfs_run_delayed_refs+0x6ea/0xfc0
? btrfs_set_path_blocking+0x31/0x60
btrfs_run_delayed_refs+0xeb/0x180
btrfs_commit_transaction+0x179/0x7f0
? btrfs_check_space_for_delayed_refs+0x30/0x50
? should_end_transaction.isra.19+0xe/0x40
btrfs_drop_snapshot+0x41c/0x7c0
btrfs_clean_one_deleted_snapshot+0xb5/0xd0
cleaner_kthread+0xf6/0x120
kthread+0xf8/0x130
? btree_invalidatepage+0x90/0x90
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
This was because btrfs_drop_snapshot depends on the root not being
modified while it's dropping the snapshot. It will unlock the root node
(and really every node) as it walks down the tree, only to re-lock it
when it needs to do something. This is a problem because if we modify
the tree we could cow a block in our path, which frees our reference to
that block. Then once we get back to that shared block we'll free our
reference to it again, and get ENOENT when trying to lookup our extent
reference to that block in __btrfs_free_extent.
This is ultimately happening because we have delayed items left to be
processed for our deleted snapshot _after_ all of the inodes are closed
for the snapshot. We only run the delayed inode item if we're deleting
the inode, and even then we do not run the delayed insertions or delayed
removals. These can be run at any point after our final inode does its
last iput, which is what triggers the snapshot deletion. We can end up
with the snapshot deletion happening and then have the delayed items run
on that file system, resulting in the above problem.
This problem has existed forever, however my patches made it much easier
to hit as I wake up the cleaner much more often to deal with delayed
iputs, which made us more likely to start the snapshot dropping work
before the transaction commits, which is when the delayed items would
generally be run. Before, generally speaking, we would run the delayed
items, commit the transaction, and wakeup the cleaner thread to start
deleting snapshots, which means we were less likely to hit this problem.
You could still hit it if you had multiple snapshots to be deleted and
ended up with lots of delayed items, but it was definitely harder.
Fix for now by simply running all the delayed items before starting to
drop the snapshot. We could make this smarter in the future by making
the delayed items per-root, and then simply drop any delayed items for
roots that we are going to delete. But for now just a quick and easy
solution is the safest.
CC: stable(a)vger.kernel.org # 4.4+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index de21e0c93eb6..8a9ce33dfdbc 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -9272,6 +9272,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
goto out_free;
}
+ err = btrfs_run_delayed_items(trans);
+ if (err)
+ goto out_end_trans;
+
if (block_rsv)
trans->block_rsv = block_rsv;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41bd60676923822de1df2c50b3f9a10171f4338a Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 28 Nov 2018 14:54:28 +0000
Subject: [PATCH] Btrfs: fix fsync of files with multiple hard links in new
directories
The log tree has a long standing problem that when a file is fsync'ed we
only check for new ancestors, created in the current transaction, by
following only the hard link for which the fsync was issued. We follow the
ancestors using the VFS' dget_parent() API. This means that if we create a
new link for a file in a directory that is new (or in an any other new
ancestor directory) and then fsync the file using an old hard link, we end
up not logging the new ancestor, and on log replay that new hard link and
ancestor do not exist. In some cases, involving renames, the file will not
exist at all.
Example:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt
mkdir /mnt/A
touch /mnt/foo
ln /mnt/foo /mnt/A/bar
xfs_io -c fsync /mnt/foo
<power failure>
In this example after log replay only the hard link named 'foo' exists
and directory A does not exist, which is unexpected. In other major linux
filesystems, such as ext4, xfs and f2fs for example, both hard links exist
and so does directory A after mounting again the filesystem.
Checking if any new ancestors are new and need to be logged was added in
2009 by commit 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes"),
however only for the ancestors of the hard link (dentry) for which the
fsync was issued, instead of checking for all ancestors for all of the
inode's hard links.
So fix this by tracking the id of the last transaction where a hard link
was created for an inode and then on fsync fallback to a full transaction
commit when an inode has more than one hard link and at least one new hard
link was created in the current transaction. This is the simplest solution
since this is not a common use case (adding frequently hard links for
which there's an ancestor created in the current transaction and then
fsync the file). In case it ever becomes a common use case, a solution
that consists of iterating the fs/subvol btree for each hard link and
check if any ancestor is new, could be implemented.
This solves many unexpected scenarios reported by Jayashree Mohan and
Vijay Chidambaram, and for which there is a new test case for fstests
under review.
Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
CC: stable(a)vger.kernel.org # 4.4+
Reported-by: Vijay Chidambaram <vvijay03(a)gmail.com>
Reported-by: Jayashree Mohan <jayashree2912(a)gmail.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/btrfs_inode.h b/fs/btrfs/btrfs_inode.h
index fc25607304f2..6f5d07415dab 100644
--- a/fs/btrfs/btrfs_inode.h
+++ b/fs/btrfs/btrfs_inode.h
@@ -147,6 +147,12 @@ struct btrfs_inode {
*/
u64 last_unlink_trans;
+ /*
+ * Track the transaction id of the last transaction used to create a
+ * hard link for the inode. This is used by the log tree (fsync).
+ */
+ u64 last_link_trans;
+
/*
* Number of bytes outstanding that are going to need csums. This is
* used in ENOSPC accounting.
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index d54bdef16d8d..b4129d9072ec 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -3658,6 +3658,21 @@ static int btrfs_read_locked_inode(struct inode *inode,
* inode is not a directory, logging its parent unnecessarily.
*/
BTRFS_I(inode)->last_unlink_trans = BTRFS_I(inode)->last_trans;
+ /*
+ * Similar reasoning for last_link_trans, needs to be set otherwise
+ * for a case like the following:
+ *
+ * mkdir A
+ * touch foo
+ * ln foo A/bar
+ * echo 2 > /proc/sys/vm/drop_caches
+ * fsync foo
+ * <power failure>
+ *
+ * Would result in link bar and directory A not existing after the power
+ * failure.
+ */
+ BTRFS_I(inode)->last_link_trans = BTRFS_I(inode)->last_trans;
path->slots[0]++;
if (inode->i_nlink != 1 ||
@@ -6597,6 +6612,7 @@ static int btrfs_link(struct dentry *old_dentry, struct inode *dir,
if (err)
goto fail;
}
+ BTRFS_I(inode)->last_link_trans = trans->transid;
d_instantiate(dentry, inode);
ret = btrfs_log_new_name(trans, BTRFS_I(inode), NULL, parent,
true, NULL);
@@ -9123,6 +9139,7 @@ struct inode *btrfs_alloc_inode(struct super_block *sb)
ei->index_cnt = (u64)-1;
ei->dir_index = 0;
ei->last_unlink_trans = 0;
+ ei->last_link_trans = 0;
ei->last_log_commit = 0;
spin_lock_init(&ei->lock);
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 013d0abcd46b..5baad9bebc62 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -5758,6 +5758,22 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
goto end_trans;
}
+ /*
+ * If a new hard link was added to the inode in the current transaction
+ * and its link count is now greater than 1, we need to fallback to a
+ * transaction commit, otherwise we can end up not logging all its new
+ * parents for all the hard links. Here just from the dentry used to
+ * fsync, we can not visit the ancestor inodes for all the other hard
+ * links to figure out if any is new, so we fallback to a transaction
+ * commit (instead of adding a lot of complexity of scanning a btree,
+ * since this scenario is not a common use case).
+ */
+ if (inode->vfs_inode.i_nlink > 1 &&
+ inode->last_link_trans > last_committed) {
+ ret = -EMLINK;
+ goto end_trans;
+ }
+
while (1) {
if (!parent || d_really_is_negative(parent) || sb != parent->d_sb)
break;
The patch below does not apply to the 4.20-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9349e23907be1954ccdf6d771d640e2788da1643 Mon Sep 17 00:00:00 2001
From: "Dmitry V. Levin" <ldv(a)altlinux.org>
Date: Thu, 1 Nov 2018 14:03:08 +0300
Subject: [PATCH] uapi: fix linux/kfd_ioctl.h userspace compilation errors
Consistently use types provided by <linux/types.h> via <drm/drm.h>
to fix the following linux/kfd_ioctl.h userspace compilation errors:
/usr/include/linux/kfd_ioctl.h:250:2: error: unknown type name 'uint32_t'
uint32_t reset_type;
/usr/include/linux/kfd_ioctl.h:251:2: error: unknown type name 'uint32_t'
uint32_t reset_cause;
/usr/include/linux/kfd_ioctl.h:252:2: error: unknown type name 'uint32_t'
uint32_t memory_lost;
/usr/include/linux/kfd_ioctl.h:253:2: error: unknown type name 'uint32_t'
uint32_t gpu_id;
Fixes: 0c119abad7f0d ("drm/amd: Add kfd ioctl defines for hw_exception event")
Cc: <stable(a)vger.kernel.org> # v4.19
Signed-off-by: Dmitry V. Levin <ldv(a)altlinux.org>
Reviewed-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f5ff8a76e208..dae897f38e59 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -255,10 +255,10 @@ struct kfd_hsa_memory_exception_data {
/* hw exception data */
struct kfd_hsa_hw_exception_data {
- uint32_t reset_type;
- uint32_t reset_cause;
- uint32_t memory_lost;
- uint32_t gpu_id;
+ __u32 reset_type;
+ __u32 reset_cause;
+ __u32 memory_lost;
+ __u32 gpu_id;
};
/* Event data */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9349e23907be1954ccdf6d771d640e2788da1643 Mon Sep 17 00:00:00 2001
From: "Dmitry V. Levin" <ldv(a)altlinux.org>
Date: Thu, 1 Nov 2018 14:03:08 +0300
Subject: [PATCH] uapi: fix linux/kfd_ioctl.h userspace compilation errors
Consistently use types provided by <linux/types.h> via <drm/drm.h>
to fix the following linux/kfd_ioctl.h userspace compilation errors:
/usr/include/linux/kfd_ioctl.h:250:2: error: unknown type name 'uint32_t'
uint32_t reset_type;
/usr/include/linux/kfd_ioctl.h:251:2: error: unknown type name 'uint32_t'
uint32_t reset_cause;
/usr/include/linux/kfd_ioctl.h:252:2: error: unknown type name 'uint32_t'
uint32_t memory_lost;
/usr/include/linux/kfd_ioctl.h:253:2: error: unknown type name 'uint32_t'
uint32_t gpu_id;
Fixes: 0c119abad7f0d ("drm/amd: Add kfd ioctl defines for hw_exception event")
Cc: <stable(a)vger.kernel.org> # v4.19
Signed-off-by: Dmitry V. Levin <ldv(a)altlinux.org>
Reviewed-by: Felix Kuehling <Felix.Kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f5ff8a76e208..dae897f38e59 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -255,10 +255,10 @@ struct kfd_hsa_memory_exception_data {
/* hw exception data */
struct kfd_hsa_hw_exception_data {
- uint32_t reset_type;
- uint32_t reset_cause;
- uint32_t memory_lost;
- uint32_t gpu_id;
+ __u32 reset_type;
+ __u32 reset_cause;
+ __u32 memory_lost;
+ __u32 gpu_id;
};
/* Event data */
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2cd4bd192ee94848695c1c052d87913260e10f36 Mon Sep 17 00:00:00 2001
From: Ram Pai <linuxram(a)us.ibm.com>
Date: Thu, 20 Dec 2018 12:03:30 -0800
Subject: [PATCH] powerpc/pkeys: Fix handling of pkey state across fork()
Protection key tracking information is not copied over to the
mm_struct of the child during fork(). This can cause the child to
erroneously allocate keys that were already allocated. Any allocated
execute-only key is lost aswell.
Add code; called by dup_mmap(), to copy the pkey state from parent to
child explicitly.
This problem was originally found by Dave Hansen on x86, which turns
out to be a problem on powerpc aswell.
Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
Cc: stable(a)vger.kernel.org # v4.16+
Reviewed-by: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Signed-off-by: Ram Pai <linuxram(a)us.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index c05efd2e8736..e687ed31d85a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -217,12 +217,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
#endif
}
-static inline int arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
-{
- return 0;
-}
-
#ifdef CONFIG_PPC_BOOK3E_64
static inline void arch_exit_mmap(struct mm_struct *mm)
{
@@ -247,6 +241,7 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
#ifdef CONFIG_PPC_MEM_KEYS
bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
bool execute, bool foreign);
+void arch_dup_pkeys(struct mm_struct *oldmm, struct mm_struct *mm);
#else /* CONFIG_PPC_MEM_KEYS */
static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
bool write, bool execute, bool foreign)
@@ -259,6 +254,7 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
#define thread_pkey_regs_save(thread)
#define thread_pkey_regs_restore(new_thread, old_thread)
#define thread_pkey_regs_init(thread)
+#define arch_dup_pkeys(oldmm, mm)
static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
{
@@ -267,5 +263,12 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
#endif /* CONFIG_PPC_MEM_KEYS */
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+ struct mm_struct *mm)
+{
+ arch_dup_pkeys(oldmm, mm);
+ return 0;
+}
+
#endif /* __KERNEL__ */
#endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 04b60a8f6e69..587807763737 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -415,3 +415,13 @@ bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
return pkey_access_permitted(vma_pkey(vma), write, execute);
}
+
+void arch_dup_pkeys(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return;
+
+ /* Duplicate the oldmm pkey state in mm: */
+ mm_pkey_allocation_map(mm) = mm_pkey_allocation_map(oldmm);
+ mm->context.execute_only_pkey = oldmm->context.execute_only_pkey;
+}
The patch below does not apply to the 4.20-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 2cd4bd192ee94848695c1c052d87913260e10f36 Mon Sep 17 00:00:00 2001
From: Ram Pai <linuxram(a)us.ibm.com>
Date: Thu, 20 Dec 2018 12:03:30 -0800
Subject: [PATCH] powerpc/pkeys: Fix handling of pkey state across fork()
Protection key tracking information is not copied over to the
mm_struct of the child during fork(). This can cause the child to
erroneously allocate keys that were already allocated. Any allocated
execute-only key is lost aswell.
Add code; called by dup_mmap(), to copy the pkey state from parent to
child explicitly.
This problem was originally found by Dave Hansen on x86, which turns
out to be a problem on powerpc aswell.
Fixes: cf43d3b26452 ("powerpc: Enable pkey subsystem")
Cc: stable(a)vger.kernel.org # v4.16+
Reviewed-by: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Signed-off-by: Ram Pai <linuxram(a)us.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index c05efd2e8736..e687ed31d85a 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -217,12 +217,6 @@ static inline void enter_lazy_tlb(struct mm_struct *mm,
#endif
}
-static inline int arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
-{
- return 0;
-}
-
#ifdef CONFIG_PPC_BOOK3E_64
static inline void arch_exit_mmap(struct mm_struct *mm)
{
@@ -247,6 +241,7 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
#ifdef CONFIG_PPC_MEM_KEYS
bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
bool execute, bool foreign);
+void arch_dup_pkeys(struct mm_struct *oldmm, struct mm_struct *mm);
#else /* CONFIG_PPC_MEM_KEYS */
static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
bool write, bool execute, bool foreign)
@@ -259,6 +254,7 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
#define thread_pkey_regs_save(thread)
#define thread_pkey_regs_restore(new_thread, old_thread)
#define thread_pkey_regs_init(thread)
+#define arch_dup_pkeys(oldmm, mm)
static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
{
@@ -267,5 +263,12 @@ static inline u64 pte_to_hpte_pkey_bits(u64 pteflags)
#endif /* CONFIG_PPC_MEM_KEYS */
+static inline int arch_dup_mmap(struct mm_struct *oldmm,
+ struct mm_struct *mm)
+{
+ arch_dup_pkeys(oldmm, mm);
+ return 0;
+}
+
#endif /* __KERNEL__ */
#endif /* __ASM_POWERPC_MMU_CONTEXT_H */
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 04b60a8f6e69..587807763737 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -415,3 +415,13 @@ bool arch_vma_access_permitted(struct vm_area_struct *vma, bool write,
return pkey_access_permitted(vma_pkey(vma), write, execute);
}
+
+void arch_dup_pkeys(struct mm_struct *oldmm, struct mm_struct *mm)
+{
+ if (static_branch_likely(&pkey_disabled))
+ return;
+
+ /* Duplicate the oldmm pkey state in mm: */
+ mm_pkey_allocation_map(mm) = mm_pkey_allocation_map(oldmm);
+ mm->context.execute_only_pkey = oldmm->context.execute_only_pkey;
+}
The patch below does not apply to the 3.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6f5b9f018f4c7686fd944d920209d1382d320e4e Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Mon, 26 Nov 2018 18:12:00 -0200
Subject: [PATCH] powerpc/tm: Unset MSR[TS] if not recheckpointing
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
There is a TM Bad Thing bug that can be caused when you return from a
signal context in a suspended transaction but with ucontext MSR[TS] unset.
This forces regs->msr[TS] to be set at syscall entrance (since the CPU
state is transactional). It also calls treclaim() to flush the transaction
state, which is done based on the live (mfmsr) MSR state.
Since user context MSR[TS] is not set, then restore_tm_sigcontexts() is not
called, thus, not executing recheckpoint, keeping the CPU state as not
transactional. When calling rfid, SRR1 will have MSR[TS] set, but the CPU
state is non transactional, causing the TM Bad Thing with the following
stack:
[ 33.862316] Bad kernel stack pointer 3fffd9dce3e0 at c00000000000c47c
cpu 0x8: Vector: 700 (Program Check) at [c00000003ff7fd40]
pc: c00000000000c47c: fast_exception_return+0xac/0xb4
lr: 00003fff865f442c
sp: 3fffd9dce3e0
msr: 8000000102a03031
current = 0xc00000041f68b700
paca = 0xc00000000fb84800 softe: 0 irq_happened: 0x01
pid = 1721, comm = tm-signal-sigre
Linux version 4.9.0-3-powerpc64le (debian-kernel(a)lists.debian.org) (gcc version 6.3.0 20170516 (Debian 6.3.0-18) ) #1 SMP Debian 4.9.30-2+deb9u2 (2017-06-26)
WARNING: exception is not recoverable, can't continue
The same problem happens on 32-bits signal handler, and the fix is very
similar, if tm_recheckpoint() is not executed, then regs->msr[TS] should be
zeroed.
This patch also fixes a sparse warning related to lack of indentation when
CONFIG_PPC_TRANSACTIONAL_MEM is set.
Fixes: 2b0a576d15e0e ("powerpc: Add new transactional memory state to the signal context")
CC: Stable <stable(a)vger.kernel.org> # 3.10+
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Tested-by: Michal Suchánek <msuchanek(a)suse.de>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/kernel/signal_32.c b/arch/powerpc/kernel/signal_32.c
index 7484f43493d3..2d47cc79e5b3 100644
--- a/arch/powerpc/kernel/signal_32.c
+++ b/arch/powerpc/kernel/signal_32.c
@@ -1158,11 +1158,11 @@ SYSCALL_DEFINE0(rt_sigreturn)
{
struct rt_sigframe __user *rt_sf;
struct pt_regs *regs = current_pt_regs();
+ int tm_restore = 0;
#ifdef CONFIG_PPC_TRANSACTIONAL_MEM
struct ucontext __user *uc_transact;
unsigned long msr_hi;
unsigned long tmp;
- int tm_restore = 0;
#endif
/* Always make any pending restarted system calls return -EINTR */
current->restart_block.fn = do_no_restart_syscall;
@@ -1210,11 +1210,19 @@ SYSCALL_DEFINE0(rt_sigreturn)
goto bad;
}
}
- if (!tm_restore)
- /* Fall through, for non-TM restore */
+ if (!tm_restore) {
+ /*
+ * Unset regs->msr because ucontext MSR TS is not
+ * set, and recheckpoint was not called. This avoid
+ * hitting a TM Bad thing at RFID
+ */
+ regs->msr &= ~MSR_TS_MASK;
+ }
+ /* Fall through, for non-TM restore */
#endif
- if (do_setcontext(&rt_sf->uc, regs, 1))
- goto bad;
+ if (!tm_restore)
+ if (do_setcontext(&rt_sf->uc, regs, 1))
+ goto bad;
/*
* It's not clear whether or why it is desirable to save the
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index ba093ec5a21f..0935fe6c282a 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -757,11 +757,23 @@ SYSCALL_DEFINE0(rt_sigreturn)
&uc_transact->uc_mcontext))
goto badframe;
}
- else
- /* Fall through, for non-TM restore */
#endif
- if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext))
- goto badframe;
+ /* Fall through, for non-TM restore */
+ if (!MSR_TM_ACTIVE(msr)) {
+ /*
+ * Unset MSR[TS] on the thread regs since MSR from user
+ * context does not have MSR active, and recheckpoint was
+ * not called since restore_tm_sigcontexts() was not called
+ * also.
+ *
+ * If not unsetting it, the code can RFID to userspace with
+ * MSR[TS] set, but without CPU in the proper state,
+ * causing a TM bad thing.
+ */
+ current->thread.regs->msr &= ~MSR_TS_MASK;
+ if (restore_sigcontext(current, NULL, 1, &uc->uc_mcontext))
+ goto badframe;
+ }
if (restore_altstack(&uc->uc_stack))
goto badframe;