Hi x86 maintainers,
This is an important fix that I believe needs to be merged for 4.21. Without it, applications calling fork() can potentially double-allocate a protection key, causing lots of strange problems.
Thomas's Reviewed-by is on the the actual fix, but not the selftest.
I would also be happy to send this as a pull request if you would prefer.
Cc: x86@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Peter Zijlstra peterz@infradead.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Will Deacon will.deacon@arm.com Cc: Andy Lutomirski luto@kernel.org Cc: Joerg Roedel jroedel@suse.de Cc: stable@vger.kernel.org
From: Dave Hansen dave.hansen@linux.intel.com
Memory protection key behavior should be the same in a child as it was in the parent before a fork. But, there is a bug that resets the state in the child at fork instead of preserving it.
Our creation of new mm's is a bit convoluted. At fork(), the code does:
1. memcpy() the parent mm to initialize child 2. mm_init() to initalize some select stuff stuff 3. dup_mmap() to create true copies that memcpy() did not do right.
For pkeys, we need to preserve two bits of state across a fork: 'execute_only_pkey' and 'pkey_allocation_map'. Those are preserved by the memcpy(), which I thought did the right thing. But, mm_init() calls init_new_context(), which I thought was *only* for execve()-time and overwrites 'execute_only_pkey' and 'pkey_allocation_map' with "new" values. But, alas, init_new_context() is used at execve() and fork().
The result is that, after a fork(), the child's pkey state ends up looking like it does after an execve(), which is totally wrong. pkeys that are already allocated can be allocated again, for instance.
To fix this, add code called by dup_mmap() to copy the pkey state from parent to child explicitly. Also add a comment above init_new_context() to make it more clear to the next poor sod what this code is used for.
Fixes: e8c24d3a23a ("x86/pkeys: Allocation/free syscalls") Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: x86@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Will Deacon will.deacon@arm.com Cc: Andy Lutomirski luto@kernel.org Cc: Joerg Roedel jroedel@suse.de Cc: stable@vger.kernel.org ---
b/arch/x86/include/asm/mmu_context.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff -puN arch/x86/include/asm/mmu_context.h~x86-pkeys-no-init-at-fork arch/x86/include/asm/mmu_context.h --- a/arch/x86/include/asm/mmu_context.h~x86-pkeys-no-init-at-fork 2019-01-02 13:53:53.217951966 -0800 +++ b/arch/x86/include/asm/mmu_context.h 2019-01-02 13:53:53.221951966 -0800 @@ -178,6 +178,10 @@ static inline void switch_ldt(struct mm_
void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk);
+/* + * Init a new mm. Used on mm copies, like at fork() + * and on mm's that are brand-new, like at execve(). + */ static inline int init_new_context(struct task_struct *tsk, struct mm_struct *mm) { @@ -228,8 +232,22 @@ do { \ } while (0) #endif
+static inline void arch_dup_pkeys(struct mm_struct *oldmm, + struct mm_struct *mm) +{ +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS + if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) + return; + + /* Duplicate the oldmm pkey state in mm: */ + mm->context.pkey_allocation_map = oldmm->context.pkey_allocation_map; + mm->context.execute_only_pkey = oldmm->context.execute_only_pkey; +#endif +} + static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) { + arch_dup_pkeys(oldmm, mm); paravirt_arch_dup_mmap(oldmm, mm); return ldt_dup_context(oldmm, mm); } _
From: Dave Hansen dave.hansen@linux.intel.com
There was a bug where the per-mm pkey state was not being preserved across fork() in the child. fork() is performed in the pkey selftests, but all of our pkey activity is performed in the parent. The child does not perform any actions sensitive to pkey state.
To make the test more sensitive to these kinds of bugs, add a fork() where we let the parent exit, and continue execution in the child.
This patch removes an early 'break;' on the first allocation failure, making the test sensitive to mis-allowed allocations after fork(). However, this means that the loop always runs to completion and we must remove the test ensuring the loop never completes.
Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: x86@kernel.org Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Peter Zijlstra peterz@infradead.org Cc: Michael Ellerman mpe@ellerman.id.au Cc: Will Deacon will.deacon@arm.com Cc: Andy Lutomirski luto@kernel.org Cc: Joerg Roedel jroedel@suse.de Cc: stable@vger.kernel.org ---
b/tools/testing/selftests/x86/protection_keys.c | 41 ++++++++++++++++++------ 1 file changed, 31 insertions(+), 10 deletions(-)
diff -puN tools/testing/selftests/x86/protection_keys.c~x86-pkeys-no-init-at-fork-selftests tools/testing/selftests/x86/protection_keys.c --- a/tools/testing/selftests/x86/protection_keys.c~x86-pkeys-no-init-at-fork-selftests 2019-01-02 13:53:53.721951964 -0800 +++ b/tools/testing/selftests/x86/protection_keys.c 2019-01-02 13:53:53.724951964 -0800 @@ -1133,6 +1133,21 @@ void test_pkey_syscalls_bad_args(int *pt pkey_assert(err); }
+void become_child(void) +{ + pid_t forkret; + + forkret = fork(); + pkey_assert(forkret >= 0); + dprintf3("[%d] fork() ret: %d\n", getpid(), forkret); + + if (!forkret) { + /* in the child */ + return; + } + exit(0); +} + /* Assumes that all pkeys other than 'pkey' are unallocated */ void test_pkey_alloc_exhaust(int *ptr, u16 pkey) { @@ -1141,7 +1156,7 @@ void test_pkey_alloc_exhaust(int *ptr, u int nr_allocated_pkeys = 0; int i;
- for (i = 0; i < NR_PKEYS*2; i++) { + for (i = 0; i < NR_PKEYS*3; i++) { int new_pkey; dprintf1("%s() alloc loop: %d\n", __func__, i); new_pkey = alloc_pkey(); @@ -1152,21 +1167,27 @@ void test_pkey_alloc_exhaust(int *ptr, u if ((new_pkey == -1) && (errno == ENOSPC)) { dprintf2("%s() failed to allocate pkey after %d tries\n", __func__, nr_allocated_pkeys); - break; + } else { + /* + * Ensure the number of successes never + * exceeds the number of keys supported + * in the hardware. + */ + pkey_assert(nr_allocated_pkeys < NR_PKEYS); + allocated_pkeys[nr_allocated_pkeys++] = new_pkey; } - pkey_assert(nr_allocated_pkeys < NR_PKEYS); - allocated_pkeys[nr_allocated_pkeys++] = new_pkey; + + /* + * Make sure that allocation state is properly + * preserved across fork(). + */ + if (i == NR_PKEYS*2) + become_child(); }
dprintf3("%s()::%d\n", __func__, __LINE__);
/* - * ensure it did not reach the end of the loop without - * failure: - */ - pkey_assert(i < NR_PKEYS*2); - - /* * There are 16 pkeys supported in hardware. Three are * allocated by the time we get here: * 1. The default key (0) _
linux-stable-mirror@lists.linaro.org