From: "Mike Rapoport (Microsoft)" rppt@kernel.org
Hi,
Following Peter's comments [1] these patches rework handling of ROX caches for module text allocations.
Instead of using a writable copy that really complicates alternatives patching, temporarily remap parts of a large ROX page as RW for the time of module formation and then restore it's ROX protections when the module is ready.
To keep the ROX memory mapped with large pages, make set_memory code capable of restoring large pages (more details are in patch 3).
The patches also available in git https://git.kernel.org/rppt/h/execmem/x86-rox/v8
[1] https://lore.kernel.org/all/20241209083818.GK8562@noisy.programming.kicks-as...
Kirill A. Shutemov (1): x86/mm/pat: Restore large pages after fragmentation
Mike Rapoport (Microsoft) (7): x86/mm/pat: cpa-test: fix length for CPA_ARRAY test x86/mm/pat: drop duplicate variable in cpa_flush() execmem: add API for temporal remapping as RW and restoring ROX afterwards module: introduce MODULE_STATE_GONE modules: switch to execmem API for remapping as RW and restoring ROX Revert "x86/module: prepare module loading for ROX allocations of text" module: drop unused module_writable_address()
arch/um/kernel/um_arch.c | 11 +- arch/x86/entry/vdso/vma.c | 3 +- arch/x86/include/asm/alternative.h | 14 +- arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/kernel/alternative.c | 181 ++++++--------- arch/x86/kernel/ftrace.c | 30 ++- arch/x86/kernel/module.c | 45 ++-- arch/x86/mm/pat/cpa-test.c | 2 +- arch/x86/mm/pat/set_memory.c | 216 +++++++++++++++++- include/linux/execmem.h | 31 +++ include/linux/module.h | 21 +- include/linux/moduleloader.h | 4 - include/linux/vm_event_item.h | 2 + kernel/module/kallsyms.c | 8 +- kernel/module/kdb.c | 2 +- kernel/module/main.c | 86 ++----- kernel/module/procfs.c | 2 +- kernel/module/strict_rwx.c | 9 +- kernel/tracepoint.c | 2 + lib/kunit/test.c | 2 + mm/execmem.c | 118 ++++++++-- mm/vmstat.c | 2 + samples/livepatch/livepatch-callbacks-demo.c | 1 + .../test_modules/test_klp_callbacks_demo.c | 1 + .../test_modules/test_klp_callbacks_demo2.c | 1 + .../livepatch/test_modules/test_klp_state.c | 1 + .../livepatch/test_modules/test_klp_state2.c | 1 + 27 files changed, 511 insertions(+), 287 deletions(-)
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
The CPA_ARRAY test always uses len[1] as numpages argument to change_page_attr_set() although the addresses array is different each iteration of the test loop.
Replace len[1] with len[i] to have numpages matching the addresses array.
Fixes: ecc729f1f471 ("x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests") Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- arch/x86/mm/pat/cpa-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/mm/pat/cpa-test.c b/arch/x86/mm/pat/cpa-test.c index 3d2f7f0a6ed1..ad3c1feec990 100644 --- a/arch/x86/mm/pat/cpa-test.c +++ b/arch/x86/mm/pat/cpa-test.c @@ -183,7 +183,7 @@ static int pageattr_test(void) break;
case 1: - err = change_page_attr_set(addrs, len[1], PAGE_CPA_TEST, 1); + err = change_page_attr_set(addrs, len[i], PAGE_CPA_TEST, 1); break;
case 2:
On Fri, Dec 27, 2024 at 09:28:18AM +0200, Mike Rapoport wrote:
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
The CPA_ARRAY test always uses len[1] as numpages argument to change_page_attr_set() although the addresses array is different each iteration of the test loop.
Replace len[1] with len[i] to have numpages matching the addresses array.
D'oh..
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
There is a 'struct cpa_data *data' parameter in cpa_flush() that is assigned to a local 'struct cpa_data *cpa' variable.
Rename the parameter from 'data' to 'cpa' and drop declaration of the local 'cpa' variable.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- arch/x86/mm/pat/set_memory.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index 95bc50a8541c..d43b919b11ae 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -396,9 +396,8 @@ static void __cpa_flush_tlb(void *data) flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i))); }
-static void cpa_flush(struct cpa_data *data, int cache) +static void cpa_flush(struct cpa_data *cpa, int cache) { - struct cpa_data *cpa = data; unsigned int i;
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
From: "Kirill A. Shutemov" kirill.shutemov@linux.intel.com
Change of attributes of the pages may lead to fragmentation of direct mapping over time and performance degradation as result.
With current code it's one way road: kernel tries to avoid splitting large pages, but it doesn't restore them back even if page attributes got compatible again.
Any change to the mapping may potentially allow to restore large page.
Hook up into cpa_flush() path to check if there's any pages to be recovered in PUD_SIZE range around pages we've just touched.
CPUs don't like[1] to have to have TLB entries of different size for the same memory, but looks like it's okay as long as these entries have matching attributes[2]. Therefore it's critical to flush TLB before any following changes to the mapping.
Note that we already allow for multiple TLB entries of different sizes for the same memory now in split_large_page() path. It's not a new situation.
set_memory_4k() provides a way to use 4k pages on purpose. Kernel must not remap such pages as large. Re-use one of software PTE bits to indicate such pages.
[1] See Erratum 383 of AMD Family 10h Processors [2] https://lore.kernel.org/linux-mm/1da1b025-cabc-6f04-bde5-e50830d1ecf0@amd.co...
[rppt@kernel.org: * s/restore/collapse/ * update formatting per peterz * use 'struct ptdesc' instead of 'struct page' for list of page tables to be freed * try to collapse PMD first and if it succeeds move on to PUD as peterz suggested * flush TLB twice: for changes done in the original CPA call and after collapsing of large pages ]
Link: https://lore.kernel.org/all/20200416213229.19174-1-kirill.shutemov@linux.int... Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Co-developed-by: Mike Rapoport (Microsoft) rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- arch/x86/include/asm/pgtable_types.h | 2 + arch/x86/mm/pat/set_memory.c | 213 ++++++++++++++++++++++++++- include/linux/vm_event_item.h | 2 + mm/vmstat.c | 2 + 4 files changed, 216 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 4b804531b03c..c90e9c51edb7 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -33,6 +33,7 @@ #define _PAGE_BIT_CPA_TEST _PAGE_BIT_SOFTW1 #define _PAGE_BIT_UFFD_WP _PAGE_BIT_SOFTW2 /* userfaultfd wrprotected */ #define _PAGE_BIT_SOFT_DIRTY _PAGE_BIT_SOFTW3 /* software dirty tracking */ +#define _PAGE_BIT_KERNEL_4K _PAGE_BIT_SOFTW3 /* page must not be converted to large */ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4
#ifdef CONFIG_X86_64 @@ -64,6 +65,7 @@ #define _PAGE_PAT_LARGE (_AT(pteval_t, 1) << _PAGE_BIT_PAT_LARGE) #define _PAGE_SPECIAL (_AT(pteval_t, 1) << _PAGE_BIT_SPECIAL) #define _PAGE_CPA_TEST (_AT(pteval_t, 1) << _PAGE_BIT_CPA_TEST) +#define _PAGE_KERNEL_4K (_AT(pteval_t, 1) << _PAGE_BIT_KERNEL_4K) #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS #define _PAGE_PKEY_BIT0 (_AT(pteval_t, 1) << _PAGE_BIT_PKEY_BIT0) #define _PAGE_PKEY_BIT1 (_AT(pteval_t, 1) << _PAGE_BIT_PKEY_BIT1) diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index d43b919b11ae..89cd8176c99f 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -107,6 +107,18 @@ static void split_page_count(int level) direct_pages_count[level - 1] += PTRS_PER_PTE; }
+static void collapse_page_count(int level) +{ + direct_pages_count[level]++; + if (system_state == SYSTEM_RUNNING) { + if (level == PG_LEVEL_2M) + count_vm_event(DIRECT_MAP_LEVEL2_COLLAPSE); + else if (level == PG_LEVEL_1G) + count_vm_event(DIRECT_MAP_LEVEL3_COLLAPSE); + } + direct_pages_count[level - 1] -= PTRS_PER_PTE; +} + void arch_report_meminfo(struct seq_file *m) { seq_printf(m, "DirectMap4k: %8lu kB\n", @@ -124,6 +136,7 @@ void arch_report_meminfo(struct seq_file *m) } #else static inline void split_page_count(int level) { } +static inline void collapse_page_count(int level) { } #endif
#ifdef CONFIG_X86_CPA_STATISTICS @@ -396,15 +409,50 @@ static void __cpa_flush_tlb(void *data) flush_tlb_one_kernel(fix_addr(__cpa_addr(cpa, i))); }
+static int collapse_large_pages(unsigned long addr, struct list_head *pgtables); + +static void cpa_collapse_large_pages(struct cpa_data *cpa) +{ + unsigned long start, addr, end; + struct ptdesc *ptdesc, *tmp; + LIST_HEAD(pgtables); + int collapsed = 0; + int i; + + if (cpa->flags & (CPA_PAGES_ARRAY | CPA_ARRAY)) { + for (i = 0; i < cpa->numpages; i++) + collapsed += collapse_large_pages(__cpa_addr(cpa, i), + &pgtables); + } else { + addr = __cpa_addr(cpa, 0); + start = addr & PMD_MASK; + end = addr + PAGE_SIZE * cpa->numpages; + + for (addr = start; within(addr, start, end); addr += PMD_SIZE) + collapsed += collapse_large_pages(addr, &pgtables); + } + + if (!collapsed) + return; + + flush_tlb_all(); + + list_for_each_entry_safe(ptdesc, tmp, &pgtables, pt_list) { + list_del(&ptdesc->pt_list); + __free_page(ptdesc_page(ptdesc)); + } +} + static void cpa_flush(struct cpa_data *cpa, int cache) { + LIST_HEAD(pgtables); unsigned int i;
BUG_ON(irqs_disabled() && !early_boot_irqs_disabled);
if (cache && !static_cpu_has(X86_FEATURE_CLFLUSH)) { cpa_flush_all(cache); - return; + goto collapse_large_pages; }
if (cpa->force_flush_all || cpa->numpages > tlb_single_page_flush_ceiling) @@ -413,7 +461,7 @@ static void cpa_flush(struct cpa_data *cpa, int cache) on_each_cpu(__cpa_flush_tlb, cpa, 1);
if (!cache) - return; + goto collapse_large_pages;
mb(); for (i = 0; i < cpa->numpages; i++) { @@ -429,6 +477,9 @@ static void cpa_flush(struct cpa_data *cpa, int cache) clflush_cache_range_opt((void *)fix_addr(addr), PAGE_SIZE); } mb(); + +collapse_large_pages: + cpa_collapse_large_pages(cpa); }
static bool overlaps(unsigned long r1_start, unsigned long r1_end, @@ -1198,6 +1249,161 @@ static int split_large_page(struct cpa_data *cpa, pte_t *kpte, return 0; }
+static int collapse_pmd_page(pmd_t *pmd, unsigned long addr, + struct list_head *pgtables) +{ + pmd_t _pmd, old_pmd; + pte_t *pte, first; + unsigned long pfn; + pgprot_t pgprot; + int i = 0; + + addr &= PMD_MASK; + pte = pte_offset_kernel(pmd, addr); + first = *pte; + pfn = pte_pfn(first); + + /* Make sure alignment is suitable */ + if (PFN_PHYS(pfn) & ~PMD_MASK) + return 0; + + /* The page is 4k intentionally */ + if (pte_flags(first) & _PAGE_KERNEL_4K) + return 0; + + /* Check that the rest of PTEs are compatible with the first one */ + for (i = 1, pte++; i < PTRS_PER_PTE; i++, pte++) { + pte_t entry = *pte; + + if (!pte_present(entry)) + return 0; + if (pte_flags(entry) != pte_flags(first)) + return 0; + if (pte_pfn(entry) != pte_pfn(first) + i) + return 0; + } + + old_pmd = *pmd; + + /* Success: set up a large page */ + pgprot = pgprot_4k_2_large(pte_pgprot(first)); + pgprot_val(pgprot) |= _PAGE_PSE; + _pmd = pfn_pmd(pfn, pgprot); + set_pmd(pmd, _pmd); + + /* Queue the page table to be freed after TLB flush */ + list_add(&page_ptdesc(pmd_page(old_pmd))->pt_list, pgtables); + + if (IS_ENABLED(CONFIG_X86_32) && !SHARED_KERNEL_PMD) { + struct page *page; + + /* Update all PGD tables to use the same large page */ + list_for_each_entry(page, &pgd_list, lru) { + pgd_t *pgd = (pgd_t *)page_address(page) + pgd_index(addr); + p4d_t *p4d = p4d_offset(pgd, addr); + pud_t *pud = pud_offset(p4d, addr); + pmd_t *pmd = pmd_offset(pud, addr); + /* Something is wrong if entries doesn't match */ + if (WARN_ON(pmd_val(old_pmd) != pmd_val(*pmd))) + continue; + set_pmd(pmd, _pmd); + } + } + + if (virt_addr_valid(addr) && pfn_range_is_mapped(pfn, pfn + 1)) + collapse_page_count(PG_LEVEL_2M); + + return 1; +} + +static int collapse_pud_page(pud_t *pud, unsigned long addr, + struct list_head *pgtables) +{ + unsigned long pfn; + pmd_t *pmd, first; + int i; + + if (!direct_gbpages) + return 0; + + addr &= PUD_MASK; + pmd = pmd_offset(pud, addr); + first = *pmd; + + /* + * To restore PUD page all PMD entries must be large and + * have suitable alignment + */ + pfn = pmd_pfn(first); + if (!pmd_leaf(first) || (PFN_PHYS(pfn) & ~PUD_MASK)) + return 0; + + /* + * To restore PUD page, all following PMDs must be compatible with the + * first one. + */ + for (i = 1, pmd++; i < PTRS_PER_PMD; i++, pmd++) { + pmd_t entry = *pmd; + + if (!pmd_present(entry) || !pmd_leaf(entry)) + return 0; + if (pmd_flags(entry) != pmd_flags(first)) + return 0; + if (pmd_pfn(entry) != pmd_pfn(first) + i * PTRS_PER_PTE) + return 0; + } + + /* Restore PUD page and queue page table to be freed after TLB flush */ + list_add(&page_ptdesc(pud_page(*pud))->pt_list, pgtables); + set_pud(pud, pfn_pud(pfn, pmd_pgprot(first))); + + if (virt_addr_valid(addr) && pfn_range_is_mapped(pfn, pfn + 1)) + collapse_page_count(PG_LEVEL_1G); + + return 1; +} + +/* + * Collapse PMD and PUD pages in the kernel mapping around the address where + * possible. + * + * Caller must flush TLB and free page tables queued on the list before + * touching the new entries. CPU must not see TLB entries of different size + * with different attributes. + */ +static int collapse_large_pages(unsigned long addr, struct list_head *pgtables) +{ + int collapsed = 0; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + addr &= PMD_MASK; + + spin_lock(&pgd_lock); + pgd = pgd_offset_k(addr); + if (pgd_none(*pgd)) + goto out; + p4d = p4d_offset(pgd, addr); + if (p4d_none(*p4d)) + goto out; + pud = pud_offset(p4d, addr); + if (!pud_present(*pud) || pud_leaf(*pud)) + goto out; + pmd = pmd_offset(pud, addr); + if (!pmd_present(*pmd) || pmd_leaf(*pmd)) + goto out; + + collapsed = collapse_pmd_page(pmd, addr, pgtables); + if (collapsed) + collapsed += collapse_pud_page(pud, addr, pgtables); + +out: + spin_unlock(&pgd_lock); + return collapsed; +} + static bool try_to_free_pte_page(pte_t *pte) { int i; @@ -2148,7 +2354,8 @@ int set_memory_p(unsigned long addr, int numpages)
int set_memory_4k(unsigned long addr, int numpages) { - return change_page_attr_set_clr(&addr, numpages, __pgprot(0), + return change_page_attr_set_clr(&addr, numpages, + __pgprot(_PAGE_KERNEL_4K), __pgprot(0), 1, 0, NULL); }
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index f70d0958095c..5a37cb2b6f93 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -151,6 +151,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, + DIRECT_MAP_LEVEL2_COLLAPSE, + DIRECT_MAP_LEVEL3_COLLAPSE, #endif #ifdef CONFIG_PER_VMA_LOCK_STATS VMA_LOCK_SUCCESS, diff --git a/mm/vmstat.c b/mm/vmstat.c index 4d016314a56c..d5e1d7f34f3c 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1435,6 +1435,8 @@ const char * const vmstat_text[] = { #ifdef CONFIG_X86 "direct_map_level2_splits", "direct_map_level3_splits", + "direct_map_level2_collapses", + "direct_map_level3_collapses", #endif #ifdef CONFIG_PER_VMA_LOCK_STATS "vma_lock_success",
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
Using a writable copy for ROX memory is cumbersome and error prone.
Add API that allow temporarily remapping of ranges in the ROX cache as writable and then restoring their read-only-execute permissions.
This API will be later used in modules code and will allow removing nasty games with writable copy in alternatives patching on x86.
The restoring of the ROX permissions relies on the ability of architecture to reconstruct large pages in its set_memory_rox() method.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- include/linux/execmem.h | 31 +++++++++++ mm/execmem.c | 118 +++++++++++++++++++++++++++++++++------- 2 files changed, 130 insertions(+), 19 deletions(-)
diff --git a/include/linux/execmem.h b/include/linux/execmem.h index 64130ae19690..65655a5d1be2 100644 --- a/include/linux/execmem.h +++ b/include/linux/execmem.h @@ -65,6 +65,37 @@ enum execmem_range_flags { * Architectures that use EXECMEM_ROX_CACHE must implement this. */ void execmem_fill_trapping_insns(void *ptr, size_t size, bool writable); + +/** + * execmem_make_temp_rw - temporarily remap region with read-write + * permissions + * @ptr: address of the region to remap + * @size: size of the region to remap + * + * Remaps a part of the cached large page in the ROX cache in the range + * [@ptr, @ptr + @size) as writable and not executable. The caller must + * have exclusive ownership of this range and ensure nothing will try to + * execute code in this range. + * + * Return: 0 on success or negative error code on failure. + */ +int execmem_make_temp_rw(void *ptr, size_t size); + +/** + * execmem_restore_rox - restore read-only-execute permissions + * @ptr: address of the region to remap + * @size: size of the region to remap + * + * Restores read-only-execute permissions on a range [@ptr, @ptr + @size) + * after it was temporarily remapped as writable. Relies on architecture + * implementation of set_memory_rox() to restore mapping using large pages. + * + * Return: 0 on success or negative error code on failure. + */ +int execmem_restore_rox(void *ptr, size_t size); +#else +static inline int execmem_make_temp_rw(void *ptr, size_t size) { return 0; } +static inline int execmem_restore_rox(void *ptr, size_t size) { return 0; } #endif
/** diff --git a/mm/execmem.c b/mm/execmem.c index 317b6a8d35be..be6b234c032e 100644 --- a/mm/execmem.c +++ b/mm/execmem.c @@ -89,6 +89,12 @@ static void *execmem_vmalloc(struct execmem_range *range, size_t size, #endif /* CONFIG_MMU */
#ifdef CONFIG_ARCH_HAS_EXECMEM_ROX +struct execmem_area { + struct vm_struct *vm; + unsigned int rw_mappings; + size_t size; +}; + struct execmem_cache { struct mutex mutex; struct maple_tree busy_areas; @@ -135,7 +141,7 @@ static void execmem_cache_clean(struct work_struct *work) struct maple_tree *free_areas = &execmem_cache.free_areas; struct mutex *mutex = &execmem_cache.mutex; MA_STATE(mas, free_areas, 0, ULONG_MAX); - void *area; + struct execmem_area *area;
mutex_lock(mutex); mas_for_each(&mas, area, ULONG_MAX) { @@ -143,11 +149,12 @@ static void execmem_cache_clean(struct work_struct *work)
if (IS_ALIGNED(size, PMD_SIZE) && IS_ALIGNED(mas.index, PMD_SIZE)) { - struct vm_struct *vm = find_vm_area(area); + struct vm_struct *vm = area->vm;
execmem_set_direct_map_valid(vm, true); mas_store_gfp(&mas, NULL, GFP_KERNEL); - vfree(area); + vfree(vm->addr); + kfree(area); } } mutex_unlock(mutex); @@ -155,30 +162,31 @@ static void execmem_cache_clean(struct work_struct *work)
static DECLARE_WORK(execmem_cache_clean_work, execmem_cache_clean);
-static int execmem_cache_add(void *ptr, size_t size) +static int execmem_cache_add(void *ptr, size_t size, struct execmem_area *area) { struct maple_tree *free_areas = &execmem_cache.free_areas; struct mutex *mutex = &execmem_cache.mutex; unsigned long addr = (unsigned long)ptr; MA_STATE(mas, free_areas, addr - 1, addr + 1); + struct execmem_area *lower_area = NULL; + struct execmem_area *upper_area = NULL; unsigned long lower, upper; - void *area = NULL; int err;
lower = addr; upper = addr + size - 1;
mutex_lock(mutex); - area = mas_walk(&mas); - if (area && mas.last == addr - 1) + lower_area = mas_walk(&mas); + if (lower_area && lower_area == area && mas.last == addr - 1) lower = mas.index;
- area = mas_next(&mas, ULONG_MAX); - if (area && mas.index == addr + size) + upper_area = mas_next(&mas, ULONG_MAX); + if (upper_area && upper_area == area && mas.index == addr + size) upper = mas.last;
mas_set_range(&mas, lower, upper); - err = mas_store_gfp(&mas, (void *)lower, GFP_KERNEL); + err = mas_store_gfp(&mas, area, GFP_KERNEL); mutex_unlock(mutex); if (err) return err; @@ -209,7 +217,8 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size) MA_STATE(mas_busy, busy_areas, 0, ULONG_MAX); struct mutex *mutex = &execmem_cache.mutex; unsigned long addr, last, area_size = 0; - void *area, *ptr = NULL; + struct execmem_area *area; + void *ptr = NULL; int err;
mutex_lock(mutex); @@ -228,20 +237,18 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size)
/* insert allocated size to busy_areas at range [addr, addr + size) */ mas_set_range(&mas_busy, addr, addr + size - 1); - err = mas_store_gfp(&mas_busy, (void *)addr, GFP_KERNEL); + err = mas_store_gfp(&mas_busy, area, GFP_KERNEL); if (err) goto out_unlock;
mas_store_gfp(&mas_free, NULL, GFP_KERNEL); if (area_size > size) { - void *ptr = (void *)(addr + size); - /* * re-insert remaining free size to free_areas at range * [addr + size, last] */ mas_set_range(&mas_free, addr + size, last); - err = mas_store_gfp(&mas_free, ptr, GFP_KERNEL); + err = mas_store_gfp(&mas_free, area, GFP_KERNEL); if (err) { mas_store_gfp(&mas_busy, NULL, GFP_KERNEL); goto out_unlock; @@ -257,16 +264,21 @@ static void *__execmem_cache_alloc(struct execmem_range *range, size_t size) static int execmem_cache_populate(struct execmem_range *range, size_t size) { unsigned long vm_flags = VM_ALLOW_HUGE_VMAP; + struct execmem_area *area; unsigned long start, end; struct vm_struct *vm; size_t alloc_size; int err = -ENOMEM; void *p;
+ area = kzalloc(sizeof(*area), GFP_KERNEL); + if (!area) + return err; + alloc_size = round_up(size, PMD_SIZE); p = execmem_vmalloc(range, alloc_size, PAGE_KERNEL, vm_flags); if (!p) - return err; + goto err_free_area;
vm = find_vm_area(p); if (!vm) @@ -289,7 +301,9 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size) if (err) goto err_free_mem;
- err = execmem_cache_add(p, alloc_size); + area->size = alloc_size; + area->vm = vm; + err = execmem_cache_add(p, alloc_size, area); if (err) goto err_free_mem;
@@ -297,6 +311,8 @@ static int execmem_cache_populate(struct execmem_range *range, size_t size)
err_free_mem: vfree(p); +err_free_area: + kfree(area); return err; }
@@ -305,6 +321,9 @@ static void *execmem_cache_alloc(struct execmem_range *range, size_t size) void *p; int err;
+ /* make sure everything in the cache is page aligned */ + size = PAGE_ALIGN(size); + p = __execmem_cache_alloc(range, size); if (p) return p; @@ -322,8 +341,8 @@ static bool execmem_cache_free(void *ptr) struct mutex *mutex = &execmem_cache.mutex; unsigned long addr = (unsigned long)ptr; MA_STATE(mas, busy_areas, addr, addr); + struct execmem_area *area; size_t size; - void *area;
mutex_lock(mutex); area = mas_walk(&mas); @@ -338,12 +357,73 @@ static bool execmem_cache_free(void *ptr)
execmem_fill_trapping_insns(ptr, size, /* writable = */ false);
- execmem_cache_add(ptr, size); + execmem_cache_add(ptr, size, area);
schedule_work(&execmem_cache_clean_work);
return true; } + +int execmem_make_temp_rw(void *ptr, size_t size) +{ + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + unsigned int nr = PAGE_ALIGN(size) >> PAGE_SHIFT; + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr = (unsigned long)ptr; + MA_STATE(mas, busy_areas, addr, addr); + struct execmem_area *area; + int ret = -ENOMEM; + + mutex_lock(mutex); + area = mas_walk(&mas); + if (!area) + goto out; + + ret = set_memory_nx(addr, nr); + if (ret) + goto out; + + /* + * If a split of large page was required, it already happened when + * we marked the pages NX which guarantees that this call won't + * fail + */ + set_memory_rw(addr, nr); + area->rw_mappings++; + +out: + mutex_unlock(mutex); + return ret; +} + +int execmem_restore_rox(void *ptr, size_t size) +{ + struct maple_tree *busy_areas = &execmem_cache.busy_areas; + struct mutex *mutex = &execmem_cache.mutex; + unsigned long addr = (unsigned long)ptr; + MA_STATE(mas, busy_areas, addr, addr); + struct execmem_area *area; + int err = 0; + + size = PAGE_ALIGN(size); + + mutex_lock(mutex); + mas_for_each(&mas, area, addr + size - 1) { + area->rw_mappings--; + if (!area->rw_mappings) { + unsigned int nr = area->size >> PAGE_SHIFT; + + addr = (unsigned long)area->vm->addr; + err = set_memory_rox(addr, nr); + if (err) + break; + } + } + mutex_unlock(mutex); + + return err; +} + #else /* CONFIG_ARCH_HAS_EXECMEM_ROX */ static void *execmem_cache_alloc(struct execmem_range *range, size_t size) {
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
In order to use execmem's API for temporal remapping of the memory allocated from ROX cache as writable, there is a need to distinguish between the state when the module is being formed and the state when it is deconstructed and freed so that when module_memory_free() is called from error paths during module loading it could restore ROX mappings.
Replace open coded checks for MODULE_STATE_UNFORMED with a helper function module_is_formed() and add a new MODULE_STATE_GONE that will be set when the module is deconstructed and freed.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- include/linux/module.h | 6 ++++++ kernel/module/kallsyms.c | 8 ++++---- kernel/module/kdb.c | 2 +- kernel/module/main.c | 19 +++++++++---------- kernel/module/procfs.c | 2 +- kernel/tracepoint.c | 2 ++ lib/kunit/test.c | 2 ++ samples/livepatch/livepatch-callbacks-demo.c | 1 + .../test_modules/test_klp_callbacks_demo.c | 1 + .../test_modules/test_klp_callbacks_demo2.c | 1 + .../livepatch/test_modules/test_klp_state.c | 1 + .../livepatch/test_modules/test_klp_state2.c | 1 + 12 files changed, 30 insertions(+), 16 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h index 94acbacdcdf1..bd8cf93d32c8 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -320,6 +320,7 @@ enum module_state { MODULE_STATE_COMING, /* Full formed, running module_init. */ MODULE_STATE_GOING, /* Going away. */ MODULE_STATE_UNFORMED, /* Still setting it up. */ + MODULE_STATE_GONE, /* Deconstructing and freeing. */ };
struct mod_tree_node { @@ -620,6 +621,11 @@ static inline bool module_is_coming(struct module *mod) return mod->state == MODULE_STATE_COMING; }
+static inline bool module_is_formed(struct module *mod) +{ + return mod->state < MODULE_STATE_UNFORMED; +} + struct module *__module_text_address(unsigned long addr); struct module *__module_address(unsigned long addr); bool is_module_address(unsigned long addr); diff --git a/kernel/module/kallsyms.c b/kernel/module/kallsyms.c index bf65e0c3c86f..daf9a9b3740f 100644 --- a/kernel/module/kallsyms.c +++ b/kernel/module/kallsyms.c @@ -361,7 +361,7 @@ int lookup_module_symbol_name(unsigned long addr, char *symname)
preempt_disable(); list_for_each_entry_rcu(mod, &modules, list) { - if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue; if (within_module(addr, mod)) { const char *sym; @@ -389,7 +389,7 @@ int module_get_kallsym(unsigned int symnum, unsigned long *value, char *type, list_for_each_entry_rcu(mod, &modules, list) { struct mod_kallsyms *kallsyms;
- if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue; kallsyms = rcu_dereference_sched(mod->kallsyms); if (symnum < kallsyms->num_symtab) { @@ -441,7 +441,7 @@ static unsigned long __module_kallsyms_lookup_name(const char *name) list_for_each_entry_rcu(mod, &modules, list) { unsigned long ret;
- if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue; ret = __find_kallsyms_symbol_value(mod, name); if (ret) @@ -484,7 +484,7 @@ int module_kallsyms_on_each_symbol(const char *modname, list_for_each_entry(mod, &modules, list) { struct mod_kallsyms *kallsyms;
- if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue;
if (modname && strcmp(modname, mod->name)) diff --git a/kernel/module/kdb.c b/kernel/module/kdb.c index 995c32d3698f..14f14700ffc2 100644 --- a/kernel/module/kdb.c +++ b/kernel/module/kdb.c @@ -23,7 +23,7 @@ int kdb_lsmod(int argc, const char **argv)
kdb_printf("Module Size modstruct Used by\n"); list_for_each_entry(mod, &modules, list) { - if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue;
kdb_printf("%-20s%8u", mod->name, mod->mem[MOD_TEXT].size); diff --git a/kernel/module/main.c b/kernel/module/main.c index 5399c182b3cb..ad8ef20c120f 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -153,7 +153,7 @@ EXPORT_SYMBOL(unregister_module_notifier); */ static inline int strong_try_module_get(struct module *mod) { - BUG_ON(mod && mod->state == MODULE_STATE_UNFORMED); + BUG_ON(mod && !module_is_formed(mod)); if (mod && mod->state == MODULE_STATE_COMING) return -EBUSY; if (try_module_get(mod)) @@ -361,7 +361,7 @@ bool find_symbol(struct find_symbol_arg *fsa) GPL_ONLY }, };
- if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue;
for (i = 0; i < ARRAY_SIZE(arr); i++) @@ -386,7 +386,7 @@ struct module *find_module_all(const char *name, size_t len,
list_for_each_entry_rcu(mod, &modules, list, lockdep_is_held(&module_mutex)) { - if (!even_unformed && mod->state == MODULE_STATE_UNFORMED) + if (!even_unformed && !module_is_formed(mod)) continue; if (strlen(mod->name) == len && !memcmp(mod->name, name, len)) return mod; @@ -457,7 +457,7 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr) preempt_disable();
list_for_each_entry_rcu(mod, &modules, list) { - if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue; if (!mod->percpu_size) continue; @@ -1326,7 +1326,7 @@ static void free_module(struct module *mod) * that noone uses it while it's being deconstructed. */ mutex_lock(&module_mutex); - mod->state = MODULE_STATE_UNFORMED; + mod->state = MODULE_STATE_GONE; mutex_unlock(&module_mutex);
/* Arch-specific cleanup. */ @@ -3048,8 +3048,7 @@ static int module_patient_check_exists(const char *name, if (old == NULL) return 0;
- if (old->state == MODULE_STATE_COMING || - old->state == MODULE_STATE_UNFORMED) { + if (old->state == MODULE_STATE_COMING || !module_is_formed(old)) { /* Wait in case it fails to load. */ mutex_unlock(&module_mutex); err = wait_event_interruptible(module_wq, @@ -3608,7 +3607,7 @@ char *module_flags(struct module *mod, char *buf, bool show_state) { int bx = 0;
- BUG_ON(mod->state == MODULE_STATE_UNFORMED); + BUG_ON(!module_is_formed(mod)); if (!mod->taints && !show_state) goto out; if (mod->taints || @@ -3702,7 +3701,7 @@ struct module *__module_address(unsigned long addr) mod = mod_find(addr, &mod_tree); if (mod) { BUG_ON(!within_module(addr, mod)); - if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) mod = NULL; } return mod; @@ -3756,7 +3755,7 @@ void print_modules(void) /* Most callers should already have preempt disabled, but make sure */ preempt_disable(); list_for_each_entry_rcu(mod, &modules, list) { - if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) continue; pr_cont(" %s%s", mod->name, module_flags(mod, buf, true)); } diff --git a/kernel/module/procfs.c b/kernel/module/procfs.c index 0a4841e88adb..2c617e6f8bc0 100644 --- a/kernel/module/procfs.c +++ b/kernel/module/procfs.c @@ -79,7 +79,7 @@ static int m_show(struct seq_file *m, void *p) unsigned int size;
/* We always ignore unformed modules. */ - if (mod->state == MODULE_STATE_UNFORMED) + if (!module_is_formed(mod)) return 0;
size = module_total_size(mod); diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 1848ce7e2976..e94247afb2c6 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -668,6 +668,8 @@ static int tracepoint_module_notify(struct notifier_block *self, break; case MODULE_STATE_UNFORMED: break; + case MODULE_STATE_GONE: + break; } return notifier_from_errno(ret); } diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 089c832e3cdb..54eaed92a2d3 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -836,6 +836,8 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val, break; case MODULE_STATE_UNFORMED: break; + case MODULE_STATE_GONE: + break; }
return 0; diff --git a/samples/livepatch/livepatch-callbacks-demo.c b/samples/livepatch/livepatch-callbacks-demo.c index 11c3f4357812..324bddaef9a6 100644 --- a/samples/livepatch/livepatch-callbacks-demo.c +++ b/samples/livepatch/livepatch-callbacks-demo.c @@ -93,6 +93,7 @@ static const char *const module_state[] = { [MODULE_STATE_COMING] = "[MODULE_STATE_COMING] Full formed, running module_init", [MODULE_STATE_GOING] = "[MODULE_STATE_GOING] Going away", [MODULE_STATE_UNFORMED] = "[MODULE_STATE_UNFORMED] Still setting it up", + [MODULE_STATE_GONE] = "[MODULE_STATE_GONE] Deconstructing and freeing", };
static void callback_info(const char *callback, struct klp_object *obj) diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c index 3fd8fe1cd1cc..8435e3254f85 100644 --- a/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo.c @@ -16,6 +16,7 @@ static const char *const module_state[] = { [MODULE_STATE_COMING] = "[MODULE_STATE_COMING] Full formed, running module_init", [MODULE_STATE_GOING] = "[MODULE_STATE_GOING] Going away", [MODULE_STATE_UNFORMED] = "[MODULE_STATE_UNFORMED] Still setting it up", + [MODULE_STATE_GONE] = "[MODULE_STATE_GONE] Deconstructing and freeing", };
static void callback_info(const char *callback, struct klp_object *obj) diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c index 5417573e80af..78c1fff5d977 100644 --- a/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_callbacks_demo2.c @@ -16,6 +16,7 @@ static const char *const module_state[] = { [MODULE_STATE_COMING] = "[MODULE_STATE_COMING] Full formed, running module_init", [MODULE_STATE_GOING] = "[MODULE_STATE_GOING] Going away", [MODULE_STATE_UNFORMED] = "[MODULE_STATE_UNFORMED] Still setting it up", + [MODULE_STATE_GONE] = "[MODULE_STATE_GONE] Deconstructing and freeing", };
static void callback_info(const char *callback, struct klp_object *obj) diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_state.c b/tools/testing/selftests/livepatch/test_modules/test_klp_state.c index 57a4253acb01..bdebf1d24c98 100644 --- a/tools/testing/selftests/livepatch/test_modules/test_klp_state.c +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_state.c @@ -18,6 +18,7 @@ static const char *const module_state[] = { [MODULE_STATE_COMING] = "[MODULE_STATE_COMING] Full formed, running module_init", [MODULE_STATE_GOING] = "[MODULE_STATE_GOING] Going away", [MODULE_STATE_UNFORMED] = "[MODULE_STATE_UNFORMED] Still setting it up", + [MODULE_STATE_GONE] = "[MODULE_STATE_GONE] Deconstructing and freeing", };
static void callback_info(const char *callback, struct klp_object *obj) diff --git a/tools/testing/selftests/livepatch/test_modules/test_klp_state2.c b/tools/testing/selftests/livepatch/test_modules/test_klp_state2.c index c978ea4d5e67..1a55f84a8eb3 100644 --- a/tools/testing/selftests/livepatch/test_modules/test_klp_state2.c +++ b/tools/testing/selftests/livepatch/test_modules/test_klp_state2.c @@ -18,6 +18,7 @@ static const char *const module_state[] = { [MODULE_STATE_COMING] = "[MODULE_STATE_COMING] Full formed, running module_init", [MODULE_STATE_GOING] = "[MODULE_STATE_GOING] Going away", [MODULE_STATE_UNFORMED] = "[MODULE_STATE_UNFORMED] Still setting it up", + [MODULE_STATE_GONE] = "[MODULE_STATE_GONE] Deconstructing and freeing", };
static void callback_info(const char *callback, struct klp_object *obj)
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
Instead of using writable copy for module text sections, temporarily remap the memory allocated from execmem's ROX cache as writable and restore its ROX permissions after the module is formed.
This will allow removing nasty games with writable copy in alternatives patching on x86.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- include/linux/module.h | 7 +--- include/linux/moduleloader.h | 4 --- kernel/module/main.c | 67 ++++++------------------------------ kernel/module/strict_rwx.c | 9 ++--- 4 files changed, 17 insertions(+), 70 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h index bd8cf93d32c8..e9fc9d1fa476 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -368,7 +368,6 @@ enum mod_mem_type {
struct module_memory { void *base; - void *rw_copy; bool is_rox; unsigned int size;
@@ -775,13 +774,9 @@ static inline bool is_livepatch_module(struct module *mod)
void set_module_sig_enforced(void);
-void *__module_writable_address(struct module *mod, void *loc); - static inline void *module_writable_address(struct module *mod, void *loc) { - if (!IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX) || !mod) - return loc; - return __module_writable_address(mod, loc); + return loc; }
#else /* !CONFIG_MODULES... */ diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 1f5507ba5a12..e395461d59e5 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -108,10 +108,6 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod);
-int module_post_finalize(const Elf_Ehdr *hdr, - const Elf_Shdr *sechdrs, - struct module *mod); - #ifdef CONFIG_MODULES void flush_module_init_free_work(void); #else diff --git a/kernel/module/main.c b/kernel/module/main.c index ad8ef20c120f..ee6b46e753a0 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1221,18 +1221,6 @@ void __weak module_arch_freeing_init(struct module *mod) { }
-void *__module_writable_address(struct module *mod, void *loc) -{ - for_class_mod_mem_type(type, text) { - struct module_memory *mem = &mod->mem[type]; - - if (loc >= mem->base && loc < mem->base + mem->size) - return loc + (mem->rw_copy - mem->base); - } - - return loc; -} - static int module_memory_alloc(struct module *mod, enum mod_mem_type type) { unsigned int size = PAGE_ALIGN(mod->mem[type].size); @@ -1250,21 +1238,15 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type) if (!ptr) return -ENOMEM;
- mod->mem[type].base = ptr; - if (execmem_is_rox(execmem_type)) { - ptr = vzalloc(size); + int err = execmem_make_temp_rw(ptr, size);
- if (!ptr) { - execmem_free(mod->mem[type].base); + if (err) { + execmem_free(ptr); return -ENOMEM; }
- mod->mem[type].rw_copy = ptr; mod->mem[type].is_rox = true; - } else { - mod->mem[type].rw_copy = mod->mem[type].base; - memset(mod->mem[type].base, 0, size); }
/* @@ -1280,6 +1262,9 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type) */ kmemleak_not_leak(ptr);
+ memset(ptr, 0, size); + mod->mem[type].base = ptr; + return 0; }
@@ -1287,8 +1272,8 @@ static void module_memory_free(struct module *mod, enum mod_mem_type type) { struct module_memory *mem = &mod->mem[type];
- if (mem->is_rox) - vfree(mem->rw_copy); + if (mod->state == MODULE_STATE_UNFORMED && mem->is_rox) + execmem_restore_rox(mem->base, mem->size);
execmem_free(mem->base); } @@ -2561,7 +2546,6 @@ static int move_module(struct module *mod, struct load_info *info) for_each_mod_mem_type(type) { if (!mod->mem[type].size) { mod->mem[type].base = NULL; - mod->mem[type].rw_copy = NULL; continue; }
@@ -2578,7 +2562,6 @@ static int move_module(struct module *mod, struct load_info *info) void *dest; Elf_Shdr *shdr = &info->sechdrs[i]; const char *sname; - unsigned long addr;
if (!(shdr->sh_flags & SHF_ALLOC)) continue; @@ -2599,14 +2582,12 @@ static int move_module(struct module *mod, struct load_info *info) ret = PTR_ERR(dest); goto out_err; } - addr = (unsigned long)dest; codetag_section_found = true; } else { enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT; unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
- addr = (unsigned long)mod->mem[type].base + offset; - dest = mod->mem[type].rw_copy + offset; + dest = mod->mem[type].base + offset; }
if (shdr->sh_type != SHT_NOBITS) { @@ -2629,7 +2610,7 @@ static int move_module(struct module *mod, struct load_info *info) * users of info can keep taking advantage and using the newly * minted official memory area. */ - shdr->sh_addr = addr; + shdr->sh_addr = (unsigned long)dest; pr_debug("\t0x%lx 0x%.8lx %s\n", (long)shdr->sh_addr, (long)shdr->sh_size, info->secstrings + shdr->sh_name); } @@ -2782,17 +2763,8 @@ int __weak module_finalize(const Elf_Ehdr *hdr, return 0; }
-int __weak module_post_finalize(const Elf_Ehdr *hdr, - const Elf_Shdr *sechdrs, - struct module *me) -{ - return 0; -} - static int post_relocation(struct module *mod, const struct load_info *info) { - int ret; - /* Sort exception table now relocations are done. */ sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -2804,24 +2776,7 @@ static int post_relocation(struct module *mod, const struct load_info *info) add_kallsyms(mod, info);
/* Arch-specific module finalizing. */ - ret = module_finalize(info->hdr, info->sechdrs, mod); - if (ret) - return ret; - - for_each_mod_mem_type(type) { - struct module_memory *mem = &mod->mem[type]; - - if (mem->is_rox) { - if (!execmem_update_copy(mem->base, mem->rw_copy, - mem->size)) - return -ENOMEM; - - vfree(mem->rw_copy); - mem->rw_copy = NULL; - } - } - - return module_post_finalize(info->hdr, info->sechdrs, mod); + return module_finalize(info->hdr, info->sechdrs, mod); }
/* Call module constructors. */ diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index 239e5013359d..ce47b6346f27 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -9,6 +9,7 @@ #include <linux/mm.h> #include <linux/vmalloc.h> #include <linux/set_memory.h> +#include <linux/execmem.h> #include "internal.h"
static int module_set_memory(const struct module *mod, enum mod_mem_type type, @@ -32,12 +33,12 @@ static int module_set_memory(const struct module *mod, enum mod_mem_type type, int module_enable_text_rox(const struct module *mod) { for_class_mod_mem_type(type, text) { + const struct module_memory *mem = &mod->mem[type]; int ret;
- if (mod->mem[type].is_rox) - continue; - - if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) + if (mem->is_rox) + ret = execmem_restore_rox(mem->base, mem->size); + else if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) ret = module_set_memory(mod, type, set_memory_rox); else ret = module_set_memory(mod, type, set_memory_x);
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
It seems to be having an unhandled kernel page fault on exec, so I guess not actually making it exec somehow?
It is a pretty standard intel machine (arch linux config, slightly scaled down via make localmodconfig to make it easier to quickly rebuild kernels). Can give more details if needed.
Here is the (vaguely annotated-ish) splat:
:: running early hook [udev] Starting systemd-udevd version 257.1-1-arch :: running hook [udev] :: Triggering uevents... [ 4.231005] usb 1-11: new high-speed USB device number 3 using xhci_hcd [ 4.261646] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Guessing the execmem didn't make it exec somehow?
[ 4.269136] BUG: unable to handle page fault for address: ffffffffc0641010 [ 4.276009] #PF: supervisor instruction fetch in kernel mode [ 4.281666] #PF: error_code(0x0011) - permissions violation [ 4.287241] PGD 3027067 P4D 3027067 PUD 3029067 PMD 12a29e063 PTE 8000000129441163 [ 4.294818] Oops: Oops: 0011 [#1] PREEMPT SMP NOPTI [ 4.299695] CPU: 31 UID: 0 PID: 331 Comm: (udev-worker) Tainted: G W 6.13.0-rc4-1-custom #1 5815a74228bcc318a2a967d947d4ca4aa132a5ed [ 4.312992] Tainted: [W]=WARN [ 4.315961] Hardware name: Gigabyte Technology Co., Ltd. Z790 AORUS ELITE X WIFI7/Z790 AORUS ELITE X WIFI7, BIOS F7 09/27/2024 [ 4.327349] RIP: 0010:crc32c_intel_mod_init+0x0/0xff0 [crc32c_intel]
My addr2line suggests:
/home/lorenzo/kerndev/kernels/mm/./arch/x86/include/asm/processor.h:43
DECLARE_PER_CPU_FIRST(struct fixed_percpu_data, fixed_percpu_data) __visible;
Which seems... odd :)
[ 4.333702] Code: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 0f 1e fa 0f 1f 44 00 00 48 c7 c7 00 e0 8a c0 e8 1b aa a8 c0 48 [ 4.352466] RSP: 0018:ffffc90004223a58 EFLAGS: 00010246 [ 4.357693] RAX: 0000000000000000 RBX: ffffffffc0641010 RCX: 0000000000000000 [ 4.364827] RDX: 0000000000000000 RSI: ffffffffc0641010 RDI: ffffc90004223a40 [ 4.371963] RBP: 0000000000000000 R08: 0000000000000020 R09: ffff88811adeeb00 [ 4.379097] R10: ffffc90004223ad0 R11: 0000000000000000 R12: ffff8881278c3600 [ 4.385530] usb 1-11: New USB device found, idVendor=05e3, idProduct=0608, bcdDevice=60.90 [ 4.386230] R13: ffffc90004223a60 R14: ffff88811ae3ac00 R15: ffff88811f251b80 [ 4.394497] usb 1-11: New USB device strings: Mfr=0, Product=1, SerialNumber=0 [ 4.401628] FS: 00007f92de124880(0000) GS:ffff88afffd80000(0000) knlGS:0000000000000000 [ 4.401630] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4.408850] usb 1-11: Product: USB2.0 Hub [ 4.416938] CR2: ffffffffc0641010 CR3: 00000001279f6006 CR4: 0000000000f70ef0 [ 4.416939] PKRU: 55555554 [ 4.423588] hub 1-11:1.0: USB hub found [ 4.426696] Call Trace: [ 4.426697] <TASK> [ 4.426699] ? __die_body.cold+0x19/0x27 [ 4.434112] hub 1-11:1.0: 4 ports detected [ 4.436547] ? __pfx_crc32c_intel_mod_init+0x10/0x10 [crc32c_intel 0c38f5da3bf5b6f9ab94fb009bb43215a1542e3a] [ 4.462776] ? page_fault_oops+0x15a/0x2d0 [ 4.466874] ? disable_mmiotrace.cold+0x6a/0xaa [ 4.471404] ? __pfx_crc32c_intel_mod_init+0x10/0x10 [crc32c_intel 0c38f5da3bf5b6f9ab94fb009bb43215a1542e3a] [ 4.481229] ? exc_page_fault+0x18a/0x190 [ 4.485240] ? asm_exc_page_fault+0x26/0x30 [ 4.489422] ? __pfx_crc32c_intel_mod_init+0x10/0x10 [crc32c_intel 0c38f5da3bf5b6f9ab94fb009bb43215a1542e3a] [ 4.499248] ? __pfx_crc32c_intel_mod_init+0x10/0x10 [crc32c_intel 0c38f5da3bf5b6f9ab94fb009bb43215a1542e3a] [ 4.509073] ? __pfx_crc32c_intel_mod_init+0x10/0x10 [crc32c_intel 0c38f5da3bf5b6f9ab94fb009bb43215a1542e3a] [ 4.518900] do_one_initcall+0x58/0x310 [ 4.522736] do_init_module+0x82/0x270 [ 4.526483] init_module_from_file+0x86/0xc0 [ 4.530756] idempotent_init_module+0x115/0x310 [ 4.535286] __x64_sys_finit_module+0x65/0xc0 [ 4.539642] do_syscall_64+0x82/0x190 [ 4.543305] ? do_syscall_64+0x8e/0x190 [ 4.547142] ? filemap_map_pages+0x51f/0x670 [ 4.551412] ? do_filp_open+0xd8/0x190 [ 4.555163] ? __pfx_page_put_link+0x10/0x10 [ 4.559433] ? do_sys_openat2+0x9c/0xe0 [ 4.563268] ? syscall_exit_to_user_mode+0x37/0x1c0 [ 4.568147] ? syscall_exit_to_user_mode+0x37/0x1c0 [ 4.573024] ? do_syscall_64+0x8e/0x190 [ 4.576861] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 4.581003] usb 1-12: new high-speed USB device number 4 using xhci_hcd [ 4.581914] RIP: 0033:0x7f92de91b1fd [ 4.592102] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 fa 0c 00 f7 d8 64 89 01 48 [ 4.610867] RSP: 002b:00007ffcf33fa0b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 [ 4.618434] RAX: ffffffffffffffda RBX: 0000555fbf68a870 RCX: 00007f92de91b1fd [ 4.625570] RDX: 0000000000000000 RSI: 00007f92de11e05d RDI: 0000000000000014 [ 4.632704] RBP: 00007ffcf33fa170 R08: 0000000000000002 R09: 00007ffcf33fa120 [ 4.639839] R10: 0000000000000007 R11: 0000000000000246 R12: 00007f92de11e05d [ 4.646973] R13: 0000000000020000 R14: 0000555fbf682a70 R15: 0000555fbf682fd0 [ 4.654109] </TASK> [ 4.656296] Modules linked in: crc32c_intel(+) spi_intel_pci drm_display_helper spi_intel nvme_auth cec video wmi
If I revert _just this_ commit it breaks the boot by not being able to find the disk (!) so guess you need to revert everything after it too which kinda makes sense :)
On Fri, Dec 27, 2024 at 09:28:23AM +0200, Mike Rapoport wrote:
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
Instead of using writable copy for module text sections, temporarily remap the memory allocated from execmem's ROX cache as writable and restore its ROX permissions after the module is formed.
This will allow removing nasty games with writable copy in alternatives patching on x86.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org
include/linux/module.h | 7 +--- include/linux/moduleloader.h | 4 --- kernel/module/main.c | 67 ++++++------------------------------ kernel/module/strict_rwx.c | 9 ++--- 4 files changed, 17 insertions(+), 70 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h index bd8cf93d32c8..e9fc9d1fa476 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -368,7 +368,6 @@ enum mod_mem_type {
struct module_memory { void *base;
- void *rw_copy; bool is_rox; unsigned int size;
@@ -775,13 +774,9 @@ static inline bool is_livepatch_module(struct module *mod)
void set_module_sig_enforced(void);
-void *__module_writable_address(struct module *mod, void *loc);
static inline void *module_writable_address(struct module *mod, void *loc) {
- if (!IS_ENABLED(CONFIG_ARCH_HAS_EXECMEM_ROX) || !mod)
return loc;
- return __module_writable_address(mod, loc);
- return loc;
}
#else /* !CONFIG_MODULES... */ diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 1f5507ba5a12..e395461d59e5 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -108,10 +108,6 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod);
-int module_post_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *mod);
#ifdef CONFIG_MODULES void flush_module_init_free_work(void); #else diff --git a/kernel/module/main.c b/kernel/module/main.c index ad8ef20c120f..ee6b46e753a0 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1221,18 +1221,6 @@ void __weak module_arch_freeing_init(struct module *mod) { }
-void *__module_writable_address(struct module *mod, void *loc) -{
- for_class_mod_mem_type(type, text) {
struct module_memory *mem = &mod->mem[type];
if (loc >= mem->base && loc < mem->base + mem->size)
return loc + (mem->rw_copy - mem->base);
- }
- return loc;
-}
static int module_memory_alloc(struct module *mod, enum mod_mem_type type) { unsigned int size = PAGE_ALIGN(mod->mem[type].size); @@ -1250,21 +1238,15 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type) if (!ptr) return -ENOMEM;
- mod->mem[type].base = ptr;
- if (execmem_is_rox(execmem_type)) {
ptr = vzalloc(size);
int err = execmem_make_temp_rw(ptr, size);
if (!ptr) {
execmem_free(mod->mem[type].base);
if (err) {
}execmem_free(ptr); return -ENOMEM;
mod->mem[type].rw_copy = ptr;
mod->mem[type].is_rox = true;
} else {
mod->mem[type].rw_copy = mod->mem[type].base;
memset(mod->mem[type].base, 0, size);
}
/*
@@ -1280,6 +1262,9 @@ static int module_memory_alloc(struct module *mod, enum mod_mem_type type) */ kmemleak_not_leak(ptr);
- memset(ptr, 0, size);
- mod->mem[type].base = ptr;
- return 0;
}
@@ -1287,8 +1272,8 @@ static void module_memory_free(struct module *mod, enum mod_mem_type type) { struct module_memory *mem = &mod->mem[type];
- if (mem->is_rox)
vfree(mem->rw_copy);
if (mod->state == MODULE_STATE_UNFORMED && mem->is_rox)
execmem_restore_rox(mem->base, mem->size);
execmem_free(mem->base);
} @@ -2561,7 +2546,6 @@ static int move_module(struct module *mod, struct load_info *info) for_each_mod_mem_type(type) { if (!mod->mem[type].size) { mod->mem[type].base = NULL;
}mod->mem[type].rw_copy = NULL; continue;
@@ -2578,7 +2562,6 @@ static int move_module(struct module *mod, struct load_info *info) void *dest; Elf_Shdr *shdr = &info->sechdrs[i]; const char *sname;
unsigned long addr;
if (!(shdr->sh_flags & SHF_ALLOC)) continue;
@@ -2599,14 +2582,12 @@ static int move_module(struct module *mod, struct load_info *info) ret = PTR_ERR(dest); goto out_err; }
addr = (unsigned long)dest; codetag_section_found = true;
} else { enum mod_mem_type type = shdr->sh_entsize >> SH_ENTSIZE_TYPE_SHIFT; unsigned long offset = shdr->sh_entsize & SH_ENTSIZE_OFFSET_MASK;
addr = (unsigned long)mod->mem[type].base + offset;
dest = mod->mem[type].rw_copy + offset;
dest = mod->mem[type].base + offset;
}
if (shdr->sh_type != SHT_NOBITS) {
@@ -2629,7 +2610,7 @@ static int move_module(struct module *mod, struct load_info *info) * users of info can keep taking advantage and using the newly * minted official memory area. */
shdr->sh_addr = addr;
pr_debug("\t0x%lx 0x%.8lx %s\n", (long)shdr->sh_addr, (long)shdr->sh_size, info->secstrings + shdr->sh_name); }shdr->sh_addr = (unsigned long)dest;
@@ -2782,17 +2763,8 @@ int __weak module_finalize(const Elf_Ehdr *hdr, return 0; }
-int __weak module_post_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)
-{
- return 0;
-}
static int post_relocation(struct module *mod, const struct load_info *info) {
- int ret;
- /* Sort exception table now relocations are done. */ sort_extable(mod->extable, mod->extable + mod->num_exentries);
@@ -2804,24 +2776,7 @@ static int post_relocation(struct module *mod, const struct load_info *info) add_kallsyms(mod, info);
/* Arch-specific module finalizing. */
- ret = module_finalize(info->hdr, info->sechdrs, mod);
- if (ret)
return ret;
- for_each_mod_mem_type(type) {
struct module_memory *mem = &mod->mem[type];
if (mem->is_rox) {
if (!execmem_update_copy(mem->base, mem->rw_copy,
mem->size))
return -ENOMEM;
vfree(mem->rw_copy);
mem->rw_copy = NULL;
}
- }
- return module_post_finalize(info->hdr, info->sechdrs, mod);
- return module_finalize(info->hdr, info->sechdrs, mod);
}
/* Call module constructors. */ diff --git a/kernel/module/strict_rwx.c b/kernel/module/strict_rwx.c index 239e5013359d..ce47b6346f27 100644 --- a/kernel/module/strict_rwx.c +++ b/kernel/module/strict_rwx.c @@ -9,6 +9,7 @@ #include <linux/mm.h> #include <linux/vmalloc.h> #include <linux/set_memory.h> +#include <linux/execmem.h> #include "internal.h"
static int module_set_memory(const struct module *mod, enum mod_mem_type type, @@ -32,12 +33,12 @@ static int module_set_memory(const struct module *mod, enum mod_mem_type type, int module_enable_text_rox(const struct module *mod) { for_class_mod_mem_type(type, text) {
int ret;const struct module_memory *mem = &mod->mem[type];
if (mod->mem[type].is_rox)
continue;
if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
if (mem->is_rox)
ret = execmem_restore_rox(mem->base, mem->size);
else ret = module_set_memory(mod, type, set_memory_x);else if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX)) ret = module_set_memory(mod, type, set_memory_rox);
-- 2.45.2
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
For what it's worth, we've found the same under Xen too.
There's one concrete bug in the series, failing to cope with the absence of superpages (fix in https://lore.kernel.org/xen-devel/6bb03333-74ca-4c2c-85a8-72549b85a5b4@suse.... but not formally posted yet AFAICT).
The rest of the thread then found a crash looking to be the same as reported here, but you've made better progress narrowing it down than we have.
~Andrew
On Fri, 3 Jan 2025 02:06:10 +0000 Andrew Cooper andrew.cooper3@citrix.com wrote:
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
For what it's worth, we've found the same under Xen too.
There's one concrete bug in the series, failing to cope with the absence of superpages (fix in https://lore.kernel.org/xen-devel/6bb03333-74ca-4c2c-85a8-72549b85a5b4@suse.... but not formally posted yet AFAICT).
The rest of the thread then found a crash looking to be the same as reported here, but you've made better progress narrowing it down than we have.
Thanks. I removed this series from mm.git while this is worked on.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On Thu, Jan 02, 2025 at 09:57:14PM -0800, Andrew Morton wrote:
On Fri, 3 Jan 2025 02:06:10 +0000 Andrew Cooper andrew.cooper3@citrix.com wrote:
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
For what it's worth, we've found the same under Xen too.
There's one concrete bug in the series, failing to cope with the absence of superpages (fix in https://lore.kernel.org/xen-devel/6bb03333-74ca-4c2c-85a8-72549b85a5b4@suse.... but not formally posted yet AFAICT).
The rest of the thread then found a crash looking to be the same as reported here, but you've made better progress narrowing it down than we have.
Thanks. I removed this series from mm.git while this is worked on.
The issue under Xen that Andrew linked happens on 6.13-rc5 which AFAICT doesn't this series yet. So, it's probably a different issue.
- -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab
On 03.01.25 03:06, Andrew Cooper wrote:
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
For what it's worth, we've found the same under Xen too.
There's one concrete bug in the series, failing to cope with the absence of superpages (fix in https://lore.kernel.org/xen-devel/6bb03333-74ca-4c2c-85a8-72549b85a5b4@suse.... but not formally posted yet AFAICT).
Now sent out:
https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u
Juergen
On Fri, Jan 03, 2025 at 07:58:13AM +0100, Jürgen Groß wrote:
On 03.01.25 03:06, Andrew Cooper wrote:
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
For what it's worth, we've found the same under Xen too.
There's one concrete bug in the series, failing to cope with the absence of superpages (fix in https://lore.kernel.org/xen-devel/6bb03333-74ca-4c2c-85a8-72549b85a5b4@suse.... but not formally posted yet AFAICT).
Now sent out:
https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u
Thanks,
Marek, Adam, can you try this patch? Although the reply here is for another future series being worked on the fix is for commit 2e45474ab14f ("execmem: add support for cache of large ROX pages").
Luis
On Sat, Jan 4, 2025 at 3:07 AM Luis Chamberlain mcgrof@kernel.org wrote:
On Fri, Jan 03, 2025 at 07:58:13AM +0100, Jürgen Groß wrote:
On 03.01.25 03:06, Andrew Cooper wrote:
Hi Mike,
This commit is making my intel box not boot in mm-unstable :>) I bisected it to this commit.
For what it's worth, we've found the same under Xen too.
There's one concrete bug in the series, failing to cope with the absence of superpages (fix in https://lore.kernel.org/xen-devel/6bb03333-74ca-4c2c-85a8-72549b85a5b4@suse.... but not formally posted yet AFAICT).
Now sent out:
https://lore.kernel.org/lkml/20250103065631.26459-1-jgross@suse.com/T/#u
Thanks,
Marek, Adam, can you try this patch? Although the reply here is for another future series being worked on the fix is for commit 2e45474ab14f ("execmem: add support for cache of large ROX pages").
Luis
Hi Luis,
I suppose you're referring to the issue described here https://lore.kernel.org/linux-mm/CAGcaFA2hdThQV6mjD_1_U+GNHThv84+MQvMWLgEuX+... Unfortnuetly this patch didn't help.
Best, Marek
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
The module code does not create a writable copy of the executable memory anymore so there is no need to handle it in module relocation and alternatives patching.
This reverts commit 9bfc4824fd4836c16bb44f922bfaffba5da3e4f3.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- arch/um/kernel/um_arch.c | 11 +- arch/x86/entry/vdso/vma.c | 3 +- arch/x86/include/asm/alternative.h | 14 +-- arch/x86/kernel/alternative.c | 181 ++++++++++++----------------- arch/x86/kernel/ftrace.c | 30 +++-- arch/x86/kernel/module.c | 45 +++---- 6 files changed, 117 insertions(+), 167 deletions(-)
diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c index 8037a967225d..d2cc2c69a8c4 100644 --- a/arch/um/kernel/um_arch.c +++ b/arch/um/kernel/um_arch.c @@ -440,25 +440,24 @@ void __init arch_cpu_finalize_init(void) os_check_bugs(); }
-void apply_seal_endbr(s32 *start, s32 *end, struct module *mod) +void apply_seal_endbr(s32 *start, s32 *end) { }
-void apply_retpolines(s32 *start, s32 *end, struct module *mod) +void apply_retpolines(s32 *start, s32 *end) { }
-void apply_returns(s32 *start, s32 *end, struct module *mod) +void apply_returns(s32 *start, s32 *end) { }
void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, - s32 *start_cfi, s32 *end_cfi, struct module *mod) + s32 *start_cfi, s32 *end_cfi) { }
-void apply_alternatives(struct alt_instr *start, struct alt_instr *end, - struct module *mod) +void apply_alternatives(struct alt_instr *start, struct alt_instr *end) { }
diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 39e6efc1a9ca..bfc7cabf4017 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -48,8 +48,7 @@ int __init init_vdso_image(const struct vdso_image *image)
apply_alternatives((struct alt_instr *)(image->data + image->alt), (struct alt_instr *)(image->data + image->alt + - image->alt_len), - NULL); + image->alt_len));
return 0; } diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index dc03a647776d..ca9ae606aab9 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -96,16 +96,16 @@ extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; * instructions were patched in already: */ extern int alternatives_patched; -struct module;
extern void alternative_instructions(void); -extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end, - struct module *mod); -extern void apply_retpolines(s32 *start, s32 *end, struct module *mod); -extern void apply_returns(s32 *start, s32 *end, struct module *mod); -extern void apply_seal_endbr(s32 *start, s32 *end, struct module *mod); +extern void apply_alternatives(struct alt_instr *start, struct alt_instr *end); +extern void apply_retpolines(s32 *start, s32 *end); +extern void apply_returns(s32 *start, s32 *end); +extern void apply_seal_endbr(s32 *start, s32 *end); extern void apply_fineibt(s32 *start_retpoline, s32 *end_retpoine, - s32 *start_cfi, s32 *end_cfi, struct module *mod); + s32 *start_cfi, s32 *end_cfi); + +struct module;
struct callthunk_sites { s32 *call_start, *call_end; diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 243843e44e89..d17518ca19b8 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -392,10 +392,8 @@ EXPORT_SYMBOL(BUG_func); * Rewrite the "call BUG_func" replacement to point to the target of the * indirect pv_ops call "call *disp(%ip)". */ -static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a, - struct module *mod) +static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a) { - u8 *wr_instr = module_writable_address(mod, instr); void *target, *bug = &BUG_func; s32 disp;
@@ -405,14 +403,14 @@ static int alt_replace_call(u8 *instr, u8 *insn_buff, struct alt_instr *a, }
if (a->instrlen != 6 || - wr_instr[0] != CALL_RIP_REL_OPCODE || - wr_instr[1] != CALL_RIP_REL_MODRM) { + instr[0] != CALL_RIP_REL_OPCODE || + instr[1] != CALL_RIP_REL_MODRM) { pr_err("ALT_FLAG_DIRECT_CALL set for unrecognized indirect call\n"); BUG(); }
/* Skip CALL_RIP_REL_OPCODE and CALL_RIP_REL_MODRM */ - disp = *(s32 *)(wr_instr + 2); + disp = *(s32 *)(instr + 2); #ifdef CONFIG_X86_64 /* ff 15 00 00 00 00 call *0x0(%rip) */ /* target address is stored at "next instruction + disp". */ @@ -450,8 +448,7 @@ static inline u8 * instr_va(struct alt_instr *i) * to refetch changed I$ lines. */ void __init_or_module noinline apply_alternatives(struct alt_instr *start, - struct alt_instr *end, - struct module *mod) + struct alt_instr *end) { u8 insn_buff[MAX_PATCH_LEN]; u8 *instr, *replacement; @@ -480,7 +477,6 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, */ for (a = start; a < end; a++) { int insn_buff_sz = 0; - u8 *wr_instr, *wr_replacement;
/* * In case of nested ALTERNATIVE()s the outer alternative might @@ -494,11 +490,7 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, }
instr = instr_va(a); - wr_instr = module_writable_address(mod, instr); - replacement = (u8 *)&a->repl_offset + a->repl_offset; - wr_replacement = module_writable_address(mod, replacement); - BUG_ON(a->instrlen > sizeof(insn_buff)); BUG_ON(a->cpuid >= (NCAPINTS + NBUGINTS) * 32);
@@ -509,9 +501,9 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, * patch if feature is *NOT* present. */ if (!boot_cpu_has(a->cpuid) == !(a->flags & ALT_FLAG_NOT)) { - memcpy(insn_buff, wr_instr, a->instrlen); + memcpy(insn_buff, instr, a->instrlen); optimize_nops(instr, insn_buff, a->instrlen); - text_poke_early(wr_instr, insn_buff, a->instrlen); + text_poke_early(instr, insn_buff, a->instrlen); continue; }
@@ -521,12 +513,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start, instr, instr, a->instrlen, replacement, a->replacementlen, a->flags);
- memcpy(insn_buff, wr_replacement, a->replacementlen); + memcpy(insn_buff, replacement, a->replacementlen); insn_buff_sz = a->replacementlen;
if (a->flags & ALT_FLAG_DIRECT_CALL) { - insn_buff_sz = alt_replace_call(instr, insn_buff, a, - mod); + insn_buff_sz = alt_replace_call(instr, insn_buff, a); if (insn_buff_sz < 0) continue; } @@ -536,11 +527,11 @@ void __init_or_module noinline apply_alternatives(struct alt_instr *start,
apply_relocation(insn_buff, instr, a->instrlen, replacement, a->replacementlen);
- DUMP_BYTES(ALT, wr_instr, a->instrlen, "%px: old_insn: ", instr); + DUMP_BYTES(ALT, instr, a->instrlen, "%px: old_insn: ", instr); DUMP_BYTES(ALT, replacement, a->replacementlen, "%px: rpl_insn: ", replacement); DUMP_BYTES(ALT, insn_buff, insn_buff_sz, "%px: final_insn: ", instr);
- text_poke_early(wr_instr, insn_buff, insn_buff_sz); + text_poke_early(instr, insn_buff, insn_buff_sz); }
kasan_enable_current(); @@ -731,20 +722,18 @@ static int patch_retpoline(void *addr, struct insn *insn, u8 *bytes) /* * Generated by 'objtool --retpoline'. */ -void __init_or_module noinline apply_retpolines(s32 *start, s32 *end, - struct module *mod) +void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr = module_writable_address(mod, addr); struct insn insn; int len, ret; u8 bytes[16]; u8 op1, op2;
- ret = insn_decode_kernel(&insn, wr_addr); + ret = insn_decode_kernel(&insn, addr); if (WARN_ON_ONCE(ret < 0)) continue;
@@ -772,9 +761,9 @@ void __init_or_module noinline apply_retpolines(s32 *start, s32 *end, len = patch_retpoline(addr, &insn, bytes); if (len == insn.length) { optimize_nops(addr, bytes, len); - DUMP_BYTES(RETPOLINE, ((u8*)wr_addr), len, "%px: orig: ", addr); + DUMP_BYTES(RETPOLINE, ((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(RETPOLINE, ((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(wr_addr, bytes, len); + text_poke_early(addr, bytes, len); } } } @@ -810,8 +799,7 @@ static int patch_return(void *addr, struct insn *insn, u8 *bytes) return i; }
-void __init_or_module noinline apply_returns(s32 *start, s32 *end, - struct module *mod) +void __init_or_module noinline apply_returns(s32 *start, s32 *end) { s32 *s;
@@ -820,13 +808,12 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end,
for (s = start; s < end; s++) { void *dest = NULL, *addr = (void *)s + *s; - void *wr_addr = module_writable_address(mod, addr); struct insn insn; int len, ret; u8 bytes[16]; u8 op;
- ret = insn_decode_kernel(&insn, wr_addr); + ret = insn_decode_kernel(&insn, addr); if (WARN_ON_ONCE(ret < 0)) continue;
@@ -846,35 +833,32 @@ void __init_or_module noinline apply_returns(s32 *start, s32 *end,
len = patch_return(addr, &insn, bytes); if (len == insn.length) { - DUMP_BYTES(RET, ((u8*)wr_addr), len, "%px: orig: ", addr); + DUMP_BYTES(RET, ((u8*)addr), len, "%px: orig: ", addr); DUMP_BYTES(RET, ((u8*)bytes), len, "%px: repl: ", addr); - text_poke_early(wr_addr, bytes, len); + text_poke_early(addr, bytes, len); } } } #else -void __init_or_module noinline apply_returns(s32 *start, s32 *end, - struct module *mod) { } +void __init_or_module noinline apply_returns(s32 *start, s32 *end) { } #endif /* CONFIG_MITIGATION_RETHUNK */
#else /* !CONFIG_MITIGATION_RETPOLINE || !CONFIG_OBJTOOL */
-void __init_or_module noinline apply_retpolines(s32 *start, s32 *end, - struct module *mod) { } -void __init_or_module noinline apply_returns(s32 *start, s32 *end, - struct module *mod) { } +void __init_or_module noinline apply_retpolines(s32 *start, s32 *end) { } +void __init_or_module noinline apply_returns(s32 *start, s32 *end) { }
#endif /* CONFIG_MITIGATION_RETPOLINE && CONFIG_OBJTOOL */
#ifdef CONFIG_X86_KERNEL_IBT
-static void poison_cfi(void *addr, void *wr_addr); +static void poison_cfi(void *addr);
-static void __init_or_module poison_endbr(void *addr, void *wr_addr, bool warn) +static void __init_or_module poison_endbr(void *addr, bool warn) { u32 endbr, poison = gen_endbr_poison();
- if (WARN_ON_ONCE(get_kernel_nofault(endbr, wr_addr))) + if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr))) return;
if (!is_endbr(endbr)) { @@ -889,7 +873,7 @@ static void __init_or_module poison_endbr(void *addr, void *wr_addr, bool warn) */ DUMP_BYTES(ENDBR, ((u8*)addr), 4, "%px: orig: ", addr); DUMP_BYTES(ENDBR, ((u8*)&poison), 4, "%px: repl: ", addr); - text_poke_early(wr_addr, &poison, 4); + text_poke_early(addr, &poison, 4); }
/* @@ -898,23 +882,22 @@ static void __init_or_module poison_endbr(void *addr, void *wr_addr, bool warn) * Seal the functions for indirect calls by clobbering the ENDBR instructions * and the kCFI hash value. */ -void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end, struct module *mod) +void __init_or_module noinline apply_seal_endbr(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr = module_writable_address(mod, addr);
- poison_endbr(addr, wr_addr, true); + poison_endbr(addr, true); if (IS_ENABLED(CONFIG_FINEIBT)) - poison_cfi(addr - 16, wr_addr - 16); + poison_cfi(addr - 16); } }
#else
-void __init_or_module apply_seal_endbr(s32 *start, s32 *end, struct module *mod) { } +void __init_or_module apply_seal_endbr(s32 *start, s32 *end) { }
#endif /* CONFIG_X86_KERNEL_IBT */
@@ -1136,7 +1119,7 @@ static u32 decode_caller_hash(void *addr) }
/* .retpoline_sites */ -static int cfi_disable_callers(s32 *start, s32 *end, struct module *mod) +static int cfi_disable_callers(s32 *start, s32 *end) { /* * Disable kCFI by patching in a JMP.d8, this leaves the hash immediate @@ -1148,23 +1131,20 @@ static int cfi_disable_callers(s32 *start, s32 *end, struct module *mod)
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr; u32 hash;
addr -= fineibt_caller_size; - wr_addr = module_writable_address(mod, addr); - hash = decode_caller_hash(wr_addr); - + hash = decode_caller_hash(addr); if (!hash) /* nocfi callers */ continue;
- text_poke_early(wr_addr, jmp, 2); + text_poke_early(addr, jmp, 2); }
return 0; }
-static int cfi_enable_callers(s32 *start, s32 *end, struct module *mod) +static int cfi_enable_callers(s32 *start, s32 *end) { /* * Re-enable kCFI, undo what cfi_disable_callers() did. @@ -1174,115 +1154,106 @@ static int cfi_enable_callers(s32 *start, s32 *end, struct module *mod)
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr; u32 hash;
addr -= fineibt_caller_size; - wr_addr = module_writable_address(mod, addr); - hash = decode_caller_hash(wr_addr); + hash = decode_caller_hash(addr); if (!hash) /* nocfi callers */ continue;
- text_poke_early(wr_addr, mov, 2); + text_poke_early(addr, mov, 2); }
return 0; }
/* .cfi_sites */ -static int cfi_rand_preamble(s32 *start, s32 *end, struct module *mod) +static int cfi_rand_preamble(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr = module_writable_address(mod, addr); u32 hash;
- hash = decode_preamble_hash(wr_addr); + hash = decode_preamble_hash(addr); if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n", addr, addr, 5, addr)) return -EINVAL;
hash = cfi_rehash(hash); - text_poke_early(wr_addr + 1, &hash, 4); + text_poke_early(addr + 1, &hash, 4); }
return 0; }
-static int cfi_rewrite_preamble(s32 *start, s32 *end, struct module *mod) +static int cfi_rewrite_preamble(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr = module_writable_address(mod, addr); u32 hash;
- hash = decode_preamble_hash(wr_addr); + hash = decode_preamble_hash(addr); if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n", addr, addr, 5, addr)) return -EINVAL;
- text_poke_early(wr_addr, fineibt_preamble_start, fineibt_preamble_size); - WARN_ON(*(u32 *)(wr_addr + fineibt_preamble_hash) != 0x12345678); - text_poke_early(wr_addr + fineibt_preamble_hash, &hash, 4); + text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size); + WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678); + text_poke_early(addr + fineibt_preamble_hash, &hash, 4); }
return 0; }
-static void cfi_rewrite_endbr(s32 *start, s32 *end, struct module *mod) +static void cfi_rewrite_endbr(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr = module_writable_address(mod, addr);
- poison_endbr(addr + 16, wr_addr + 16, false); + poison_endbr(addr+16, false); } }
/* .retpoline_sites */ -static int cfi_rand_callers(s32 *start, s32 *end, struct module *mod) +static int cfi_rand_callers(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr; u32 hash;
addr -= fineibt_caller_size; - wr_addr = module_writable_address(mod, addr); - hash = decode_caller_hash(wr_addr); + hash = decode_caller_hash(addr); if (hash) { hash = -cfi_rehash(hash); - text_poke_early(wr_addr + 2, &hash, 4); + text_poke_early(addr + 2, &hash, 4); } }
return 0; }
-static int cfi_rewrite_callers(s32 *start, s32 *end, struct module *mod) +static int cfi_rewrite_callers(s32 *start, s32 *end) { s32 *s;
for (s = start; s < end; s++) { void *addr = (void *)s + *s; - void *wr_addr; u32 hash;
addr -= fineibt_caller_size; - wr_addr = module_writable_address(mod, addr); - hash = decode_caller_hash(wr_addr); + hash = decode_caller_hash(addr); if (hash) { - text_poke_early(wr_addr, fineibt_caller_start, fineibt_caller_size); - WARN_ON(*(u32 *)(wr_addr + fineibt_caller_hash) != 0x12345678); - text_poke_early(wr_addr + fineibt_caller_hash, &hash, 4); + text_poke_early(addr, fineibt_caller_start, fineibt_caller_size); + WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678); + text_poke_early(addr + fineibt_caller_hash, &hash, 4); } /* rely on apply_retpolines() */ } @@ -1291,9 +1262,8 @@ static int cfi_rewrite_callers(s32 *start, s32 *end, struct module *mod) }
static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, - s32 *start_cfi, s32 *end_cfi, struct module *mod) + s32 *start_cfi, s32 *end_cfi, bool builtin) { - bool builtin = mod ? false : true; int ret;
if (WARN_ONCE(fineibt_preamble_size != 16, @@ -1311,7 +1281,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, * rewrite them. This disables all CFI. If this succeeds but any of the * later stages fails, we're without CFI. */ - ret = cfi_disable_callers(start_retpoline, end_retpoline, mod); + ret = cfi_disable_callers(start_retpoline, end_retpoline); if (ret) goto err;
@@ -1322,11 +1292,11 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, cfi_bpf_subprog_hash = cfi_rehash(cfi_bpf_subprog_hash); }
- ret = cfi_rand_preamble(start_cfi, end_cfi, mod); + ret = cfi_rand_preamble(start_cfi, end_cfi); if (ret) goto err;
- ret = cfi_rand_callers(start_retpoline, end_retpoline, mod); + ret = cfi_rand_callers(start_retpoline, end_retpoline); if (ret) goto err; } @@ -1338,7 +1308,7 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, return;
case CFI_KCFI: - ret = cfi_enable_callers(start_retpoline, end_retpoline, mod); + ret = cfi_enable_callers(start_retpoline, end_retpoline); if (ret) goto err;
@@ -1348,17 +1318,17 @@ static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
case CFI_FINEIBT: /* place the FineIBT preamble at func()-16 */ - ret = cfi_rewrite_preamble(start_cfi, end_cfi, mod); + ret = cfi_rewrite_preamble(start_cfi, end_cfi); if (ret) goto err;
/* rewrite the callers to target func()-16 */ - ret = cfi_rewrite_callers(start_retpoline, end_retpoline, mod); + ret = cfi_rewrite_callers(start_retpoline, end_retpoline); if (ret) goto err;
/* now that nobody targets func()+0, remove ENDBR there */ - cfi_rewrite_endbr(start_cfi, end_cfi, mod); + cfi_rewrite_endbr(start_cfi, end_cfi);
if (builtin) pr_info("Using FineIBT CFI\n"); @@ -1377,7 +1347,7 @@ static inline void poison_hash(void *addr) *(u32 *)addr = 0; }
-static void poison_cfi(void *addr, void *wr_addr) +static void poison_cfi(void *addr) { switch (cfi_mode) { case CFI_FINEIBT: @@ -1389,8 +1359,8 @@ static void poison_cfi(void *addr, void *wr_addr) * ud2 * 1: nop */ - poison_endbr(addr, wr_addr, false); - poison_hash(wr_addr + fineibt_preamble_hash); + poison_endbr(addr, false); + poison_hash(addr + fineibt_preamble_hash); break;
case CFI_KCFI: @@ -1399,7 +1369,7 @@ static void poison_cfi(void *addr, void *wr_addr) * movl $0, %eax * .skip 11, 0x90 */ - poison_hash(wr_addr + 1); + poison_hash(addr + 1); break;
default: @@ -1410,21 +1380,22 @@ static void poison_cfi(void *addr, void *wr_addr) #else
static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, - s32 *start_cfi, s32 *end_cfi, struct module *mod) + s32 *start_cfi, s32 *end_cfi, bool builtin) { }
#ifdef CONFIG_X86_KERNEL_IBT -static void poison_cfi(void *addr, void *wr_addr) { } +static void poison_cfi(void *addr) { } #endif
#endif
void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline, - s32 *start_cfi, s32 *end_cfi, struct module *mod) + s32 *start_cfi, s32 *end_cfi) { return __apply_fineibt(start_retpoline, end_retpoline, - start_cfi, end_cfi, mod); + start_cfi, end_cfi, + /* .builtin = */ false); }
#ifdef CONFIG_SMP @@ -1721,16 +1692,16 @@ void __init alternative_instructions(void) paravirt_set_cap();
__apply_fineibt(__retpoline_sites, __retpoline_sites_end, - __cfi_sites, __cfi_sites_end, NULL); + __cfi_sites, __cfi_sites_end, true);
/* * Rewrite the retpolines, must be done before alternatives since * those can rewrite the retpoline thunks. */ - apply_retpolines(__retpoline_sites, __retpoline_sites_end, NULL); - apply_returns(__return_sites, __return_sites_end, NULL); + apply_retpolines(__retpoline_sites, __retpoline_sites_end); + apply_returns(__return_sites, __return_sites_end);
- apply_alternatives(__alt_instructions, __alt_instructions_end, NULL); + apply_alternatives(__alt_instructions, __alt_instructions_end);
/* * Now all calls are established. Apply the call thunks if @@ -1741,7 +1712,7 @@ void __init alternative_instructions(void) /* * Seal all functions that do not have their address taken. */ - apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end, NULL); + apply_seal_endbr(__ibt_endbr_seal, __ibt_endbr_seal_end);
#ifdef CONFIG_SMP /* Patch to UP if other cpus not imminent. */ diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c index 4dd0ad6c94d6..adb09f78edb2 100644 --- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -118,13 +118,10 @@ ftrace_modify_code_direct(unsigned long ip, const char *old_code, return ret;
/* replace the text with the new text */ - if (ftrace_poke_late) { + if (ftrace_poke_late) text_poke_queue((void *)ip, new_code, MCOUNT_INSN_SIZE, NULL); - } else { - mutex_lock(&text_mutex); - text_poke((void *)ip, new_code, MCOUNT_INSN_SIZE); - mutex_unlock(&text_mutex); - } + else + text_poke_early((void *)ip, new_code, MCOUNT_INSN_SIZE); return 0; }
@@ -321,7 +318,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) unsigned const char op_ref[] = { 0x48, 0x8b, 0x15 }; unsigned const char retq[] = { RET_INSN_OPCODE, INT3_INSN_OPCODE }; union ftrace_op_code_union op_ptr; - void *ret; + int ret;
if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) { start_offset = (unsigned long)ftrace_regs_caller; @@ -352,15 +349,15 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) npages = DIV_ROUND_UP(*tramp_size, PAGE_SIZE);
/* Copy ftrace_caller onto the trampoline memory */ - ret = text_poke_copy(trampoline, (void *)start_offset, size); - if (WARN_ON(!ret)) + ret = copy_from_kernel_nofault(trampoline, (void *)start_offset, size); + if (WARN_ON(ret < 0)) goto fail;
ip = trampoline + size; if (cpu_feature_enabled(X86_FEATURE_RETHUNK)) __text_gen_insn(ip, JMP32_INSN_OPCODE, ip, x86_return_thunk, JMP32_INSN_SIZE); else - text_poke_copy(ip, retq, sizeof(retq)); + memcpy(ip, retq, sizeof(retq));
/* No need to test direct calls on created trampolines */ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) { @@ -368,7 +365,8 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) ip = trampoline + (jmp_offset - start_offset); if (WARN_ON(*(char *)ip != 0x75)) goto fail; - if (!text_poke_copy(ip, x86_nops[2], 2)) + ret = copy_from_kernel_nofault(ip, x86_nops[2], 2); + if (ret < 0) goto fail; }
@@ -381,7 +379,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) */
ptr = (unsigned long *)(trampoline + size + RET_SIZE); - text_poke_copy(ptr, &ops, sizeof(unsigned long)); + *ptr = (unsigned long)ops;
op_offset -= start_offset; memcpy(&op_ptr, trampoline + op_offset, OP_REF_SIZE); @@ -397,7 +395,7 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) op_ptr.offset = offset;
/* put in the new offset to the ftrace_ops */ - text_poke_copy(trampoline + op_offset, &op_ptr, OP_REF_SIZE); + memcpy(trampoline + op_offset, &op_ptr, OP_REF_SIZE);
/* put in the call to the function */ mutex_lock(&text_mutex); @@ -407,9 +405,9 @@ create_trampoline(struct ftrace_ops *ops, unsigned int *tramp_size) * the depth accounting before the call already. */ dest = ftrace_ops_get_func(ops); - text_poke_copy_locked(trampoline + call_offset, - text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest), - CALL_INSN_SIZE, false); + memcpy(trampoline + call_offset, + text_gen_insn(CALL_INSN_OPCODE, trampoline + call_offset, dest), + CALL_INSN_SIZE); mutex_unlock(&text_mutex);
/* ALLOC_TRAMP flags lets us know we created it */ diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c index 8984abd91c00..837450b6e882 100644 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@ -146,21 +146,18 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs, }
if (apply) { - void *wr_loc = module_writable_address(me, loc); - - if (memcmp(wr_loc, &zero, size)) { + if (memcmp(loc, &zero, size)) { pr_err("x86/modules: Invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n", (int)ELF64_R_TYPE(rel[i].r_info), loc, val); return -ENOEXEC; } - write(wr_loc, &val, size); + write(loc, &val, size); } else { if (memcmp(loc, &val, size)) { pr_warn("x86/modules: Invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n", (int)ELF64_R_TYPE(rel[i].r_info), loc, val); return -ENOEXEC; } - /* FIXME: needs care for ROX module allocations */ write(loc, &zero, size); } } @@ -227,7 +224,7 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *me) { - const Elf_Shdr *s, *alt = NULL, + const Elf_Shdr *s, *alt = NULL, *locks = NULL, *orc = NULL, *orc_ip = NULL, *retpolines = NULL, *returns = NULL, *ibt_endbr = NULL, *calls = NULL, *cfi = NULL; @@ -236,6 +233,8 @@ int module_finalize(const Elf_Ehdr *hdr, for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { if (!strcmp(".altinstructions", secstrings + s->sh_name)) alt = s; + if (!strcmp(".smp_locks", secstrings + s->sh_name)) + locks = s; if (!strcmp(".orc_unwind", secstrings + s->sh_name)) orc = s; if (!strcmp(".orc_unwind_ip", secstrings + s->sh_name)) @@ -266,20 +265,20 @@ int module_finalize(const Elf_Ehdr *hdr, csize = cfi->sh_size; }
- apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize, me); + apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize); } if (retpolines) { void *rseg = (void *)retpolines->sh_addr; - apply_retpolines(rseg, rseg + retpolines->sh_size, me); + apply_retpolines(rseg, rseg + retpolines->sh_size); } if (returns) { void *rseg = (void *)returns->sh_addr; - apply_returns(rseg, rseg + returns->sh_size, me); + apply_returns(rseg, rseg + returns->sh_size); } if (alt) { /* patch .altinstructions */ void *aseg = (void *)alt->sh_addr; - apply_alternatives(aseg, aseg + alt->sh_size, me); + apply_alternatives(aseg, aseg + alt->sh_size); } if (calls || alt) { struct callthunk_sites cs = {}; @@ -298,28 +297,8 @@ int module_finalize(const Elf_Ehdr *hdr, } if (ibt_endbr) { void *iseg = (void *)ibt_endbr->sh_addr; - apply_seal_endbr(iseg, iseg + ibt_endbr->sh_size, me); + apply_seal_endbr(iseg, iseg + ibt_endbr->sh_size); } - - if (orc && orc_ip) - unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size, - (void *)orc->sh_addr, orc->sh_size); - - return 0; -} - -int module_post_finalize(const Elf_Ehdr *hdr, - const Elf_Shdr *sechdrs, - struct module *me) -{ - const Elf_Shdr *s, *locks = NULL; - char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset; - - for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { - if (!strcmp(".smp_locks", secstrings + s->sh_name)) - locks = s; - } - if (locks) { void *lseg = (void *)locks->sh_addr; void *text = me->mem[MOD_TEXT].base; @@ -329,6 +308,10 @@ int module_post_finalize(const Elf_Ehdr *hdr, text, text_end); }
+ if (orc && orc_ip) + unwind_module_init(me, (void *)orc_ip->sh_addr, orc_ip->sh_size, + (void *)orc->sh_addr, orc->sh_size); + return 0; }
From: "Mike Rapoport (Microsoft)" rppt@kernel.org
module_writable_address() is unused and can be removed.
Signed-off-by: Mike Rapoport (Microsoft) rppt@kernel.org --- include/linux/module.h | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/include/linux/module.h b/include/linux/module.h index e9fc9d1fa476..222099bb07cf 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -774,11 +774,6 @@ static inline bool is_livepatch_module(struct module *mod)
void set_module_sig_enforced(void);
-static inline void *module_writable_address(struct module *mod, void *loc) -{ - return loc; -} - #else /* !CONFIG_MODULES... */
static inline struct module *__module_address(unsigned long addr) @@ -886,11 +881,6 @@ static inline bool module_is_coming(struct module *mod) { return false; } - -static inline void *module_writable_address(struct module *mod, void *loc) -{ - return loc; -} #endif /* CONFIG_MODULES */
#ifdef CONFIG_SYSFS
linux-kselftest-mirror@lists.linaro.org