On 11/11/25 10:07 am, Dev Jain wrote:
On 11/11/25 9:47 am, Yang Shi wrote:
On 11/10/25 7:39 PM, Dev Jain wrote:
On 05/11/25 9:27 am, Dev Jain wrote:
On 04/11/25 6:26 pm, Will Deacon wrote:
On Tue, Nov 04, 2025 at 09:06:12AM +0530, Dev Jain wrote:
On 04/11/25 12:15 am, Yang Shi wrote: > On 11/3/25 7:16 AM, Will Deacon wrote: >> On Mon, Nov 03, 2025 at 11:43:06AM +0530, Dev Jain wrote: >>> Post a166563e7ec3 ("arm64: mm: support large block mapping when >>> rodata=full"), >>> __change_memory_common has a real chance of failing due to split >>> failure. >>> Before that commit, this line was introduced in c55191e96caa, >>> still having >>> a chance of failing if it needs to allocate pagetable memory in >>> apply_to_page_range, although that has never been observed to >>> be true. >>> In general, we should always propagate the return value to the >>> caller. >>> >>> Cc: stable@vger.kernel.org >>> Fixes: c55191e96caa ("arm64: mm: apply r/o permissions of VM >>> areas to its linear alias as well") >>> Signed-off-by: Dev Jain dev.jain@arm.com >>> --- >>> Based on Linux 6.18-rc4. >>> >>> arch/arm64/mm/pageattr.c | 5 ++++- >>> 1 file changed, 4 insertions(+), 1 deletion(-) >>> >>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c >>> index 5135f2d66958..b4ea86cd3a71 100644 >>> --- a/arch/arm64/mm/pageattr.c >>> +++ b/arch/arm64/mm/pageattr.c >>> @@ -148,6 +148,7 @@ static int change_memory_common(unsigned >>> long addr, int numpages, >>> unsigned long size = PAGE_SIZE * numpages; >>> unsigned long end = start + size; >>> struct vm_struct *area; >>> + int ret; >>> int i; >>> if (!PAGE_ALIGNED(addr)) { >>> @@ -185,8 +186,10 @@ static int change_memory_common(unsigned >>> long addr, int numpages, >>> if (rodata_full && (pgprot_val(set_mask) == PTE_RDONLY || >>> pgprot_val(clear_mask) == PTE_RDONLY)) { >>> for (i = 0; i < area->nr_pages; i++) { >>> - __change_memory_common((u64)page_address(area->pages[i]), >>> + ret = >>> __change_memory_common((u64)page_address(area->pages[i]), >>> PAGE_SIZE, set_mask, clear_mask); >>> + if (ret) >>> + return ret; >> Hmm, this means we can return failure half-way through the >> operation. Is >> that something callers are expecting to handle? If so, how can >> they tell >> how far we got? > IIUC the callers don't have to know whether it is half-way or not > because the callers will change the permission back (e.g. to RW) > for the > whole range when freeing memory. Yes, it is the caller's responsibility to set VM_FLUSH_RESET_PERMS flag. Upon vfree(), it will change the direct map permissions back to RW.
Ok, but vfree() ends up using update_range_prot() to do that and if we need to worry about that failing (as per your commit message), then we're in trouble because the calls to set_area_direct_map() are unchecked.
In other words, this patch is either not necessary or it is incomplete.
Here is the relevant email, in the discussion between Ryan and Yang:
https://lore.kernel.org/all/fe52a1d8-5211-4962-afc8-c3f9caf64119@os.ampereco...
We had concluded that all callers of set_memory_ro() or set_memory_rox() (which require the linear map perm change back to default, upon vfree() ) will call it for the entire region (vm_struct). So, when we do the set_direct_map_invalid_noflush, it is guaranteed that the region has already been split. So this call cannot fail.
https://lore.kernel.org/all/f8898c87-8f49-4ef2-86ae-b60bcf67658c@os.ampereco...
This email notes that there is some code doing set_memory_rw() and unnecessarily setting the VM_FLUSH_RESET_PERMS flag, but in that case we don't care about the set_direct_map_invalid_noflush call failing because the protections are already RW.
Although we had also observed that all of this is fragile and depends on the caller doing the correct thing. The real solution should be somehow getting rid of the BBM style invalidation. Ryan had proposed some methods in that email thread.
One solution which I had thought of, is that, observe that we are doing an overkill by setting the linear map to invalid and then default, for the *entire* region. What we can do is iterate over the linear map alias of the vm_struct *area and only change permission back to RW for the pages which are *not* RW. And, those relevant mappings are guaranteed to be split because they were changed from RW to not RW.
@Yang and Ryan,
I saw Yang's patch here: https://lore.kernel.org/all/20251023204428.477531-1-yang@os.amperecomputing....
and realized that currently we are splitting away the linear map alias of the *entire* region.
Shouldn't this then imply that set_direct_map_invalid_noflush will never fail, since even
a set_memory_rox() call on a single page will split the linear map for the entire region,
and thus there is no fragility here which we were discussing about? I may be forgetting
something, this linear map stuff is confusing enough already.
It still may fail due to page table allocation failure when doing split. But it is still fine. We may run into 3 cases:
- set_memory_rox succeed to split the whole range, then
set_direct_map_invalid_noflush() will succeed too 2. set_memory_rox fails to split, for example, just change partial range permission due to page table allocation failure, then set_direct_map_invalid_noflush() may a. successfully change the permission back to default till where set_memory_rox fails at since that range has been successfully split. It is ok since the remaining range is actually not changed to ro by set_memory_rox at all b. successfully change the permission back to default for the whole range (for example, memory pressure is mitigated when set_direct_map_invalid_noflush() is called). It is definitely fine as well
Correct, what I mean to imply here is that, your patch will break this? If set_memory_* is applied on x till y, your patch changes the linear map alias
only from x till y - set_direct_map_invalid_noflush instead operates on 0 till size - 1, where 0 <=x <=y <= size - 1. So, it may encounter a -ENOMEM
on [0, x) range while invalidating, and that is *not* okay because we must reset back [0, x) to default?
Okay I realize this is wrong, please read the last bit as "and this is okay because [0, x) is default already", and therefore
your patch is correct.
Let me send a patch documenting this so we don't get confused in the future, it is easy to forget all of this.
Hopefully I don't miss anything.
Thanks, Yang
Will