On Tue, May 26, 2020 at 07:27:15AM -0700, Dave Hansen wrote:
On 5/25/20 8:08 AM, Kirill A. Shutemov wrote:
- if (not_addressable) {
pr_err("%lldGB of physical memory is not addressable in the paging mode\n",
not_addressable >> 30);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
Could this happen at all when l5 is enabled? Does it mean we need kmap() for 64-bit?
It's future-profing. Who knows what paging modes we would have in the future.
Future-proofing and firmware-proofing. :)
In any case, are we *really* limited to 52 bits of physical memory with 5-level paging?
Yes. It's architectural. SDM says "MAXPHYADDR is at most 52" (Vol 3A, 4.1.4).
I guess it can be extended with an opt-in feature and relevant changes to page table structure. But as of today there's no such thing.
Previously, we said we were limited to 46 bits, and now we're saying that the limit is 52 with 5-level paging:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 52 : 46)
The 46 was fine with the 48 bits of address space on 4-level paging systems since we need 1/2 of the address space for userspace, 1/4 for the direct map and 1/4 for the vmalloc-and-friends area. At 46 bits of address space, we fill up the direct map.
The hardware designers know this and never enumerated a MAXPHYADDR from CPUID which was higher than what we could cover with 46 bits. It was nice and convenient that these two separate things matched:
- The amount of physical address space addressable in a direct map consuming 1/4 of the virtual address space.
- The CPU-enumerated MAXPHYADDR which among other things dictates how much physical address space is addressable in a PTE.
But, with 5-level paging, things are a little different. The limit in addressable memory because of running out of the direct map actually happens at 55 bits (57-2=55, analogous to the 4-level 48-2=46).
So shouldn't it technically be this:
#define MAX_PHYSMEM_BITS (pgtable_l5_enabled() ? 55 : 46)
?
Bits above 52 are ignored in the page table entries and accessible to software. Some of them got claimed by HW features (XD-bit, protection keys), but such features require explicit opt-in on software side.
Kernel could claim bits 53-55 for the physical address, but it doesn't get us anything: if future HW would provide such feature it would require opt-in. On other hand claiming them now means we cannot use them for other purposes as SW bit. I don't see a point.