A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org # v4.14 ---
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
--- arch/x86/kernel/e820.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index c5399e80c59c..022fe1de8940 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1307,7 +1307,14 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
- memblock_add(entry->addr, entry->size); + if (entry->addr >= MAXMEM || end >= MAXMEM) + pr_err_once("Some physical memory is not addressable in the paging mode.\n"); + + if (entry->addr >= MAXMEM) + continue; + + end = min_t(u64, end, MAXMEM - 1); + memblock_add(entry->addr, end - entry->addr); }
/* Throw away partial pages: */
On 5/11/20 9:37 AM, Kirill A. Shutemov wrote:
memblock_add(entry->addr, entry->size);
if (entry->addr >= MAXMEM || end >= MAXMEM)
pr_err_once("Some physical memory is not addressable in the paging mode.\n");
Hi Kirill,
Thanks for fixing this!
Could we make the pr_err() a bit more informative, though? It would be nice to print out how much memory (or which addresses at least) are being thrown away.
I was also thinking that it would be handy to tell folks how to rectify the situation. Should we perhaps dump out the runtime status of X86_FEATURE_LA57?
On Mon, May 11, 2020 at 09:43:30AM -0700, Dave Hansen wrote:
On 5/11/20 9:37 AM, Kirill A. Shutemov wrote:
memblock_add(entry->addr, entry->size);
if (entry->addr >= MAXMEM || end >= MAXMEM)
pr_err_once("Some physical memory is not addressable in the paging mode.\n");
Hi Kirill,
Thanks for fixing this!
Could we make the pr_err() a bit more informative, though? It would be nice to print out how much memory (or which addresses at least) are being thrown away.
I was also thinking that it would be handy to tell folks how to rectify the situation. Should we perhaps dump out the runtime status of X86_FEATURE_LA57?
Something like this (incremental patch)?
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index 022fe1de8940..172b4244069f 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -1280,8 +1280,8 @@ void __init e820__memory_setup(void)
void __init e820__memblock_setup(void) { + u64 size, end, not_addressable = 0; int i; - u64 end;
/* * The bootstrap memblock region count maximum is 128 entries @@ -1307,16 +1307,24 @@ void __init e820__memblock_setup(void) if (entry->type != E820_TYPE_RAM && entry->type != E820_TYPE_RESERVED_KERN) continue;
- if (entry->addr >= MAXMEM || end >= MAXMEM) - pr_err_once("Some physical memory is not addressable in the paging mode.\n"); - - if (entry->addr >= MAXMEM) + if (entry->addr >= MAXMEM) { + not_addressable += entry->size; continue; + }
end = min_t(u64, end, MAXMEM - 1); + size = end - entry->addr; + not_addressable += entry->size - size; memblock_add(entry->addr, end - entry->addr); }
+ if (not_addressable) { + pr_err("%lldMB of physical memory is not addressable in the paging mode\n", + not_addressable >> 20); + if (!pgtable_l5_enabled()) + pr_err("Consider enabling 5-level paging\n"); + } + /* Throw away partial pages: */ memblock_trim_memory(PAGE_SIZE);
On 5/11/20 10:04 AM, Kirill A. Shutemov wrote:
- if (not_addressable) {
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
if (!pgtable_l5_enabled())
pr_err("Consider enabling 5-level paging\n");
- }
Looks sane to me. Definitely good enough until we get the first bug reports from an end user about how they screwed this up in practice.
For the aggregate patch:
Reviewed-by: Dave Hansen dave.hansen@intel.com
BTW, it's a shame that 0day and friends can't find stuff like this. I have the feeling we have more bugs like this coming.
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
Is "MB" the right unit for this. The problem seems to happen for systems with >64TB ... I doubt the unaddressable memory is just a couple of MBbytes
-Tony
On Mon, May 11, 2020 at 06:10:12PM +0000, Luck, Tony wrote:
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
Is "MB" the right unit for this. The problem seems to happen for systems with >64TB ... I doubt the unaddressable memory is just a couple of MBbytes
Change it to GB?
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
Is "MB" the right unit for this. The problem seems to happen for systems with >64TB ... I doubt the unaddressable memory is just a couple of MBbytes
Change it to GB?
I think it would be more readable.
[Maybe Linux needs a magic %p{something} that does auto-sizing to print in the most appropriate out of KB, MB, GB, TB, PB?]
-Tony
On Mon, May 11, 2020 at 06:58:21PM +0000, Luck, Tony wrote:
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
Is "MB" the right unit for this. The problem seems to happen for systems with >64TB ... I doubt the unaddressable memory is just a couple of MBbytes
Change it to GB?
I think it would be more readable.
[Maybe Linux needs a magic %p{something} that does auto-sizing to print in the most appropriate out of KB, MB, GB, TB, PB?]
We have one in string_helpers.c.
On Tue, May 12, 2020 at 12:19:25AM +0300, Andy Shevchenko wrote:
On Mon, May 11, 2020 at 06:58:21PM +0000, Luck, Tony wrote:
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
Is "MB" the right unit for this. The problem seems to happen for systems with >64TB ... I doubt the unaddressable memory is just a couple of MBbytes
Change it to GB?
I think it would be more readable.
[Maybe Linux needs a magic %p{something} that does auto-sizing to print in the most appropriate out of KB, MB, GB, TB, PB?]
We have one in string_helpers.c.
Ah, nice. So:
#include <linux/string_helpers.h>
char tmp[10]; /* Bother, no #define for this, just a comment in string_helpers.c */
string_get_size(not_addressable, 1, STRING_UNITS_2, tmp, sizeof(tmp);
pr_err("%s of physical memory is not addressable in the paging mode\n", tmp);
-Tony
On Mon, May 11, 2020 at 05:50:01PM -0700, Luck, Tony wrote:
On Tue, May 12, 2020 at 12:19:25AM +0300, Andy Shevchenko wrote:
On Mon, May 11, 2020 at 06:58:21PM +0000, Luck, Tony wrote:
pr_err("%lldMB of physical memory is not addressable in the paging mode\n",
not_addressable >> 20);
Is "MB" the right unit for this. The problem seems to happen for systems with >64TB ... I doubt the unaddressable memory is just a couple of MBbytes
Change it to GB?
I think it would be more readable.
[Maybe Linux needs a magic %p{something} that does auto-sizing to print in the most appropriate out of KB, MB, GB, TB, PB?]
We have one in string_helpers.c.
Ah, nice. So:
#include <linux/string_helpers.h>
char tmp[10]; /* Bother, no #define for this, just a comment in string_helpers.c */
string_get_size(not_addressable, 1, STRING_UNITS_2, tmp, sizeof(tmp);
pr_err("%s of physical memory is not addressable in the paging mode\n", tmp);
I've already submitted the patch upstream. I'll add it if a new revision is needed.
On Mon, May 11, 2020 at 07:37:06PM +0300, Kirill A. Shutemov wrote:
A 5-level paging capable machine can have memory above 46-bit in the physical address space. This memory is only addressable in the 5-level paging mode: we don't have enough virtual address space to create direct mapping for such memory in the 4-level paging mode.
Currently, we fail boot completely: NULL pointer dereference in subsection_map_init().
Skip creating a memblock for such memory instead and notify user that some memory is not addressable.
Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org # v4.14
Tested with a hacked QEMU: https://gist.github.com/kiryl/d45eb54110944ff95e544972d8bdac1d
BTW, I was only able to boot with legacy SeaBIOS, not with OVMF. No idea why.
linux-stable-mirror@lists.linaro.org