[adding the people involved in developing and applying the culprit to the list of recipients]
FWIW, thread starts here: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjoseph...
On 02.03.24 09:24, Pavin Joseph wrote:
On 3/1/24 20:15, Linux regression tracking (Thorsten Leemhuis) wrote:
Does mainline show the same problem? The answer determines who later will have to look into this.
Yes, I reproduced the issue on mainline and the latest stable version 6.7.7 using your excellent guide.
Thx for testing and glad to hear. Still: if you have any feedback how to make that guide even better, please let me know!
With a bit of luck somebody might have heard about problems like yours. But if nobody comes up with an idea up within a few days we almost certainly need a bisection to get down to the root of the problem.
Full bisection done, culprit identified, and validated by reverting commit on mainline.
I assume the latter meant "reverting the culprit on mainline fixed the problem"; if you meant something else, please let us know.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.
Attached bisection log and config used.
Bisection final results: 7143c5f4cf2073193eb27c9cdb84fd4655d1802d is the first bad commit commit 7143c5f4cf2073193eb27c9cdb84fd4655d1802d Author: Steve Wahl steve.wahl@hpe.com Date: Fri Jan 26 10:48:41 2024 -0600
x86/mm/ident_map: Use gbpages only where full GB page should be mapped.
commit d794734c9bbfe22f86686dc2909c25f5ffe1a572 upstream.
When ident_pud_init() uses only gbpages to create identity maps, large ranges of addresses not actually requested can be included in the resulting table; a 4K request will map a full GB. On UV systems, this ends up including regions that will cause hardware to halt the system if accessed (these are marked "reserved" by BIOS). Even processor speculation into these regions is enough to trigger the system halt.
Only use gbpages when map creation requests include the full GB page of space. Fall back to using smaller 2M pages when only portions of a GB page are included in the request.
No attempt is made to coalesce mapping requests. If a request requires a map entry at the 2M (pmd) level, subsequent mapping requests within the same 1G region will also be at the pmd level, even if adjacent or overlapping such requests could have been combined to map a full gbpage. Existing usage starts with larger regions and then adds smaller regions, so this should not have any great consequence.
[ dhansen: fix up comment formatting, simplifty changelog ]
Signed-off-by: Steve Wahl steve.wahl@hpe.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240126164841.170866-1-steve.wahl%40hpe.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/x86/mm/ident_map.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
Btw, the issue appears on LTS kernel 6.6.18 as well. I didn't build this one from the source and test, but installed it a while back from OpenSuse Tumbleweed repos as "kernel-longterm" is a new addition and is being actively tested over there.
P.S.:
#regzbot introduced d794734c9bbfe22f86686dc2909c25f5ffe1a572 #regzbot title x86/mm/ident_map: kexec now leads to reboot