Currently, load_microcode_amd() iterates over all NUMA nodes, retrieves their CPU masks and unconditonally accesses per-CPU data for the first CPU of each mask.
According to Documentation/admin-guide/mm/numaperf.rst: "Some memory may share the same node as a CPU, and others are provided as memory only nodes." Therefore, some node CPU masks may be empty and wouldn't have a "first CPU".
On a machine with far memory (and therefore CPU-less NUMA nodes): - cpumask_of_node(nid) is 0 - cpumask_first(0) is CONFIG_NR_CPUS - cpu_data(CONFIG_NR_CPUS) accesses the cpu_info per-CPU array at an index that is 1 out of bounds
This does not have any security implications since flashing microcode is a privileged operation but I believe this has reliability implications by potentially corrupting memory while flashing a microcode update.
When booting with CONFIG_UBSAN_BOUNDS=y on an AMD machine that flashes a microcode update. I get the following splat:
UBSAN: array-index-out-of-bounds in arch/x86/kernel/cpu/microcode/amd.c:X:Y index 512 is out of range for type 'unsigned long[512]' [...] Call Trace: dump_stack+0xdb/0x143 __ubsan_handle_out_of_bounds+0xf5/0x120 load_microcode_amd+0x58f/0x6b0 request_microcode_amd+0x17c/0x250 reload_store+0x174/0x2b0 kernfs_fop_write_iter+0x227/0x2d0 vfs_write+0x322/0x510 ksys_write+0xb5/0x160 do_syscall_64+0x6b/0xa0 entry_SYSCALL_64_after_hwframe+0x67/0xd1
This patch checks that a NUMA node has CPUs before attempting to update its first CPU's microcode.
Fixes: 7ff6edf4fef3 ("x86/microcode/AMD: Fix mixed steppings support") Signed-off-by: Florent Revest revest@chromium.org Cc: stable@vger.kernel.org --- arch/x86/kernel/cpu/microcode/amd.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index 95ac1c6a84fbe..7c06425edc1b5 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -1059,6 +1059,7 @@ static enum ucode_state _load_microcode_amd(u8 family, const u8 *data, size_t si
static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t size) { + const struct cpumask *mask; struct cpuinfo_x86 *c; unsigned int nid, cpu; struct ucode_patch *p; @@ -1069,7 +1070,11 @@ static enum ucode_state load_microcode_amd(u8 family, const u8 *data, size_t siz return ret;
for_each_node(nid) { - cpu = cpumask_first(cpumask_of_node(nid)); + mask = cpumask_of_node(nid); + if (cpumask_empty(mask)) + continue; + + cpu = cpumask_first(mask); c = &cpu_data(cpu);
p = find_patch(cpu);
On Fri, Mar 7, 2025 at 3:55 PM Dave Hansen dave.hansen@intel.com wrote:
On 3/7/25 05:12, Florent Revest wrote:
for_each_node(nid) {
cpu = cpumask_first(cpumask_of_node(nid));
mask = cpumask_of_node(nid);
if (cpumask_empty(mask))
continue;
cpu = cpumask_first(mask);
Would for_each_node_with_cpus() trim this down a bit?
Oh nice, I didn't notice this macro, thanks for pointing it out! :) I'm happy to respin a v2 using for_each_node_with_cpus(), I'll just leave a bit more time in case there are other comments.
One thing I'm not entirely sure about is that for_each_node_with_cpus() is implemented on top of for_each_online_node(). This differs from the current code which uses for_each_node(). I can't tell if iterating over offline nodes is a bug or a feature of load_microcode_amd() so this would be an extra change to the business logic which I can't really explain/justify.
On 3/7/25 07:58, Florent Revest wrote:
One thing I'm not entirely sure about is that for_each_node_with_cpus() is implemented on top of for_each_online_node(). This differs from the current code which uses for_each_node(). I can't tell if iterating over offline nodes is a bug or a feature of load_microcode_amd() so this would be an extra change to the business logic which I can't really explain/justify.
Actually, the per-node caches seem to have gone away at some point too. Boris would know the history. This might need a a cleanup like Boris alluded to in 05e91e7211383. This might not even need a nid loop.
On Fri, Mar 07, 2025 at 08:32:20AM -0800, Dave Hansen wrote:
On 3/7/25 07:58, Florent Revest wrote:
One thing I'm not entirely sure about is that for_each_node_with_cpus() is implemented on top of for_each_online_node(). This differs from the current code which uses for_each_node(). I can't tell if iterating over offline nodes is a bug
You better not have offlined nodes when applying microcode. The path you're landing in here has already hotplug disabled, tho.
or a feature of load_microcode_amd() so this would be an extra change to the business logic which I can't really explain/justify.
Actually, the per-node caches seem to have gone away at some point too. Boris would know the history. This might need a a cleanup like Boris alluded to in 05e91e7211383. This might not even need a nid loop.
Nah, the cache is still there. For now...
for_each_node_with_cpus() should simply work unless I'm missing some other angle...
On Fri, Mar 7, 2025 at 5:44 PM Borislav Petkov bp@alien8.de wrote:
On Fri, Mar 07, 2025 at 08:32:20AM -0800, Dave Hansen wrote:
On 3/7/25 07:58, Florent Revest wrote:
One thing I'm not entirely sure about is that for_each_node_with_cpus() is implemented on top of for_each_online_node(). This differs from the current code which uses for_each_node(). I can't tell if iterating over offline nodes is a bug
You better not have offlined nodes when applying microcode. The path you're landing in here has already hotplug disabled, tho.
or a feature of load_microcode_amd() so this would be an extra change to the business logic which I can't really explain/justify.
Actually, the per-node caches seem to have gone away at some point too. Boris would know the history. This might need a a cleanup like Boris alluded to in 05e91e7211383. This might not even need a nid loop.
Nah, the cache is still there. For now...
for_each_node_with_cpus() should simply work unless I'm missing some other angle...
Awesome - thank you both! I'll send a v2 using for_each_node_with_cpus() ... On Monday :) Have a good weekend!
…
This patch checks that a NUMA node …
Please improve such a change description another bit.
See also: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/...
Regards, Markus
On Fri, Mar 07, 2025 at 08:18:09PM +0100, Markus Elfring wrote:
…
This patch checks that a NUMA node …
Please improve such a change description another bit.
See also: https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/...
Regards, Markus
Hi,
This is the semi-friendly patch-bot of Greg Kroah-Hartman.
Markus, you seem to have sent a nonsensical or otherwise pointless review comment to a patch submission on a Linux kernel developer mailing list. I strongly suggest that you not do this anymore. Please do not bother developers who are actively working to produce patches and features with comments that, in the end, are a waste of time.
Patch submitter, please ignore Markus's suggestion; you do not need to follow it at all. The person/bot/AI that sent it is being ignored by almost all Linux kernel maintainers for having a persistent pattern of behavior of producing distracting and pointless commentary, and inability to adapt to feedback. Please feel free to also ignore emails from them.
thanks,
greg k-h's patch email bot
linux-stable-mirror@lists.linaro.org