Hi,
This is v3 of a patchset to set the number of cache leaves independently for each CPU. v1 and v2 can be found here [1] and here [2].
Changes since v2: * This version uncovered a NULL-pointer dereference in recent changes to cacheinfo[3]. This dereference is observed when the system does not configure cacheinfo early during boot nor makes corrections later during CPU hotplug; as is the case in x86. Patch 1 fixes this issue.
Changes since v1: * Dave Hansen suggested to use the existing per-CPU ci_cpu_cacheinfo variable. Now the global variable num_cache_leaves became useless. * While here, I noticed that init_cache_level() also became useless: x86 does not need ci_cpu_cacheinfo::num_levels.
[1]. https://lore.kernel.org/lkml/20230314231658.30169-1-ricardo.neri-calderon@li... [2]. https://lore.kernel.org/all/20230424001956.21434-1-ricardo.neri-calderon@lin... [3]. https://lore.kernel.org/all/20230412185759.755408-1-rrendec@redhat.com/
Ricardo Neri (3): cacheinfo: Allocate memory for memory if not done from the primary CPU x86/cacheinfo: Delete global num_cache_leaves x86/cacheinfo: Clean out init_cache_level()
arch/x86/kernel/cpu/cacheinfo.c | 50 ++++++++++++++++----------------- drivers/base/cacheinfo.c | 6 +++- 2 files changed, 30 insertions(+), 26 deletions(-)
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed. --- Changes since v2: * Introduced this patch.
Changes since v1: * N/A --- drivers/base/cacheinfo.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index cbae8be1fe52..461a77ece4b0 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -554,7 +554,11 @@ static inline int init_level_allocate_ci(unsigned int cpu) */ ci_cacheinfo(cpu)->early_ci_levels = false;
- if (cache_leaves(cpu) <= early_leaves) + /* + * Some architectures (e.g., x86) do not use early initialization. + * Allocate memory now in such case. + */ + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu)) return 0;
kfree(per_cpu_cacheinfo(cpu));
On Fri, 2023-08-04 at 18:24 -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Changes since v2: * Introduced this patch.
Changes since v1: * N/A
drivers/base/cacheinfo.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index cbae8be1fe52..461a77ece4b0 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -554,7 +554,11 @@ static inline int init_level_allocate_ci(unsigned int cpu) */ ci_cacheinfo(cpu)->early_ci_levels = false; - if (cache_leaves(cpu) <= early_leaves) + /* + * Some architectures (e.g., x86) do not use early initialization. + * Allocate memory now in such case. + */ + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu)) return 0; kfree(per_cpu_cacheinfo(cpu));
For this patch only:
Reviewed-by: Radu Rendec rrendec@redhat.com
Thanks for submitting!
Best regards, Radu
On Sat, Aug 05, 2023 at 10:28:30AM -0400, Radu Rendec wrote:
On Fri, 2023-08-04 at 18:24 -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Changes since v2: * Introduced this patch.
Changes since v1: * N/A
drivers/base/cacheinfo.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index cbae8be1fe52..461a77ece4b0 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -554,7 +554,11 @@ static inline int init_level_allocate_ci(unsigned int cpu) */ ci_cacheinfo(cpu)->early_ci_levels = false; - if (cache_leaves(cpu) <= early_leaves) + /* + * Some architectures (e.g., x86) do not use early initialization. + * Allocate memory now in such case. + */ + if (cache_leaves(cpu) <= early_leaves && per_cpu_cacheinfo(cpu)) return 0; kfree(per_cpu_cacheinfo(cpu));
For this patch only:
Reviewed-by: Radu Rendec rrendec@redhat.com
Thanks for submitting!
Thank you!
On Fri, Aug 04, 2023 at 06:24:19PM -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Not sure if we strictly need this(details below), but I am fine either way.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Right, this is the information I have been asking in the previous versions. This clarifies a lot. The trigger is in the patch 2/3 which is why it didn't make complete sense to me without it when you posted this patch independently. Thanks for posting it together and sorry for the delay(both reviewing this and in understanding the issue).
Given the trigger for NULL pointer dereference is in 2/3, I am not sure if it is really worth applying this to all the stable kernels with the commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU"). That is the reason why I asked to drop fixes tag if you agree with me. It is simple fix, so I am OK if you prefer to see that in the stable kernels as well.
Since there are x86 changes and patch 2/3 triggers NULL pointer dereference without this patch, I prefer you route all 3 via x86. So,
Reviewed-by: Sudeep Holla sudeep.holla@arm.com
On Wed, 2023-08-30 at 12:49 +0100, Sudeep Holla wrote:
On Fri, Aug 04, 2023 at 06:24:19PM -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Not sure if we strictly need this(details below), but I am fine either way.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Right, this is the information I have been asking in the previous versions. This clarifies a lot. The trigger is in the patch 2/3 which is why it didn't make complete sense to me without it when you posted this patch independently. Thanks for posting it together and sorry for the delay(both reviewing this and in understanding the issue).
Given the trigger for NULL pointer dereference is in 2/3, I am not sure if it is really worth applying this to all the stable kernels with the commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU"). That is the reason why I asked to drop fixes tag if you agree with me. It is simple fix, so I am OK if you prefer to see that in the stable kernels as well.
Thanks for reviewing, Sudeep. Since my previous commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") opens a door for the NULL pointer dereference, I would sleep better at night if the fix was included in the stable kernels :) But seriously, I am concerned that with the fix applied in mainline and not in stable, something else could be backported to the stable in the future, that could trigger the NULL pointer dereference there. Ricardo's patch 2/3 is one way to trigger it, but you never know what other patch lands in mainline in the future that assumes it's safe to set the cache leaves earlier.
Since there are x86 changes and patch 2/3 triggers NULL pointer dereference without this patch, I prefer you route all 3 via x86. So,
Reviewed-by: Sudeep Holla sudeep.holla@arm.com
-- Regards, Radu
On Wed, Aug 30, 2023 at 08:13:09AM -0400, Radu Rendec wrote:
On Wed, 2023-08-30 at 12:49 +0100, Sudeep Holla wrote:
On Fri, Aug 04, 2023 at 06:24:19PM -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Not sure if we strictly need this(details below), but I am fine either way.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Right, this is the information I have been asking in the previous versions. This clarifies a lot. The trigger is in the patch 2/3 which is why it didn't make complete sense to me without it when you posted this patch independently. Thanks for posting it together and sorry for the delay(both reviewing this and in understanding the issue).
Given the trigger for NULL pointer dereference is in 2/3, I am not sure if it is really worth applying this to all the stable kernels with the commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU"). That is the reason why I asked to drop fixes tag if you agree with me. It is simple fix, so I am OK if you prefer to see that in the stable kernels as well.
Thanks for reviewing, Sudeep. Since my previous commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") opens a door for the NULL pointer dereference, I would sleep better at night if the fix was included in the stable kernels :) But seriously, I am concerned that with the fix applied in mainline and not in stable, something else could be backported to the stable in the future, that could trigger the NULL pointer dereference there. Ricardo's patch 2/3 is one way to trigger it, but you never know what other patch lands in mainline in the future that assumes it's safe to set the cache leaves earlier.
Fair enough. I agree with you, so please retain the fixes tag as is. Please work with x86 maintainers to get it merged along with other patches. Let me know if you have other plans.
On Wed, 2023-08-30 at 16:47 +0100, Sudeep Holla wrote:
On Wed, Aug 30, 2023 at 08:13:09AM -0400, Radu Rendec wrote:
On Wed, 2023-08-30 at 12:49 +0100, Sudeep Holla wrote:
On Fri, Aug 04, 2023 at 06:24:19PM -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Not sure if we strictly need this(details below), but I am fine either way.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Right, this is the information I have been asking in the previous versions. This clarifies a lot. The trigger is in the patch 2/3 which is why it didn't make complete sense to me without it when you posted this patch independently. Thanks for posting it together and sorry for the delay(both reviewing this and in understanding the issue).
Given the trigger for NULL pointer dereference is in 2/3, I am not sure if it is really worth applying this to all the stable kernels with the commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU"). That is the reason why I asked to drop fixes tag if you agree with me. It is simple fix, so I am OK if you prefer to see that in the stable kernels as well.
Thanks for reviewing, Sudeep. Since my previous commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") opens a door for the NULL pointer dereference, I would sleep better at night if the fix was included in the stable kernels :) But seriously, I am concerned that with the fix applied in mainline and not in stable, something else could be backported to the stable in the future, that could trigger the NULL pointer dereference there. Ricardo's patch 2/3 is one way to trigger it, but you never know what other patch lands in mainline in the future that assumes it's safe to set the cache leaves earlier.
Fair enough. I agree with you, so please retain the fixes tag as is. Please work with x86 maintainers to get it merged along with other patches. Let me know if you have other plans.
Thanks, Sudeep. Technically, these are Ricardo's patches, so I will let him engage with the x86 maintainers and drive the integration work. But the plan looks good to me, and I will stand by and offer any support may be needed for the fix patch.
-- Regards, Radu
On Wed, Aug 30, 2023 at 12:45:22PM -0400, Radu Rendec wrote:
On Wed, 2023-08-30 at 16:47 +0100, Sudeep Holla wrote:
On Wed, Aug 30, 2023 at 08:13:09AM -0400, Radu Rendec wrote:
On Wed, 2023-08-30 at 12:49 +0100, Sudeep Holla wrote:
On Fri, Aug 04, 2023 at 06:24:19PM -0700, Ricardo Neri wrote:
Commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") adds functionality that architectures can use to optionally allocate and build cacheinfo early during boot. Commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") lets secondary CPUs correct (and reallocate memory) cacheinfo data if needed.
If the early build functionality is not used and cacheinfo does not need correction, memory for cacheinfo is never allocated. x86 does not use the early build functionality. Consequently, during the cacheinfo CPU hotplug callback, last_level_cache_is_valid() attempts to dereference a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not present page PGD 0 P4D 0 Oops: 0000 [#1] PREEPMT SMP NOPTI CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1 RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback if not done earlier.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Acked-by: Len Brown len.brown@intel.com Fixes: 6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
Not sure if we strictly need this(details below), but I am fine either way.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com
The motivation for commit 5944ce092b97 was to prevent a BUG splat in PREEMPT_RT kernels during memory allocation. This splat is not observed on x86 because the memory allocation for cacheinfo happens in detect_cache_attributes() from the cacheinfo CPU hotplug callback.
The dereference of a NULL pointer is not observed today because cache_leaves(cpu) is zero until after init_cache_level() is called (also during the CPU hotplug callback). Patch2 will set it earlier and the NULL- pointer dereference will be observed.
Right, this is the information I have been asking in the previous versions. This clarifies a lot. The trigger is in the patch 2/3 which is why it didn't make complete sense to me without it when you posted this patch independently. Thanks for posting it together and sorry for the delay(both reviewing this and in understanding the issue).
Given the trigger for NULL pointer dereference is in 2/3, I am not sure if it is really worth applying this to all the stable kernels with the commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU"). That is the reason why I asked to drop fixes tag if you agree with me. It is simple fix, so I am OK if you prefer to see that in the stable kernels as well.
Thanks for reviewing, Sudeep. Since my previous commit 6539cffa9495 ("cacheinfo: Add arch specific early level initializer") opens a door for the NULL pointer dereference, I would sleep better at night if the fix was included in the stable kernels :) But seriously, I am concerned that with the fix applied in mainline and not in stable, something else could be backported to the stable in the future, that could trigger the NULL pointer dereference there. Ricardo's patch 2/3 is one way to trigger it, but you never know what other patch lands in mainline in the future that assumes it's safe to set the cache leaves earlier.
Fair enough. I agree with you, so please retain the fixes tag as is. Please work with x86 maintainers to get it merged along with other patches. Let me know if you have other plans.
Thanks, Sudeep. Technically, these are Ricardo's patches, so I will let him engage with the x86 maintainers and drive the integration work. But the plan looks good to me, and I will stand by and offer any support may be needed for the fix patch.
Thank you very much Sudeep and Radu for your feedback and review! The x86 maintainers are in the To: field of this patchset.
The patches apply cleanly on top of the latest tip/master, but not on the latest rework of the topology evaluation from Thomas. Then I am not sure when/if this patchset will be merged.
Thanks and BR, Ricardo
Linux remembers cpu_cachinfo::num_leaves per CPU, but x86 initializes all CPUs from the same global "num_cache_leaves".
This is erroneous on systems like Meteor Lake, which has different num_leaves per CPU. Delete the global "num_cache_leaves" and initialize num_leaves accurately on each CPU.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Reviewed-by: Len Brown len.brown@intel.com Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- After this change, all CPUs will traverse CPUID leaf 0x4 when booted for the first time. On systems with symmetric cache topologies this is useless work.
Creating a list of processor models that have asymmetric cache topologies was considered. The burden of maintaining such list would outweigh the performance benefit of skipping this extra step. --- Changes since v2: * None
Changes since v1: * Do not make num_cache_leaves a per-CPU variable. Instead, reuse the existing per-CPU ci_cpu_cacheinfo variable. (Dave Hansen) --- arch/x86/kernel/cpu/cacheinfo.c | 45 ++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index 8f86eacf69f7..b4334c529231 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -178,7 +178,16 @@ struct _cpuid4_info_regs { struct amd_northbridge *nb; };
-static unsigned short num_cache_leaves; +static inline unsigned int get_num_cache_leaves(unsigned int cpu) +{ + return get_cpu_cacheinfo(cpu)->num_leaves; +} + +static inline void +set_num_cache_leaves(unsigned int nr_leaves, unsigned int cpu) +{ + get_cpu_cacheinfo(cpu)->num_leaves = nr_leaves; +}
/* AMD doesn't have CPUID4. Emulate it here to report the same information to the user. This makes some assumptions about the machine: @@ -718,19 +727,21 @@ void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu) void init_amd_cacheinfo(struct cpuinfo_x86 *c) {
+ unsigned int cpu = c->cpu_index; + if (boot_cpu_has(X86_FEATURE_TOPOEXT)) { - num_cache_leaves = find_num_cache_leaves(c); + set_num_cache_leaves(find_num_cache_leaves(c), cpu); } else if (c->extended_cpuid_level >= 0x80000006) { if (cpuid_edx(0x80000006) & 0xf000) - num_cache_leaves = 4; + set_num_cache_leaves(4, cpu); else - num_cache_leaves = 3; + set_num_cache_leaves(3, cpu); } }
void init_hygon_cacheinfo(struct cpuinfo_x86 *c) { - num_cache_leaves = find_num_cache_leaves(c); + set_num_cache_leaves(find_num_cache_leaves(c), c->cpu_index); }
void init_intel_cacheinfo(struct cpuinfo_x86 *c) @@ -740,24 +751,21 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) unsigned int new_l1d = 0, new_l1i = 0; /* Cache sizes from cpuid(4) */ unsigned int new_l2 = 0, new_l3 = 0, i; /* Cache sizes from cpuid(4) */ unsigned int l2_id = 0, l3_id = 0, num_threads_sharing, index_msb; -#ifdef CONFIG_SMP unsigned int cpu = c->cpu_index; -#endif
if (c->cpuid_level > 3) { - static int is_initialized; - - if (is_initialized == 0) { - /* Init num_cache_leaves from boot CPU */ - num_cache_leaves = find_num_cache_leaves(c); - is_initialized++; - } + /* + * There should be at least one leaf. A non-zero value means + * that the number of leaves has been initialized. + */ + if (!get_num_cache_leaves(cpu)) + set_num_cache_leaves(find_num_cache_leaves(c), cpu);
/* * Whenever possible use cpuid(4), deterministic cache * parameters cpuid leaf to find the cache details */ - for (i = 0; i < num_cache_leaves; i++) { + for (i = 0; i < get_num_cache_leaves(cpu); i++) { struct _cpuid4_info_regs this_leaf = {}; int retval;
@@ -793,14 +801,14 @@ void init_intel_cacheinfo(struct cpuinfo_x86 *c) * Don't use cpuid2 if cpuid4 is supported. For P4, we use cpuid2 for * trace cache */ - if ((num_cache_leaves == 0 || c->x86 == 15) && c->cpuid_level > 1) { + if ((!get_num_cache_leaves(cpu) || c->x86 == 15) && c->cpuid_level > 1) { /* supports eax=2 call */ int j, n; unsigned int regs[4]; unsigned char *dp = (unsigned char *)regs; int only_trace = 0;
- if (num_cache_leaves != 0 && c->x86 == 15) + if (get_num_cache_leaves(cpu) && c->x86 == 15) only_trace = 1;
/* Number of times to iterate */ @@ -1002,12 +1010,9 @@ int init_cache_level(unsigned int cpu) { struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu);
- if (!num_cache_leaves) - return -ENOENT; if (!this_cpu_ci) return -EINVAL; this_cpu_ci->num_levels = 3; - this_cpu_ci->num_leaves = num_cache_leaves; return 0; }
init_cache_level() no longer has a purpose on x86. It no longer needs to set num_leaves, and it never had to set num_levels, which was unnecessary on x86.
Replace it with "return 0" simply to override the weak function, which would return an error.
Cc: Andreas Herrmann aherrmann@suse.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Chen Yu yu.c.chen@intel.com Cc: Len Brown len.brown@intel.com Cc: Radu Rendec rrendec@redhat.com Cc: Pierre Gondois Pierre.Gondois@arm.com Cc: Pu Wen puwen@hygon.cn Cc: "Rafael J. Wysocki" rafael.j.wysocki@intel.com Cc: Sudeep Holla sudeep.holla@arm.com Cc: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: Will Deacon will@kernel.org Cc: Zhang Rui rui.zhang@intel.com Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Reviewed-by: Len Brown len.brown@intel.com Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com --- Changes since v2: * None
Changes since v1: * Introduced this patch. --- arch/x86/kernel/cpu/cacheinfo.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c index b4334c529231..46a4a14fc96a 100644 --- a/arch/x86/kernel/cpu/cacheinfo.c +++ b/arch/x86/kernel/cpu/cacheinfo.c @@ -1008,11 +1008,6 @@ static void ci_leaf_init(struct cacheinfo *this_leaf,
int init_cache_level(unsigned int cpu) { - struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); - - if (!this_cpu_ci) - return -EINVAL; - this_cpu_ci->num_levels = 3; return 0; }
On Fri, Aug 04, 2023 at 06:24:18PM -0700, Ricardo Neri wrote:
Hi,
Hi Ricardo,
This is v3 of a patchset to set the number of cache leaves independently for each CPU. v1 and v2 can be found here [1] and here [2].
I am on CC of your patch set and glanced through it. Long ago I've touched related code but now I am not really up-to-date to do a qualified review in this area. First, I would have to look into documentation to refresh my memory etc. pp.
I've not seen (or it escaped me) information that this was tested on a variety of machines that might be affected by this change. And there are no Tested-by-tags.
Even if changes look simple and reasonable they can cause issues.
Thus from my POV it would be good to have some information what tests were done. I am not asking to test on all possible systems but just knowing which system(s) was (were) used for functional testing is of value.
Changes since v2:
- This version uncovered a NULL-pointer dereference in recent changes to cacheinfo[3]. This dereference is observed when the system does not configure cacheinfo early during boot nor makes corrections later during CPU hotplug; as is the case in x86. Patch 1 fixes this issue.
Changes since v1:
- Dave Hansen suggested to use the existing per-CPU ci_cpu_cacheinfo variable. Now the global variable num_cache_leaves became useless.
- While here, I noticed that init_cache_level() also became useless: x86 does not need ci_cpu_cacheinfo::num_levels.
[1]. https://lore.kernel.org/lkml/20230314231658.30169-1-ricardo.neri-calderon@li... [2]. https://lore.kernel.org/all/20230424001956.21434-1-ricardo.neri-calderon@lin... [3]. https://lore.kernel.org/all/20230412185759.755408-1-rrendec@redhat.com/
Ricardo Neri (3): cacheinfo: Allocate memory for memory if not done from the primary CPU x86/cacheinfo: Delete global num_cache_leaves x86/cacheinfo: Clean out init_cache_level()
arch/x86/kernel/cpu/cacheinfo.c | 50 ++++++++++++++++----------------- drivers/base/cacheinfo.c | 6 +++- 2 files changed, 30 insertions(+), 26 deletions(-)
-- 2.25.1
On Fri, Sep 01, 2023 at 08:50:31AM +0200, Andreas Herrmann wrote:
On Fri, Aug 04, 2023 at 06:24:18PM -0700, Ricardo Neri wrote:
Hi,
Hi Ricardo,
This is v3 of a patchset to set the number of cache leaves independently for each CPU. v1 and v2 can be found here [1] and here [2].
I am on CC of your patch set and glanced through it. Long ago I've touched related code but now I am not really up-to-date to do a qualified review in this area. First, I would have to look into documentation to refresh my memory etc. pp.
I've not seen (or it escaped me) information that this was tested on a variety of machines that might be affected by this change. And there are no Tested-by-tags.
Even if changes look simple and reasonable they can cause issues.
Thus from my POV it would be good to have some information what tests were done. I am not asking to test on all possible systems but just knowing which system(s) was (were) used for functional testing is of value.
Doing a good review -- trying to catch every flaw -- is really hard to do. Especially when you are not actively doing development work in an area.
For example see
commit e33b93650fc5 ("blk-iocost: Pass gendisk to ioc_refresh_params") [Breno Leitao leitao@debian.org, Tue Feb 28 03:16:54 2023 -0800]
This fixes commit
ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful") [Christoph Hellwig hch@lst.de, Fri Feb 3 16:03:54 2023 +0100]
I had reviewed the latter (see https://marc.info/?i=Y8plg6OAa4lrnyZZ@suselix) and the entire patch series. I've compared the original code with the patch and walked through every single hunk of the diff and tried to think it through. The changes made sense to me. Then came the bug report(s) and I felt that I had failed tremendously. To err is human.
What this shows (and it is already known): with every patch new errors are potentially introduced in the kernel. Functional, and higher level testing can help to spot them before a kernel is deployed in the field.
At a higher level view this proves another thing. Linux kernel development is a transparent example of "peer-review-process".
In our scientific age it is often postulated that peer review is the way to go[1] and that it kind of guarantees that what's published, or rolled out, is reasonable and "works".
The above sample shows that this process will not catch all flaws and that proper, transparent and reliable tests are required before anything is deployed in the field.
This is true for every branch of science.
If it is purposely not done something is fishy.
[1] Also some state that it is "the only way to go" and every thing figured out without a peer-review-process is false and can't be trusted. Of course this is a false statement.
On Fri, Sep 01, 2023 at 09:52:54AM +0200, Andreas Herrmann wrote:
On Fri, Sep 01, 2023 at 08:50:31AM +0200, Andreas Herrmann wrote:
On Fri, Aug 04, 2023 at 06:24:18PM -0700, Ricardo Neri wrote:
Hi,
Hi Ricardo,
This is v3 of a patchset to set the number of cache leaves independently for each CPU. v1 and v2 can be found here [1] and here [2].
I am on CC of your patch set and glanced through it. Long ago I've touched related code but now I am not really up-to-date to do a qualified review in this area. First, I would have to look into documentation to refresh my memory etc. pp.
I've not seen (or it escaped me) information that this was tested on a variety of machines that might be affected by this change. And there are no Tested-by-tags.
Even if changes look simple and reasonable they can cause issues.
Thus from my POV it would be good to have some information what tests were done. I am not asking to test on all possible systems but just knowing which system(s) was (were) used for functional testing is of value.
Doing a good review -- trying to catch every flaw -- is really hard to do. Especially when you are not actively doing development work in an area.
For example see
commit e33b93650fc5 ("blk-iocost: Pass gendisk to ioc_refresh_params") [Breno Leitao leitao@debian.org, Tue Feb 28 03:16:54 2023 -0800]
This fixes commit
ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful") [Christoph Hellwig hch@lst.de, Fri Feb 3 16:03:54 2023 +0100]
I had reviewed the latter (see https://marc.info/?i=Y8plg6OAa4lrnyZZ@suselix) and the entire patch series. I've compared the original code with the patch and walked through every single hunk of the diff and tried to think it through. The changes made sense to me. Then came the bug report(s) and I felt that I had failed tremendously. To err is human.
What this shows (and it is already known): with every patch new errors are potentially introduced in the kernel. Functional, and higher level testing can help to spot them before a kernel is deployed in the field.
At a higher level view this proves another thing. Linux kernel development is a transparent example of "peer-review-process".
In our scientific age it is often postulated that peer review is the way to go[1] and that it kind of guarantees that what's published, or rolled out, is reasonable and "works".
The above sample shows that this process will not catch all flaws and that proper, transparent and reliable tests are required before anything is deployed in the field.
This is true for every branch of science.
If it is purposely not done something is fishy.
[1] Also some state that it is "the only way to go" and every thing figured out without a peer-review-process is false and can't be trusted. Of course this is a false statement.
Hi Andreas,
Agreed. Testing is important. For the specific case of these patches, I booted CONFIG_PREEMPT_RT and !CONFIG_PREEMPT_RT kernels. Then I a) Ensured that the splat reported in commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") was not observed.
b) Ensured that /sys/devices/system/cpu/cpuX/cache is present.
c) Ensured that the contents /sys/devices/system/cpu/cpuX/cache is the same before and after my patches.
I tested on the following systems: Intel Alder Lake, Intel Meteor Lake, 2-socket Intel Icelake server, 2-socket Intel Cascade Lake server, 2-socket Intel Skylake server, 4-socket Intel Broadwell server, 2-socket Intel Haswell server, 2-socket AMD Rome server, and 2-socket AMD Milan server.
Thanks and BR, Ricardo
On Mon, Sep 11, 2023 at 08:23:50PM -0700, Ricardo Neri wrote:
Hi Andreas,
Agreed. Testing is important. For the specific case of these patches, I booted CONFIG_PREEMPT_RT and !CONFIG_PREEMPT_RT kernels. Then I a) Ensured that the splat reported in commit 5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU") was not observed.
b) Ensured that /sys/devices/system/cpu/cpuX/cache is present.
c) Ensured that the contents /sys/devices/system/cpu/cpuX/cache is the same before and after my patches.
I tested on the following systems: Intel Alder Lake, Intel Meteor Lake, 2-socket Intel Icelake server, 2-socket Intel Cascade Lake server, 2-socket Intel Skylake server, 4-socket Intel Broadwell server, 2-socket Intel Haswell server, 2-socket AMD Rome server, and 2-socket AMD Milan server.
Thanks and BR, Ricardo
Thanks for all the tests and info.
linux-stable-mirror@lists.linaro.org