Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which can require large contiguous memory (up to order=9) depending on the implementation. This change prevents allocation failures by allowing the system to fall back to vmalloc when contiguous memory allocation fails.
Since this buffer is only used for debugging purposes, physical memory contiguity is not required, making vmalloc a suitable alternative.
Cc: stable@vger.kernel.org Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit") Suggested-by: Rik van Riel riel@surriel.com Signed-off-by: Breno Leitao leitao@debian.org Acked-by: Andrea Righi arighi@nvidia.com --- Changes in v2: - Use kvfree() on the free path as well. - Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org --- kernel/sched/ext.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 66bcd40a28ca1..db9af6a3c04fd 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass)
static void free_exit_info(struct scx_exit_info *ei) { - kfree(ei->dump); + kvfree(ei->dump); kfree(ei->msg); kfree(ei->bt); kfree(ei); @@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len)
ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL); ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL); - ei->dump = kzalloc(exit_dump_len, GFP_KERNEL); + ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
if (!ei->bt || !ei->msg || !ei->dump) { free_exit_info(ei);
--- base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250407-scx-11dbf94803c3
Best regards,
Hi Breno,
I already acked even the buggy version, so this one looks good. :)
On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which can require large contiguous memory (up to order=9) depending on the
BTW, from where this order=9 is coming from? exit_dump_len is 32K by default, but a BPF scheduler can arbitrarily set it to any value via ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
Thanks, -Andrea
implementation. This change prevents allocation failures by allowing the system to fall back to vmalloc when contiguous memory allocation fails.
Since this buffer is only used for debugging purposes, physical memory contiguity is not required, making vmalloc a suitable alternative.
Cc: stable@vger.kernel.org Fixes: 07814a9439a3b0 ("sched_ext: Print debug dump after an error exit") Suggested-by: Rik van Riel riel@surriel.com Signed-off-by: Breno Leitao leitao@debian.org Acked-by: Andrea Righi arighi@nvidia.com
Changes in v2:
- Use kvfree() on the free path as well.
- Link to v1: https://lore.kernel.org/r/20250407-scx-v1-1-774ba74a2c17@debian.org
kernel/sched/ext.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 66bcd40a28ca1..db9af6a3c04fd 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4623,7 +4623,7 @@ static void scx_ops_bypass(bool bypass) static void free_exit_info(struct scx_exit_info *ei) {
- kfree(ei->dump);
- kvfree(ei->dump); kfree(ei->msg); kfree(ei->bt); kfree(ei);
@@ -4639,7 +4639,7 @@ static struct scx_exit_info *alloc_exit_info(size_t exit_dump_len) ei->bt = kcalloc(SCX_EXIT_BT_LEN, sizeof(ei->bt[0]), GFP_KERNEL); ei->msg = kzalloc(SCX_EXIT_MSG_LEN, GFP_KERNEL);
- ei->dump = kzalloc(exit_dump_len, GFP_KERNEL);
- ei->dump = kvzalloc(exit_dump_len, GFP_KERNEL);
if (!ei->bt || !ei->msg || !ei->dump) { free_exit_info(ei);
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8 change-id: 20250407-scx-11dbf94803c3
Best regards,
Breno Leitao leitao@debian.org
Hello Andrea,
On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
Hi Breno,
I already acked even the buggy version, so this one looks good. :)
On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which can require large contiguous memory (up to order=9) depending on the
BTW, from where this order=9 is coming from? exit_dump_len is 32K by default, but a BPF scheduler can arbitrarily set it to any value via ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
You are absolutely correct, this allocation could be of any size.
I've got this problem because I was monitoring the Meta fleet, and saw a bunch of allocation failures and decided to investigate. In this case specifically, the users were using order=9 (512 pages), but, again, this could be even bigger.
Thanks for the review, --breno
On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote:
Hello Andrea,
On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
Hi Breno,
I already acked even the buggy version, so this one looks good. :)
On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which can require large contiguous memory (up to order=9) depending on the
BTW, from where this order=9 is coming from? exit_dump_len is 32K by default, but a BPF scheduler can arbitrarily set it to any value via ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
You are absolutely correct, this allocation could be of any size.
I've got this problem because I was monitoring the Meta fleet, and saw a bunch of allocation failures and decided to investigate. In this case specifically, the users were using order=9 (512 pages), but, again, this could be even bigger.
I see, makes sense. Maybe we can rephrase this part to not mention the order=9 allocation and avoid potential confusion.
Thanks, -Andrea
On Tue, Apr 08, 2025 at 03:12:43PM +0200, Andrea Righi wrote:
On Tue, Apr 08, 2025 at 05:17:16AM -0700, Breno Leitao wrote:
Hello Andrea,
On Tue, Apr 08, 2025 at 01:30:32PM +0200, Andrea Righi wrote:
Hi Breno,
I already acked even the buggy version, so this one looks good. :)
On Tue, Apr 08, 2025 at 04:09:02AM -0700, Breno Leitao wrote:
Replace kzalloc with kvzalloc for the exit_dump buffer allocation, which can require large contiguous memory (up to order=9) depending on the
BTW, from where this order=9 is coming from? exit_dump_len is 32K by default, but a BPF scheduler can arbitrarily set it to any value via ops->exit_dump_len, so it could be even bigger than an order 9 allocation.
You are absolutely correct, this allocation could be of any size.
I've got this problem because I was monitoring the Meta fleet, and saw a bunch of allocation failures and decided to investigate. In this case specifically, the users were using order=9 (512 pages), but, again, this could be even bigger.
I see, makes sense. Maybe we can rephrase this part to not mention the order=9 allocation and avoid potential confusion.
Sure! I will send a v3 later today, then.
linux-stable-mirror@lists.linaro.org