From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } }
@@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } }
+err_powerup: rproc_del(rproc); err_add: k3_r5_reserved_mem_exit(kproc);
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH] remoteproc: k3-r5: Fix error handling when power-up failed Link: https://lore.kernel.org/stable/9f481156-f220-4adf-b3d9-670871351e26%40siemen...
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name);
return ret;
} }goto err_powerup;
@@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
err_add: k3_r5_reserved_mem_exit(kproc);
On 21.08.24 07:30, Beleswar Prasad Padhi wrote:
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one. Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally.
I'll stop my whac-a-mole. Someone needs to sit down and do that for the complete code consistently. And test the error cases.
Jan
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
err_add: k3_r5_reserved_mem_exit(kproc);
On 21-08-2024 23:40, Jan Kiszka wrote:
On 21.08.24 07:30, Beleswar Prasad Padhi wrote:
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one.
You are correct. Unfortunately, the organic growth of this driver has resulted in a need to refactor. I plan on doing this and post the refactoring soon. This should be part of the overall refactoring as suggested by Mathieu[2]. But for the immediate problem, your fix does patch things up.. hence:
Acked-by: Beleswar Padhi b-padhi@ti.com
[2]: https://lore.kernel.org/all/Zr4w8Vj0mVo5sBsJ@p14s/
Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally.
Curious, Could you point out how does the change in [1] breaks current error handling?
I'll stop my whac-a-mole. Someone needs to sit down and do that for the complete code consistently. And test the error cases.
Jan
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
err_add: k3_r5_reserved_mem_exit(kproc);
On 22.08.24 07:22, Beleswar Prasad Padhi wrote:
On 21-08-2024 23:40, Jan Kiszka wrote:
On 21.08.24 07:30, Beleswar Prasad Padhi wrote:
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one.
You are correct. Unfortunately, the organic growth of this driver has resulted in a need to refactor. I plan on doing this and post the refactoring soon. This should be part of the overall refactoring as suggested by Mathieu[2]. But for the immediate problem, your fix does patch things up.. hence:
Acked-by: Beleswar Padhi b-padhi@ti.com
Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally.
Curious, Could you point out how does the change in [1] breaks current error handling?
Same story: You leave the inner loop of k3_r5_cluster_rproc_init() via return without that loop having been converted to support this.
Jan
On 22-08-2024 10:57, Jan Kiszka wrote:
On 22.08.24 07:22, Beleswar Prasad Padhi wrote:
On 21-08-2024 23:40, Jan Kiszka wrote:
On 21.08.24 07:30, Beleswar Prasad Padhi wrote:
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one.
You are correct. Unfortunately, the organic growth of this driver has resulted in a need to refactor. I plan on doing this and post the refactoring soon. This should be part of the overall refactoring as suggested by Mathieu[2]. But for the immediate problem, your fix does patch things up.. hence:
Acked-by: Beleswar Padhi b-padhi@ti.com
Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally.
Curious, Could you point out how does the change in [1] breaks current error handling?
Same story: You leave the inner loop of k3_r5_cluster_rproc_init() via return without that loop having been converted to support this.
The rproc has been allocated via devm_rproc_alloc[3] before the return[4] at k3_r5_cluster_rproc_init. Thus, it is capable of freeing the rproc just based on error codes. It was tested. [3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/dri... [4]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/dri...
Jan
On 22.08.24 07:42, Beleswar Prasad Padhi wrote:
On 22-08-2024 10:57, Jan Kiszka wrote:
On 22.08.24 07:22, Beleswar Prasad Padhi wrote:
On 21-08-2024 23:40, Jan Kiszka wrote:
On 21.08.24 07:30, Beleswar Prasad Padhi wrote:
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one.
You are correct. Unfortunately, the organic growth of this driver has resulted in a need to refactor. I plan on doing this and post the refactoring soon. This should be part of the overall refactoring as suggested by Mathieu[2]. But for the immediate problem, your fix does patch things up.. hence:
Acked-by: Beleswar Padhi b-padhi@ti.com
Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally.
Curious, Could you point out how does the change in [1] breaks current error handling?
Same story: You leave the inner loop of k3_r5_cluster_rproc_init() via return without that loop having been converted to support this.
The rproc has been allocated via devm_rproc_alloc[3] before the
This is insufficient. Study the code again what it currently does to role back. I'm not saying that this is the only way to do it, but you need to change the code FIRST before introducing direct returns. And once you can do that, you should obviously replace the existing gotos as well.
Jan
return[4] at k3_r5_cluster_rproc_init. Thus, it is capable of freeing the rproc just based on error codes. It was tested. [3]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/dri... [4]: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/dri...
Jan
On Thu, Aug 22, 2024 at 10:52:40AM +0530, Beleswar Prasad Padhi wrote:
On 21-08-2024 23:40, Jan Kiszka wrote:
On 21.08.24 07:30, Beleswar Prasad Padhi wrote:
On 19-08-2024 20:54, Jan Kiszka wrote:
From: Jan Kiszka jan.kiszka@siemens.com
By simply bailing out, the driver was violating its rule and internal
Using device lifecycle managed functions to register the rproc (devm_rproc_add()), bailing out with an error code will work.
assumptions that either both or no rproc should be initialized. E.g., this could cause the first core to be available but not the second one, leading to crashes on its shutdown later on while trying to dereference that second instance.
Fixes: 61f6f68447ab ("remoteproc: k3-r5: Wait for core0 power-up before powering up core1") Signed-off-by: Jan Kiszka jan.kiszka@siemens.com
drivers/remoteproc/ti_k3_r5_remoteproc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/remoteproc/ti_k3_r5_remoteproc.c b/drivers/remoteproc/ti_k3_r5_remoteproc.c index 39a47540c590..eb09d2e9b32a 100644 --- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -1332,7 +1332,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) dev_err(dev, "Timed out waiting for %s core to power up!\n", rproc->name); - return ret; + goto err_powerup; } } @@ -1348,6 +1348,7 @@ static int k3_r5_cluster_rproc_init(struct platform_device *pdev) } } +err_powerup: rproc_del(rproc);
Please use devm_rproc_add() to avoid having to do rproc_del() manually here.
This is just be the tip of the iceberg. The whole code needs to be reworked accordingly so that we can drop these goto, not just this one.
You are correct. Unfortunately, the organic growth of this driver has resulted in a need to refactor. I plan on doing this and post the refactoring soon. This should be part of the overall refactoring as suggested by Mathieu[2]. But for the immediate problem, your fix does patch things up.. hence:
Acked-by: Beleswar Padhi b-padhi@ti.com
I have applied this patch.
Thanks, Mathieu
Just look at k3_r5_reserved_mem_init. Your change in [1] was also too early in this regard, breaking current error handling additionally.
Curious, Could you point out how does the change in [1] breaks current error handling?
I'll stop my whac-a-mole. Someone needs to sit down and do that for the complete code consistently. And test the error cases.
Jan
[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?...
err_add: k3_r5_reserved_mem_exit(kproc);
linux-stable-mirror@lists.linaro.org