This patch series aims to fix various issues throughout the QinHeng CH9200 driver. This driver fails to handle various failures, which in one case has lead to a uninit access bug found via syzbot. Upon reviewing the driver I fixed a few more issues which I have included in this patch series.
Parts of this series are the product of discussions and suggestions I had from others like Andrew Lunn, Simon Horman and Jakub Kicinski you can view those discussions below:
Link: https://lore.kernel.org/all/20250319112156.48312-1-qasdev00@gmail.com Link: https://lore.kernel.org/all/20250218002443.11731-1-qasdev00@gmail.com/ Link: https://lore.kernel.org/all/20250311161157.49065-1-qasdev00@gmail.com/
Qasim Ijaz (5): fix uninitialised access bug during mii_nway_restart remove extraneous return that prevents error propagation fail fast on control_read() failures during get_mac_address() add missing error handling in ch9200_bind() avoid triggering NWay restart on non-zero PHY ID
drivers/net/usb/ch9200.c | 61 ++++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 21 deletions(-)
In mii_nway_restart() during the line:
bmcr = mii->mdio_read(mii->dev, mii->phy_id, MII_BMCR);
The code attempts to call mii->mdio_read which is ch9200_mdio_read().
ch9200_mdio_read() utilises a local buffer, which is initialised with control_read():
unsigned char buff[2];
However buff is conditionally initialised inside control_read():
if (err == size) { memcpy(data, buf, size); }
If the condition of "err == size" is not met, then buff remains uninitialised. Once this happens the uninitialised buff is accessed and returned during ch9200_mdio_read():
return (buff[0] | buff[1] << 8);
The problem stems from the fact that ch9200_mdio_read() ignores the return value of control_read(), leading to uinit-access of buff.
To fix this we should check the return value of control_read() and return early on error. We return 0 on control_read() failure here because returning a negative may trigger the "bmcr & BMCR_ANENABLE" check within mii_nway_restart.
Reported-by: syzbot syzbot+3361c2d6f78a3e0892f9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3361c2d6f78a3e0892f9 Tested-by: syzbot syzbot+3361c2d6f78a3e0892f9@syzkaller.appspotmail.com Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz qasdev00@gmail.com --- drivers/net/usb/ch9200.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index f69d9b902da0..4f29ecf2240a 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -178,6 +178,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) { struct usbnet *dev = netdev_priv(netdev); unsigned char buff[2]; + int ret;
netdev_dbg(netdev, "%s phy_id:%02x loc:%02x\n", __func__, phy_id, loc); @@ -185,8 +186,10 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) if (phy_id != 0) return -ENODEV;
- control_read(dev, REQUEST_READ, 0, loc * 2, buff, 0x02, - CONTROL_TIMEOUT_MS); + ret = control_read(dev, REQUEST_READ, 0, loc * 2, buff, 0x02, + CONTROL_TIMEOUT_MS); + if (ret < 0) + return 0;
return (buff[0] | buff[1] << 8); }
On Sat, Apr 12, 2025 at 07:38:25PM +0100, Qasim Ijaz wrote:
In mii_nway_restart() during the line:
bmcr = mii->mdio_read(mii->dev, mii->phy_id, MII_BMCR);
The code attempts to call mii->mdio_read which is ch9200_mdio_read().
ch9200_mdio_read() utilises a local buffer, which is initialised with control_read():
unsigned char buff[2];
However buff is conditionally initialised inside control_read():
if (err == size) { memcpy(data, buf, size); }
If the condition of "err == size" is not met, then buff remains uninitialised. Once this happens the uninitialised buff is accessed and returned during ch9200_mdio_read():
return (buff[0] | buff[1] << 8);
The problem stems from the fact that ch9200_mdio_read() ignores the return value of control_read(), leading to uinit-access of buff.
To fix this we should check the return value of control_read() and return early on error. We return 0 on control_read() failure here because returning a negative may trigger the "bmcr & BMCR_ANENABLE" check within mii_nway_restart.
Reported-by: syzbot syzbot+3361c2d6f78a3e0892f9@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=3361c2d6f78a3e0892f9 Tested-by: syzbot syzbot+3361c2d6f78a3e0892f9@syzkaller.appspotmail.com Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Cc: stable@vger.kernel.org Signed-off-by: Qasim Ijaz qasdev00@gmail.com
Reviewed-by: Andrew Lunn andrew@lunn.ch
Andrew
The control_write() function sets err to -EINVAL however there is an incorrectly placed 'return 0' statement after it which stops the propogation of the error.
Fix this issue by removing the 'return 0'.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Qasim Ijaz qasdev00@gmail.com --- drivers/net/usb/ch9200.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index 4f29ecf2240a..61eb6c207eb1 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -168,8 +168,6 @@ static int control_write(struct usbnet *dev, unsigned char request, err = -EINVAL; kfree(buf);
- return 0; - err_out: return err; }
Hi,
Thanks for your patch.
FYI: kernel test robot notices the stable kernel rule is not satisfied.
The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opti...
Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH 2/5] net: ch9200: remove extraneous return that prevents error propagation Link: https://lore.kernel.org/stable/20250412183829.41342-3-qasdev00%40gmail.com
The get_mac_address() function has an issue where it does not directly check the return value of each control_read(), instead it sums up the return values and checks them all at the end which means if any call to control_read() fails the function just continues on.
Handle this by validating the return value of each call and fail fast and early instead of continuing.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Qasim Ijaz qasdev00@gmail.com --- drivers/net/usb/ch9200.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index 61eb6c207eb1..4f1d2e9045a9 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -304,24 +304,27 @@ static int ch9200_rx_fixup(struct usbnet *dev, struct sk_buff *skb)
static int get_mac_address(struct usbnet *dev, unsigned char *data) { - int err = 0; unsigned char mac_addr[0x06]; - int rd_mac_len = 0; + int rd_mac_len;
netdev_dbg(dev->net, "%s:\n\tusbnet VID:%0x PID:%0x\n", __func__, le16_to_cpu(dev->udev->descriptor.idVendor), le16_to_cpu(dev->udev->descriptor.idProduct));
- memset(mac_addr, 0, sizeof(mac_addr)); - rd_mac_len = control_read(dev, REQUEST_READ, 0, - MAC_REG_STATION_L, mac_addr, 0x02, - CONTROL_TIMEOUT_MS); - rd_mac_len += control_read(dev, REQUEST_READ, 0, MAC_REG_STATION_M, - mac_addr + 2, 0x02, CONTROL_TIMEOUT_MS); - rd_mac_len += control_read(dev, REQUEST_READ, 0, MAC_REG_STATION_H, - mac_addr + 4, 0x02, CONTROL_TIMEOUT_MS); - if (rd_mac_len != ETH_ALEN) - err = -EINVAL; + rd_mac_len = control_read(dev, REQUEST_READ, 0, MAC_REG_STATION_L, + mac_addr, 0x02, CONTROL_TIMEOUT_MS); + if (rd_mac_len < 0) + return rd_mac_len; + + rd_mac_len = control_read(dev, REQUEST_READ, 0, MAC_REG_STATION_M, + mac_addr + 2, 0x02, CONTROL_TIMEOUT_MS); + if (rd_mac_len < 0) + return rd_mac_len; + + rd_mac_len = control_read(dev, REQUEST_READ, 0, MAC_REG_STATION_H, + mac_addr + 4, 0x02, CONTROL_TIMEOUT_MS); + if (rd_mac_len < 0) + return rd_mac_len;
data[0] = mac_addr[5]; data[1] = mac_addr[4]; @@ -330,7 +333,7 @@ static int get_mac_address(struct usbnet *dev, unsigned char *data) data[4] = mac_addr[1]; data[5] = mac_addr[0];
- return err; + return 0; }
static int ch9200_bind(struct usbnet *dev, struct usb_interface *intf) @@ -386,6 +389,9 @@ static int ch9200_bind(struct usbnet *dev, struct usb_interface *intf) CONTROL_TIMEOUT_MS);
retval = get_mac_address(dev, addr); + if (retval < 0) + return retval; + eth_hw_addr_set(dev->net, addr);
return retval;
The ch9200_bind() function has no error handling for any control_write() calls.
Fix this by checking if any control_write() call fails and propagate the error to the caller.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Qasim Ijaz qasdev00@gmail.com --- drivers/net/usb/ch9200.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index 4f1d2e9045a9..187bbfc991f5 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -338,12 +338,12 @@ static int get_mac_address(struct usbnet *dev, unsigned char *data)
static int ch9200_bind(struct usbnet *dev, struct usb_interface *intf) { - int retval = 0; + int retval; unsigned char data[2]; u8 addr[ETH_ALEN];
retval = usbnet_get_endpoints(dev, intf); - if (retval) + if (retval < 0) return retval;
dev->mii.dev = dev->net; @@ -361,32 +361,44 @@ static int ch9200_bind(struct usbnet *dev, struct usb_interface *intf) data[1] = 0x0F; retval = control_write(dev, REQUEST_WRITE, 0, MAC_REG_THRESHOLD, data, 0x02, CONTROL_TIMEOUT_MS); + if (retval < 0) + return retval;
data[0] = 0xA0; data[1] = 0x90; retval = control_write(dev, REQUEST_WRITE, 0, MAC_REG_FIFO_DEPTH, data, 0x02, CONTROL_TIMEOUT_MS); + if (retval < 0) + return retval;
data[0] = 0x30; data[1] = 0x00; retval = control_write(dev, REQUEST_WRITE, 0, MAC_REG_PAUSE, data, 0x02, CONTROL_TIMEOUT_MS); + if (retval < 0) + return retval;
data[0] = 0x17; data[1] = 0xD8; retval = control_write(dev, REQUEST_WRITE, 0, MAC_REG_FLOW_CONTROL, data, 0x02, CONTROL_TIMEOUT_MS); + if (retval < 0) + return retval;
/* Undocumented register */ data[0] = 0x01; data[1] = 0x00; retval = control_write(dev, REQUEST_WRITE, 0, 254, data, 0x02, CONTROL_TIMEOUT_MS); + if (retval < 0) + return retval;
data[0] = 0x5F; data[1] = 0x0D; retval = control_write(dev, REQUEST_WRITE, 0, MAC_REG_CTRL, data, 0x02, CONTROL_TIMEOUT_MS); + if (retval < 0) + return retval;
retval = get_mac_address(dev, addr); if (retval < 0) @@ -394,7 +406,7 @@ static int ch9200_bind(struct usbnet *dev, struct usb_interface *intf)
eth_hw_addr_set(dev->net, addr);
- return retval; + return 0; }
static const struct driver_info ch9200_info = {
On Sat, 12 Apr 2025 19:38:28 +0100 Qasim Ijaz wrote:
retval = usbnet_get_endpoints(dev, intf);
- if (retval)
- if (retval < 0) return retval;
This change is unnecessary ? Commit message speaks of control_write(), this is usbnet_get_endpoints().
On Tue, Apr 15, 2025 at 08:47:08PM -0700, Jakub Kicinski wrote:
On Sat, 12 Apr 2025 19:38:28 +0100 Qasim Ijaz wrote:
retval = usbnet_get_endpoints(dev, intf);
- if (retval)
- if (retval < 0) return retval;
This change is unnecessary ? Commit message speaks of control_write(), this is usbnet_get_endpoints().
So this change was done mainly for consistency with the other error checks in the function. Essentially in my one of my previous patches (https://lore.kernel.org/all/20250317175117.GI688833@kernel.org/) I was using "if (retval)" for error handling, however after Simon's recommendation to use "if (retval < 0)" I changed this. In this particular function I took Simons advice but then noticed that the usbnet_get_endpoints() check was still using "if (retval)" so I decided to make it the same as the others.
The behaviour is still the same regardless of it we do "if (retval < 0)" or "if (retval)" for checking usbnet_get_endpoints() since it returns 0 on success or negative on failure.
So in ch9200_bind:
In the first case of "if (retval)", if the usbnet_get_endpoints() function fails and returns negative then we execute this branch and it returns negative, if it succeeds with 0 then the ch9200_bind function continues.
In the second case of "if (retval < 0)", if the usbnet_get_endpoints() function fails and returns negative then we execute this branch and it returns negative, if it succeeds with 0 then ch9200_bind function continues.
If you like I can include this in the patch description for clarity or remove it entirely.
During ch9200_mdio_read if the phy_id is not 0 -ENODEV is returned.
In certain cases such as in mii_nway_restart returning a negative such as -ENODEV triggers the "bmcr & BMCR_ANENABLE" check, we should avoid this on error and just end the function.
To address this just return 0.
Signed-off-by: Qasim Ijaz qasdev00@gmail.com --- drivers/net/usb/ch9200.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index 187bbfc991f5..281800bb2ff2 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc);
if (phy_id != 0) - return -ENODEV; + return 0;
ret = control_read(dev, REQUEST_READ, 0, loc * 2, buff, 0x02, CONTROL_TIMEOUT_MS);
On Sat, Apr 12, 2025 at 07:38:29PM +0100, Qasim Ijaz wrote:
During ch9200_mdio_read if the phy_id is not 0 -ENODEV is returned.
In certain cases such as in mii_nway_restart returning a negative such as -ENODEV triggers the "bmcr & BMCR_ANENABLE" check, we should avoid this on error and just end the function.
To address this just return 0.
Signed-off-by: Qasim Ijaz qasdev00@gmail.com
drivers/net/usb/ch9200.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index 187bbfc991f5..281800bb2ff2 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc); if (phy_id != 0)
return -ENODEV;
return 0;
An actually MDIO bus would return 0xffff is asked to read from a PHY which is not on the bus. But i've no idea how the ancient mii code handles this.
If this code every gets updated to using phylib, many of the changes you are making will need reverting because phylib actually wants to see the errors. So i'm somewhat reluctant to make changes like this.
Andrew
On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
@@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc); if (phy_id != 0)
return -ENODEV;
return 0;
An actually MDIO bus would return 0xffff is asked to read from a PHY which is not on the bus. But i've no idea how the ancient mii code handles this.
If this code every gets updated to using phylib, many of the changes you are making will need reverting because phylib actually wants to see the errors. So i'm somewhat reluctant to make changes like this.
Right.
I mean most of the patches seem to be adding error checking, unlike this one, but since Qasim doesn't have access to this HW they are more likely to break stuff than fix. I'm going to apply the first patch, Qasim if you'd like to clean up the rest I think it should be done separately without the Fixes tags, if at all.
On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
@@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc); if (phy_id != 0)
return -ENODEV;
return 0;
An actually MDIO bus would return 0xffff is asked to read from a PHY which is not on the bus. But i've no idea how the ancient mii code handles this.
If this code every gets updated to using phylib, many of the changes you are making will need reverting because phylib actually wants to see the errors. So i'm somewhat reluctant to make changes like this.
Right.
I mean most of the patches seem to be adding error checking, unlike this one, but since Qasim doesn't have access to this HW they are more likely to break stuff than fix. I'm going to apply the first patch, Qasim if you'd like to clean up the rest I think it should be done separately without the Fixes tags, if at all.
Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real error to silence the syzbot error and if someone with access to the HW comes along they can try to move this driver to more modern infra?
On Tue, Apr 15, 2025 at 08:56:48PM -0700, Jakub Kicinski wrote:
On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
@@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc); if (phy_id != 0)
return -ENODEV;
return 0;
An actually MDIO bus would return 0xffff is asked to read from a PHY which is not on the bus. But i've no idea how the ancient mii code handles this.
If this code every gets updated to using phylib, many of the changes you are making will need reverting because phylib actually wants to see the errors. So i'm somewhat reluctant to make changes like this.
Right.
I mean most of the patches seem to be adding error checking, unlike this one, but since Qasim doesn't have access to this HW they are more likely to break stuff than fix. I'm going to apply the first patch, Qasim if you'd like to clean up the rest I think it should be done separately without the Fixes tags, if at all.
Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real error to silence the syzbot error and if someone with access to the HW
Hi Andrew and Jakub
Since there is uncertainty on whether these patches would break things how about I refactor the patches to instead return what the function already returns, this way we include error handling but maintain consistency with what the function already returns and does so there is no chance of breaking stuff. I think including the error handling would be a good idea overall because we have already seen 1 bug where the root cause is insufficient error handling right? Furthermore this driver has not been updated in 4 years, so for the near‑term surely improving these aspects can only be a good thing.
So now going into the changes:
Patch 1: So patch 1 changes mdio_read, we can see that on failure mdio_read already returns a negative of -ENODEV, so for the patch 1 change we can simply just error check control_read by "if (ret < 0) return ret;" This matches the fact that mdio_read already returns a negative so no chance of breaking anything.
Patch 2: For patch 2 I will add Cc stable to this patch since kernel test robot flagged it, I assume backporting it would be the right thing to do since the return statement here stops error propagation. Would you like me to add it to the other patches?
Patch 3: For patch 3 the get_mac_address and ch9200_bind function already returns a negative on error so my changes don't change what is returned, so this should be fine i think.
Patch 4: For patch 4 it already returns a negative on error via usbnet_get_endpoints() so i hope it is fine as is? Jakub commented on the changed usbnet_get_endpoints() error check, if you want me to revert it back I can do that.
Patch 5: We can drop this.
Andrew and Jakub if you’re happy with this should I resend with these changes?
comes along they can try to move this driver to more modern infra?
On Thu, Apr 17, 2025 at 02:12:36PM +0100, Qasim Ijaz wrote:
On Tue, Apr 15, 2025 at 08:56:48PM -0700, Jakub Kicinski wrote:
On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
@@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc); if (phy_id != 0)
return -ENODEV;
return 0;
An actually MDIO bus would return 0xffff is asked to read from a PHY which is not on the bus. But i've no idea how the ancient mii code handles this.
If this code every gets updated to using phylib, many of the changes you are making will need reverting because phylib actually wants to see the errors. So i'm somewhat reluctant to make changes like this.
Right.
I mean most of the patches seem to be adding error checking, unlike this one, but since Qasim doesn't have access to this HW they are more likely to break stuff than fix. I'm going to apply the first patch, Qasim if you'd like to clean up the rest I think it should be done separately without the Fixes tags, if at all.
Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real error to silence the syzbot error and if someone with access to the HW
Hi Andrew and Jakub
Since there is uncertainty on whether these patches would break things how about I refactor the patches to instead return what the function already returns, this way we include error handling but maintain consistency with what the function already returns and does so there is no chance of breaking stuff. I think including the error handling would be a good idea overall because we have already seen 1 bug where the root cause is insufficient error handling right? Furthermore this driver has not been updated in 4 years, so for the near‑term surely improving these aspects can only be a good thing.
It is not a simple thing to decided if we should make changes or not, if we don't have the hardware. The test robot is saying things are potentially wrong, but we don't have any users complaining it is broken. If we make the test robot happy, without testing the changes, we can make users unhappy by breaking it. And that is the opposite of what we want.
We also need to think about "return on investment". Is anybody actually using this device still? Would it be better to spend our time on other devices we know are actually used?
If you can find a board which actually has this device, or can find somebody to run tests, then great, we are likely to accept them. But otherwise please focus on minimum low risk changes which are obviously correct, or just leave the test robot unhappy.
Andrew
On Thu, Apr 17, 2025 at 04:08:08PM +0200, Andrew Lunn wrote:
On Thu, Apr 17, 2025 at 02:12:36PM +0100, Qasim Ijaz wrote:
On Tue, Apr 15, 2025 at 08:56:48PM -0700, Jakub Kicinski wrote:
On Tue, 15 Apr 2025 20:52:30 -0700 Jakub Kicinski wrote:
On Tue, 15 Apr 2025 03:35:07 +0200 Andrew Lunn wrote:
@@ -182,7 +182,7 @@ static int ch9200_mdio_read(struct net_device *netdev, int phy_id, int loc) __func__, phy_id, loc); if (phy_id != 0)
return -ENODEV;
return 0;
An actually MDIO bus would return 0xffff is asked to read from a PHY which is not on the bus. But i've no idea how the ancient mii code handles this.
If this code every gets updated to using phylib, many of the changes you are making will need reverting because phylib actually wants to see the errors. So i'm somewhat reluctant to make changes like this.
Right.
I mean most of the patches seem to be adding error checking, unlike this one, but since Qasim doesn't have access to this HW they are more likely to break stuff than fix. I'm going to apply the first patch, Qasim if you'd like to clean up the rest I think it should be done separately without the Fixes tags, if at all.
Ah, no, patch 1 also does return 0. Hm. Maybe let's propagate the real error to silence the syzbot error and if someone with access to the HW
Hi Andrew and Jakub
Since there is uncertainty on whether these patches would break things how about I refactor the patches to instead return what the function already returns, this way we include error handling but maintain consistency with what the function already returns and does so there is no chance of breaking stuff. I think including the error handling would be a good idea overall because we have already seen 1 bug where the root cause is insufficient error handling right? Furthermore this driver has not been updated in 4 years, so for the near‑term surely improving these aspects can only be a good thing.
It is not a simple thing to decided if we should make changes or not, if we don't have the hardware. The test robot is saying things are potentially wrong, but we don't have any users complaining it is broken. If we make the test robot happy, without testing the changes, we can make users unhappy by breaking it. And that is the opposite of what we want.
For patch 2 the kernel test robot said it is missing a Cc stable tag in the sign-off area, it didnt highlight any build or functional errors so I don't understand what you mean there.
We also need to think about "return on investment". Is anybody actually using this device still? Would it be better to spend our time on other devices we know are actually used?
If you can find a board which actually has this device, or can find somebody to run tests, then great, we are likely to accept them. But otherwise please focus on minimum low risk changes which are obviously
So going forward what should we do? I gave my thoughts for each patch above and how I think we should change it to minimise breaking things while adding error handling, which ones do you agree/ don't agree with?
Thanks Qasim
correct, or just leave the test robot unhappy.
Andrew
Hi Andrew, Jakub
Just pinging on my last message. Any thoughts on how to proceed with this patch series, I left my thoughts in the previous message.
Thanks, Qasim
On Fri, Apr 25, 2025 at 11:13:12AM +0100, Qasim Ijaz wrote:
Hi Andrew, Jakub
Just pinging on my last message. Any thoughts on how to proceed with this patch series, I left my thoughts in the previous message.
I would suggest you do the minimum, low risk changes. Don't be driven to fix all the syzbot warnings just to make syzbot quiet. What really matters is you don't break the driver for users. syzbot is secondary.
Andrew
On Mon, Apr 28, 2025 at 04:22:59PM +0200, Andrew Lunn wrote:
On Fri, Apr 25, 2025 at 11:13:12AM +0100, Qasim Ijaz wrote:
Hi Andrew, Jakub
Just pinging on my last message. Any thoughts on how to proceed with this patch series, I left my thoughts in the previous message.
I would suggest you do the minimum, low risk changes. Don't be driven to fix all the syzbot warnings just to make syzbot quiet. What really matters is you don't break the driver for users. syzbot is secondary.
Right, got it so avoid breaking it at all costs, in that case should we move forward with the syzbot fix and the "remove extraneous return that prevents error propagation" patches only?
For the syzbot one we will return a negative on control read failure, as the function already does that when encountering an invalid phy_id.
As for the "remove extraneous return that prevents error propagation" change it seems like a simple low risk change from what I can tell (if not please let me know).
Would you guys be happy with this?
Thanks Qasim
Andrew
linux-stable-mirror@lists.linaro.org