During testing I noticed a crash if unloading/loading the gs_usb driver during high CAN bus load.
The current version of the candlelight firmware doesn't flush the queues of the received CAN frames during the reset command. This leads to a crash if hardware timestamps are enabled, it a URB from the device is received before the cycle counter/time counter infrastructure has been setup.
First clean up then error handling in gs_can_open().
Then, fix the problem by converting the cycle counter/time counter infrastructure from a per-channel to per-device and set it up before submitting RX-URBs to the USB stack.
Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de --- Marc Kleine-Budde (2): can: gs_usb: gs_can_open(): improve error handling can: gs_usb: fix time stamp counter initialization
drivers/net/can/usb/gs_usb.c | 130 ++++++++++++++++++++++++------------------- 1 file changed, 74 insertions(+), 56 deletions(-) --- base-commit: 0dd1805fe498e0cf64f68e451a8baff7e64494ec change-id: 20230712-gs_usb-fix-time-stamp-counter-4bd302c808af
Best regards,
The gs_usb driver handles USB devices with more than 1 CAN channel. The RX path for all channels share the same bulk endpoint (the transmitted bulk data encodes the channel number). These per-device resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately in case of a failure or the URBs are anchored. All anchored URBs are finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a URB or a buffer fails. However, if usb_submit_urb() fails, the driver continues with the URBs submitted so far, even if no URBs were successfully submitted.
Treat every error as fatal and free all allocated resources immediately.
Switch to goto-style error handling, to prepare the driver for more per-device resource allocation.
Cc: stable@vger.kernel.org Cc: John Whittington git@jbrengineering.co.uk Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de --- drivers/net/can/usb/gs_usb.c | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c index d476c2884008..85b7b59c8426 100644 --- a/drivers/net/can/usb/gs_usb.c +++ b/drivers/net/can/usb/gs_usb.c @@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev) .mode = cpu_to_le32(GS_CAN_MODE_START), }; struct gs_host_frame *hf; + struct urb *urb = NULL; u32 ctrlmode; u32 flags = 0; int rc, i; @@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) { for (i = 0; i < GS_MAX_RX_URBS; i++) { - struct urb *urb; u8 *buf;
/* alloc rx urb */ urb = usb_alloc_urb(0, GFP_KERNEL); - if (!urb) - return -ENOMEM; + if (!urb) { + rc = -ENOMEM; + goto out_usb_kill_anchored_urbs; + }
/* alloc rx buffer */ buf = kmalloc(dev->parent->hf_size_rx, @@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev) if (!buf) { netdev_err(netdev, "No memory left for USB buffer\n"); - usb_free_urb(urb); - return -ENOMEM; + rc = -ENOMEM; + goto out_usb_free_urb; }
/* fill, anchor, and submit rx urb */ @@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev) netdev_err(netdev, "usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb); - usb_free_urb(urb); - break; + goto out_usb_unanchor_urb; }
/* Drop reference, @@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev) if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) gs_usb_timestamp_stop(dev); dev->can.state = CAN_STATE_STOPPED; - return rc; + + goto out_usb_kill_anchored_urbs; }
parent->active_channels++; @@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev) netif_start_queue(netdev);
return 0; + +out_usb_unanchor_urb: + usb_unanchor_urb(urb); +out_usb_free_urb: + usb_free_urb(urb); +out_usb_kill_anchored_urbs: + if (!parent->active_channels) + usb_kill_anchored_urbs(&dev->tx_submitted); + + close_candev(netdev); + + return rc; }
static int gs_usb_get_state(const struct net_device *netdev,
If the gs_usb device driver is unloaded (or unbound) before the interface is shut down, the USB stack first calls the struct usb_driver::disconnect and then the struct net_device_ops::ndo_stop callback.
In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more RX'ed CAN frames are send from the USB device to the host. Later in gs_can_close() a reset control message is send to each CAN channel to remove the controller from the CAN bus. In this race window the USB device can still receive CAN frames from the bus and internally queue them to be send to the host.
At least in the current version of the candlelight firmware, the queue of received CAN frames is not emptied during the reset command. After loading (or binding) the gs_usb driver, new URBs are submitted during the struct net_device_ops::ndo_open callback and the candlelight firmware starts sending its already queued CAN frames to the host.
However, this scenario was not considered when implementing the hardware timestamp function. The cycle counter/time counter infrastructure is set up (gs_usb_timestamp_init()) after the USBs are submitted, resulting in a NULL pointer dereference if timecounter_cyc2time() (via the call chain: gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() -> gs_usb_skb_set_timestamp()) is called too early.
Move the gs_usb_timestamp_init() function before the URBs are submitted to fix this problem.
For a comprehensive solution, we need to consider gs_usb devices with more than 1 channel. The cycle counter/time counter infrastructure is setup per channel, but the RX URBs are per device. Once gs_can_open() of _a_ channel has been called, and URBs have been submitted, the gs_usb_receive_bulk_callback() can be called for _all_ available channels, even for channels that are not running, yet. As cycle counter/time counter has not set up, this will again lead to a NULL pointer dereference.
Convert the cycle counter/time counter from a "per channel" to a "per device" functionality. Also set it up, before submitting any URBs to the device.
Further in gs_usb_receive_bulk_callback(), don't process any URBs for not started CAN channels, only resubmit the URB.
Fixes: 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support") Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-1623532... Cc: stable@vger.kernel.org Cc: John Whittington git@jbrengineering.co.uk Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de --- drivers/net/can/usb/gs_usb.c | 101 +++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 48 deletions(-)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c index 85b7b59c8426..f418066569fc 100644 --- a/drivers/net/can/usb/gs_usb.c +++ b/drivers/net/can/usb/gs_usb.c @@ -303,12 +303,6 @@ struct gs_can { struct can_bittiming_const bt_const, data_bt_const; unsigned int channel; /* channel number */
- /* time counter for hardware timestamps */ - struct cyclecounter cc; - struct timecounter tc; - spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */ - struct delayed_work timestamp; - u32 feature; unsigned int hf_size_tx;
@@ -325,6 +319,13 @@ struct gs_usb { struct gs_can *canch[GS_MAX_INTF]; struct usb_anchor rx_submitted; struct usb_device *udev; + + /* time counter for hardware timestamps */ + struct cyclecounter cc; + struct timecounter tc; + spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */ + struct delayed_work timestamp; + unsigned int hf_size_rx; u8 active_channels; }; @@ -388,15 +389,15 @@ static int gs_cmd_reset(struct gs_can *dev) GFP_KERNEL); }
-static inline int gs_usb_get_timestamp(const struct gs_can *dev, +static inline int gs_usb_get_timestamp(const struct gs_usb *parent, u32 *timestamp_p) { __le32 timestamp; int rc;
- rc = usb_control_msg_recv(dev->udev, 0, GS_USB_BREQ_TIMESTAMP, + rc = usb_control_msg_recv(parent->udev, 0, GS_USB_BREQ_TIMESTAMP, USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_INTERFACE, - dev->channel, 0, + 0, 0, ×tamp, sizeof(timestamp), USB_CTRL_GET_TIMEOUT, GFP_KERNEL); @@ -410,20 +411,20 @@ static inline int gs_usb_get_timestamp(const struct gs_can *dev,
static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev->tc_lock) { - struct gs_can *dev = container_of(cc, struct gs_can, cc); + struct gs_usb *parent = container_of(cc, struct gs_usb, cc); u32 timestamp = 0; int err;
- lockdep_assert_held(&dev->tc_lock); + lockdep_assert_held(&parent->tc_lock);
/* drop lock for synchronous USB transfer */ - spin_unlock_bh(&dev->tc_lock); - err = gs_usb_get_timestamp(dev, ×tamp); - spin_lock_bh(&dev->tc_lock); + spin_unlock_bh(&parent->tc_lock); + err = gs_usb_get_timestamp(parent, ×tamp); + spin_lock_bh(&parent->tc_lock); if (err) - netdev_err(dev->netdev, - "Error %d while reading timestamp. HW timestamps may be inaccurate.", - err); + dev_err(&parent->udev->dev, + "Error %d while reading timestamp. HW timestamps may be inaccurate.", + err);
return timestamp; } @@ -431,14 +432,14 @@ static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev static void gs_usb_timestamp_work(struct work_struct *work) { struct delayed_work *delayed_work = to_delayed_work(work); - struct gs_can *dev; + struct gs_usb *parent;
- dev = container_of(delayed_work, struct gs_can, timestamp); - spin_lock_bh(&dev->tc_lock); - timecounter_read(&dev->tc); - spin_unlock_bh(&dev->tc_lock); + parent = container_of(delayed_work, struct gs_usb, timestamp); + spin_lock_bh(&parent->tc_lock); + timecounter_read(&parent->tc); + spin_unlock_bh(&parent->tc_lock);
- schedule_delayed_work(&dev->timestamp, + schedule_delayed_work(&parent->timestamp, GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ); }
@@ -446,37 +447,38 @@ static void gs_usb_skb_set_timestamp(struct gs_can *dev, struct sk_buff *skb, u32 timestamp) { struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb); + struct gs_usb *parent = dev->parent; u64 ns;
- spin_lock_bh(&dev->tc_lock); - ns = timecounter_cyc2time(&dev->tc, timestamp); - spin_unlock_bh(&dev->tc_lock); + spin_lock_bh(&parent->tc_lock); + ns = timecounter_cyc2time(&parent->tc, timestamp); + spin_unlock_bh(&parent->tc_lock);
hwtstamps->hwtstamp = ns_to_ktime(ns); }
-static void gs_usb_timestamp_init(struct gs_can *dev) +static void gs_usb_timestamp_init(struct gs_usb *parent) { - struct cyclecounter *cc = &dev->cc; + struct cyclecounter *cc = &parent->cc;
cc->read = gs_usb_timestamp_read; cc->mask = CYCLECOUNTER_MASK(32); cc->shift = 32 - bits_per(NSEC_PER_SEC / GS_USB_TIMESTAMP_TIMER_HZ); cc->mult = clocksource_hz2mult(GS_USB_TIMESTAMP_TIMER_HZ, cc->shift);
- spin_lock_init(&dev->tc_lock); - spin_lock_bh(&dev->tc_lock); - timecounter_init(&dev->tc, &dev->cc, ktime_get_real_ns()); - spin_unlock_bh(&dev->tc_lock); + spin_lock_init(&parent->tc_lock); + spin_lock_bh(&parent->tc_lock); + timecounter_init(&parent->tc, &parent->cc, ktime_get_real_ns()); + spin_unlock_bh(&parent->tc_lock);
- INIT_DELAYED_WORK(&dev->timestamp, gs_usb_timestamp_work); - schedule_delayed_work(&dev->timestamp, + INIT_DELAYED_WORK(&parent->timestamp, gs_usb_timestamp_work); + schedule_delayed_work(&parent->timestamp, GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ); }
-static void gs_usb_timestamp_stop(struct gs_can *dev) +static void gs_usb_timestamp_stop(struct gs_usb *parent) { - cancel_delayed_work_sync(&dev->timestamp); + cancel_delayed_work_sync(&parent->timestamp); }
static void gs_update_state(struct gs_can *dev, struct can_frame *cf) @@ -560,6 +562,9 @@ static void gs_usb_receive_bulk_callback(struct urb *urb) if (!netif_device_present(netdev)) return;
+ if (!netif_running(netdev)) + goto resubmit_urb; + if (hf->echo_id == -1) { /* normal rx */ if (hf->flags & GS_CAN_FLAG_FD) { skb = alloc_canfd_skb(dev->netdev, &cfd); @@ -856,6 +861,9 @@ static int gs_can_open(struct net_device *netdev) }
if (!parent->active_channels) { + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) + gs_usb_timestamp_init(parent); + for (i = 0; i < GS_MAX_RX_URBS; i++) { u8 *buf;
@@ -926,13 +934,9 @@ static int gs_can_open(struct net_device *netdev) flags |= GS_CAN_MODE_FD;
/* if hardware supports timestamps, enable it */ - if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) { + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) flags |= GS_CAN_MODE_HW_TIMESTAMP;
- /* start polling timestamp */ - gs_usb_timestamp_init(dev); - } - /* finally start device */ dev->can.state = CAN_STATE_ERROR_ACTIVE; dm.flags = cpu_to_le32(flags); @@ -942,8 +946,6 @@ static int gs_can_open(struct net_device *netdev) GFP_KERNEL); if (rc) { netdev_err(netdev, "Couldn't start device (err=%d)\n", rc); - if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) - gs_usb_timestamp_stop(dev); dev->can.state = CAN_STATE_STOPPED;
goto out_usb_kill_anchored_urbs; @@ -960,9 +962,13 @@ static int gs_can_open(struct net_device *netdev) out_usb_free_urb: usb_free_urb(urb); out_usb_kill_anchored_urbs: - if (!parent->active_channels) + if (!parent->active_channels) { usb_kill_anchored_urbs(&dev->tx_submitted);
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) + gs_usb_timestamp_stop(parent); + } + close_candev(netdev);
return rc; @@ -1011,14 +1017,13 @@ static int gs_can_close(struct net_device *netdev)
netif_stop_queue(netdev);
- /* stop polling timestamp */ - if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) - gs_usb_timestamp_stop(dev); - /* Stop polling */ parent->active_channels--; if (!parent->active_channels) { usb_kill_anchored_urbs(&parent->rx_submitted); + + if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) + gs_usb_timestamp_stop(parent); }
/* Stop sending URBs */
Hi Marc,
Thanks for this. I've had a look and tested it (candleLight_fw v2.0, `candump can0 -H`, unload) and all seems good.
John.
------- Original Message ------- On Sunday, July 16th, 2023 at 21:33, Marc Kleine-Budde mkl@pengutronix.de wrote:
During testing I noticed a crash if unloading/loading the gs_usb driver during high CAN bus load.
The current version of the candlelight firmware doesn't flush the queues of the received CAN frames during the reset command. This leads to a crash if hardware timestamps are enabled, it a URB from the device is received before the cycle counter/time counter infrastructure has been setup.
First clean up then error handling in gs_can_open().
Then, fix the problem by converting the cycle counter/time counter infrastructure from a per-channel to per-device and set it up before submitting RX-URBs to the USB stack.
Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de
Marc Kleine-Budde (2): can: gs_usb: gs_can_open(): improve error handling can: gs_usb: fix time stamp counter initialization
drivers/net/can/usb/gs_usb.c | 130 ++++++++++++++++++++++++------------------- 1 file changed, 74 insertions(+), 56 deletions(-)
base-commit: 0dd1805fe498e0cf64f68e451a8baff7e64494ec change-id: 20230712-gs_usb-fix-time-stamp-counter-4bd302c808af
Best regards,
Marc Kleine-Budde mkl@pengutronix.de
linux-stable-mirror@lists.linaro.org