From: Marvin Liu yong.liu@intel.com
When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can use virtqueue_enable_cb_delayed_packed to reduce the number of device interrupts. At the moment, this is the case for virtio-net when the napi_tx module parameter is set to false.
In this case, the virtio driver selects an event offset and expects that the device will send a notification when rolling over the event offset in the ring. However, if this roll-over happens before the event suppression structure update, the notification won't be sent. To address this race condition the driver needs to check wether the device rolled over the offset after updating the event suppression structure.
With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the flags field of the descriptor at the specified offset.
Unfortunately, checking at the event offset isn't reliable: if descriptors are chained (e.g. when INDIRECT is off) not all descriptors are overwritten by the device, so it's possible that the device skipped the specific descriptor driver is checking when writing out used descriptors. If this happens, the driver won't detect the race condition and will incorrectly expect the device to send a notification.
For virtio-net, the result will be a TX queue stall, with the transmission getting blocked forever.
With the packed ring, it isn't easy to find a location which is guaranteed to change upon the roll-over, except the next device descriptor, as described in the spec:
Writes of device and driver descriptors can generally be reordered, but each side (driver and device) are only required to poll (or test) a single location in memory: the next device descriptor after the one they processed previously, in circular order.
while this might be sub-optimal, let's do exactly this for now.
Cc: stable@vger.kernel.org Cc: Jason Wang jasowang@redhat.com Signed-off-by: Marvin Liu yong.liu@intel.com Signed-off-by: Michael S. Tsirkin mst@redhat.com ---
So this is what I have in my tree now - this is just Marvin's patch with a tweaked description.
drivers/virtio/virtio_ring.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index bdc08244a648..a8041e451e9e 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) * counter first before updating event flags. */ virtio_wmb(vq->weak_barriers); - } else { - used_idx = vq->last_used_idx; - wrap_counter = vq->packed.used_wrap_counter; }
if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) { @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) */ virtio_mb(vq->weak_barriers);
- if (is_used_desc_packed(vq, used_idx, wrap_counter)) { + if (is_used_desc_packed(vq, + vq->last_used_idx, + vq->packed.used_wrap_counter)) { END_USE(vq); return false; }
On 2019/10/27 下午6:08, Michael S. Tsirkin wrote:
From: Marvin Liu yong.liu@intel.com
When VIRTIO_F_RING_EVENT_IDX is negotiated, virtio devices can use virtqueue_enable_cb_delayed_packed to reduce the number of device interrupts. At the moment, this is the case for virtio-net when the napi_tx module parameter is set to false.
In this case, the virtio driver selects an event offset and expects that the device will send a notification when rolling over the event offset in the ring. However, if this roll-over happens before the event suppression structure update, the notification won't be sent. To address this race condition the driver needs to check wether the device rolled over the offset after updating the event suppression structure.
With VIRTIO_F_RING_PACKED, the virtio driver did this by reading the flags field of the descriptor at the specified offset.
Unfortunately, checking at the event offset isn't reliable: if descriptors are chained (e.g. when INDIRECT is off) not all descriptors are overwritten by the device, so it's possible that the device skipped the specific descriptor driver is checking when writing out used descriptors. If this happens, the driver won't detect the race condition and will incorrectly expect the device to send a notification.
For virtio-net, the result will be a TX queue stall, with the transmission getting blocked forever.
With the packed ring, it isn't easy to find a location which is guaranteed to change upon the roll-over, except the next device descriptor, as described in the spec:
Writes of device and driver descriptors can generally be reordered, but each side (driver and device) are only required to poll (or test) a single location in memory: the next device descriptor after the one they processed previously, in circular order.
while this might be sub-optimal, let's do exactly this for now.
Cc: stable@vger.kernel.org
Fixes: f51f982682e2a ("virtio_ring: leverage event idx in packed ring")
Cc: Jason Wang jasowang@redhat.com Signed-off-by: Marvin Liu yong.liu@intel.com Signed-off-by: Michael S. Tsirkin mst@redhat.com
So this is what I have in my tree now - this is just Marvin's patch with a tweaked description.
drivers/virtio/virtio_ring.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c index bdc08244a648..a8041e451e9e 100644 --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -1499,9 +1499,6 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) * counter first before updating event flags. */ virtio_wmb(vq->weak_barriers);
- } else {
used_idx = vq->last_used_idx;
}wrap_counter = vq->packed.used_wrap_counter;
if (vq->packed.event_flags_shadow == VRING_PACKED_EVENT_FLAG_DISABLE) { @@ -1518,7 +1515,9 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq) */ virtio_mb(vq->weak_barriers);
- if (is_used_desc_packed(vq, used_idx, wrap_counter)) {
- if (is_used_desc_packed(vq,
vq->last_used_idx,
END_USE(vq); return false; }vq->packed.used_wrap_counter)) {
linux-stable-mirror@lists.linaro.org