The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer.
This is a backport of the upstream commit 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic") with modifications, as the original patch has missing dependencies on kernel v6.12.x. The structure "struct attribute_group" does not have bin_size field in v6.12.x kernel so the logic of configuring size of sysfs node for ring buffer has been moved to vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page") Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Naman Jain namjain@linux.microsoft.com ---
This change won't apply on older kernels currently due to missing dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has changed completely, while the functionality remains same. Requesting Michael, Dexuan, Wei to please review again.
--- drivers/hv/vmbus_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 1f519e925f06..616e63fb2f15 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = { .name = "ring", .mode = 0600, }, - .size = 2 * SZ_2M, .mmap = hv_mmap_ring_buffer_wrapper, }; static struct attribute *vmbus_chan_attrs[] = { @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj, /* Hide ring attribute if channel's ring_sysfs_visible is set to false */ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible) return 0; + attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
return attr->attr.mode; }
From: Naman Jain namjain@linux.microsoft.com Sent: Wednesday, July 23, 2025 12:02 AM
The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer.
This is a backport of the upstream commit 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic") with modifications, as the original patch has missing dependencies on kernel v6.12.x. The structure "struct attribute_group" does not have bin_size field in v6.12.x kernel so the logic of configuring size of sysfs node for ring buffer has been moved to vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page") Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Naman Jain namjain@linux.microsoft.com
This change won't apply on older kernels currently due to missing dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has changed completely, while the functionality remains same. Requesting Michael, Dexuan, Wei to please review again.
drivers/hv/vmbus_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 1f519e925f06..616e63fb2f15 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = { .name = "ring", .mode = 0600, },
- .size = 2 * SZ_2M, .mmap = hv_mmap_ring_buffer_wrapper,
}; static struct attribute *vmbus_chan_attrs[] = { @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj, /* Hide ring attribute if channel's ring_sysfs_visible is set to false */ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible) return 0;
- attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
Suppose a VM has two devices using UIO, such as DPDK network device with a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices will be referencing the same static instance of chan_attr_ring_buffer, and the .size field it contains. The above statement will change that .size field between 2MiB and 16KiB as the /sys entries are initially populated, and as the visibility is changed if the devices are removed and re-instantiated (which is much more likely for fcopy than for netvsc). That changing of the .size value will probably work most of the time, but it's racy if two devices with different ring buffer sizes get instantiated or re-instantiated at the same time.
Unfortunately, I don't see a fix, short of backporting support for the .bin_size function, as this is exactly the problem that function solves.
Michael
return attr->attr.mode; } -- 2.34.1
On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
From: Naman Jain namjain@linux.microsoft.com Sent: Wednesday, July 23, 2025 12:02 AM
The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer.
This is a backport of the upstream commit 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic") with modifications, as the original patch has missing dependencies on kernel v6.12.x. The structure "struct attribute_group" does not have bin_size field in v6.12.x kernel so the logic of configuring size of sysfs node for ring buffer has been moved to vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page") Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Naman Jain namjain@linux.microsoft.com
This change won't apply on older kernels currently due to missing dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has changed completely, while the functionality remains same. Requesting Michael, Dexuan, Wei to please review again.
drivers/hv/vmbus_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 1f519e925f06..616e63fb2f15 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = { .name = "ring", .mode = 0600, },
- .size = 2 * SZ_2M, .mmap = hv_mmap_ring_buffer_wrapper,
}; static struct attribute *vmbus_chan_attrs[] = { @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj, /* Hide ring attribute if channel's ring_sysfs_visible is set to false */ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible) return 0;
- attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
Suppose a VM has two devices using UIO, such as DPDK network device with a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices will be referencing the same static instance of chan_attr_ring_buffer, and the .size field it contains. The above statement will change that .size field between 2MiB and 16KiB as the /sys entries are initially populated, and as the visibility is changed if the devices are removed and re-instantiated (which is much more likely for fcopy than for netvsc). That changing of the .size value will probably work most of the time, but it's racy if two devices with different ring buffer sizes get instantiated or re-instantiated at the same time.
IIRC it works out in practice. While the global attribute instance is indeed modified back-and-forth the size from it will be *copied* into kernfs after each recalculation. So each attribute should get its own correct size.
Unfortunately, I don't see a fix, short of backporting support for the .bin_size function, as this is exactly the problem that function solves.
It should work out in practice. (I introduced the .bin_size function)
Thomas
From: Thomas Weißschuh linux@weissschuh.net Sent: Tuesday, July 29, 2025 11:47 AM
On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
From: Naman Jain namjain@linux.microsoft.com Sent: Wednesday, July 23, 2025 12:02 AM
The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer.
This is a backport of the upstream commit 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic") with modifications, as the original patch has missing dependencies on kernel v6.12.x. The structure "struct attribute_group" does not have bin_size field in v6.12.x kernel so the logic of configuring size of sysfs node for ring buffer has been moved to vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page") Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Naman Jain namjain@linux.microsoft.com
This change won't apply on older kernels currently due to missing dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has changed completely, while the functionality remains same. Requesting Michael, Dexuan, Wei to please review again.
drivers/hv/vmbus_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 1f519e925f06..616e63fb2f15 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = { .name = "ring", .mode = 0600, },
- .size = 2 * SZ_2M, .mmap = hv_mmap_ring_buffer_wrapper,
}; static struct attribute *vmbus_chan_attrs[] = { @@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj, /* Hide ring attribute if channel's ring_sysfs_visible is set to false */ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible) return 0;
- attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
Suppose a VM has two devices using UIO, such as DPDK network device with a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices will be referencing the same static instance of chan_attr_ring_buffer, and the .size field it contains. The above statement will change that .size field between 2MiB and 16KiB as the /sys entries are initially populated, and as the visibility is changed if the devices are removed and re-instantiated (which is much more likely for fcopy than for netvsc). That changing of the .size value will probably work most of the time, but it's racy if two devices with different ring buffer sizes get instantiated or re-instantiated at the same time.
IIRC it works out in practice. While the global attribute instance is indeed modified back-and-forth the size from it will be *copied* into kernfs after each recalculation. So each attribute should get its own correct size.
The race I see is in fs/sysfs/group.c in the create_files() function. It calls the is_bin_visible() function, which this patch uses to set the .size field of the static attribute. Then creates_files() calls sysfs_add_bin_file_mode_ns(), which reads the .size field and uses it to create the sysfs entry. But if create_files() is called in parallel on two different kobjs of the same type, but with different values for the .size field, the second create_files() could overwrite the static .size field after the first create_files() has set it, but before it has used it. I don't see any global lock that would prevent such, though maybe I'm missing something.
Unfortunately, I don't see a fix, short of backporting support for the .bin_size function, as this is exactly the problem that function solves.
It should work out in practice. (I introduced the .bin_size function)
The race I describe is unlikely, particularly if attribute groups are created once and then not disturbed. But note that the Hyper-V fcopy group can get updated in a running VM via update_sysfs_group(), which also calls create_files(). Such an update might marginally increase the potential for the race and for getting the wrong size. Still, I agree it should work out in practice.
Michael
Thomas
On 7/30/2025 1:15 AM, Michael Kelley wrote:
From: Thomas Weißschuh linux@weissschuh.net Sent: Tuesday, July 29, 2025 11:47 AM
On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
From: Naman Jain namjain@linux.microsoft.com Sent: Wednesday, July 23, 2025 12:02 AM
The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer.
This is a backport of the upstream commit 65995e97a1ca ("Drivers: hv: Make the sysfs node size for the ring buffer dynamic") with modifications, as the original patch has missing dependencies on kernel v6.12.x. The structure "struct attribute_group" does not have bin_size field in v6.12.x kernel so the logic of configuring size of sysfs node for ring buffer has been moved to vmbus_chan_bin_attr_is_visible().
Original change was not a fix, but it needs to be backported to fix size related discrepancy caused by the commit mentioned in Fixes tag.
Fixes: bf1299797c3c ("uio_hv_generic: Align ring size to system page") Cc: stable@vger.kernel.org # 6.12.x Signed-off-by: Naman Jain namjain@linux.microsoft.com
This change won't apply on older kernels currently due to missing dependencies. I will take care of them after this goes in.
I did not retain any Reviewed-by or Tested-by tags, since the code has changed completely, while the functionality remains same. Requesting Michael, Dexuan, Wei to please review again.
drivers/hv/vmbus_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 1f519e925f06..616e63fb2f15 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1810,7 +1810,6 @@ static struct bin_attribute chan_attr_ring_buffer = { .name = "ring", .mode = 0600, },
- .size = 2 * SZ_2M, .mmap = hv_mmap_ring_buffer_wrapper, }; static struct attribute *vmbus_chan_attrs[] = {
@@ -1866,6 +1865,7 @@ static umode_t vmbus_chan_bin_attr_is_visible(struct kobject *kobj, /* Hide ring attribute if channel's ring_sysfs_visible is set to false */ if (attr == &chan_attr_ring_buffer && !channel->ring_sysfs_visible) return 0;
- attr->size = channel->ringbuffer_pagecount << PAGE_SHIFT;
Suppose a VM has two devices using UIO, such as DPDK network device with a 2MiB ring buffer, and an fcopy device with a 16KiB ring buffer. Both devices will be referencing the same static instance of chan_attr_ring_buffer, and the .size field it contains. The above statement will change that .size field between 2MiB and 16KiB as the /sys entries are initially populated, and as the visibility is changed if the devices are removed and re-instantiated (which is much more likely for fcopy than for netvsc). That changing of the .size value will probably work most of the time, but it's racy if two devices with different ring buffer sizes get instantiated or re-instantiated at the same time.
IIRC it works out in practice. While the global attribute instance is indeed modified back-and-forth the size from it will be *copied* into kernfs after each recalculation. So each attribute should get its own correct size.
The race I see is in fs/sysfs/group.c in the create_files() function. It calls the is_bin_visible() function, which this patch uses to set the .size field of the static attribute. Then creates_files() calls sysfs_add_bin_file_mode_ns(), which reads the .size field and uses it to create the sysfs entry. But if create_files() is called in parallel on two different kobjs of the same type, but with different values for the .size field, the second create_files() could overwrite the static .size field after the first create_files() has set it, but before it has used it. I don't see any global lock that would prevent such, though maybe I'm missing something.
Unfortunately, I don't see a fix, short of backporting support for the .bin_size function, as this is exactly the problem that function solves.
It should work out in practice. (I introduced the .bin_size function)
The race I describe is unlikely, particularly if attribute groups are created once and then not disturbed. But note that the Hyper-V fcopy group can get updated in a running VM via update_sysfs_group(), which also calls create_files(). Such an update might marginally increase the potential for the race and for getting the wrong size. Still, I agree it should work out in practice.
Michael
Thomas
hi Thomas, Would it be possible to port your changes on 6.12 kernel, to avoid such race conditions? Or if it has a lot of dependencies, or if you have a follow-up advice, please let us us know.
Thanks, Naman
Hi Naman,
On 2025-07-31 21:13:27+0530, Naman Jain wrote:
On 7/30/2025 1:15 AM, Michael Kelley wrote:
From: Thomas Weißschuh linux@weissschuh.net Sent: Tuesday, July 29, 2025 11:47 AM
On 2025-07-29 18:39:45+0000, Michael Kelley wrote:
From: Naman Jain namjain@linux.microsoft.com Sent: Wednesday, July 23, 2025 12:02 AM
<snip>
Unfortunately, I don't see a fix, short of backporting support for the .bin_size function, as this is exactly the problem that function solves.
It should work out in practice. (I introduced the .bin_size function)
The race I describe is unlikely, particularly if attribute groups are created once and then not disturbed. But note that the Hyper-V fcopy group can get updated in a running VM via update_sysfs_group(), which also calls create_files(). Such an update might marginally increase the potential for the race and for getting the wrong size. Still, I agree it should work out in practice.
Michael
Thomas
hi Thomas, Would it be possible to port your changes on 6.12 kernel, to avoid such race conditions?
Possible, surely. But Greg has to accept it. And you will need to do the backports.
Or if it has a lot of dependencies, or if you have a follow-up advice, please let us us know.
Not more than mentioned before.
Thomas
linux-stable-mirror@lists.linaro.org