sysfs_emit(3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output")) has a hidden constraint that the buf should be alignment with PAGE_SIZE. It's OK since 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") help us to solve scenes like CONFIG_SLUB_DEBUG or CONFIG_SLOB which will break this.
But since lots of stable branch(we reproduce it with 4.19 stable) merge 3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output") without 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), we will get the follow warning with command 'cat /sys/class/iscsi_transport/tcp/handle' once we enable CONFIG_SLUB_DEBUG and start kernel with slub_debug=UFPZ!
Obviously, we can backport 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") to fix it. But this will waste some memory to ensure natural alignment which seems unbearable for embedded device. So for stable branch like 4.19, can we just remove the warning in sysfs_emit since the only user for it is iscsi, and seq_read + sysfs_kf_seq_show can ensure that the buf in sysfs_emit must be aware of PAGE_SIZE. Or does there some other advise for this problem?
# without 59bb47985c1d + 1G ram [root@localhost ~]# free total used free shared buff/cache available Mem: 947336 169960 389732 896 387644 624216 Swap: 0 0 0
# merge with 59bb47985c1d + 1G ram [root@localhost ~]# free total used free shared buff/cache available Mem: 947340 175176 374396 896 397768 618964 Swap: 0 0 0 [root@localhost ~]#
[ 37.683332] ------------[ cut here ]------------ [ 37.692747] invalid sysfs_emit: buf:00000000f75441ab [ 37.693914] WARNING: CPU: 1 PID: 576 at fs/sysfs/file.c:577 sysfs_emit+0xb9/0xe0 [ 37.694861] Modules linked in: [ 37.695264] CPU: 1 PID: 576 Comm: cat Not tainted 4.19.183-00023-gdf225d326e8c #7 [ 37.696210] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 37.697866] RIP: 0010:sysfs_emit+0xb9/0xe0 [ 37.698387] Code: 47 c9 c3 48 83 05 76 33 b3 04 01 48 89 fe 48 c7 c7 64 08 bb 8a 48 83 05 7c 33 b3 04 01 e8 13 7f be 00 48 83 05 77 33 b3 04 01 <0f> 0b 48 83 05 75 33 b3 04 01 48 83 05 73 [ 37.700713] RSP: 0018:ffffc90000af7cf8 EFLAGS: 00010202 [ 37.701370] RAX: 0000000000000000 RBX: ffff88803e0e4c00 RCX: 0000000000000006 [ 37.702261] RDX: 0000000000000007 RSI: 0000000000000006 RDI: ffff888039455bf0 [ 37.703171] RBP: ffffc90000af7d48 R08: 00000000000002f8 R09: 0000000000000005 [ 37.704079] R10: 00000000000002f7 R11: ffffffff8bd9534d R12: ffff88801a013740 [ 37.705001] R13: ffff88803db37a08 R14: ffff88803db37a30 R15: ffff88803db37a48 [ 37.705918] FS: 00007fcb96411580(0000) GS:ffff888039440000(0000) knlGS:0000000000000000 [ 37.706956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.707692] CR2: 00007fcb88cf0000 CR3: 000000001a501000 CR4: 00000000000006e0 [ 37.708607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 37.709520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 37.710427] Call Trace: [ 37.710784] show_transport_handle+0x3e/0x60 [ 37.711338] dev_attr_show+0x22/0x60 [ 37.711808] sysfs_kf_seq_show+0xc6/0x190 [ 37.712332] kernfs_seq_show+0x25/0x30 [ 37.712862] seq_read+0xe1/0x540 [ 37.713292] ? __handle_mm_fault+0xba3/0x1c70 [ 37.713866] kernfs_fop_read+0x36/0x230 [ 37.714371] __vfs_read+0x3c/0x230 [ 37.714819] ? handle_mm_fault+0x1d1/0x340 [ 37.715345] vfs_read+0xb5/0x1b0 [ 37.715774] ksys_read+0x67/0x130 [ 37.716218] __x64_sys_read+0x1e/0x30 [ 37.716701] do_syscall_64+0x95/0x3d0 [ 37.717175] ? do_async_page_fault+0x2e/0x190 [ 37.717747] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 37.718406] RIP: 0033:0x7fcb963363f2 [ 37.718881] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a 41 0a 00 e8 75 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 04 [ 37.721290] RSP: 002b:00007ffea78dff18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 37.722264] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fcb963363f2 [ 37.723169] RDX: 0000000000020000 RSI: 00007fcb88cf1000 RDI: 0000000000000003 [ 37.724100] RBP: 00007fcb88cf1000 R08: 00007fcb88cf0010 R09: 0000000000000000 [ 37.725039] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00 [ 37.725945] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [ 37.726857] ---[ end trace fbd5b85cd7d85530 ]---
Emm...
Actually, the problem exist for stable branch like 4.19 after fix for CVE-2021-27365 which include the follow two patch: 2efc459d06f1 ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output") ec98ea7070e9 ("scsi: iscsi: Ensure sysfs attributes are limited to PAGE_SIZE")
在 2021/4/2 15:16, yangerkun 写道:
sysfs_emit(3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output")) has a hidden constraint that the buf should be alignment with PAGE_SIZE. It's OK since 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") help us to solve scenes like CONFIG_SLUB_DEBUG or CONFIG_SLOB which will break this.
But since lots of stable branch(we reproduce it with 4.19 stable) merge 3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output") without 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), we will get the follow warning with command 'cat /sys/class/iscsi_transport/tcp/handle' once we enable CONFIG_SLUB_DEBUG and start kernel with slub_debug=UFPZ!
Obviously, we can backport 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") to fix it. But this will waste some memory to ensure natural alignment which seems unbearable for embedded device. So for stable branch like 4.19, can we just remove the warning in sysfs_emit since the only user for it is iscsi, and seq_read
- sysfs_kf_seq_show can ensure that the buf in sysfs_emit must be aware
of PAGE_SIZE. Or does there some other advise for this problem?
# without 59bb47985c1d + 1G ram [root@localhost ~]# free total used free shared buff/cache available Mem: 947336 169960 389732 896 387644 624216 Swap: 0 0 0
# merge with 59bb47985c1d + 1G ram [root@localhost ~]# free total used free shared buff/cache available Mem: 947340 175176 374396 896 397768 618964 Swap: 0 0 0 [root@localhost ~]#
[ 37.683332] ------------[ cut here ]------------ [ 37.692747] invalid sysfs_emit: buf:00000000f75441ab [ 37.693914] WARNING: CPU: 1 PID: 576 at fs/sysfs/file.c:577 sysfs_emit+0xb9/0xe0 [ 37.694861] Modules linked in: [ 37.695264] CPU: 1 PID: 576 Comm: cat Not tainted 4.19.183-00023-gdf225d326e8c #7 [ 37.696210] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 37.697866] RIP: 0010:sysfs_emit+0xb9/0xe0 [ 37.698387] Code: 47 c9 c3 48 83 05 76 33 b3 04 01 48 89 fe 48 c7 c7 64 08 bb 8a 48 83 05 7c 33 b3 04 01 e8 13 7f be 00 48 83 05 77 33 b3 04 01 <0f> 0b 48 83 05 75 33 b3 04 01 48 83 05 73 [ 37.700713] RSP: 0018:ffffc90000af7cf8 EFLAGS: 00010202 [ 37.701370] RAX: 0000000000000000 RBX: ffff88803e0e4c00 RCX: 0000000000000006 [ 37.702261] RDX: 0000000000000007 RSI: 0000000000000006 RDI: ffff888039455bf0 [ 37.703171] RBP: ffffc90000af7d48 R08: 00000000000002f8 R09: 0000000000000005 [ 37.704079] R10: 00000000000002f7 R11: ffffffff8bd9534d R12: ffff88801a013740 [ 37.705001] R13: ffff88803db37a08 R14: ffff88803db37a30 R15: ffff88803db37a48 [ 37.705918] FS: 00007fcb96411580(0000) GS:ffff888039440000(0000) knlGS:0000000000000000 [ 37.706956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 37.707692] CR2: 00007fcb88cf0000 CR3: 000000001a501000 CR4: 00000000000006e0 [ 37.708607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 37.709520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 37.710427] Call Trace: [ 37.710784] show_transport_handle+0x3e/0x60 [ 37.711338] dev_attr_show+0x22/0x60 [ 37.711808] sysfs_kf_seq_show+0xc6/0x190 [ 37.712332] kernfs_seq_show+0x25/0x30 [ 37.712862] seq_read+0xe1/0x540 [ 37.713292] ? __handle_mm_fault+0xba3/0x1c70 [ 37.713866] kernfs_fop_read+0x36/0x230 [ 37.714371] __vfs_read+0x3c/0x230 [ 37.714819] ? handle_mm_fault+0x1d1/0x340 [ 37.715345] vfs_read+0xb5/0x1b0 [ 37.715774] ksys_read+0x67/0x130 [ 37.716218] __x64_sys_read+0x1e/0x30 [ 37.716701] do_syscall_64+0x95/0x3d0 [ 37.717175] ? do_async_page_fault+0x2e/0x190 [ 37.717747] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 37.718406] RIP: 0033:0x7fcb963363f2 [ 37.718881] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a 41 0a 00 e8 75 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 04 [ 37.721290] RSP: 002b:00007ffea78dff18 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 37.722264] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fcb963363f2 [ 37.723169] RDX: 0000000000020000 RSI: 00007fcb88cf1000 RDI: 0000000000000003 [ 37.724100] RBP: 00007fcb88cf1000 R08: 00007fcb88cf0010 R09: 0000000000000000 [ 37.725039] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00 [ 37.725945] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 [ 37.726857] ---[ end trace fbd5b85cd7d85530 ]---
.
On Fri, Apr 02, 2021 at 03:16:21PM +0800, yangerkun wrote:
sysfs_emit(3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output")) has a hidden constraint that the buf should be alignment with PAGE_SIZE. It's OK since 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") help us to solve scenes like CONFIG_SLUB_DEBUG or CONFIG_SLOB which will break this.
But since lots of stable branch(we reproduce it with 4.19 stable) merge 3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output") without 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"), we will get the follow warning with command 'cat /sys/class/iscsi_transport/tcp/handle' once we enable CONFIG_SLUB_DEBUG and start kernel with slub_debug=UFPZ!
Obviously, we can backport 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") to fix it. But this will waste some memory to ensure natural alignment which seems unbearable for embedded device. So for stable branch like 4.19, can we just remove the warning in sysfs_emit since the only user for it is iscsi, and seq_read
- sysfs_kf_seq_show can ensure that the buf in sysfs_emit must be aware
of PAGE_SIZE. Or does there some other advise for this problem?
More users of this function will be backported over time as we all know, so just removing the functions is not good.
Why is the buffer alignment considered a "waste" here? If that change is in Linus's tree and newer kernels (it showed up in 5.4 which was released quite a while ago), where are the people complaining about it there?
I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") seems like the correct thing to do here to bring things into alignment (pun intended) with newer kernels.
thanks,
greg k-h
On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
Why is the buffer alignment considered a "waste" here? If that change is in Linus's tree and newer kernels (it showed up in 5.4 which was released quite a while ago), where are the people complaining about it there?
I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") seems like the correct thing to do here to bring things into alignment (pun intended) with newer kernels.
It's only a waste for slabs which need things like redzones (eg we could get 7 512-byte allocations out of a 4kB page with a 64 byte redzone and no alignment ; with alignment we can only get four). Since slub can enable/disable redzones on a per-slab basis, and redzones aren't terribly interesting now that we have kasan/kfence, nobody really cares.
在 2021/4/2 22:41, Matthew Wilcox 写道:
On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
Why is the buffer alignment considered a "waste" here? If that change is in Linus's tree and newer kernels (it showed up in 5.4 which was released quite a while ago), where are the people complaining about it there?
I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") seems like the correct thing to do here to bring things into alignment (pun intended) with newer kernels.
It's only a waste for slabs which need things like redzones (eg we could get 7 512-byte allocations out of a 4kB page with a 64 byte redzone and no alignment ; with alignment we can only get four). Since slub can enable/disable redzones on a per-slab basis, and redzones aren't terribly interesting now that we have kasan/kfence, nobody really cares.
.
Thanks for your explain! The imfluence seems minimal since the "waste" will happen only when we enable slub_debug.
One more question for Joe Perches. Patch v2[1] does not add the alignment check for buf and we add it in v3[2]. I don't see the necessity for this check... Can you help to explain that why we need this?
Thanks, Kun.
[1]. https://lore.kernel.org/lkml/a9054fb521e65f2809671fa9c18e2453061e9d91.159874... [2]. https://lore.kernel.org/lkml/743a648dc817cddd2e7046283c868f1c08742f29.camel@...
On Wed, 2021-04-07 at 20:14 +0800, yangerkun wrote:
在 2021/4/2 22:41, Matthew Wilcox 写道:
On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
Why is the buffer alignment considered a "waste" here? If that change is in Linus's tree and newer kernels (it showed up in 5.4 which was released quite a while ago), where are the people complaining about it there?
I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") seems like the correct thing to do here to bring things into alignment (pun intended) with newer kernels.
It's only a waste for slabs which need things like redzones (eg we could get 7 512-byte allocations out of a 4kB page with a 64 byte redzone and no alignment ; with alignment we can only get four). Since slub can enable/disable redzones on a per-slab basis, and redzones aren't terribly interesting now that we have kasan/kfence, nobody really cares.
.
Thanks for your explain! The imfluence seems minimal since the "waste" will happen only when we enable slub_debug.
One more question for Joe Perches. Patch v2[1] does not add the alignment check for buf and we add it in v3[2]. I don't see the necessity for this check... Can you help to explain that why we need this?
It's to make sure it's a PAGE_SIZE aligned buffer. It's just so it would not be misused/abused in non-sysfs derived cases.
在 2021/4/7 21:21, Joe Perches 写道:
On Wed, 2021-04-07 at 20:14 +0800, yangerkun wrote:
在 2021/4/2 22:41, Matthew Wilcox 写道:
On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
Why is the buffer alignment considered a "waste" here? If that change is in Linus's tree and newer kernels (it showed up in 5.4 which was released quite a while ago), where are the people complaining about it there?
I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)") seems like the correct thing to do here to bring things into alignment (pun intended) with newer kernels.
It's only a waste for slabs which need things like redzones (eg we could get 7 512-byte allocations out of a 4kB page with a 64 byte redzone and no alignment ; with alignment we can only get four). Since slub can enable/disable redzones on a per-slab basis, and redzones aren't terribly interesting now that we have kasan/kfence, nobody really cares.
.
Thanks for your explain! The imfluence seems minimal since the "waste" will happen only when we enable slub_debug.
One more question for Joe Perches. Patch v2[1] does not add the alignment check for buf and we add it in v3[2]. I don't see the necessity for this check... Can you help to explain that why we need this?
It's to make sure it's a PAGE_SIZE aligned buffer. It's just so it would not be misused/abused in non-sysfs derived cases.
.
Thanks! It help me to understand the problem better!
linux-stable-mirror@lists.linaro.org