So we've had this regression in 9p for.. almost a year, which is way too long, but there was no "easy" reproducer until yesterday (thank you again!!)
It turned out to be a bug with iov_iter on folios, iov_iter_get_pages_alloc2() would advance the iov_iter correctly up to the end edge of a folio and the later copy_to_iter() fails on the iterate_folioq() bug.
Happy to consider alternative ways of fixing this, now there's a reproducer it's all much clearer; for the bug to be visible we basically need to make and IO with non-contiguous folios in the iov_iter which is not obvious to test with synthetic VMs, with size that triggers a zero-copy read followed by a non-zero-copy read.
Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- Dominique Martinet (2): iov_iter: iterate_folioq: fix handling of offset >= folio size iov_iter: iov_folioq_get_pages: don't leave empty slot behind
include/linux/iov_iter.h | 3 +++ lib/iov_iter.c | 6 +++--- 2 files changed, 6 insertions(+), 3 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250811-iot_iter_folio-1b7849f88fed
Best regards,
From: Dominique Martinet asmadeus@codewreck.org
It's apparently possible to get an iov forwarded all the way up to the end of the current page we're looking at, e.g.
(gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}}
Where iov_offset is 4k with 4k-sized folios
This should have been because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code.
Reported-by: Maximilian Bosch maximilian@mbosch.me Reported-by: Ryan Lahfa ryan@lahfa.xyz Reported-by: Christian Theune ct@flyingcircus.io Reported-by: Arnout Engelen arnout@bzzt.net Link: https://lkml.kernel.org/r/D4LHHUNLG79Y.12PI0X6BEHRHW@mbosch.me/ Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Cc: stable@vger.kernel.org # v6.12+ Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- include/linux/iov_iter.h | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/include/linux/iov_iter.h b/include/linux/iov_iter.h index c4aa58032faf874ee5b29bd37f9e23c479741bef..7988a0fc94ad0525b475196035dc5d754fd3d117 100644 --- a/include/linux/iov_iter.h +++ b/include/linux/iov_iter.h @@ -168,6 +168,8 @@ size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2 break;
fsize = folioq_folio_size(folioq, slot); + if (skip >= fsize) + goto next; base = kmap_local_folio(folio, skip); part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); remain = step(base, progress, part, priv, priv2); @@ -177,6 +179,7 @@ size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2 progress += consumed; skip += consumed; if (skip >= fsize) { +next: skip = 0; slot++; if (slot == folioq_nr_slots(folioq) && folioq->next) {
Dominique Martinet via B4 Relay wrote:
It's apparently possible to get an iov forwarded all the way up to the
By "forwarded" I presume you mean "advanced"?
end of the current page we're looking at, e.g.
(gdb) p *iter $24 = {iter_type = 4 '\004', nofault = false, data_source = false, iov_offset = 4096, {__ubuf_iovec = { iov_base = 0xffff88800f5bc000, iov_len = 655}, {{__iov = 0xffff88800f5bc000, kvec = 0xffff88800f5bc000, bvec = 0xffff88800f5bc000, folioq = 0xffff88800f5bc000, xarray = 0xffff88800f5bc000, ubuf = 0xffff88800f5bc000}, count = 655}}, {nr_segs = 2, folioq_slot = 2 '\002', xarray_start = 2}}
Where iov_offset is 4k with 4k-sized folios
This should have been because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code.
Note that things get complicated because folioqs form a segmented list that can be under construction as it advances. So if there's no next folioq segment at the time you advance to the end of the current one, it will end up parked at the end of the last folio or with slot==nr_slots because there's nowhere for it to advance to. However, the folioq chain can then get extended, so the advancer has to detect this and move on to the next segment.
Anyway:
Acked-by: David Howells dhowells@redhat.com
Note that extract_folioq_to_sg() already does this as does iov_iter_extract_folioq_pages().
David Howells wrote on Mon, Aug 11, 2025 at 03:37:43PM +0100:
Dominique Martinet via B4 Relay wrote:
It's apparently possible to get an iov forwarded all the way up to the
By "forwarded" I presume you mean "advanced"?
Thanks, swapped words in v2
This should have been because we're only in the 2nd slot and there's another one after this, but iterate_folioq should not try to map a folio that skips the whole size, and more importantly part here does not end up zero (because 'PAGE_SIZE - skip % PAGE_SIZE' ends up PAGE_SIZE and not zero..), so skip forward to the "advance to next folio" code.
Note that things get complicated because folioqs form a segmented list that can be under construction as it advances. So if there's no next folioq segment at the time you advance to the end of the current one, it will end up parked at the end of the last folio or with slot==nr_slots because there's nowhere for it to advance to.
Hmm, I've already sent a v2 with other things fixed but now you made me look at the "we're at the end of the iov_iter" case I think this won't work well either? folioq_folio() always returns something, and the advance code only advances if folioq->next is set and doesn't bail out if it's unset.
There should be a `if (slot == folioq_nr_slots(folioq)) break` check somewhere as well? Or is the iov_iter guaranteed to always 1/ have some data and 2/ either be big enough or have remaining data in a step?
I can believe the former but wouldn't trust the later...
Note that extract_folioq_to_sg() already does this as does iov_iter_extract_folioq_pages().
Yes we're not quite consistent here, some functions like the plain iov_iter_advance will get you on an invalid slot to check for folioq->next on next invocations while others point at the end of the last folio in the queue (like iov_iter_extract_folioq_pages(), and iov_folioq_get_pages() before patch 2); I think either pattern is valid; I've changed iov_folioq_get_pages() because it was a bit weird to have an iov_iter with offset > count and iov_iter_advance wouldn't do this, but I agree either should work, we just probably want to be more consistent.
Thanks,
asmadeus@codewreck.org wrote:
There should be a `if (slot == folioq_nr_slots(folioq)) break` check somewhere as well? Or is the iov_iter guaranteed to always 1/ have some data and 2/ either be big enough or have remaining data in a step?
We should handle both cases. I think the other iteration functions will. iov_iter_extractg_folioq_pages(), for example, wraps it in a conditional:
if (offset < fsize) { part = umin(part, umin(maxsize - extracted, fsize - offset)); i->count -= part; i->iov_offset += part; extracted += part;
p[nr++] = folio_page(folio, offset / PAGE_SIZE); }
David
David Howells wrote on Tue, Aug 12, 2025 at 10:34:17AM +0100:
asmadeus@codewreck.org wrote:
There should be a `if (slot == folioq_nr_slots(folioq)) break` check somewhere as well? Or is the iov_iter guaranteed to always 1/ have some data and 2/ either be big enough or have remaining data in a step?
We should handle both cases. I think the other iteration functions will. iov_iter_extractg_folioq_pages(), for example, wraps it in a conditional:
if (offset < fsize) { part = umin(part, umin(maxsize - extracted, fsize - offset)); i->count -= part; i->iov_offset += part; extracted += part; p[nr++] = folio_page(folio, offset / PAGE_SIZE); }
That's not what I pointed out just now; it doesn't check either if there is no slot left For example, an iov_iter with nr_slots = 4, slot = 4, folioq->next = NULL will happily trod on folioq->vec.folios[4] (folioq_folio(folioq, slot)) which is invalid
Hi Dominique,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 8f5ae30d69d7543eee0d70083daf4de8fe15d585]
url: https://github.com/intel-lab-lkp/linux/commits/Dominique-Martinet-via-B4-Rel... base: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 patch link: https://lore.kernel.org/r/20250811-iot_iter_folio-v1-1-d9c223adf93c%40codewr... patch subject: [PATCH 1/2] iov_iter: iterate_folioq: fix handling of offset >= folio size config: i386-buildonly-randconfig-002-20250811 (https://download.01.org/0day-ci/archive/20250812/202508120250.Eooq2ydr-lkp@i...) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250812/202508120250.Eooq2ydr-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202508120250.Eooq2ydr-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from lib/iov_iter.c:14:
include/linux/iov_iter.h:171:7: warning: variable 'remain' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
171 | if (skip >= fsize) | ^~~~~~~~~~~~~ include/linux/iov_iter.h:190:7: note: uninitialized use occurs here 190 | if (remain) | ^~~~~~ include/linux/iov_iter.h:171:3: note: remove the 'if' if its condition is always false 171 | if (skip >= fsize) | ^~~~~~~~~~~~~~~~~~ 172 | goto next; | ~~~~~~~~~ include/linux/iov_iter.h:163:22: note: initialize the variable 'remain' to silence this warning 163 | size_t part, remain, consumed; | ^ | = 0 1 warning generated.
vim +171 include/linux/iov_iter.h
143 144 /* 145 * Handle ITER_FOLIOQ. 146 */ 147 static __always_inline 148 size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2, 149 iov_step_f step) 150 { 151 const struct folio_queue *folioq = iter->folioq; 152 unsigned int slot = iter->folioq_slot; 153 size_t progress = 0, skip = iter->iov_offset; 154 155 if (slot == folioq_nr_slots(folioq)) { 156 /* The iterator may have been extended. */ 157 folioq = folioq->next; 158 slot = 0; 159 } 160 161 do { 162 struct folio *folio = folioq_folio(folioq, slot); 163 size_t part, remain, consumed; 164 size_t fsize; 165 void *base; 166 167 if (!folio) 168 break; 169 170 fsize = folioq_folio_size(folioq, slot);
171 if (skip >= fsize)
172 goto next; 173 base = kmap_local_folio(folio, skip); 174 part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); 175 remain = step(base, progress, part, priv, priv2); 176 kunmap_local(base); 177 consumed = part - remain; 178 len -= consumed; 179 progress += consumed; 180 skip += consumed; 181 if (skip >= fsize) { 182 next: 183 skip = 0; 184 slot++; 185 if (slot == folioq_nr_slots(folioq) && folioq->next) { 186 folioq = folioq->next; 187 slot = 0; 188 } 189 } 190 if (remain) 191 break; 192 } while (len); 193 194 iter->folioq_slot = slot; 195 iter->folioq = folioq; 196 iter->iov_offset = skip; 197 iter->count -= progress; 198 return progress; 199 } 200
On Tue, Aug 12, 2025 at 02:55:55AM +0800, kernel test robot wrote:
Hi Dominique,
kernel test robot noticed the following build warnings:
[auto build test WARNING on 8f5ae30d69d7543eee0d70083daf4de8fe15d585]
url: https://github.com/intel-lab-lkp/linux/commits/Dominique-Martinet-via-B4-Rel... base: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 patch link: https://lore.kernel.org/r/20250811-iot_iter_folio-v1-1-d9c223adf93c%40codewr... patch subject: [PATCH 1/2] iov_iter: iterate_folioq: fix handling of offset >= folio size config: i386-buildonly-randconfig-002-20250811 (https://download.01.org/0day-ci/archive/20250812/202508120250.Eooq2ydr-lkp@i...) compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250812/202508120250.Eooq2ydr-lkp@i...)
If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot lkp@intel.com | Closes: https://lore.kernel.org/oe-kbuild-all/202508120250.Eooq2ydr-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from lib/iov_iter.c:14:
include/linux/iov_iter.h:171:7: warning: variable 'remain' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
171 | if (skip >= fsize) | ^~~~~~~~~~~~~
include/linux/iov_iter.h:190:7: note: uninitialized use occurs here 190 | if (remain) | ^~~~~~ include/linux/iov_iter.h:171:3: note: remove the 'if' if its condition is always false 171 | if (skip >= fsize) | ^~~~~~~~~~~~~~~~~~ 172 | goto next; | ~~~~~~~~~ include/linux/iov_iter.h:163:22: note: initialize the variable 'remain' to silence this warning 163 | size_t part, remain, consumed; | ^ | = 0 1 warning generated.
I see this in -next now, should remain be zero initialized or is there some other fix that is needed?
vim +171 include/linux/iov_iter.h
143 144 /* 145 * Handle ITER_FOLIOQ. 146 */ 147 static __always_inline 148 size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2, 149 iov_step_f step) 150 { 151 const struct folio_queue *folioq = iter->folioq; 152 unsigned int slot = iter->folioq_slot; 153 size_t progress = 0, skip = iter->iov_offset; 154 155 if (slot == folioq_nr_slots(folioq)) { 156 /* The iterator may have been extended. */ 157 folioq = folioq->next; 158 slot = 0; 159 } 160 161 do { 162 struct folio *folio = folioq_folio(folioq, slot); 163 size_t part, remain, consumed; 164 size_t fsize; 165 void *base; 166 167 if (!folio) 168 break; 169 170 fsize = folioq_folio_size(folioq, slot);
171 if (skip >= fsize)
172 goto next; 173 base = kmap_local_folio(folio, skip); 174 part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); 175 remain = step(base, progress, part, priv, priv2); 176 kunmap_local(base); 177 consumed = part - remain; 178 len -= consumed; 179 progress += consumed; 180 skip += consumed; 181 if (skip >= fsize) { 182 next: 183 skip = 0; 184 slot++; 185 if (slot == folioq_nr_slots(folioq) && folioq->next) { 186 folioq = folioq->next; 187 slot = 0; 188 } 189 } 190 if (remain) 191 break; 192 } while (len); 193 194 iter->folioq_slot = slot; 195 iter->folioq = folioq; 196 iter->iov_offset = skip; 197 iter->count -= progress; 198 return progress; 199 } 200
-- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki
Nathan Chancellor wrote on Tue, Aug 12, 2025 at 10:16:33PM -0700:
1 warning generated.
I see this in -next now, should remain be zero initialized or is there some other fix that is needed?
A zero-initialization is fine, I sent a v2 with zero-initialization fixed yesterday: https://lkml.kernel.org/r/20250812-iot_iter_folio-v2-1-f99423309478@codewrec...
(and I'll send a v3 with the goto replaced with a bigger if later today as per David's request)
I assume Andrew will pick it up eventually?
On Wed, Aug 13, 2025 at 02:34:25PM +0900, Dominique Martinet wrote:
Nathan Chancellor wrote on Tue, Aug 12, 2025 at 10:16:33PM -0700:
1 warning generated.
I see this in -next now, should remain be zero initialized or is there some other fix that is needed?
A zero-initialization is fine, I sent a v2 with zero-initialization fixed yesterday: https://lkml.kernel.org/r/20250812-iot_iter_folio-v2-1-f99423309478@codewrec...
(and I'll send a v3 with the goto replaced with a bigger if later today as per David's request)
I assume Andrew will pick it up eventually?
I hope this to happen sooner as it broke my builds too (I always do now `make W=1` and suggest all developers should follow).
Andy Shevchenko wrote on Wed, Aug 13, 2025 at 03:39:09PM +0200:
I assume Andrew will pick it up eventually?
I hope this to happen sooner as it broke my builds too (I always do now `make W=1` and suggest all developers should follow).
I actually test with W=1 too, but somehow this warning doesn't show up in my build, I'm not quite sure why :/ (even if I try clang like the test robot... But there's plenty of other warnings all around everywhere else, so I agree this is all way too manual)
Anyway, sorry about it...
On Wed, Aug 13, 2025 at 10:45:33PM +0900, Dominique Martinet wrote:
Andy Shevchenko wrote on Wed, Aug 13, 2025 at 03:39:09PM +0200:
I assume Andrew will pick it up eventually?
I hope this to happen sooner as it broke my builds too (I always do now `make W=1` and suggest all developers should follow).
I actually test with W=1 too, but somehow this warning doesn't show up in my build, I'm not quite sure why :/ (even if I try clang like the test robot... But there's plenty of other warnings all around everywhere else, so I agree this is all way too manual)
Depends on your config, last few releases I was specifically targetting x86 defconfigs (32- and 64-bit) to be build with `make W=1`. There are a couple of changes that are still pending, but otherwise it builds with GCC and clang.
On Wed, Aug 13, 2025 at 03:39:09PM +0200, Andy Shevchenko wrote:
On Wed, Aug 13, 2025 at 02:34:25PM +0900, Dominique Martinet wrote:
Nathan Chancellor wrote on Tue, Aug 12, 2025 at 10:16:33PM -0700:
1 warning generated.
I see this in -next now, should remain be zero initialized or is there some other fix that is needed?
A zero-initialization is fine, I sent a v2 with zero-initialization fixed yesterday: https://lkml.kernel.org/r/20250812-iot_iter_folio-v2-1-f99423309478@codewrec...
(and I'll send a v3 with the goto replaced with a bigger if later today as per David's request)
I assume Andrew will pick it up eventually?
I hope this to happen sooner as it broke my builds too (I always do now `make W=1` and suggest all developers should follow).
This build failure is showing up in my testing as well.
In the service of preventing bisection issues, would it be possible to fold the fix into the original patch?
Thanx, Paul
Andy Shevchenko wrote on Wed, Aug 13, 2025 at 04:52:39PM +0300:
I actually test with W=1 too, but somehow this warning doesn't show up in my build, I'm not quite sure why :/ (even if I try clang like the test robot... But there's plenty of other warnings all around everywhere else, so I agree this is all way too manual)
Depends on your config, last few releases I was specifically targetting x86 defconfigs (32- and 64-bit) to be build with `make W=1`. There are a couple of changes that are still pending, but otherwise it builds with GCC and clang.
I meant it the other way around: the warning isn't showing up on master + these patches for my config.
But now I double-checked, 'CC=clang make W=1' doesn't actually use clang, I should have tried 'make CC=clang W=1'... And, yeah, it just doesn't show up with gcc so I'll know it's better to check both compilers...
Paul E. McKenney wrote on Wed, Aug 13, 2025 at 04:04:19PM -0700:
I hope this to happen sooner as it broke my builds too (I always do now `make W=1` and suggest all developers should follow).
This build failure is showing up in my testing as well.
In the service of preventing bisection issues, would it be possible to fold the fix into the original patch?
Andrew just picked v3 up, so there won't be any such problem, and -next will stop failing after today's update
Dominique Martinet asmadeus@codewreck.org wrote:
I assume Andrew will pick it up eventually?
These might be more a Christian/VFS thing.
David
linux-stable-mirror@lists.linaro.org