On 11/4/21 12:19 AM, Christoph Hellwig wrote:
On Thu, Nov 04, 2021 at 12:11:45AM +0800, Coly Li wrote:
fresh page for each vec, and bio_for_each_segment_all iterates page by page. IFF there is an offset there is proble in the surrounding code as bch_bio_alloc_pages assumes that it is called on a freshly allocate and initialized bio.
Yes, the offset is modified in bch_bio_alloc_pages().
Where? In my upstream copy of bch_bio_alloc_pages there is no bv_offset manipulation, and I could not see how such a manipulation would make sense.
Forgive my typo. It was bch_bio_map() before bch_bio_alloc_pages(), both in do_btree_node_write() and in util.c, bv->bv_offset is set by, bv->bv_offset = base ? offset_in_page(base) : 0;
Here base is bset *i which is initialized in do_btree_node_write() as, struct bset *i = btree_bset_last(b);
The unit size of bset is 1 bcache-defined-block size, minimized value is 512.
Normally the bcache defined block size is 4KB so the issue was not triggered frequently. I found it during testing my nvdimm enabling code for bcache, where I happen to make the bcache defined block size to non-4KB. The offset is from the previous written bkey set, which the minimized unit size is 1 bcache-defined-block-size.
So you have some out of tree changes here? Copying a PAGE_SIZE into a 'segment' bvec just does not make any sense if there is an offset, as segments are defined as bvecs that do not span page boundaries.
I suspect the best thing to do in do_btree_node_write would be something like the patch below instead of poking into the internals here, but I'd also really like to understand the root cause as it does point to a bug somewhere else.
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c index 93b67b8d31c3d..f69914848f32f 100644 --- a/drivers/md/bcache/btree.c +++ b/drivers/md/bcache/btree.c @@ -378,8 +378,8 @@ static void do_btree_node_write(struct btree *b) struct bvec_iter_all iter_all; bio_for_each_segment_all(bv, b->bio, iter_all) {
memcpy(bvec_virt(bv), addr, PAGE_SIZE);
addr += PAGE_SIZE;
memcpy_to_bvec(bvec_virt(bv), addr);
}addr += bv->bv_len;
bch_submit_bbio(b->bio, b->c, &k.key, 0);
The above change doesn't work, still observe panic[1].
Before calling bio_for_each_segment_all(), void *addr is calculated by, void *addr = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1)); which is a page size aligned address. When writing down a dirty btree node, it requires whole page to be written. Your original patch works fine when there is not previously unwirtten keys in the page as previous bkey set (and corrupts memory when bv->bv_offset is non-zero). The above change seems missing the part in offset [0, bv->bv_offset] inside the dirty page, I am not sure how the bellowed panic happens by the above change, it seems like wild pointer from the missing part of btree node when iterating the btree.
If you don't want to directly access members inside struct bio_vec, is there something like page_base(vb) which returns bv->bv_page ?
Thanks.
Coly Li
[1] the panic message starts like,
[ 1926.350362] ------------[ cut here ]------------ [ 1926.405603] kernel BUG at ./include/linux/highmem.h:316! [ 1926.469172] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI [ 1926.540006] CPU: 10 PID: 477 Comm: kworker/10:2 Kdump: loaded Tainted: G E 5.15.0-59.27-default+ #57 [ 1926.350362 1926.540009] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE164L-2.80]- 10/23/2020 [ 1926.540010] Workqueue: bcache bch_data_insert_keys [bcache] [ 1926.806482] RIP: 0010:__bch_btree_node_write+0x662/0x690 [bcache] <snipped>