From: Filipe Manana fdmanana@suse.com
[ Upstream commit 20903032cd9f0260b99aeab92e6540f0350e4a23 ]
During the nocow writeback path, we currently iterate the rbtree of block groups twice: once for checking if the target block group is RO with the call to btrfs_extent_readonly()), and once again for getting a nocow reference on the block group with a call to btrfs_inc_nocow_writers().
Since btrfs_inc_nocow_writers() already returns false when the target block group is RO, remove the call to btrfs_extent_readonly(). Not only we avoid searching the blocks group rbtree twice, it also helps reduce contention on the lock that protects it (specially since it is a spin lock and not a read-write lock). That may make a noticeable difference on very large filesystems, with thousands of allocated block groups.
Reviewed-by: Anand Jain anand.jain@oracle.com Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/inode.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index ad34c5a09bef..02c4bfa515fb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1657,9 +1657,6 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, */ btrfs_release_path(path);
- /* If extent is RO, we must COW it */ - if (btrfs_extent_readonly(fs_info, disk_bytenr)) - goto out_check; ret = btrfs_cross_ref_exist(root, ino, found_key.offset - extent_offset, disk_bytenr, false); @@ -1706,6 +1703,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode, WARN_ON_ONCE(freespace_inode); goto out_check; } + /* If the extent's block group is RO, we must COW */ if (!btrfs_inc_nocow_writers(fs_info, disk_bytenr)) goto out_check; nocow = true;
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 9c7d83ae6ba67d6c6199cce24573983db3b56332 ]
syzbot is hitting WARN_ON(pstore_sb != sb) at pstore_kill_sb() [1], for the assumption that pstore_sb != NULL is wrong because pstore_fill_super() will not assign pstore_sb = sb when new_inode() for d_make_root() returned NULL (due to memory allocation fault injection).
Since mount_single() calls pstore_kill_sb() when pstore_fill_super() failed, pstore_kill_sb() needs to be aware of such failure path.
[1] https://syzkaller.appspot.com/bug?id=6abacb8da5137cb47a416f2bef95719ed60508a...
Reported-by: syzbot syzbot+d0cf0ad6513e9a1da5df@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20210214031307.57903-1-penguin-kernel@I-love.SAKUR... Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pstore/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/pstore/inode.c b/fs/pstore/inode.c index 93a217e4f563..14658b009f1b 100644 --- a/fs/pstore/inode.c +++ b/fs/pstore/inode.c @@ -467,7 +467,7 @@ static struct dentry *pstore_mount(struct file_system_type *fs_type, static void pstore_kill_sb(struct super_block *sb) { mutex_lock(&pstore_sb_lock); - WARN_ON(pstore_sb != sb); + WARN_ON(pstore_sb && pstore_sb != sb);
kill_litter_super(sb); pstore_sb = NULL;
From: Qu Wenruo wqu@suse.com
[ Upstream commit c28ea613fafad910d08f67efe76ae552b1434e44 ]
[BUG] When running fstresss, we can hit strange data csum mismatch where the on-disk data is in fact correct (passes scrub).
With some extra debug info added, we have the following traces:
0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192 0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096 0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192 0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864 0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096 0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864 0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192 0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384 1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096 1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864 1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!! 7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0
[CAUSE] Normally we expect all submitted bio reads to only touch the range we specified, and under subpage context, it means we should only touch the range specified in each bvec.
But in data read path, inside end_bio_extent_readpage(), we have page zeroing which only takes regular page size into consideration.
This means for subpage if we have an inode whose content looks like below:
0 16K 32K 48K 64K |///////| |///////| |
|//| = data needs to be read from disk | | = hole
And i_size is 64K initially.
Then the following race can happen:
T1 | T2 --------------------------------+-------------------------------- btrfs_do_readpage() | |- isize = 64K; | | At this time, the isize is | | 64K | | | |- submit_extent_page() | | submit previous assembled bio| | assemble bio for [0, 16K) | | | |- submit_extent_page() | submit read bio for [0, 16K) | assemble read bio for | [32K, 48K) | | | btrfs_setsize() | |- i_size_write(, 16K); | Now i_size is only 16K end_io() for [0K, 16K) | |- end_bio_extent_readpage() | |- btrfs_verify_data_csum() | | No csum error | |- i_size = 16K; | |- zero_user_segment(16K, | PAGE_SIZE); | !!! We zeroed range | !!! [32K, 48K) | | end_io for [32K, 48K) | |- end_bio_extent_readpage() | |- btrfs_verify_data_csum() | ! CSUM MISMATCH ! | ! As the range is zeroed now !
[FIX] To fix the problem, make end_bio_extent_readpage() to only zero the range of bvec.
The bug only affects subpage read-write support, as for full read-only mount we can't change i_size thus won't hit the race condition.
Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c9cee458e001..ff1a0f97ba84 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2972,12 +2972,23 @@ static void end_bio_extent_readpage(struct bio *bio) if (likely(uptodate)) { loff_t i_size = i_size_read(inode); pgoff_t end_index = i_size >> PAGE_SHIFT; - unsigned off;
- /* Zero out the end if this page straddles i_size */ - off = offset_in_page(i_size); - if (page->index == end_index && off) - zero_user_segment(page, off, PAGE_SIZE); + /* + * Zero out the remaining part if this range straddles + * i_size. + * + * Here we should only zero the range inside the bvec, + * not touch anything else. + * + * NOTE: i_size is exclusive while end is inclusive. + */ + if (page->index == end_index && i_size <= end) { + u32 zero_start = max(offset_in_page(i_size), + offset_in_page(end)); + + zero_user_segment(page, zero_start, + offset_in_page(end) + 1); + } } ASSERT(bio_offset + len > bio_offset); bio_offset += len;
On Sun, Mar 07, 2021 at 08:57:37AM -0500, Sasha Levin wrote:
From: Qu Wenruo wqu@suse.com
[ Upstream commit c28ea613fafad910d08f67efe76ae552b1434e44 ]
[BUG] When running fstresss, we can hit strange data csum mismatch where the on-disk data is in fact correct (passes scrub).
With some extra debug info added, we have the following traces:
0482us: btrfs_do_readpage: root=5 ino=284 offset=393216, submit force=0 pgoff=0 iosize=8192 0494us: btrfs_do_readpage: root=5 ino=284 offset=401408, submit force=0 pgoff=8192 iosize=4096 0498us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=393216 len=8192 0591us: btrfs_do_readpage: root=5 ino=284 offset=405504, submit force=0 pgoff=12288 iosize=36864 0594us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=401408 len=4096 0863us: btrfs_submit_data_bio: root=5 ino=284 bio first bvec=405504 len=36864 0933us: btrfs_verify_data_csum: root=5 ino=284 offset=393216 len=8192 0967us: btrfs_do_readpage: root=5 ino=284 offset=442368, skip beyond isize pgoff=49152 iosize=16384 1047us: btrfs_verify_data_csum: root=5 ino=284 offset=401408 len=4096 1163us: btrfs_verify_data_csum: root=5 ino=284 offset=405504 len=36864 1290us: check_data_csum: !!! root=5 ino=284 offset=438272 pg_off=45056 !!! 7387us: end_bio_extent_readpage: root=5 ino=284 before pending_read_bios=0
[CAUSE] Normally we expect all submitted bio reads to only touch the range we specified, and under subpage context, it means we should only touch the range specified in each bvec.
But in data read path, inside end_bio_extent_readpage(), we have page zeroing which only takes regular page size into consideration.
This means for subpage if we have an inode whose content looks like below:
0 16K 32K 48K 64K |///////| |///////| |
|//| = data needs to be read from disk | | = hole
And i_size is 64K initially.
Then the following race can happen:
T1 | T2
--------------------------------+-------------------------------- btrfs_do_readpage() | |- isize = 64K; | | At this time, the isize is | | 64K | | | |- submit_extent_page() | | submit previous assembled bio| | assemble bio for [0, 16K) | | | |- submit_extent_page() | submit read bio for [0, 16K) | assemble read bio for | [32K, 48K) | | | btrfs_setsize() | |- i_size_write(, 16K); | Now i_size is only 16K end_io() for [0K, 16K) | |- end_bio_extent_readpage() | |- btrfs_verify_data_csum() | | No csum error | |- i_size = 16K; | |- zero_user_segment(16K, | PAGE_SIZE); | !!! We zeroed range | !!! [32K, 48K) | | end_io for [32K, 48K) | |- end_bio_extent_readpage() | |- btrfs_verify_data_csum() | ! CSUM MISMATCH ! | ! As the range is zeroed now !
[FIX] To fix the problem, make end_bio_extent_readpage() to only zero the range of bvec.
The bug only affects subpage read-write support, as for full read-only mount we can't change i_size thus won't hit the race condition.
Please drop this patch from autosel because of the above, this is in a feature that's in progress and does not affect regular filesystems.
From: Aaron Lewis aaronlewis@google.com
[ Upstream commit 6528fc0a11de3d16339cf17639e2f69a68fcaf4d ]
The vcpu mmap area may consist of more than just the kvm_run struct. Allocate enough space for the entire vcpu mmap area. Without this, on x86, the PIO page, for example, will be missing. This is problematic when dealing with an unhandled exception from the guest as the exception vector will be incorrectly reported as 0x0.
Message-Id: 20210210165035.3712489-1-aaronlewis@google.com Reviewed-by: Andrew Jones drjones@redhat.com Co-developed-by: Steve Rutherford srutherford@google.com Signed-off-by: Aaron Lewis aaronlewis@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index fa5a90e6c6f0..859a0b57c683 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -21,6 +21,8 @@ #define KVM_UTIL_PGS_PER_HUGEPG 512 #define KVM_UTIL_MIN_PFN 2
+static int vcpu_mmap_sz(void); + /* Aligns x up to the next multiple of size. Size must be a power of 2. */ static void *align(void *x, size_t size) { @@ -509,7 +511,7 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct vcpu *vcpu) vcpu->dirty_gfns = NULL; }
- ret = munmap(vcpu->state, sizeof(*vcpu->state)); + ret = munmap(vcpu->state, vcpu_mmap_sz()); TEST_ASSERT(ret == 0, "munmap of VCPU fd failed, rc: %i " "errno: %i", ret, errno); close(vcpu->fd); @@ -978,7 +980,7 @@ void vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpuid) TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->state), "vcpu mmap size " "smaller than expected, vcpu_mmap_sz: %i expected_min: %zi", vcpu_mmap_sz(), sizeof(*vcpu->state)); - vcpu->state = (struct kvm_run *) mmap(NULL, sizeof(*vcpu->state), + vcpu->state = (struct kvm_run *) mmap(NULL, vcpu_mmap_sz(), PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd, 0); TEST_ASSERT(vcpu->state != MAP_FAILED, "mmap vcpu_state failed, " "vcpu id: %u errno: %i", vcpuid, errno);
From: Dmitry Osipenko digetx@gmail.com
[ Upstream commit 7db688e99c0f770ae73e0f1f3fb67f9b64266445 ]
There is a quite huge "uncorrectable error in header" flood in KMSG on a clean system boot since there is no pstore buffer saved in RAM. Let's silence the redundant noisy messages by rate-limiting the printk message. Now there are maximum 10 messages printed repeatedly instead of 35+.
Signed-off-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20210302095850.30894-1-digetx@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pstore/ram_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index aa8e0b65ff1a..fff363bfd484 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -246,7 +246,7 @@ static int persistent_ram_init_ecc(struct persistent_ram_zone *prz, pr_info("error in header, %d\n", numerr); prz->corrected_bytes += numerr; } else if (numerr < 0) { - pr_info("uncorrectable error in header\n"); + pr_info_ratelimited("uncorrectable error in header\n"); prz->bad_blocks++; }
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 25951362db7b3791488ec45bf56c0043f107b94b ]
It works fine and was only disabled because primary GPUs don't enter runpm if there is a console bound to the fbdev due to the kmap. This will at least allow runpm on secondary cards.
Reviewed-by: Evan Quan evan.quan@amd.com Reviewed-by: Rajneesh Bhardwaj rajneesh.bhardwaj@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index b16b32797624..ccfa2f9d5446 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -173,8 +173,6 @@ int amdgpu_driver_load_kms(struct amdgpu_device *adev, unsigned long flags) switch (adev->asic_type) { case CHIP_VEGA20: case CHIP_ARCTURUS: - case CHIP_SIENNA_CICHLID: - case CHIP_NAVY_FLOUNDER: /* enable runpm if runpm=1 */ if (amdgpu_runtime_pm > 0) adev->runpm = true;
From: "Steven Rostedt (VMware)" rostedt@goodmis.org
[ Upstream commit ee666a185558ac9a929e53b902a568442ed62416 ]
If tracing is disabled for some reason (traceoff_on_warning, command line, etc), the ftrace selftests are guaranteed to fail, as their results are defined by trace data in the ring buffers. If the ring buffers are turned off, the tests will fail, due to lack of data.
Because tracing being disabled is for a specific reason (warning, user decided to, etc), it does not make sense to enable tracing to run the self tests, as the test output may corrupt the reason for the tracing to be disabled.
Instead, simply skip the self tests and report that they are being skipped due to tracing being disabled.
Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/trace.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index b5815a022ecc..4b6df07d6dc6 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1932,6 +1932,12 @@ static int run_tracer_selftest(struct tracer *type) if (!selftests_can_run) return save_selftest(type);
+ if (!tracing_is_on()) { + pr_warn("Selftest for tracer %s skipped due to tracing disabled\n", + type->name); + return 0; + } + /* * Run a selftest on this tracer. * Here we reset the trace buffer, and set the current
From: Julian Einwag jeinwag-nvme@marcapo.com
[ Upstream commit 5e112d3fb89703a4981ded60561b5647db3693bf ]
The kernel fails to fully detect these SSDs, only the character devices are present:
[ 10.785605] nvme nvme0: pci function 0000:04:00.0 [ 10.876787] nvme nvme1: pci function 0000:81:00.0 [ 13.198614] nvme nvme0: missing or invalid SUBNQN field. [ 13.198658] nvme nvme1: missing or invalid SUBNQN field. [ 13.206896] nvme nvme0: Shutdown timeout set to 20 seconds [ 13.215035] nvme nvme1: Shutdown timeout set to 20 seconds [ 13.225407] nvme nvme0: 16/0/0 default/read/poll queues [ 13.233602] nvme nvme1: 16/0/0 default/read/poll queues [ 13.239627] nvme nvme0: Identify Descriptors failed (8194) [ 13.246315] nvme nvme1: Identify Descriptors failed (8194)
Adding the NVME_QUIRK_NO_NS_DESC_LIST fixes this problem.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=205679 Signed-off-by: Julian Einwag jeinwag-nvme@marcapo.com Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 6bad4d4dcdf0..7a38d764b486 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3230,7 +3230,8 @@ static const struct pci_device_id nvme_id_table[] = { { PCI_DEVICE(0x126f, 0x2263), /* Silicon Motion unidentified */ .driver_data = NVME_QUIRK_NO_NS_DESC_LIST, }, { PCI_DEVICE(0x1bb1, 0x0100), /* Seagate Nytro Flash Storage */ - .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, + .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY | + NVME_QUIRK_NO_NS_DESC_LIST, }, { PCI_DEVICE(0x1c58, 0x0003), /* HGST adapter */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, { PCI_DEVICE(0x1c58, 0x0023), /* WDC SN200 adapter */
From: Zoltán Böszörményi zboszor@gmail.com
[ Upstream commit dc22c1c058b5c4fe967a20589e36f029ee42a706 ]
My 2TB SKC2000 showed the exact same symptoms that were provided in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed cold boot to get it back.
According to some sources, the A2000 is simply a rebadged SKC2000 with a slightly optimized firmware.
Adding the SKC2000 PCI ID to the quirk list with the same workaround as the A2000 made my laptop survive a 5 hours long Yocto bootstrap buildfest which reliably triggered the SSD lockup previously.
Signed-off-by: Zoltán Böszörményi zboszor@gmail.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 7a38d764b486..14c5b52400ef 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3262,6 +3262,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x2646, 0x2262), /* KINGSTON SKC2000 NVMe SSD */ + .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),
Hi,
2021. 03. 07. 14:57 keltezéssel, Sasha Levin írta:
From: Zoltán Böszörményi zboszor@gmail.com
[ Upstream commit dc22c1c058b5c4fe967a20589e36f029ee42a706 ]
My 2TB SKC2000 showed the exact same symptoms that were provided in 538e4a8c57 ("nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs"), i.e. a complete NVME lockup that needed cold boot to get it back.
According to some sources, the A2000 is simply a rebadged SKC2000 with a slightly optimized firmware.
Adding the SKC2000 PCI ID to the quirk list with the same workaround as the A2000 made my laptop survive a 5 hours long Yocto bootstrap buildfest which reliably triggered the SSD lockup previously.
Signed-off-by: Zoltán Böszörményi zboszor@gmail.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org
Thanks for picking it up for stable.
May I suggest to include it in 5.10.x as well? Originally this patch was tested on 5.10.17.
Best regards, Zoltán Böszörményi
drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 7a38d764b486..14c5b52400ef 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3262,6 +3262,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, },
- { PCI_DEVICE(0x2646, 0x2262), /* KINGSTON SKC2000 NVMe SSD */
{ PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2001),.driver_data = NVME_QUIRK_NO_DEEPEST_PS, },
From: Pascal Terjan pterjan@google.com
[ Upstream commit 6e6a6828c517fb6819479bf5187df5f39084eb9e ]
Add the NVME_QUIRK_NO_NS_DESC_LIST and NVME_QUIRK_IGNORE_DEV_SUBNQN quirks for this buggy device.
Reported and tested in https://bugs.mageia.org/show_bug.cgi?id=28417
Signed-off-by: Pascal Terjan pterjan@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 14c5b52400ef..806a5d071ef6 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3245,6 +3245,9 @@ static const struct pci_device_id nvme_id_table[] = { NVME_QUIRK_IGNORE_DEV_SUBNQN, }, { PCI_DEVICE(0x1987, 0x5016), /* Phison E16 */ .driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, }, + { PCI_DEVICE(0x1b4b, 0x1092), /* Lexar 256 GB SSD */ + .driver_data = NVME_QUIRK_NO_NS_DESC_LIST | + NVME_QUIRK_IGNORE_DEV_SUBNQN, }, { PCI_DEVICE(0x1d1d, 0x1f1f), /* LighNVM qemu device */ .driver_data = NVME_QUIRK_LIGHTNVM, }, { PCI_DEVICE(0x1d1d, 0x2807), /* CNEX WL */
From: Martin George marting@netapp.com
[ Upstream commit 32feb6de47242e54692eceab52cfae8616aa0518 ]
Currently kato is initialized to NVME_DEFAULT_KATO for both discovery & i/o controllers. This is a problem specifically for non-persistent discovery controllers since it always ends up with a non-zero kato value. Fix this by initializing kato to zero instead, and ensuring various controllers are assigned appropriate kato values as follows:
non-persistent controllers - kato set to zero persistent controllers - kato set to NVMF_DEV_DISC_TMO (or any positive int via nvme-cli) i/o controllers - kato set to NVME_DEFAULT_KATO (or any positive int via nvme-cli)
Signed-off-by: Martin George marting@netapp.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/fabrics.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/fabrics.c b/drivers/nvme/host/fabrics.c index 72ac00173500..684acd6813bc 100644 --- a/drivers/nvme/host/fabrics.c +++ b/drivers/nvme/host/fabrics.c @@ -634,7 +634,7 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, opts->queue_size = NVMF_DEF_QUEUE_SIZE; opts->nr_io_queues = num_online_cpus(); opts->reconnect_delay = NVMF_DEF_RECONNECT_DELAY; - opts->kato = NVME_DEFAULT_KATO; + opts->kato = 0; opts->duplicate_connect = false; opts->fast_io_fail_tmo = NVMF_DEF_FAIL_FAST_TMO; opts->hdr_digest = false; @@ -897,6 +897,9 @@ static int nvmf_parse_options(struct nvmf_ctrl_options *opts, opts->nr_write_queues = 0; opts->nr_poll_queues = 0; opts->duplicate_connect = true; + } else { + if (!opts->kato) + opts->kato = NVME_DEFAULT_KATO; } if (ctrl_loss_tmo < 0) { opts->max_reconnects = -1;
From: Max Gurtovoy mgurtovoy@nvidia.com
[ Upstream commit d9f273b7585c380d7a10d4b3187ddc2d37f2740b ]
In case we have already established connection to nvmf target, it shouldn't be allowed to change the model_number. E.g. if someone will identify ctrl and get model_number of "my_model" later on will change the model_numbel via configfs to "my_new_model" this will break the NVMe specification for "Get Log Page – Persistent Event Log" that refers to Model Number as: "This field contains the same value as reported in the Model Number field of the Identify Controller data structure, bytes 63:24."
Although it doesn't mentioned explicitly that this field can't be changed, we can assume it.
So allow setting this field only once: using configfs or in the first identify ctrl operation.
Signed-off-by: Max Gurtovoy mgurtovoy@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/admin-cmd.c | 36 ++++++++++++++++-------- drivers/nvme/target/configfs.c | 50 +++++++++++++++------------------ drivers/nvme/target/core.c | 2 +- drivers/nvme/target/nvmet.h | 7 +---- 4 files changed, 50 insertions(+), 45 deletions(-)
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index 1827d8d8f3b0..44d6d9f419d0 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -319,27 +319,40 @@ static void nvmet_execute_get_log_page(struct nvmet_req *req) nvmet_req_complete(req, NVME_SC_INVALID_FIELD | NVME_SC_DNR); }
-static void nvmet_id_set_model_number(struct nvme_id_ctrl *id, - struct nvmet_subsys *subsys) +static u16 nvmet_set_model_number(struct nvmet_subsys *subsys) { - const char *model = NVMET_DEFAULT_CTRL_MODEL; - struct nvmet_subsys_model *subsys_model; + u16 status = 0; + + mutex_lock(&subsys->lock); + if (!subsys->model_number) { + subsys->model_number = + kstrdup(NVMET_DEFAULT_CTRL_MODEL, GFP_KERNEL); + if (!subsys->model_number) + status = NVME_SC_INTERNAL; + } + mutex_unlock(&subsys->lock);
- rcu_read_lock(); - subsys_model = rcu_dereference(subsys->model); - if (subsys_model) - model = subsys_model->number; - memcpy_and_pad(id->mn, sizeof(id->mn), model, strlen(model), ' '); - rcu_read_unlock(); + return status; }
static void nvmet_execute_identify_ctrl(struct nvmet_req *req) { struct nvmet_ctrl *ctrl = req->sq->ctrl; + struct nvmet_subsys *subsys = ctrl->subsys; struct nvme_id_ctrl *id; u32 cmd_capsule_size; u16 status = 0;
+ /* + * If there is no model number yet, set it now. It will then remain + * stable for the life time of the subsystem. + */ + if (!subsys->model_number) { + status = nvmet_set_model_number(subsys); + if (status) + goto out; + } + id = kzalloc(sizeof(*id), GFP_KERNEL); if (!id) { status = NVME_SC_INTERNAL; @@ -353,7 +366,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req) memset(id->sn, ' ', sizeof(id->sn)); bin2hex(id->sn, &ctrl->subsys->serial, min(sizeof(ctrl->subsys->serial), sizeof(id->sn) / 2)); - nvmet_id_set_model_number(id, ctrl->subsys); + memcpy_and_pad(id->mn, sizeof(id->mn), subsys->model_number, + strlen(subsys->model_number), ' '); memcpy_and_pad(id->fr, sizeof(id->fr), UTS_RELEASE, strlen(UTS_RELEASE), ' ');
diff --git a/drivers/nvme/target/configfs.c b/drivers/nvme/target/configfs.c index c61ffd767062..4b809fe499c2 100644 --- a/drivers/nvme/target/configfs.c +++ b/drivers/nvme/target/configfs.c @@ -1120,16 +1120,12 @@ static ssize_t nvmet_subsys_attr_model_show(struct config_item *item, char *page) { struct nvmet_subsys *subsys = to_subsys(item); - struct nvmet_subsys_model *subsys_model; - char *model = NVMET_DEFAULT_CTRL_MODEL; int ret;
- rcu_read_lock(); - subsys_model = rcu_dereference(subsys->model); - if (subsys_model) - model = subsys_model->number; - ret = snprintf(page, PAGE_SIZE, "%s\n", model); - rcu_read_unlock(); + mutex_lock(&subsys->lock); + ret = snprintf(page, PAGE_SIZE, "%s\n", subsys->model_number ? + subsys->model_number : NVMET_DEFAULT_CTRL_MODEL); + mutex_unlock(&subsys->lock);
return ret; } @@ -1140,14 +1136,17 @@ static bool nvmet_is_ascii(const char c) return c >= 0x20 && c <= 0x7e; }
-static ssize_t nvmet_subsys_attr_model_store(struct config_item *item, - const char *page, size_t count) +static ssize_t nvmet_subsys_attr_model_store_locked(struct nvmet_subsys *subsys, + const char *page, size_t count) { - struct nvmet_subsys *subsys = to_subsys(item); - struct nvmet_subsys_model *new_model; - char *new_model_number; int pos = 0, len;
+ if (subsys->model_number) { + pr_err("Can't set model number. %s is already assigned\n", + subsys->model_number); + return -EINVAL; + } + len = strcspn(page, "\n"); if (!len) return -EINVAL; @@ -1157,28 +1156,25 @@ static ssize_t nvmet_subsys_attr_model_store(struct config_item *item, return -EINVAL; }
- new_model_number = kmemdup_nul(page, len, GFP_KERNEL); - if (!new_model_number) + subsys->model_number = kmemdup_nul(page, len, GFP_KERNEL); + if (!subsys->model_number) return -ENOMEM; + return count; +}
- new_model = kzalloc(sizeof(*new_model) + len + 1, GFP_KERNEL); - if (!new_model) { - kfree(new_model_number); - return -ENOMEM; - } - memcpy(new_model->number, new_model_number, len); +static ssize_t nvmet_subsys_attr_model_store(struct config_item *item, + const char *page, size_t count) +{ + struct nvmet_subsys *subsys = to_subsys(item); + ssize_t ret;
down_write(&nvmet_config_sem); mutex_lock(&subsys->lock); - new_model = rcu_replace_pointer(subsys->model, new_model, - mutex_is_locked(&subsys->lock)); + ret = nvmet_subsys_attr_model_store_locked(subsys, page, count); mutex_unlock(&subsys->lock); up_write(&nvmet_config_sem);
- kfree_rcu(new_model, rcuhead); - kfree(new_model_number); - - return count; + return ret; } CONFIGFS_ATTR(nvmet_subsys_, attr_model);
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c index 8ce4d59cc9e7..c7af907912f2 100644 --- a/drivers/nvme/target/core.c +++ b/drivers/nvme/target/core.c @@ -1521,7 +1521,7 @@ static void nvmet_subsys_free(struct kref *ref) nvmet_passthru_subsys_free(subsys);
kfree(subsys->subsysnqn); - kfree_rcu(subsys->model, rcuhead); + kfree(subsys->model_number); kfree(subsys); }
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h index 592763732065..aac741bf378c 100644 --- a/drivers/nvme/target/nvmet.h +++ b/drivers/nvme/target/nvmet.h @@ -208,11 +208,6 @@ struct nvmet_ctrl { bool pi_support; };
-struct nvmet_subsys_model { - struct rcu_head rcuhead; - char number[]; -}; - struct nvmet_subsys { enum nvme_subsys_type type;
@@ -242,7 +237,7 @@ struct nvmet_subsys { struct config_group namespaces_group; struct config_group allowed_hosts_group;
- struct nvmet_subsys_model __rcu *model; + char *model_number;
#ifdef CONFIG_NVME_TARGET_PASSTHRU struct nvme_ctrl *passthru_ctrl;
linux-stable-mirror@lists.linaro.org