Now {pmd,pte}_mkdirty() set _PAGE_DIRTY bit unconditionally, this causes
random segmentation fault after commit 0ccf7f168e17bb7e ("mm/thp: carry
over dirty bit when thp splits on pmd").
The reason is: when fork(), parent process use pmd_wrprotect() to clear
huge page's _PAGE_WRITE and _PAGE_DIRTY (for COW); then pte_mkdirty() set
_PAGE_DIRTY as well as _PAGE_MODIFIED while splitting dirty huge pages;
once _PAGE_DIRTY is set, there will be no tlb modify exception so the COW
machanism fails; and at last memory corruption occurred between parent
and child processes.
So, we should set _PAGE_DIRTY only when _PAGE_WRITE is set in {pmd,pte}_
mkdirty().
Cc: stable(a)vger.kernel.org
Cc: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
Note: CC sparc maillist because they have similar issues.
arch/loongarch/include/asm/pgtable.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/loongarch/include/asm/pgtable.h b/arch/loongarch/include/asm/pgtable.h
index 946704bee599..debbe116f105 100644
--- a/arch/loongarch/include/asm/pgtable.h
+++ b/arch/loongarch/include/asm/pgtable.h
@@ -349,7 +349,9 @@ static inline pte_t pte_mkclean(pte_t pte)
static inline pte_t pte_mkdirty(pte_t pte)
{
- pte_val(pte) |= (_PAGE_DIRTY | _PAGE_MODIFIED);
+ pte_val(pte) |= _PAGE_MODIFIED;
+ if (pte_val(pte) & _PAGE_WRITE)
+ pte_val(pte) |= _PAGE_DIRTY;
return pte;
}
@@ -478,7 +480,9 @@ static inline pmd_t pmd_mkclean(pmd_t pmd)
static inline pmd_t pmd_mkdirty(pmd_t pmd)
{
- pmd_val(pmd) |= (_PAGE_DIRTY | _PAGE_MODIFIED);
+ pmd_val(pmd) |= _PAGE_MODIFIED;
+ if (pmd_val(pmd) & _PAGE_WRITE)
+ pmd_val(pmd) |= _PAGE_DIRTY;
return pmd;
}
--
2.31.1
It is valid to receive external interrupt and have broken IDT entry,
which will lead to #GP with exit_int_into that will contain the index of
the IDT entry (e.g any value).
Other exceptions can happen as well, like #NP or #SS
(if stack switch fails).
Thus this warning can be user triggred and has very little value.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/svm.c | 9 ---------
1 file changed, 9 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index e9cec1b692051c..36f651ce842174 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3428,15 +3428,6 @@ static int svm_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
return 0;
}
- if (is_external_interrupt(svm->vmcb->control.exit_int_info) &&
- exit_code != SVM_EXIT_EXCP_BASE + PF_VECTOR &&
- exit_code != SVM_EXIT_NPF && exit_code != SVM_EXIT_TASK_SWITCH &&
- exit_code != SVM_EXIT_INTR && exit_code != SVM_EXIT_NMI)
- printk(KERN_ERR "%s: unexpected exit_int_info 0x%x "
- "exit_code 0x%x\n",
- __func__, svm->vmcb->control.exit_int_info,
- exit_code);
-
if (exit_fastpath != EXIT_FASTPATH_NONE)
return 1;
--
2.34.3
While not obivous, kvm_vcpu_reset() leaves the nested mode by clearing
'vcpu->arch.hflags' but it does so without all the required housekeeping.
On SVM, it is possible to have a vCPU reset while in guest mode because
unlike VMX, on SVM, INIT's are not latched in SVM non root mode and in
addition to that L1 doesn't have to intercept triple fault, which should
also trigger L1's reset if happens in L2 while L1 didn't intercept it.
If one of the above conditions happen, KVM will continue to use vmcb02
while not having in the guest mode.
Later the IA32_EFER will be cleared which will lead to freeing of the
nested guest state which will (correctly) free the vmcb02, but since
KVM still uses it (incorrectly) this will lead to a use after free
and kernel crash.
This issue is assigned CVE-2022-3344
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/x86.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 316ab1d5317f92..3fd900504e683b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11694,8 +11694,18 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
WARN_ON_ONCE(!init_event &&
(old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu)));
+ /*
+ * SVM doesn't unconditionally VM-Exit on INIT and SHUTDOWN, thus it's
+ * possible to INIT the vCPU while L2 is active. Force the vCPU back
+ * into L1 as EFER.SVME is cleared on INIT (along with all other EFER
+ * bits), i.e. virtualization is disabled.
+ */
+ if (is_guest_mode(vcpu))
+ kvm_leave_nested(vcpu);
+
kvm_lapic_reset(vcpu, init_event);
+ WARN_ON_ONCE(is_guest_mode(vcpu) || is_smm(vcpu));
vcpu->arch.hflags = 0;
vcpu->arch.smi_pending = 0;
--
2.34.3
Make sure that KVM uses vmcb01 before freeing nested state, and warn if
that is not the case.
This is a minimal fix for CVE-2022-3344 making the kernel print a warning
instead of a kernel panic.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/nested.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index b258d6988f5dde..b74da40c1fc40c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1126,6 +1126,9 @@ void svm_free_nested(struct vcpu_svm *svm)
if (!svm->nested.initialized)
return;
+ if (WARN_ON_ONCE(svm->vmcb != svm->vmcb01.ptr))
+ svm_switch_vmcb(svm, &svm->vmcb01);
+
svm_vcpu_free_msrpm(svm->nested.msrpm);
svm->nested.msrpm = NULL;
--
2.34.3
If the VM was terminated while nested, we free the nested state
while the vCPU still is in nested mode.
Soon a warning will be added for this condition.
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/svm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d22a809d923339..e9cec1b692051c 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1440,6 +1440,7 @@ static void svm_vcpu_free(struct kvm_vcpu *vcpu)
*/
svm_clear_current_vmcb(svm->vmcb);
+ svm_leave_nested(vcpu);
svm_free_nested(svm);
sev_free_vcpu(vcpu);
--
2.34.3
From: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
Erase can be zeroed in spi_nor_parse_4bait() or
spi_nor_init_non_uniform_erase_map(). In practice it happened with
mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands,
but only 4K and 64K erase with 4b address commands.
Fixes: dc92843159a7 ("mtd: spi-nor: fix erase_type array to indicate current map conf")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)nokia.com>
---
Changes in v2:
erase->opcode -> erase->size
drivers/mtd/spi-nor/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 88dd090..183ea9d 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -1400,6 +1400,8 @@ spi_nor_find_best_erase_type(const struct spi_nor_erase_map *map,
continue;
erase = &map->erase_type[i];
+ if (!erase->size)
+ continue;
/* Alignment is not mandatory for overlaid regions */
if (region->offset & SNOR_OVERLAID_REGION &&
--
2.10.2
commit 8cccf05fe857a18ee26e20d11a8455a73ffd4efd upstream.
If a nilfs2 filesystem is downgraded to read-only due to metadata
corruption on disk and is remounted read/write, or if emergency read-only
remount is performed, detaching a log writer and synchronizing the
filesystem can be done at the same time.
In these cases, use-after-free of the log writer (hereinafter
nilfs->ns_writer) can happen as shown in the scenario below:
Task1 Task2
-------------------------------- ------------------------------
nilfs_construct_segment
nilfs_segctor_sync
init_wait
init_waitqueue_entry
add_wait_queue
schedule
nilfs_remount (R/W remount case)
nilfs_attach_log_writer
nilfs_detach_log_writer
nilfs_segctor_destroy
kfree
finish_wait
_raw_spin_lock_irqsave
__raw_spin_lock_irqsave
do_raw_spin_lock
debug_spin_lock_before <-- use-after-free
While Task1 is sleeping, nilfs->ns_writer is freed by Task2. After Task1
waked up, Task1 accesses nilfs->ns_writer which is already freed. This
scenario diagram is based on the Shigeru Yoshida's post [1].
This patch fixes the issue by not detaching nilfs->ns_writer on remount so
that this UAF race doesn't happen. Along with this change, this patch
also inserts a few necessary read-only checks with superblock instance
where only the ns_writer pointer was used to check if the filesystem is
read-only.
Link: https://syzkaller.appspot.com/bug?id=79a4c002e960419ca173d55e863bd09e8112df…
Link: https://lkml.kernel.org/r/20221103141759.1836312-1-syoshida@redhat.com [1]
Link: https://lkml.kernel.org/r/20221104142959.28296-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+f816fa82f8783f7a02bb(a)syzkaller.appspotmail.com
Reported-by: Shigeru Yoshida <syoshida(a)redhat.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to stable-4.14 instead of the patch that could
not be applied to it last time.
A rejected hunk has been manually resolved without noticeable change,
and testing against that stable tree was fine.
fs/nilfs2/segment.c | 15 ++++++++-------
fs/nilfs2/super.c | 2 --
2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index e5a2b04c77ad..bff7fca4762d 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -331,7 +331,7 @@ void nilfs_relax_pressure_in_lock(struct super_block *sb)
struct the_nilfs *nilfs = sb->s_fs_info;
struct nilfs_sc_info *sci = nilfs->ns_writer;
- if (!sci || !sci->sc_flush_request)
+ if (sb_rdonly(sb) || unlikely(!sci) || !sci->sc_flush_request)
return;
set_bit(NILFS_SC_PRIOR_FLUSH, &sci->sc_flags);
@@ -2256,7 +2256,7 @@ int nilfs_construct_segment(struct super_block *sb)
struct nilfs_transaction_info *ti;
int err;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
/* A call inside transactions causes a deadlock. */
@@ -2295,7 +2295,7 @@ int nilfs_construct_dsync_segment(struct super_block *sb, struct inode *inode,
struct nilfs_transaction_info ti;
int err = 0;
- if (!sci)
+ if (sb_rdonly(sb) || unlikely(!sci))
return -EROFS;
nilfs_transaction_lock(sb, &ti, 0);
@@ -2792,11 +2792,12 @@ int nilfs_attach_log_writer(struct super_block *sb, struct nilfs_root *root)
if (nilfs->ns_writer) {
/*
- * This happens if the filesystem was remounted
- * read/write after nilfs_error degenerated it into a
- * read-only mount.
+ * This happens if the filesystem is made read-only by
+ * __nilfs_error or nilfs_remount and then remounted
+ * read/write. In these cases, reuse the existing
+ * writer.
*/
- nilfs_detach_log_writer(sb);
+ return 0;
}
nilfs->ns_writer = nilfs_segctor_new(sb, root);
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index af7d0d5cce50..36e60a45a1bf 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -1148,8 +1148,6 @@ static int nilfs_remount(struct super_block *sb, int *flags, char *data)
if ((bool)(*flags & MS_RDONLY) == sb_rdonly(sb))
goto out;
if (*flags & MS_RDONLY) {
- /* Shutting down log writer */
- nilfs_detach_log_writer(sb);
sb->s_flags |= MS_RDONLY;
/*
--
2.31.1
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8625147cafaa ("hugetlbfs: don't delete error page from pagecache")
7e1813d48dd3 ("hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8625147cafaa9ba74713d682f5185eb62cb2aedb Mon Sep 17 00:00:00 2001
From: James Houghton <jthoughton(a)google.com>
Date: Tue, 18 Oct 2022 20:01:25 +0000
Subject: [PATCH] hugetlbfs: don't delete error page from pagecache
This change is very similar to the change that was made for shmem [1], and
it solves the same problem but for HugeTLBFS instead.
Currently, when poison is found in a HugeTLB page, the page is removed
from the page cache. That means that attempting to map or read that
hugepage in the future will result in a new hugepage being allocated
instead of notifying the user that the page was poisoned. As [1] states,
this is effectively memory corruption.
The fix is to leave the page in the page cache. If the user attempts to
use a poisoned HugeTLB page with a syscall, the syscall will fail with
EIO, the same error code that shmem uses. For attempts to map the page,
the thread will get a BUS_MCEERR_AR SIGBUS.
[1]: commit a76054266661 ("mm: shmem: don't truncate page if memory failure happens")
Link: https://lkml.kernel.org/r/20221018200125.848471-1-jthoughton@google.com
Signed-off-by: James Houghton <jthoughton(a)google.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index dd54f67e47fd..df7772335dc0 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -328,6 +328,12 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to)
} else {
unlock_page(page);
+ if (PageHWPoison(page)) {
+ put_page(page);
+ retval = -EIO;
+ break;
+ }
+
/*
* We have the page, copy it to user space buffer.
*/
@@ -1111,13 +1117,6 @@ static int hugetlbfs_migrate_folio(struct address_space *mapping,
static int hugetlbfs_error_remove_page(struct address_space *mapping,
struct page *page)
{
- struct inode *inode = mapping->host;
- pgoff_t index = page->index;
-
- hugetlb_delete_from_page_cache(page);
- if (unlikely(hugetlb_unreserve_pages(inode, index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
-
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 546df97c31e4..e48f8ef45b17 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6111,6 +6111,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
ptl = huge_pte_lock(h, dst_mm, dst_pte);
+ ret = -EIO;
+ if (PageHWPoison(page))
+ goto out_release_unlock;
+
/*
* We allow to overwrite a pte marker: consider when both MISSING|WP
* registered, we firstly wr-protect a none pte which has no page cache
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 145bb561ddb3..bead6bccc7f2 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1080,6 +1080,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
int res;
struct page *hpage = compound_head(p);
struct address_space *mapping;
+ bool extra_pins = false;
if (!PageHuge(hpage))
return MF_DELAYED;
@@ -1087,6 +1088,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
mapping = page_mapping(hpage);
if (mapping) {
res = truncate_error_page(hpage, page_to_pfn(p), mapping);
+ /* The page is kept in page cache. */
+ extra_pins = true;
unlock_page(hpage);
} else {
unlock_page(hpage);
@@ -1104,7 +1107,7 @@ static int me_huge_page(struct page_state *ps, struct page *p)
}
}
- if (has_extra_refcount(ps, p, false))
+ if (has_extra_refcount(ps, p, extra_pins))
res = MF_FAILED;
return res;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
f0861f49bd94 ("x86/sgx: Add overflow check in sgx_validate_offset_length()")
dda03e2c331b ("x86/sgx: Create utility to validate user provided offset and length")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0861f49bd946ff94fce4f82509c45e167f63690 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Borys=20Pop=C5=82awski?= <borysp(a)invisiblethingslab.com>
Date: Wed, 5 Oct 2022 00:59:03 +0200
Subject: [PATCH] x86/sgx: Add overflow check in sgx_validate_offset_length()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
sgx_validate_offset_length() function verifies "offset" and "length"
arguments provided by userspace, but was missing an overflow check on
their addition. Add it.
Fixes: c6d26d370767 ("x86/sgx: Add SGX_IOC_ENCLAVE_ADD_PAGES")
Signed-off-by: Borys Popławski <borysp(a)invisiblethingslab.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: stable(a)vger.kernel.org # v5.11+
Link: https://lore.kernel.org/r/0d91ac79-6d84-abed-5821-4dbe59fa1a38@invisiblethi…
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index ebe79d60619f..da8b8ea6b063 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -356,6 +356,9 @@ static int sgx_validate_offset_length(struct sgx_encl *encl,
if (!length || !IS_ALIGNED(length, PAGE_SIZE))
return -EINVAL;
+ if (offset + length < offset)
+ return -EINVAL;
+
if (offset + length - PAGE_SIZE >= encl->size)
return -EINVAL;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
222cfa0118aa ("mmc: sdhci-pci: Fix possible memory leak caused by missing pci_dev_put()")
c31165d7400b ("mmc: sdhci-pci: Add support for HS200 tuning mode on AMD, eMMC-4.5.1")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 222cfa0118aa68687ace74aab8fdf77ce8fbd7e6 Mon Sep 17 00:00:00 2001
From: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
Date: Mon, 14 Nov 2022 16:31:00 +0800
Subject: [PATCH] mmc: sdhci-pci: Fix possible memory leak caused by missing
pci_dev_put()
pci_get_device() will increase the reference count for the returned
pci_dev. We need to use pci_dev_put() to decrease the reference count
before amd_probe() returns. There is no problem for the 'smbus_dev ==
NULL' branch because pci_dev_put() can also handle the NULL input
parameter case.
Fixes: 659c9bc114a8 ("mmc: sdhci-pci: Build o2micro support in the same module")
Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221114083100.149200-1-wangxiongfeng2@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/sdhci-pci-core.c b/drivers/mmc/host/sdhci-pci-core.c
index 34ea1acbb3cc..28dc65023fa9 100644
--- a/drivers/mmc/host/sdhci-pci-core.c
+++ b/drivers/mmc/host/sdhci-pci-core.c
@@ -1749,6 +1749,8 @@ static int amd_probe(struct sdhci_pci_chip *chip)
}
}
+ pci_dev_put(smbus_dev);
+
if (gen == AMD_CHIPSET_BEFORE_ML || gen == AMD_CHIPSET_CZ)
chip->quirks2 |= SDHCI_QUIRK2_CLEAR_TRANSFERMODE_REG_BEFORE_CMD;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
65946690ed8d ("firmware: coreboot: Register bus in module init")
cae0970ee9c4 ("firmware: google: Release devices before unregistering the bus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65946690ed8d972fdb91a74ee75ac0f0f0d68321 Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Wed, 19 Oct 2022 18:10:53 -0700
Subject: [PATCH] firmware: coreboot: Register bus in module init
The coreboot_table driver registers a coreboot bus while probing a
"coreboot_table" device representing the coreboot table memory region.
Probing this device (i.e., registering the bus) is a dependency for the
module_init() functions of any driver for this bus (e.g.,
memconsole-coreboot.c / memconsole_driver_init()).
With synchronous probe, this dependency works OK, as the link order in
the Makefile ensures coreboot_table_driver_init() (and thus,
coreboot_table_probe()) completes before a coreboot device driver tries
to add itself to the bus.
With asynchronous probe, however, coreboot_table_probe() may race with
memconsole_driver_init(), and so we're liable to hit one of these two:
1. coreboot_driver_register() eventually hits "[...] the bus was not
initialized.", and the memconsole driver fails to register; or
2. coreboot_driver_register() gets past #1, but still races with
bus_register() and hits some other undefined/crashing behavior (e.g.,
in driver_find() [1])
We can resolve this by registering the bus in our initcall, and only
deferring "device" work (scanning the coreboot memory region and
creating sub-devices) to probe().
[1] Example failure, using 'driver_async_probe=*' kernel command line:
[ 0.114217] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[ 0.114307] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1 #63
[ 0.114316] Hardware name: Google Scarlet (DT)
...
[ 0.114488] Call trace:
[ 0.114494] _raw_spin_lock+0x34/0x60
[ 0.114502] kset_find_obj+0x28/0x84
[ 0.114511] driver_find+0x30/0x50
[ 0.114520] driver_register+0x64/0x10c
[ 0.114528] coreboot_driver_register+0x30/0x3c
[ 0.114540] memconsole_driver_init+0x24/0x30
[ 0.114550] do_one_initcall+0x154/0x2e0
[ 0.114560] do_initcall_level+0x134/0x160
[ 0.114571] do_initcalls+0x60/0xa0
[ 0.114579] do_basic_setup+0x28/0x34
[ 0.114588] kernel_init_freeable+0xf8/0x150
[ 0.114596] kernel_init+0x2c/0x12c
[ 0.114607] ret_from_fork+0x10/0x20
[ 0.114624] Code: 5280002b 1100054a b900092a f9800011 (885ffc01)
[ 0.114631] ---[ end trace 0000000000000000 ]---
Fixes: b81e3140e412 ("firmware: coreboot: Make bus registration symmetric")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Reviewed-by: Stephen Boyd <swboyd(a)chromium.org>
Link: https://lore.kernel.org/r/20221019180934.1.If29e167d8a4771b0bf4a39c89c6946e…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/firmware/google/coreboot_table.c b/drivers/firmware/google/coreboot_table.c
index c52bcaa9def6..9ca21feb9d45 100644
--- a/drivers/firmware/google/coreboot_table.c
+++ b/drivers/firmware/google/coreboot_table.c
@@ -149,12 +149,8 @@ static int coreboot_table_probe(struct platform_device *pdev)
if (!ptr)
return -ENOMEM;
- ret = bus_register(&coreboot_bus_type);
- if (!ret) {
- ret = coreboot_table_populate(dev, ptr);
- if (ret)
- bus_unregister(&coreboot_bus_type);
- }
+ ret = coreboot_table_populate(dev, ptr);
+
memunmap(ptr);
return ret;
@@ -169,7 +165,6 @@ static int __cb_dev_unregister(struct device *dev, void *dummy)
static int coreboot_table_remove(struct platform_device *pdev)
{
bus_for_each_dev(&coreboot_bus_type, NULL, NULL, __cb_dev_unregister);
- bus_unregister(&coreboot_bus_type);
return 0;
}
@@ -199,6 +194,32 @@ static struct platform_driver coreboot_table_driver = {
.of_match_table = of_match_ptr(coreboot_of_match),
},
};
-module_platform_driver(coreboot_table_driver);
+
+static int __init coreboot_table_driver_init(void)
+{
+ int ret;
+
+ ret = bus_register(&coreboot_bus_type);
+ if (ret)
+ return ret;
+
+ ret = platform_driver_register(&coreboot_table_driver);
+ if (ret) {
+ bus_unregister(&coreboot_bus_type);
+ return ret;
+ }
+
+ return 0;
+}
+
+static void __exit coreboot_table_driver_exit(void)
+{
+ platform_driver_unregister(&coreboot_table_driver);
+ bus_unregister(&coreboot_bus_type);
+}
+
+module_init(coreboot_table_driver_init);
+module_exit(coreboot_table_driver_exit);
+
MODULE_AUTHOR("Google, Inc.");
MODULE_LICENSE("GPL");
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7fc961cf7ffc ("iommu/vt-d: Set SRE bit only when hardware has SRS cap")
54c80d907400 ("iommu/vt-d: Use user privilege for RID2PASID translation")
672cf6df9b8a ("iommu/vt-d: Move Intel IOMMU driver into subdirectory")
3b50142d8528 ("MAINTAINERS: sort field names for all entries")
4400b7d68f6e ("MAINTAINERS: sort entries by entry name")
b032227c6293 ("Merge tag 'nios2-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7fc961cf7ffcb130c4e93ee9a5628134f9de700a Mon Sep 17 00:00:00 2001
From: Tina Zhang <tina.zhang(a)intel.com>
Date: Wed, 16 Nov 2022 13:15:44 +0800
Subject: [PATCH] iommu/vt-d: Set SRE bit only when hardware has SRS cap
SRS cap is the hardware cap telling if the hardware IOMMU can support
requests seeking supervisor privilege or not. SRE bit in scalable-mode
PASID table entry is treated as Reserved(0) for implementation not
supporting SRS cap.
Checking SRS cap before setting SRE bit can avoid the non-recoverable
fault of "Non-zero reserved field set in PASID Table Entry" caused by
setting SRE bit while there is no SRS cap support. The fault messages
look like below:
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read NO_PASID] Request device [00:0d.0] fault addr 0x1154e1000
[fault reason 0x5a]
SM: Non-zero reserved field set in PASID Table Entry
Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tina Zhang <tina.zhang(a)intel.com>
Link: https://lore.kernel.org/r/20221115070346.1112273-1-tina.zhang@intel.com
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221116051544.26540-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c30ddac40ee5..e13d7e5273e1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -642,7 +642,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
* Since it is a second level only translation setup, we should
* set SRE bit as well (addresses are expected to be GPAs).
*/
- if (pasid != PASID_RID2PASID)
+ if (pasid != PASID_RID2PASID && ecap_srs(iommu->ecap))
pasid_set_sre(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);
@@ -685,7 +685,8 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
* We should set SRE bit as well since the addresses are expected
* to be GPAs.
*/
- pasid_set_sre(pte);
+ if (ecap_srs(iommu->ecap))
+ pasid_set_sre(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
539bcb57da2f ("io_uring: fix tw losing poll events")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
a9c210cebe13 ("io_uring: move epoll handler to its own file")
4cf90495281b ("io_uring: add a dummy -EOPNOTSUPP prep handler")
99f15d8d6136 ("io_uring: move uring_cmd handling to its own file")
cd40cae29ef8 ("io_uring: split out open/close operations")
453b329be5ea ("io_uring: separate out file table handling code")
f4c163dd7d4b ("io_uring: split out fadvise/madvise operations")
0d5847274037 ("io_uring: split out fs related sync/fallocate functions")
531113bbd5bf ("io_uring: split out splice related operations")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 539bcb57da2f58886d7d5c17134236b0ec9cd15d Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 17 Nov 2022 18:40:15 +0000
Subject: [PATCH] io_uring: fix tw losing poll events
We may never try to process a poll wake and its mask if there was
multiple wake ups racing for queueing up a tw. Force
io_poll_check_events() to update the mask by vfs_poll().
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/00344d60f8b18907171178d7cf598de71d127b0b.16687102…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 90920abf91ff..c34019b18211 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -228,6 +228,13 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
return IOU_POLL_DONE;
if (v & IO_POLL_CANCEL_FLAG)
return -ECANCELED;
+ /*
+ * cqe.res contains only events of the first wake up
+ * and all others are be lost. Redo vfs_poll() to get
+ * up to date state.
+ */
+ if ((v & IO_POLL_REF_MASK) != 1)
+ req->cqe.res = 0;
/* the mask was stashed in __io_poll_execute */
if (!req->cqe.res) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
3ce00bb7e91c ("binder: validate alloc->mm in ->mmap() handler")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ce00bb7e91cf57d723905371507af57182c37ef Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 4 Nov 2022 23:12:35 +0000
Subject: [PATCH] binder: validate alloc->mm in ->mmap() handler
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr
dereference") binder caches a pointer to the current->mm during open().
This fixes a null-ptr dereference reported by syzkaller. Unfortunately,
it also opens the door for a process to update its mm after the open(),
(e.g. via execve) making the cached alloc->mm pointer invalid.
Things get worse when the process continues to mmap() a vma. From this
point forward, binder will attempt to find this vma using an obsolete
alloc->mm reference. Such as in binder_update_page_range(), where the
wrong vma is obtained via vma_lookup(), yet binder proceeds to happily
insert new pages into it.
To avoid this issue fail the ->mmap() callback if we detect a mismatch
between the vma->vm_mm and the original alloc->mm pointer. This prevents
alloc->vm_addr from getting set, so that any subsequent vma_lookup()
calls fail as expected.
Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.15+
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 1c39cfce32fa..4ad42b0f75cd 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -739,6 +739,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
const char *failure_string;
struct binder_buffer *buffer;
+ if (unlikely(vma->vm_mm != alloc->mm)) {
+ ret = -EINVAL;
+ failure_string = "invalid vma->vm_mm";
+ goto err_invalid_mm;
+ }
+
mutex_lock(&binder_alloc_mmap_lock);
if (alloc->buffer_size) {
ret = -EBUSY;
@@ -785,6 +791,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
alloc->buffer_size = 0;
err_already_mapped:
mutex_unlock(&binder_alloc_mmap_lock);
+err_invalid_mm:
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%s: %d %lx-%lx %s failed %d\n", __func__,
alloc->pid, vma->vm_start, vma->vm_end,
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
3ce00bb7e91c ("binder: validate alloc->mm in ->mmap() handler")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3ce00bb7e91cf57d723905371507af57182c37ef Mon Sep 17 00:00:00 2001
From: Carlos Llamas <cmllamas(a)google.com>
Date: Fri, 4 Nov 2022 23:12:35 +0000
Subject: [PATCH] binder: validate alloc->mm in ->mmap() handler
Since commit 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr
dereference") binder caches a pointer to the current->mm during open().
This fixes a null-ptr dereference reported by syzkaller. Unfortunately,
it also opens the door for a process to update its mm after the open(),
(e.g. via execve) making the cached alloc->mm pointer invalid.
Things get worse when the process continues to mmap() a vma. From this
point forward, binder will attempt to find this vma using an obsolete
alloc->mm reference. Such as in binder_update_page_range(), where the
wrong vma is obtained via vma_lookup(), yet binder proceeds to happily
insert new pages into it.
To avoid this issue fail the ->mmap() callback if we detect a mismatch
between the vma->vm_mm and the original alloc->mm pointer. This prevents
alloc->vm_addr from getting set, so that any subsequent vma_lookup()
calls fail as expected.
Fixes: 1da52815d5f1 ("binder: fix alloc->vma_vm_mm null-ptr dereference")
Reported-by: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org> # 5.15+
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
Acked-by: Todd Kjos <tkjos(a)google.com>
Link: https://lore.kernel.org/r/20221104231235.348958-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 1c39cfce32fa..4ad42b0f75cd 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -739,6 +739,12 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
const char *failure_string;
struct binder_buffer *buffer;
+ if (unlikely(vma->vm_mm != alloc->mm)) {
+ ret = -EINVAL;
+ failure_string = "invalid vma->vm_mm";
+ goto err_invalid_mm;
+ }
+
mutex_lock(&binder_alloc_mmap_lock);
if (alloc->buffer_size) {
ret = -EBUSY;
@@ -785,6 +791,7 @@ int binder_alloc_mmap_handler(struct binder_alloc *alloc,
alloc->buffer_size = 0;
err_already_mapped:
mutex_unlock(&binder_alloc_mmap_lock);
+err_invalid_mm:
binder_alloc_debug(BINDER_DEBUG_USER_ERROR,
"%s: %d %lx-%lx %s failed %d\n", __func__,
alloc->pid, vma->vm_start, vma->vm_end,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cd136706b4f9 ("USB: bcma: Make GPIO explicitly optional")
d91adc5322ab ("Revert "USB: bcma: Add a check for devm_gpiod_get"")
f3de5d857bb2 ("USB: bcma: Add a check for devm_gpiod_get")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd136706b4f925aa5d316642543babac90d45910 Mon Sep 17 00:00:00 2001
From: Linus Walleij <linus.walleij(a)linaro.org>
Date: Mon, 7 Nov 2022 10:07:53 +0100
Subject: [PATCH] USB: bcma: Make GPIO explicitly optional
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
What the code does is to not check the return value from
devm_gpiod_get() and then avoid using an erroneous GPIO descriptor
with IS_ERR_OR_NULL().
This will miss real errors from the GPIO core that should not be
ignored, such as probe deferral.
Instead request the GPIO as explicitly optional, which means that
if it doesn't exist, the descriptor returned will be NULL.
Then we can add error handling and also avoid just doing this on
the device tree path, and simplify the site where the optional
GPIO descriptor is used.
There were some problems with cleaning up this GPIO descriptor
use in the past, but this is the proper way to deal with it.
Cc: Rafał Miłecki <rafal(a)milecki.pl>
Cc: Chuhong Yuan <hslester96(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20221107090753.1404679-1-linus.walleij@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/bcma-hcd.c b/drivers/usb/host/bcma-hcd.c
index 2df52f75f6b3..7558cc4d90cc 100644
--- a/drivers/usb/host/bcma-hcd.c
+++ b/drivers/usb/host/bcma-hcd.c
@@ -285,7 +285,7 @@ static void bcma_hci_platform_power_gpio(struct bcma_device *dev, bool val)
{
struct bcma_hcd_device *usb_dev = bcma_get_drvdata(dev);
- if (IS_ERR_OR_NULL(usb_dev->gpio_desc))
+ if (!usb_dev->gpio_desc)
return;
gpiod_set_value(usb_dev->gpio_desc, val);
@@ -406,9 +406,11 @@ static int bcma_hcd_probe(struct bcma_device *core)
return -ENOMEM;
usb_dev->core = core;
- if (core->dev.of_node)
- usb_dev->gpio_desc = devm_gpiod_get(&core->dev, "vcc",
- GPIOD_OUT_HIGH);
+ usb_dev->gpio_desc = devm_gpiod_get_optional(&core->dev, "vcc",
+ GPIOD_OUT_HIGH);
+ if (IS_ERR(usb_dev->gpio_desc))
+ return dev_err_probe(&core->dev, PTR_ERR(usb_dev->gpio_desc),
+ "error obtaining VCC GPIO");
switch (core->id.id) {
case BCMA_CORE_USB20_HOST:
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0954256e970e ("scsi: zfcp: Fix double free of FSF request when qdio send fails")
f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
dac37e15b7d5 ("scsi: zfcp: fix use-after-"free" in FC ingress path after TMF")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0954256e970ecf371b03a6c9af2cf91b9c4085ff Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Wed, 16 Nov 2022 11:50:37 +0100
Subject: [PATCH] scsi: zfcp: Fix double free of FSF request when qdio send
fails
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.16685955…
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0954256e970e ("scsi: zfcp: Fix double free of FSF request when qdio send fails")
f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0954256e970ecf371b03a6c9af2cf91b9c4085ff Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Wed, 16 Nov 2022 11:50:37 +0100
Subject: [PATCH] scsi: zfcp: Fix double free of FSF request when qdio send
fails
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.16685955…
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0954256e970e ("scsi: zfcp: Fix double free of FSF request when qdio send fails")
f9eca0227600 ("scsi: zfcp: drop duplicate fsf_command from zfcp_fsf_req which is also in QTCB header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0954256e970ecf371b03a6c9af2cf91b9c4085ff Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Wed, 16 Nov 2022 11:50:37 +0100
Subject: [PATCH] scsi: zfcp: Fix double free of FSF request when qdio send
fails
We used to use the wrong type of integer in 'zfcp_fsf_req_send()' to cache
the FSF request ID when sending a new FSF request. This is used in case the
sending fails and we need to remove the request from our internal hash
table again (so we don't keep an invalid reference and use it when we free
the request again).
In 'zfcp_fsf_req_send()' we used to cache the ID as 'int' (signed and 32
bit wide), but the rest of the zfcp code (and the firmware specification)
handles the ID as 'unsigned long'/'u64' (unsigned and 64 bit wide [s390x
ELF ABI]). For one this has the obvious problem that when the ID grows
past 32 bit (this can happen reasonably fast) it is truncated to 32 bit
when storing it in the cache variable and so doesn't match the original ID
anymore. The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to 'unsigned long'. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to load the 32
bit variable into a 64 bit register (e.g.: 'lgf %r11,188(%r15)'). So once
we pass the cached variable into 'zfcp_reqlist_find_rm()' to remove the
request again all the leading zeros will be flipped to ones to extend the
sign and won't match the original ID anymore (this has been observed in
practice).
If we can't successfully remove the request from the hash table again after
'zfcp_qdio_send()' fails (this happens regularly when zfcp cannot notify
the adapter about new work because the adapter is already gone during
e.g. a ChpID toggle) we will end up with a double free. We unconditionally
free the request in the calling function when 'zfcp_fsf_req_send()' fails,
but because the request is still in the hash table we end up with a stale
memory reference, and once the zfcp adapter is either reset during recovery
or shutdown we end up freeing the same memory twice.
The resulting stack traces vary depending on the kernel and have no direct
correlation to the place where the bug occurs. Here are three examples that
have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to 'unsigned
long', like the rest of zfcp and also the argument for
'zfcp_reqlist_find_rm()'. This prevents truncation and wrong sign extension
and so can successfully remove the request from the hash table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Link: https://lore.kernel.org/r/979f6e6019d15f91ba56182f1aaf68d61bf37fc6.16685955…
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
2e586641c950 ("ceph: do not update snapshot context when there is no new snapshot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51884d153f7ec85e18d607b2467820a90e0f4359 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Wed, 9 Nov 2022 11:00:39 +0800
Subject: [PATCH] ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 864cdaa0d2bd..e4151852184e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -763,7 +763,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
struct ceph_mds_snap_realm *ri; /* encoded */
__le64 *snaps; /* encoded */
__le64 *prior_parent_snaps; /* encoded */
- struct ceph_snap_realm *realm = NULL;
+ struct ceph_snap_realm *realm;
struct ceph_snap_realm *first_realm = NULL;
struct ceph_snap_realm *realm_to_rebuild = NULL;
int rebuild_snapcs;
@@ -774,6 +774,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("%s deletion=%d\n", __func__, deletion);
more:
+ realm = NULL;
rebuild_snapcs = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
2e586641c950 ("ceph: do not update snapshot context when there is no new snapshot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51884d153f7ec85e18d607b2467820a90e0f4359 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Wed, 9 Nov 2022 11:00:39 +0800
Subject: [PATCH] ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 864cdaa0d2bd..e4151852184e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -763,7 +763,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
struct ceph_mds_snap_realm *ri; /* encoded */
__le64 *snaps; /* encoded */
__le64 *prior_parent_snaps; /* encoded */
- struct ceph_snap_realm *realm = NULL;
+ struct ceph_snap_realm *realm;
struct ceph_snap_realm *first_realm = NULL;
struct ceph_snap_realm *realm_to_rebuild = NULL;
int rebuild_snapcs;
@@ -774,6 +774,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("%s deletion=%d\n", __func__, deletion);
more:
+ realm = NULL;
rebuild_snapcs = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
51884d153f7e ("ceph: avoid putting the realm twice when decoding snaps fails")
2e586641c950 ("ceph: do not update snapshot context when there is no new snapshot")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51884d153f7ec85e18d607b2467820a90e0f4359 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Wed, 9 Nov 2022 11:00:39 +0800
Subject: [PATCH] ceph: avoid putting the realm twice when decoding snaps fails
When decoding the snaps fails it maybe leaving the 'first_realm'
and 'realm' pointing to the same snaprealm memory. And then it'll
put it twice and could cause random use-after-free, BUG_ON, etc
issues.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/57686
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 864cdaa0d2bd..e4151852184e 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -763,7 +763,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
struct ceph_mds_snap_realm *ri; /* encoded */
__le64 *snaps; /* encoded */
__le64 *prior_parent_snaps; /* encoded */
- struct ceph_snap_realm *realm = NULL;
+ struct ceph_snap_realm *realm;
struct ceph_snap_realm *first_realm = NULL;
struct ceph_snap_realm *realm_to_rebuild = NULL;
int rebuild_snapcs;
@@ -774,6 +774,7 @@ int ceph_update_snap_trace(struct ceph_mds_client *mdsc,
dout("%s deletion=%d\n", __func__, deletion);
more:
+ realm = NULL;
rebuild_snapcs = 0;
ceph_decode_need(&p, e, sizeof(*ri), bad);
ri = p;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
7acae6183cf3 ("ceph: fix possible NULL pointer dereference for req->r_session")
89d43d0551a8 ("ceph: put the requests/sessions when it fails to alloc memory")
708c87168b61 ("ceph: fix off by one bugs in unsafe_request_wait()")
e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
7acae6183cf3 ("ceph: fix possible NULL pointer dereference for req->r_session")
89d43d0551a8 ("ceph: put the requests/sessions when it fails to alloc memory")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
7acae6183cf3 ("ceph: fix possible NULL pointer dereference for req->r_session")
89d43d0551a8 ("ceph: put the requests/sessions when it fails to alloc memory")
708c87168b61 ("ceph: fix off by one bugs in unsafe_request_wait()")
e1a4541ec0b9 ("ceph: flush the mdlog before waiting on unsafe reqs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5bd76b8de5b7 ("ceph: fix NULL pointer dereference for req->r_session")
aa1d627207ca ("ceph: Use kcalloc for allocating multiple elements")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5bd76b8de5b74fa941a6eafee87728a0fe072267 Mon Sep 17 00:00:00 2001
From: Xiubo Li <xiubli(a)redhat.com>
Date: Thu, 10 Nov 2022 21:01:59 +0800
Subject: [PATCH] ceph: fix NULL pointer dereference for req->r_session
The request's r_session maybe changed when it was forwarded or
resent. Both the forwarding and resending cases the requests will
be protected by the mdsc->mutex.
Cc: stable(a)vger.kernel.org
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2137955
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
Reviewed-by: Ilya Dryomov <idryomov(a)gmail.com>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index fb023f9fafcb..e54814d0c2f7 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2248,7 +2248,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_mds_request *req1 = NULL, *req2 = NULL;
- unsigned int max_sessions;
int ret, err = 0;
spin_lock(&ci->i_unsafe_lock);
@@ -2266,28 +2265,24 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
}
spin_unlock(&ci->i_unsafe_lock);
- /*
- * The mdsc->max_sessions is unlikely to be changed
- * mostly, here we will retry it by reallocating the
- * sessions array memory to get rid of the mdsc->mutex
- * lock.
- */
-retry:
- max_sessions = mdsc->max_sessions;
-
/*
* Trigger to flush the journal logs in all the relevant MDSes
* manually, or in the worst case we must wait at most 5 seconds
* to wait the journal logs to be flushed by the MDSes periodically.
*/
- if ((req1 || req2) && likely(max_sessions)) {
- struct ceph_mds_session **sessions = NULL;
- struct ceph_mds_session *s;
+ if (req1 || req2) {
struct ceph_mds_request *req;
+ struct ceph_mds_session **sessions;
+ struct ceph_mds_session *s;
+ unsigned int max_sessions;
int i;
+ mutex_lock(&mdsc->mutex);
+ max_sessions = mdsc->max_sessions;
+
sessions = kcalloc(max_sessions, sizeof(s), GFP_KERNEL);
if (!sessions) {
+ mutex_unlock(&mdsc->mutex);
err = -ENOMEM;
goto out;
}
@@ -2299,16 +2294,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2321,16 +2306,6 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
s = req->r_session;
if (!s)
continue;
- if (unlikely(s->s_mds >= max_sessions)) {
- spin_unlock(&ci->i_unsafe_lock);
- for (i = 0; i < max_sessions; i++) {
- s = sessions[i];
- if (s)
- ceph_put_mds_session(s);
- }
- kfree(sessions);
- goto retry;
- }
if (!sessions[s->s_mds]) {
s = ceph_get_mds_session(s);
sessions[s->s_mds] = s;
@@ -2342,11 +2317,12 @@ static int flush_mdlog_and_wait_inode_unsafe_requests(struct inode *inode)
/* the auth MDS */
spin_lock(&ci->i_ceph_lock);
if (ci->i_auth_cap) {
- s = ci->i_auth_cap->session;
- if (!sessions[s->s_mds])
- sessions[s->s_mds] = ceph_get_mds_session(s);
+ s = ci->i_auth_cap->session;
+ if (!sessions[s->s_mds])
+ sessions[s->s_mds] = ceph_get_mds_session(s);
}
spin_unlock(&ci->i_ceph_lock);
+ mutex_unlock(&mdsc->mutex);
/* send flush mdlog request to MDSes */
for (i = 0; i < max_sessions; i++) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7fdbc5f014c3 ("io_uring: disallow self-propelled ring polling")
2ba69707d915 ("io_uring: clean up io_poll_check_events return values")
d245bca6375b ("io_uring: don't expose io_fill_cqe_aux()")
f3b44f92e59a ("io_uring: move read/write related opcodes to its own file")
c98817e6cd44 ("io_uring: move remaining file table manipulation to filetable.c")
735729844819 ("io_uring: move rsrc related data, core, and commands")
3b77495a9723 ("io_uring: split provided buffers handling into its own file")
7aaff708a768 ("io_uring: move cancelation into its own file")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
92ac8beaea1f ("io_uring: include and forward-declaration sanitation")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7fdbc5f014c3f71bc44673a2d6c5bb2d12d45f25 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Fri, 18 Nov 2022 15:41:41 +0000
Subject: [PATCH] io_uring: disallow self-propelled ring polling
When we post a CQE we wake all ring pollers as it normally should be.
However, if a CQE was generated by a multishot poll request targeting
its own ring, it'll wake that request up, which will make it to post
a new CQE, which will wake the request and so on until it exhausts all
CQ entries.
Don't allow multishot polling io_uring files but downgrade them to
oneshots, which was always stated as a correct behaviour that the
userspace should check for.
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/3124038c0e7474d427538c2d915335ec28c92d21.16687857…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index c34019b18211..055632e9092a 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -246,6 +246,8 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
continue;
if (req->apoll_events & EPOLLONESHOT)
return IOU_POLL_DONE;
+ if (io_is_uring_fops(req->file))
+ return IOU_POLL_DONE;
/* multishot, just fill a CQE and proceed */
if (!(req->flags & REQ_F_APOLL_MULTISHOT)) {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
7090abd6ad06 ("serial: 8250_lpss: Use 16B DMA burst with Elkhart Lake")
2cb3315107b5 ("serial: 8250_lpss: Enable PSE UART Auto Flow Control")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7090abd6ad0610a144523ce4ffcb8560909bf2a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:51 +0200
Subject: [PATCH] serial: 8250_lpss: Use 16B DMA burst with Elkhart Lake
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Configure DMA to use 16B burst size with Elkhart Lake. This makes the
bus use more efficient and works around an issue which occurs with the
previously used 1B.
The fix was initially developed by Srikanth Thokala and Aman Kumar.
This together with the previous config change is the cleaned up version
of the original fix.
Fixes: 0a9410b981e9 ("serial: 8250_lpss: Enable DMA on Intel Elkhart Lake")
Cc: <stable(a)vger.kernel.org> # serial: 8250_lpss: Configure DMA also w/o DMA filter
Reported-by: Wentong Wu <wentong.wu(a)intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-4-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_lpss.c b/drivers/tty/serial/8250/8250_lpss.c
index 7d9cddbfef40..0e43bdfb7459 100644
--- a/drivers/tty/serial/8250/8250_lpss.c
+++ b/drivers/tty/serial/8250/8250_lpss.c
@@ -174,6 +174,8 @@ static int ehl_serial_setup(struct lpss8250 *lpss, struct uart_port *port)
*/
up->dma = dma;
+ lpss->dma_maxburst = 16;
+
port->set_termios = dw8250_do_set_termios;
return 0;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1980860e0c82 ("serial: 8250: Flush DMA Rx on RLSI")
a931237cbea2 ("serial: 8250: Fall back to non-DMA Rx if IIR_RDI occurs")
df561f6688fe ("treewide: Use fallthrough pseudo-keyword")
37711e5e2325 ("Merge tag 'nfs-for-5.9-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1980860e0c8299316cddaf0992dd9e1258ec9d88 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ilpo=20J=C3=A4rvinen?= <ilpo.jarvinen(a)linux.intel.com>
Date: Tue, 8 Nov 2022 14:19:52 +0200
Subject: [PATCH] serial: 8250: Flush DMA Rx on RLSI
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Returning true from handle_rx_dma() without flushing DMA first creates
a data ordering hazard. If DMA Rx has handled any character at the
point when RLSI occurs, the non-DMA path handles any pending characters
jumping them ahead of those characters that are pending under DMA.
Fixes: 75df022b5f89 ("serial: 8250_dma: Fix RX handling")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.c…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 92dd18716169..388172289627 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1901,10 +1901,9 @@ static bool handle_rx_dma(struct uart_8250_port *up, unsigned int iir)
if (!up->dma->rx_running)
break;
fallthrough;
+ case UART_IIR_RLSI:
case UART_IIR_RX_TIMEOUT:
serial8250_rx_dma_flush(up);
- fallthrough;
- case UART_IIR_RLSI:
return true;
}
return up->dma->rx_dma(up);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
57572cacd36e ("iio: accel: bma400: Ensure VDDIO is enable defore reading the chip ID.")
12c99f859fd3 ("iio: accel: bma400: conversion to device-managed function")
02e2af20f4f9 ("Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57572cacd36e6d4be7722d7770d23f4430219827 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Date: Sun, 2 Oct 2022 15:41:33 +0100
Subject: [PATCH] iio: accel: bma400: Ensure VDDIO is enable defore reading the
chip ID.
The regulator enables were after the check on the chip variant, which was
very unlikely to return a correct value when not powered.
Presumably all the device anyone is testing on have a regulator that
is already powered up when this code runs for reasons beyond the scope
of this driver. Move the read call down a few lines.
Fixes: 3cf7ded15e40 ("iio: accel: bma400: basic regulator support")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Dan Robertson <dan(a)dlrobertson.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20221002144133.3771029-1-jic23@kernel.org
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index ad8fce3e08cd..490c342ef72a 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -869,18 +869,6 @@ static int bma400_init(struct bma400_data *data)
unsigned int val;
int ret;
- /* Try to read chip_id register. It must return 0x90. */
- ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
- if (ret) {
- dev_err(data->dev, "Failed to read chip id register\n");
- return ret;
- }
-
- if (val != BMA400_ID_REG_VAL) {
- dev_err(data->dev, "Chip ID mismatch\n");
- return -ENODEV;
- }
-
data->regulators[BMA400_VDD_REGULATOR].supply = "vdd";
data->regulators[BMA400_VDDIO_REGULATOR].supply = "vddio";
ret = devm_regulator_bulk_get(data->dev,
@@ -906,6 +894,18 @@ static int bma400_init(struct bma400_data *data)
if (ret)
return ret;
+ /* Try to read chip_id register. It must return 0x90. */
+ ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
+ if (ret) {
+ dev_err(data->dev, "Failed to read chip id register\n");
+ return ret;
+ }
+
+ if (val != BMA400_ID_REG_VAL) {
+ dev_err(data->dev, "Chip ID mismatch\n");
+ return -ENODEV;
+ }
+
ret = bma400_get_power_mode(data);
if (ret) {
dev_err(data->dev, "Failed to get the initial power-mode\n");
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
57572cacd36e ("iio: accel: bma400: Ensure VDDIO is enable defore reading the chip ID.")
12c99f859fd3 ("iio: accel: bma400: conversion to device-managed function")
02e2af20f4f9 ("Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 57572cacd36e6d4be7722d7770d23f4430219827 Mon Sep 17 00:00:00 2001
From: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Date: Sun, 2 Oct 2022 15:41:33 +0100
Subject: [PATCH] iio: accel: bma400: Ensure VDDIO is enable defore reading the
chip ID.
The regulator enables were after the check on the chip variant, which was
very unlikely to return a correct value when not powered.
Presumably all the device anyone is testing on have a regulator that
is already powered up when this code runs for reasons beyond the scope
of this driver. Move the read call down a few lines.
Fixes: 3cf7ded15e40 ("iio: accel: bma400: basic regulator support")
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Reviewed-by: Dan Robertson <dan(a)dlrobertson.com>
Cc: <Stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20221002144133.3771029-1-jic23@kernel.org
diff --git a/drivers/iio/accel/bma400_core.c b/drivers/iio/accel/bma400_core.c
index ad8fce3e08cd..490c342ef72a 100644
--- a/drivers/iio/accel/bma400_core.c
+++ b/drivers/iio/accel/bma400_core.c
@@ -869,18 +869,6 @@ static int bma400_init(struct bma400_data *data)
unsigned int val;
int ret;
- /* Try to read chip_id register. It must return 0x90. */
- ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
- if (ret) {
- dev_err(data->dev, "Failed to read chip id register\n");
- return ret;
- }
-
- if (val != BMA400_ID_REG_VAL) {
- dev_err(data->dev, "Chip ID mismatch\n");
- return -ENODEV;
- }
-
data->regulators[BMA400_VDD_REGULATOR].supply = "vdd";
data->regulators[BMA400_VDDIO_REGULATOR].supply = "vddio";
ret = devm_regulator_bulk_get(data->dev,
@@ -906,6 +894,18 @@ static int bma400_init(struct bma400_data *data)
if (ret)
return ret;
+ /* Try to read chip_id register. It must return 0x90. */
+ ret = regmap_read(data->regmap, BMA400_CHIP_ID_REG, &val);
+ if (ret) {
+ dev_err(data->dev, "Failed to read chip id register\n");
+ return ret;
+ }
+
+ if (val != BMA400_ID_REG_VAL) {
+ dev_err(data->dev, "Chip ID mismatch\n");
+ return -ENODEV;
+ }
+
ret = bma400_get_power_mode(data);
if (ret) {
dev_err(data->dev, "Failed to get the initial power-mode\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
9d5333c93134 ("usb: cdns3: host: fix endless superspeed hub port reset")
88171f67a2c1 ("usb: cdns3: Removes xhci_cdns3_suspend_quirk from host-export.h")
3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
e93e58d27402 ("usb: cdnsp: Device side header file for CDNSP driver")
0b490046d8d7 ("usb: cdns3: Refactoring names in reusable code")
394c3a144de8 ("usb: cdns3: Moves reusable code to separate module")
f738957277ba ("usb: cdns3: Split core.c into cdns3-plat and core.c file")
db8892bb1bb6 ("usb: cdns3: Add support for DRD CDNSP")
d2a968dddf98 ("Merge tag 'usb-v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/peter.chen/usb into usb-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d5333c931347005352d5b8beaa43528c94cfc9c Mon Sep 17 00:00:00 2001
From: Li Jun <jun.li(a)nxp.com>
Date: Wed, 26 Oct 2022 15:07:49 -0400
Subject: [PATCH] usb: cdns3: host: fix endless superspeed hub port reset
When usb 3.0 hub connect with one USB 2.0 device and NO USB 3.0 device,
some usb hub reports endless port reset message.
[ 190.324169] usb 2-1: new SuperSpeed USB device number 88 using xhci-hcd
[ 190.352834] hub 2-1:1.0: USB hub found
[ 190.356995] hub 2-1:1.0: 4 ports detected
[ 190.700056] usb 2-1: USB disconnect, device number 88
[ 192.472139] usb 2-1: new SuperSpeed USB device number 89 using xhci-hcd
[ 192.500820] hub 2-1:1.0: USB hub found
[ 192.504977] hub 2-1:1.0: 4 ports detected
[ 192.852066] usb 2-1: USB disconnect, device number 89
The reason is the runtime pm state of USB2.0 port is active and
USB 3.0 port is suspend, so parent device is active state.
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/xhci-hcd.1.auto/usb2/power/runtime_status
suspended
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/xhci-hcd.1.auto/usb1/power/runtime_status
active
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/xhci-hcd.1.auto/power/runtime_status
active
cat /sys/bus/platform/devices/5b110000.usb/5b130000.usb/power/runtime_status
active
So xhci_cdns3_suspend_quirk() have not called. U3 configure is not applied.
move U3 configure into host start. Reinit again in resume function in case
controller power lost during suspend.
Cc: stable(a)vger.kernel.org 5.10
Signed-off-by: Li Jun <jun.li(a)nxp.com>
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
Reviewed-by: Peter Chen <peter.chen(a)kernel.org>
Acked-by: Alexander Stein <alexander.stein(a)ew.tq-group.com>
Link: https://lore.kernel.org/r/20221026190749.2280367-1-Frank.Li@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/cdns3/host.c b/drivers/usb/cdns3/host.c
index 9643b905e2d8..6164fc4c96a4 100644
--- a/drivers/usb/cdns3/host.c
+++ b/drivers/usb/cdns3/host.c
@@ -24,11 +24,37 @@
#define CFG_RXDET_P3_EN BIT(15)
#define LPM_2_STB_SWITCH_EN BIT(25)
-static int xhci_cdns3_suspend_quirk(struct usb_hcd *hcd);
+static void xhci_cdns3_plat_start(struct usb_hcd *hcd)
+{
+ struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+ u32 value;
+
+ /* set usbcmd.EU3S */
+ value = readl(&xhci->op_regs->command);
+ value |= CMD_PM_INDEX;
+ writel(value, &xhci->op_regs->command);
+
+ if (hcd->regs) {
+ value = readl(hcd->regs + XECP_AUX_CTRL_REG1);
+ value |= CFG_RXDET_P3_EN;
+ writel(value, hcd->regs + XECP_AUX_CTRL_REG1);
+
+ value = readl(hcd->regs + XECP_PORT_CAP_REG);
+ value |= LPM_2_STB_SWITCH_EN;
+ writel(value, hcd->regs + XECP_PORT_CAP_REG);
+ }
+}
+
+static int xhci_cdns3_resume_quirk(struct usb_hcd *hcd)
+{
+ xhci_cdns3_plat_start(hcd);
+ return 0;
+}
static const struct xhci_plat_priv xhci_plat_cdns3_xhci = {
.quirks = XHCI_SKIP_PHY_INIT | XHCI_AVOID_BEI,
- .suspend_quirk = xhci_cdns3_suspend_quirk,
+ .plat_start = xhci_cdns3_plat_start,
+ .resume_quirk = xhci_cdns3_resume_quirk,
};
static int __cdns_host_init(struct cdns *cdns)
@@ -90,32 +116,6 @@ static int __cdns_host_init(struct cdns *cdns)
return ret;
}
-static int xhci_cdns3_suspend_quirk(struct usb_hcd *hcd)
-{
- struct xhci_hcd *xhci = hcd_to_xhci(hcd);
- u32 value;
-
- if (pm_runtime_status_suspended(hcd->self.controller))
- return 0;
-
- /* set usbcmd.EU3S */
- value = readl(&xhci->op_regs->command);
- value |= CMD_PM_INDEX;
- writel(value, &xhci->op_regs->command);
-
- if (hcd->regs) {
- value = readl(hcd->regs + XECP_AUX_CTRL_REG1);
- value |= CFG_RXDET_P3_EN;
- writel(value, hcd->regs + XECP_AUX_CTRL_REG1);
-
- value = readl(hcd->regs + XECP_PORT_CAP_REG);
- value |= LPM_2_STB_SWITCH_EN;
- writel(value, hcd->regs + XECP_PORT_CAP_REG);
- }
-
- return 0;
-}
-
static void cdns_host_exit(struct cdns *cdns)
{
kfree(cdns->xhci_plat_data);
On Mon, Nov 21, 2022 at 12:08 AM -05, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> to the 6.0-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> l2tp-serialize-access-to-sk_user_data-with-sk_callba.patch
> and it can be found in the queue-6.0 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please don't add the commit below to stable tree, yet.
We have a fix-up for it under review:
https://lore.kernel.org/netdev/20221121085426.21315-1-jakub@cloudflare.com/
> commit 87a828cfa5ce4b2075d26660756c07751648d13f
> Author: Jakub Sitnicki <jakub(a)cloudflare.com>
> Date: Mon Nov 14 20:16:19 2022 +0100
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> [ Upstream commit b68777d54fac21fc833ec26ea1a2a84f975ab035 ]
>
> sk->sk_user_data has multiple users, which are not compatible with each
> other. Writers must synchronize by grabbing the sk->sk_callback_lock.
>
> l2tp currently fails to grab the lock when modifying the underlying tunnel
> socket fields. Fix it by adding appropriate locking.
>
> We err on the side of safety and grab the sk_callback_lock also inside the
> sk_destruct callback overridden by l2tp, even though there should be no
> refs allowing access to the sock at the time when sk_destruct gets called.
>
> v4:
> - serialize write to sk_user_data in l2tp sk_destruct
>
> v3:
> - switch from sock lock to sk_callback_lock
> - document write-protection for sk_user_data
>
> v2:
> - update Fixes to point to origin of the bug
> - use real names in Reported/Tested-by tags
>
> Cc: Tom Parkin <tparkin(a)katalix.com>
> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
> Reported-by: Haowei Yan <g1042620637(a)gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index f6e6838c82df..03a4ebe3ccc8 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -323,7 +323,7 @@ struct sk_filter;
> * @sk_tskey: counter to disambiguate concurrent tstamp requests
> * @sk_zckey: counter to order MSG_ZEROCOPY notifications
> * @sk_socket: Identd and reporting IO signals
> - * @sk_user_data: RPC layer private data
> + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock.
> * @sk_frag: cached page frag
> * @sk_peek_off: current peek_offset value
> * @sk_send_head: front of stuff to transmit
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 7499c51b1850..754fdda8a5f5 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk)
> }
>
> /* Remove hooks into tunnel socket */
> + write_lock_bh(&sk->sk_callback_lock);
> sk->sk_destruct = tunnel->old_sk_destruct;
> sk->sk_user_data = NULL;
> + write_unlock_bh(&sk->sk_callback_lock);
>
> /* Call the original destructor */
> if (sk->sk_destruct)
> @@ -1469,16 +1471,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock = sockfd_lookup(tunnel->fd, &ret);
> if (!sock)
> goto err;
> -
> - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap);
> - if (ret < 0)
> - goto err_sock;
> }
>
> + sk = sock->sk;
> + write_lock(&sk->sk_callback_lock);
> +
> + ret = l2tp_validate_socket(sk, net, tunnel->encap);
> + if (ret < 0)
> + goto err_sock;
> +
> tunnel->l2tp_net = net;
> pn = l2tp_pernet(net);
>
> - sk = sock->sk;
> sock_hold(sk);
> tunnel->sock = sk;
>
> @@ -1504,7 +1508,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
>
> setup_udp_tunnel_sock(net, sock, &udp_cfg);
> } else {
> - sk->sk_user_data = tunnel;
> + rcu_assign_sk_user_data(sk, tunnel);
> }
>
> tunnel->old_sk_destruct = sk->sk_destruct;
> @@ -1518,6 +1522,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> if (tunnel->fd >= 0)
> sockfd_put(sock);
>
> + write_unlock(&sk->sk_callback_lock);
> return 0;
>
> err_sock:
> @@ -1525,6 +1530,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock_release(sock);
> else
> sockfd_put(sock);
> +
> + write_unlock(&sk->sk_callback_lock);
> err:
> return ret;
> }
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
b98186aee22f ("io_uring: update res mask in io_poll_check_events")
d245bca6375b ("io_uring: don't expose io_fill_cqe_aux()")
f3b44f92e59a ("io_uring: move read/write related opcodes to its own file")
c98817e6cd44 ("io_uring: move remaining file table manipulation to filetable.c")
735729844819 ("io_uring: move rsrc related data, core, and commands")
3b77495a9723 ("io_uring: split provided buffers handling into its own file")
7aaff708a768 ("io_uring: move cancelation into its own file")
329061d3e2f9 ("io_uring: move poll handling into its own file")
cfd22e6b3319 ("io_uring: add opcode name to io_op_defs")
92ac8beaea1f ("io_uring: include and forward-declaration sanitation")
c9f06aa7de15 ("io_uring: move io_uring_task (tctx) helpers into its own file")
a4ad4f748ea9 ("io_uring: move fdinfo helpers to its own file")
e5550a1447bf ("io_uring: use io_is_uring_fops() consistently")
17437f311490 ("io_uring: move SQPOLL related handling into its own file")
59915143e89f ("io_uring: move timeout opcodes and handling into its own file")
e418bbc97bff ("io_uring: move our reference counting into a header")
36404b09aa60 ("io_uring: move msg_ring into its own file")
f9ead18c1058 ("io_uring: split network related opcodes into its own file")
e0da14def1ee ("io_uring: move statx handling to its own file")
a9c210cebe13 ("io_uring: move epoll handler to its own file")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b98186aee22fa593bc8c6b2c5d839c2ee518bc8c Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 17 Nov 2022 18:40:14 +0000
Subject: [PATCH] io_uring: update res mask in io_poll_check_events
When io_poll_check_events() collides with someone attempting to queue a
task work, it'll spin for one more time. However, it'll continue to use
the mask from the first iteration instead of updating it. For example,
if the first wake up was a EPOLLIN and the second EPOLLOUT, the
userspace will not get EPOLLOUT in time.
Clear the mask for all subsequent iterations to force vfs_poll().
Cc: stable(a)vger.kernel.org
Fixes: aa43477b04025 ("io_uring: poll rework")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/2dac97e8f691231049cb259c4ae57e79e40b537c.16687102…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/poll.c b/io_uring/poll.c
index f500506984ec..90920abf91ff 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -258,6 +258,9 @@ static int io_poll_check_events(struct io_kiocb *req, bool *locked)
return ret;
}
+ /* force the next iteration to vfs_poll() */
+ req->cqe.res = 0;
+
/*
* Release all references, retry if someone tried to restart
* task_work while we were executing it.
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
4d2852412306 ("drm/amd/display: Fix calculation for cursor CAB allocation")
525a65c77db5 ("drm/amd/display: Update MALL SS NumWays calculation")
6eef37460584 ("drm/amd/display: Add debug option for allocating extra way for cursor")
5c1a431aaf52 ("drm/amd/display: Added debug option for forcing subvp num ways")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4d285241230676ba8b888701b89684b4e0360fcc Mon Sep 17 00:00:00 2001
From: George Shen <george.shen(a)amd.com>
Date: Tue, 1 Nov 2022 23:03:03 -0400
Subject: [PATCH] drm/amd/display: Fix calculation for cursor CAB allocation
[Why]
The cursor size (in memory) is currently incorrectly calculated,
resulting not enough CAB being allocated for static screen cursor
in MALL refresh. This results in cursor image corruption.
[How]
Use cursor pitch instead of cursor width when calculating cursor size.
Update num cache lines calculation to use the result of the cursor size
calculation instead of manually recalculating again.
Reviewed-by: Alvin Lee <Alvin.Lee2(a)amd.com>
Acked-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: George Shen <george.shen(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.0.x
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index cf5bd9713f54..ac41a763bb1d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -283,8 +283,7 @@ static uint32_t dcn32_calculate_cab_allocation(struct dc *dc, struct dc_state *c
using the max for calculation */
if (hubp->curs_attr.width > 0) {
- // Round cursor width to next multiple of 64
- cursor_size = (((hubp->curs_attr.width + 63) / 64) * 64) * hubp->curs_attr.height;
+ cursor_size = hubp->curs_attr.pitch * hubp->curs_attr.height;
switch (pipe->stream->cursor_attributes.color_format) {
case CURSOR_MODE_MONO:
@@ -309,9 +308,9 @@ static uint32_t dcn32_calculate_cab_allocation(struct dc *dc, struct dc_state *c
cursor_size > 16384) {
/* cursor_num_mblk = CEILING(num_cursors*cursor_width*cursor_width*cursor_Bpe/mblk_bytes, 1)
*/
- cache_lines_used += (((hubp->curs_attr.width * hubp->curs_attr.height * cursor_bpp +
- DCN3_2_MALL_MBLK_SIZE_BYTES - 1) / DCN3_2_MALL_MBLK_SIZE_BYTES) *
- DCN3_2_MALL_MBLK_SIZE_BYTES) / dc->caps.cache_line_size + 2;
+ cache_lines_used += (((cursor_size + DCN3_2_MALL_MBLK_SIZE_BYTES - 1) /
+ DCN3_2_MALL_MBLK_SIZE_BYTES) * DCN3_2_MALL_MBLK_SIZE_BYTES) /
+ dc->caps.cache_line_size + 2;
}
break;
}
@@ -727,10 +726,7 @@ void dcn32_update_mall_sel(struct dc *dc, struct dc_state *context)
struct hubp *hubp = pipe->plane_res.hubp;
if (pipe->stream && pipe->plane_state && hubp && hubp->funcs->hubp_update_mall_sel) {
- //Round cursor width up to next multiple of 64
- int cursor_width = ((hubp->curs_attr.width + 63) / 64) * 64;
- int cursor_height = hubp->curs_attr.height;
- int cursor_size = cursor_width * cursor_height;
+ int cursor_size = hubp->curs_attr.pitch * hubp->curs_attr.height;
switch (hubp->curs_attr.color_format) {
case CURSOR_MODE_MONO:
From: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
In vc4_platform_drm_probe() function vc4_match_add_drivers() is called to
find component matches for the component drivers. If no such match is found
the passed variable "match" is still NULL after the function returns.
Do not pass "match" to component_master_add_with_match() in this case since
this results in a NULL pointer access as soon as match->num is used to
allocate a component_match array. Instead return with -ENODEV from the
drivers probe function.
Fixes: c8b75bca92cb ("drm/vc4: Add KMS support for Raspberry Pi.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
---
drivers/gpu/drm/vc4/vc4_drv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_drv.c b/drivers/gpu/drm/vc4/vc4_drv.c
index 2027063fdc30..2e53d7f8ad44 100644
--- a/drivers/gpu/drm/vc4/vc4_drv.c
+++ b/drivers/gpu/drm/vc4/vc4_drv.c
@@ -437,6 +437,9 @@ static int vc4_platform_drm_probe(struct platform_device *pdev)
vc4_match_add_drivers(dev, &match,
component_drivers, ARRAY_SIZE(component_drivers));
+ if (!match)
+ return -ENODEV;
+
return component_master_add_with_match(dev, &vc4_drm_ops, match);
}
base-commit: 30a0b95b1335e12efef89dd78518ed3e4a71a763
--
2.36.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
13292494379f ("tracing: Make struct ring_buffer less ambiguous")
1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer")
2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h")
a47b53e95acc ("tracing: Rename tracing_reset() to tracing_reset_cpu()")
46cc0b44428d ("tracing/snapshot: Resize spare buffer if size changed")
d2d8b146043a ("Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 20 Oct 2022 23:14:27 -0400
Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Primiano Tucci <primiano(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2504df9a0453..3c7d295746f6 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table);
+ struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9712083832f4..089b1ec9cb3b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu)
return cnt - read;
}
+static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ size_t nr_pages;
+ size_t dirty;
+
+ nr_pages = cpu_buffer->nr_pages;
+ if (!nr_pages || !full)
+ return true;
+
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+
+ return (dirty * 100) > (full * nr_pages);
+}
+
/*
* rb_wake_up_waiters - wake up tasks waiting for ring buffer input
*
@@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
- size_t nr_pages;
- size_t dirty;
+ bool done;
if (!full)
break;
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ done = !pagebusy && full_hit(buffer, cpu, full);
+
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (!pagebusy &&
- (!nr_pages || (dirty * 100) > full * nr_pages))
+ if (done)
break;
}
@@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* @cpu: the cpu buffer to wait on
* @filp: the file descriptor
* @poll_table: The poll descriptor
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
@@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* zero otherwise.
*/
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table)
+ struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;
- if (cpu == RING_BUFFER_ALL_CPUS)
+ if (cpu == RING_BUFFER_ALL_CPUS) {
work = &buffer->irq_work;
- else {
+ full = 0;
+ } else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -EINVAL;
@@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
work = &cpu_buffer->irq_work;
}
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ if (full) {
+ poll_wait(filp, &work->full_waiters, poll_table);
+ work->full_waiters_pending = true;
+ } else {
+ poll_wait(filp, &work->waiters, poll_table);
+ work->waiters_pending = true;
+ }
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
+ if (full)
+ return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
+
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
@@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
{
- size_t nr_pages;
- size_t dirty;
- size_t full;
-
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
/* irq_work_queue() supplies it's own memory barriers */
@@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched);
- full = cpu_buffer->shortest_full;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
- if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+ if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full))
return;
cpu_buffer->irq_work.wakeup_full = true;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 47a44b055a1d..c6c7a0af3ed2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl
return EPOLLIN | EPOLLRDNORM;
else
return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file,
- filp, poll_table);
+ filp, poll_table, iter->tr->buffer_percent);
}
static __poll_t
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
13292494379f ("tracing: Make struct ring_buffer less ambiguous")
1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer")
2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h")
a47b53e95acc ("tracing: Rename tracing_reset() to tracing_reset_cpu()")
46cc0b44428d ("tracing/snapshot: Resize spare buffer if size changed")
d2d8b146043a ("Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 20 Oct 2022 23:14:27 -0400
Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Primiano Tucci <primiano(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2504df9a0453..3c7d295746f6 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table);
+ struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9712083832f4..089b1ec9cb3b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu)
return cnt - read;
}
+static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ size_t nr_pages;
+ size_t dirty;
+
+ nr_pages = cpu_buffer->nr_pages;
+ if (!nr_pages || !full)
+ return true;
+
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+
+ return (dirty * 100) > (full * nr_pages);
+}
+
/*
* rb_wake_up_waiters - wake up tasks waiting for ring buffer input
*
@@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
- size_t nr_pages;
- size_t dirty;
+ bool done;
if (!full)
break;
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ done = !pagebusy && full_hit(buffer, cpu, full);
+
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (!pagebusy &&
- (!nr_pages || (dirty * 100) > full * nr_pages))
+ if (done)
break;
}
@@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* @cpu: the cpu buffer to wait on
* @filp: the file descriptor
* @poll_table: The poll descriptor
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
@@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* zero otherwise.
*/
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table)
+ struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;
- if (cpu == RING_BUFFER_ALL_CPUS)
+ if (cpu == RING_BUFFER_ALL_CPUS) {
work = &buffer->irq_work;
- else {
+ full = 0;
+ } else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -EINVAL;
@@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
work = &cpu_buffer->irq_work;
}
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ if (full) {
+ poll_wait(filp, &work->full_waiters, poll_table);
+ work->full_waiters_pending = true;
+ } else {
+ poll_wait(filp, &work->waiters, poll_table);
+ work->waiters_pending = true;
+ }
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
+ if (full)
+ return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
+
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
@@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
{
- size_t nr_pages;
- size_t dirty;
- size_t full;
-
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
/* irq_work_queue() supplies it's own memory barriers */
@@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched);
- full = cpu_buffer->shortest_full;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
- if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+ if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full))
return;
cpu_buffer->irq_work.wakeup_full = true;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 47a44b055a1d..c6c7a0af3ed2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl
return EPOLLIN | EPOLLRDNORM;
else
return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file,
- filp, poll_table);
+ filp, poll_table, iter->tr->buffer_percent);
}
static __poll_t
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark")
7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()")
13292494379f ("tracing: Make struct ring_buffer less ambiguous")
1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer")
2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h")
a47b53e95acc ("tracing: Rename tracing_reset() to tracing_reset_cpu()")
46cc0b44428d ("tracing/snapshot: Resize spare buffer if size changed")
d2d8b146043a ("Merge tag 'trace-v5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 20 Oct 2022 23:14:27 -0400
Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark
Currently the way polling works on the ring buffer is broken. It will
return immediately if there's any data in the ring buffer whereas a read
will block until the watermark (defined by the tracefs buffer_percent file)
is hit.
That is, a select() or poll() will return as if there's data available,
but then the following read will block. This is broken for the way
select()s and poll()s are supposed to work.
Have the polling on the ring buffer also block the same way reads and
splice does on the ring buffer.
Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Primiano Tucci <primiano(a)google.com>
Cc: stable(a)vger.kernel.org
Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 2504df9a0453..3c7d295746f6 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table);
+ struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
#define RING_BUFFER_ALL_CPUS -1
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 9712083832f4..089b1ec9cb3b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu)
return cnt - read;
}
+static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ size_t nr_pages;
+ size_t dirty;
+
+ nr_pages = cpu_buffer->nr_pages;
+ if (!nr_pages || !full)
+ return true;
+
+ dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+
+ return (dirty * 100) > (full * nr_pages);
+}
+
/*
* rb_wake_up_waiters - wake up tasks waiting for ring buffer input
*
@@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
!ring_buffer_empty_cpu(buffer, cpu)) {
unsigned long flags;
bool pagebusy;
- size_t nr_pages;
- size_t dirty;
+ bool done;
if (!full)
break;
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu);
+ done = !pagebusy && full_hit(buffer, cpu, full);
+
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (!pagebusy &&
- (!nr_pages || (dirty * 100) > full * nr_pages))
+ if (done)
break;
}
@@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* @cpu: the cpu buffer to wait on
* @filp: the file descriptor
* @poll_table: The poll descriptor
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
@@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* zero otherwise.
*/
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
- struct file *filp, poll_table *poll_table)
+ struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct rb_irq_work *work;
- if (cpu == RING_BUFFER_ALL_CPUS)
+ if (cpu == RING_BUFFER_ALL_CPUS) {
work = &buffer->irq_work;
- else {
+ full = 0;
+ } else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return -EINVAL;
@@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
work = &cpu_buffer->irq_work;
}
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ if (full) {
+ poll_wait(filp, &work->full_waiters, poll_table);
+ work->full_waiters_pending = true;
+ } else {
+ poll_wait(filp, &work->waiters, poll_table);
+ work->waiters_pending = true;
+ }
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
+ if (full)
+ return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
+
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
@@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer,
static __always_inline void
rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
{
- size_t nr_pages;
- size_t dirty;
- size_t full;
-
if (buffer->irq_work.waiters_pending) {
buffer->irq_work.waiters_pending = false;
/* irq_work_queue() supplies it's own memory barriers */
@@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched);
- full = cpu_buffer->shortest_full;
- nr_pages = cpu_buffer->nr_pages;
- dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu);
- if (full && nr_pages && (dirty * 100) <= full * nr_pages)
+ if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full))
return;
cpu_buffer->irq_work.wakeup_full = true;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 47a44b055a1d..c6c7a0af3ed2 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl
return EPOLLIN | EPOLLRDNORM;
else
return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file,
- filp, poll_table);
+ filp, poll_table, iter->tr->buffer_percent);
}
static __poll_t
This bug is marked as fixed by commit:
net: core: netlink: add helper refcount dec and lock function
net: sched: add helper function to take reference to Qdisc
net: sched: extend Qdisc with rcu
net: sched: rename qdisc_destroy() to qdisc_put()
net: sched: use Qdisc rcu API instead of relying on rtnl lock
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
When extending segments, nilfs_sufile_alloc() is called to get an
unassigned segment, then mark it as dirty to avoid accidentally
allocating the same segment in the future.
But for some special cases such as a corrupted image it can be
unreliable.
If such corruption of the dirty state of the segment occurs, nilfs2 may
reallocate a segment that is in use and pick the same segment for
writing twice at the same time.
This will cause the problem reported by syzkaller:
https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd…
This case started with segbuf1.segnum = 3, nextnum = 4 when constructed.
It supposed segment 4 has already been allocated and marked as dirty.
However the dirty state was corrupted and segment 4 usage was not dirty.
For the first time nilfs_segctor_extend_segments() segment 4 was
allocated again, which made segbuf2 and next segbuf3 had same segment 4.
sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is
added to both buffer lists of two segbuf. It makes the lists broken
which causes NULL pointer dereference.
Fix the problem by setting usage as dirty every time in
nilfs_sufile_mark_dirty(), which is called during constructing current
segment to be written out and before allocating next segment.
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+77e4f005cb899d4268d1(a)syzkaller.appspotmail.com
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
---
v1 -> v2:
1) Add lock protection as Ryusuke suggested and slightly fix commit
message.
2) Fix and add tags.
v2 -> v3:
Fix commit message to make it clear.
---
fs/nilfs2/sufile.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 77ff8e95421f..dc359b56fdfa 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -495,14 +495,22 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum)
{
struct buffer_head *bh;
+ void *kaddr;
+ struct nilfs_segment_usage *su;
int ret;
+ down_write(&NILFS_MDT(sufile)->mi_sem);
ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
if (!ret) {
mark_buffer_dirty(bh);
nilfs_mdt_mark_dirty(sufile);
+ kaddr = kmap_atomic(bh->b_page);
+ su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
+ nilfs_segment_usage_set_dirty(su);
+ kunmap_atomic(kaddr);
brelse(bh);
}
+ up_write(&NILFS_MDT(sufile)->mi_sem);
return ret;
}
--
2.17.1
In nilfs_sufile_mark_dirty(), the buffer and inode are set dirty, but
nilfs_segment_usage is not set dirty, which makes it can be found by
nilfs_sufile_alloc() because it checks nilfs_segment_usage_clean(su).
This will cause the problem reported by syzkaller:
https://syzkaller.appspot.com/bug?id=c7c4748e11ffcc367cef04f76e02e931833cbd…
It's because the case starts with segbuf1.segnum = 3, nextnum = 4, and
nilfs_sufile_alloc() not called to allocate a new segment.
The first time nilfs_segctor_extend_segments() allocated segment
segbuf2.segnum = segbuf1.nextnum = 4, then nilfs_sufile_alloc() found
nextnextnum = 4 segment because its su is not set dirty.
So segbuf2.nextnum = 4, which causes next segbuf3.segnum = 4.
sb_getblk() will get same bh for segbuf2 and segbuf3, and this bh is
added to both buffer lists of two segbuf.
It makes the list head of second list linked to the first one. When
iterating the first one, it will access and deref the head of second,
which causes NULL pointer dereference.
Fix this by setting usage as dirty in nilfs_sufile_mark_dirty(),
and add lock in it to protect the usage modification.
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+77e4f005cb899d4268d1(a)syzkaller.appspotmail.com
Reported-by: Liu Shixin <liushixin2(a)huawei.com>
Signed-off-by: Chen Zhongjin <chenzhongjin(a)huawei.com>
Acked-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
---
v1 -> v2:
1) Add lock protection as Ryusuke suggested and slightly fix commit
message.
2) Fix and add tags.
---
fs/nilfs2/sufile.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/nilfs2/sufile.c b/fs/nilfs2/sufile.c
index 77ff8e95421f..dc359b56fdfa 100644
--- a/fs/nilfs2/sufile.c
+++ b/fs/nilfs2/sufile.c
@@ -495,14 +495,22 @@ void nilfs_sufile_do_free(struct inode *sufile, __u64 segnum,
int nilfs_sufile_mark_dirty(struct inode *sufile, __u64 segnum)
{
struct buffer_head *bh;
+ void *kaddr;
+ struct nilfs_segment_usage *su;
int ret;
+ down_write(&NILFS_MDT(sufile)->mi_sem);
ret = nilfs_sufile_get_segment_usage_block(sufile, segnum, 0, &bh);
if (!ret) {
mark_buffer_dirty(bh);
nilfs_mdt_mark_dirty(sufile);
+ kaddr = kmap_atomic(bh->b_page);
+ su = nilfs_sufile_block_get_segment_usage(sufile, segnum, bh, kaddr);
+ nilfs_segment_usage_set_dirty(su);
+ kunmap_atomic(kaddr);
brelse(bh);
}
+ up_write(&NILFS_MDT(sufile)->mi_sem);
return ret;
}
--
2.17.1
On Mon, Nov 21, 2022 at 12:12 AM -05, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> l2tp-serialize-access-to-sk_user_data-with-sk_callba.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please don't add the commit below to stable tree, yet.
We have a fix-up for it under review:
https://lore.kernel.org/netdev/20221121085426.21315-1-jakub@cloudflare.com/
> commit 92aa1132edc6e6e912efd056c698cd6e52b83c6f
> Author: Jakub Sitnicki <jakub(a)cloudflare.com>
> Date: Mon Nov 14 20:16:19 2022 +0100
>
> l2tp: Serialize access to sk_user_data with sk_callback_lock
>
> [ Upstream commit b68777d54fac21fc833ec26ea1a2a84f975ab035 ]
>
> sk->sk_user_data has multiple users, which are not compatible with each
> other. Writers must synchronize by grabbing the sk->sk_callback_lock.
>
> l2tp currently fails to grab the lock when modifying the underlying tunnel
> socket fields. Fix it by adding appropriate locking.
>
> We err on the side of safety and grab the sk_callback_lock also inside the
> sk_destruct callback overridden by l2tp, even though there should be no
> refs allowing access to the sock at the time when sk_destruct gets called.
>
> v4:
> - serialize write to sk_user_data in l2tp sk_destruct
>
> v3:
> - switch from sock lock to sk_callback_lock
> - document write-protection for sk_user_data
>
> v2:
> - update Fixes to point to origin of the bug
> - use real names in Reported/Tested-by tags
>
> Cc: Tom Parkin <tparkin(a)katalix.com>
> Fixes: 3557baabf280 ("[L2TP]: PPP over L2TP driver core")
> Reported-by: Haowei Yan <g1042620637(a)gmail.com>
> Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e1a303e4f0f7..3e9db5146765 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -323,7 +323,7 @@ struct bpf_local_storage;
> * @sk_tskey: counter to disambiguate concurrent tstamp requests
> * @sk_zckey: counter to order MSG_ZEROCOPY notifications
> * @sk_socket: Identd and reporting IO signals
> - * @sk_user_data: RPC layer private data
> + * @sk_user_data: RPC layer private data. Write-protected by @sk_callback_lock.
> * @sk_frag: cached page frag
> * @sk_peek_off: current peek_offset value
> * @sk_send_head: front of stuff to transmit
> diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
> index 93271a2632b8..c77032638a06 100644
> --- a/net/l2tp/l2tp_core.c
> +++ b/net/l2tp/l2tp_core.c
> @@ -1150,8 +1150,10 @@ static void l2tp_tunnel_destruct(struct sock *sk)
> }
>
> /* Remove hooks into tunnel socket */
> + write_lock_bh(&sk->sk_callback_lock);
> sk->sk_destruct = tunnel->old_sk_destruct;
> sk->sk_user_data = NULL;
> + write_unlock_bh(&sk->sk_callback_lock);
>
> /* Call the original destructor */
> if (sk->sk_destruct)
> @@ -1471,16 +1473,18 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock = sockfd_lookup(tunnel->fd, &ret);
> if (!sock)
> goto err;
> -
> - ret = l2tp_validate_socket(sock->sk, net, tunnel->encap);
> - if (ret < 0)
> - goto err_sock;
> }
>
> + sk = sock->sk;
> + write_lock(&sk->sk_callback_lock);
> +
> + ret = l2tp_validate_socket(sk, net, tunnel->encap);
> + if (ret < 0)
> + goto err_sock;
> +
> tunnel->l2tp_net = net;
> pn = l2tp_pernet(net);
>
> - sk = sock->sk;
> sock_hold(sk);
> tunnel->sock = sk;
>
> @@ -1506,7 +1510,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
>
> setup_udp_tunnel_sock(net, sock, &udp_cfg);
> } else {
> - sk->sk_user_data = tunnel;
> + rcu_assign_sk_user_data(sk, tunnel);
> }
>
> tunnel->old_sk_destruct = sk->sk_destruct;
> @@ -1520,6 +1524,7 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> if (tunnel->fd >= 0)
> sockfd_put(sock);
>
> + write_unlock(&sk->sk_callback_lock);
> return 0;
>
> err_sock:
> @@ -1527,6 +1532,8 @@ int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
> sock_release(sock);
> else
> sockfd_put(sock);
> +
> + write_unlock(&sk->sk_callback_lock);
> err:
> return ret;
> }
#regzbot introduced v5.19-rc6..1dd685c414a7b9fdb3d23aca3aedae84f0b998ae
Hi, I recently tried to upgrade to linux v6.0.x but when trying to
boot it fails with "error: out of memory" when or after loading
initramfs (which then kpanics because the vfs root is missing).
The latest releases I tested are v6.0.9 and v6.1-rc5 and it's broken there too.
I bisected the error to this patch:
1dd685c414a7b9fdb3d23aca3aedae84f0b998ae "XArray: Add calls to
might_alloc()" is the first bad commit.
I've confirmed this is not a side effect of a poor bitsect because
1dd685c414a7b9fdb3d23aca3aedae84f0b998ae~1 (v5.19-rc6) works.
I've tried reverting the failing commit on top of v6.0.9 and it didn't fixed it.
My system is:
CPU: Ryzen 3600
Motherboard: B550 AORUS ELITE V2
Ram: 48GB (16+32) of unmatched DDR4
GPU: AMD rx580
Various ssds, hdds and network cards plugged with various buses.
You can find a folder with my .config, bisect logs and screenshots of
the error messages there:
https://jorropo.net/ipfs/QmaWH84UPEen4E9n69KZiLjPDaTi2aJvizv3JYiL7Gfmnr/https://ipfs.io/ipfs/QmaWH84UPEen4E9n69KZiLjPDaTi2aJvizv3JYiL7Gfmnr/
I'll be happy to assist you if you need help reproducing this issue
and or testing fixes.
Thx, Jorropo
Since commit 0f0101719138 ("usb: dwc3: Don't switch OTG -> peripheral if extcon is present"),
Dual Role support on Intel Merrifield platform broke due to rearranging
the call to dwc3_get_extcon().
It appears to be caused by ulpi_read_id() masking the timeout on the first
test write. In the past dwc3 probe continued by calling dwc3_core_soft_reset()
followed by dwc3_get_extcon() which happend to return -EPROBE_DEFER.
On deferred probe ulpi_read_id() finally succeeded. Due to above mentioned
rearranging -EPROBE_DEFER is not returned and probe completes without phy.
As we now changed ulpi_read_id() to return -ETIMEDOUT in this case, we
need to handle the error by calling dwc3_core_soft_reset() and request
-EPROBE_DEFER. On deferred probe ulpi_read_id() is retried and succeeds.
Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ferry Toth <ftoth(a)exalondelft.nl>
---
drivers/usb/dwc3/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 648f1c570021..2779f17bffaf 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1106,8 +1106,13 @@ static int dwc3_core_init(struct dwc3 *dwc)
if (!dwc->ulpi_ready) {
ret = dwc3_core_ulpi_init(dwc);
- if (ret)
+ if (ret) {
+ if (ret == -ETIMEDOUT) {
+ dwc3_core_soft_reset(dwc);
+ ret = -EPROBE_DEFER;
+ }
goto err0;
+ }
dwc->ulpi_ready = true;
}
--
2.37.2
Hi, this is your Linux kernel regression tracker speaking.
I noticed a regression report in bugzilla.kernel.org. As many (most?)
kernel developer don't keep an eye on it, I decided to forward it by
mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216675 :
> Casey Tucker 2022-11-09 20:09:22 UTC
>
> After updating kernel to latest, my Roland audio device no longer shows up in aplay -l or cat /proc/asound/cards.
>
> I'm running Arch Linux. The last known good kernel was 6.0.2-arch1, and I was able to determine by booting a couple of virtual machines in qemu that an upstream patch shipped in 6.0.3-arch3 refactors card registration, and effectively breaks initialization of this device.
>
> Patch: https://lore.kernel.org/all/20220904161247.16461-1-tiwai@suse.de/
@Casey, btw and just to be sure you are aware of it, there was a fix-up
patch ("ALSA: usb-audio: Fix last interface check for registration") for
that commit in 6.0.3, too:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
> I will be looking at this later this evening attempting to implement a fix that doesn't depend on reverting this patch, and may update this bug report with more details.
See the ticket for more details.
BTW, let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 30d629795e2
https://bugzilla.kernel.org/show_bug.cgi?id=216675
#regzbot ignore-activity
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
The cpufreq_driver->get() callback is supposed to return the current
frequency of the CPU and not the one requested by the CPUFreq core.
Fix it by returning the frequency that gets supplied to the CPU after
the DCVS operation of EPSS/OSM.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 2849dd8bc72b ("cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver")
Reported-by: Sudeep Holla <sudeep.holla(a)arm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/cpufreq/qcom-cpufreq-hw.c | 42 +++++++++++++++++++++----------
1 file changed, 29 insertions(+), 13 deletions(-)
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 1faa325d3e52..62f36c76e26c 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -131,7 +131,35 @@ static int qcom_cpufreq_hw_target_index(struct cpufreq_policy *policy,
return 0;
}
+static unsigned long qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
+{
+ unsigned int lval;
+
+ if (qcom_cpufreq.soc_data->reg_current_vote)
+ lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_current_vote) & 0x3ff;
+ else
+ lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_domain_state) & 0xff;
+
+ return lval * xo_rate;
+}
+
+/* Get the current frequency of the CPU (after throttling) */
static unsigned int qcom_cpufreq_hw_get(unsigned int cpu)
+{
+ struct qcom_cpufreq_data *data;
+ struct cpufreq_policy *policy;
+
+ policy = cpufreq_cpu_get_raw(cpu);
+ if (!policy)
+ return 0;
+
+ data = policy->driver_data;
+
+ return qcom_lmh_get_throttle_freq(data) / HZ_PER_KHZ;
+}
+
+/* Get the frequency requested by the cpufreq core for the CPU */
+static unsigned int qcom_cpufreq_get_freq(unsigned int cpu)
{
struct qcom_cpufreq_data *data;
const struct qcom_cpufreq_soc_data *soc_data;
@@ -292,18 +320,6 @@ static void qcom_get_related_cpus(int index, struct cpumask *m)
}
}
-static unsigned long qcom_lmh_get_throttle_freq(struct qcom_cpufreq_data *data)
-{
- unsigned int lval;
-
- if (qcom_cpufreq.soc_data->reg_current_vote)
- lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_current_vote) & 0x3ff;
- else
- lval = readl_relaxed(data->base + qcom_cpufreq.soc_data->reg_domain_state) & 0xff;
-
- return lval * xo_rate;
-}
-
static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
{
struct cpufreq_policy *policy = data->policy;
@@ -347,7 +363,7 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
* If h/w throttled frequency is higher than what cpufreq has requested
* for, then stop polling and switch back to interrupt mechanism.
*/
- if (throttled_freq >= qcom_cpufreq_hw_get(cpu))
+ if (throttled_freq >= qcom_cpufreq_get_freq(cpu))
enable_irq(data->throttle_irq);
else
mod_delayed_work(system_highpri_wq, &data->throttle_work,
--
2.25.1
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
bigjoiner_pipes() doesn't consider that:
- RKL only has three pipes
- some pipes may be fused off
This means that intel_atomic_check_bigjoiner() won't reject
all configurations that would need a non-existent pipe.
Instead we just keep on rolling witout actually having
reserved the slave pipe we need.
It's possible that we don't outright explode anywhere due to
this since eg. for_each_intel_crtc_in_pipe_mask() will only
walk the crtcs we've registered even though the passed in
pipe_mask asks for more of them. But clearly the thing won't
do what is expected of it when the required pipes are not
present.
Fix the problem by consulting the device info pipe_mask already
in bigjoiner_pipes().
Cc: stable(a)vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b3e23708d194..6c2686ecb62a 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -3733,12 +3733,16 @@ static bool ilk_get_pipe_config(struct intel_crtc *crtc,
static u8 bigjoiner_pipes(struct drm_i915_private *i915)
{
+ u8 pipes;
+
if (DISPLAY_VER(i915) >= 12)
- return BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C) | BIT(PIPE_D);
+ pipes = BIT(PIPE_A) | BIT(PIPE_B) | BIT(PIPE_C) | BIT(PIPE_D);
else if (DISPLAY_VER(i915) >= 11)
- return BIT(PIPE_B) | BIT(PIPE_C);
+ pipes = BIT(PIPE_B) | BIT(PIPE_C);
else
- return 0;
+ pipes = 0;
+
+ return pipes & RUNTIME_INFO(i915)->pipe_mask;
}
static bool transcoder_ddi_func_is_enabled(struct drm_i915_private *dev_priv,
--
2.37.4
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The flag that tells the event to call its triggers after reading the event
is set for eprobes after the eprobe is enabled. This leads to a race where
the eprobe may be triggered at the beginning of the event where the record
information is NULL. The eprobe then dereferences the NULL record causing
a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelm…
Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.…
Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org>
Cc: Tzvetomir Stoyanov <tz.stoyanov(a)gmail.com>
Cc: Tom Zanussi <zanussi(a)kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Reported-by: Rafael Mendonca <rafaelmendsr(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_eprobe.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/trace/trace_eprobe.c b/kernel/trace/trace_eprobe.c
index 5dd0617e5df6..9cda9a38422c 100644
--- a/kernel/trace/trace_eprobe.c
+++ b/kernel/trace/trace_eprobe.c
@@ -563,6 +563,9 @@ static void eprobe_trigger_func(struct event_trigger_data *data,
{
struct eprobe_data *edata = data->private_data;
+ if (unlikely(!rec))
+ return;
+
__eprobe_trace_func(edata, rec);
}
--
2.35.1
From: Xiu Jianfeng <xiujianfeng(a)huawei.com>
The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next}
of @ftrace_mode->list are NULL, it's not a valid state to call list_del().
If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free
tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del()
will write prev->next and next->prev, where null pointer dereference
happens.
BUG: kernel NULL pointer dereference, address: 0000000000000008
Oops: 0002 [#1] PREEMPT SMP NOPTI
Call Trace:
<TASK>
ftrace_mod_callback+0x20d/0x220
? do_filp_open+0xd9/0x140
ftrace_process_regex.isra.51+0xbf/0x130
ftrace_regex_write.isra.52.part.53+0x6e/0x90
vfs_write+0xee/0x3a0
? __audit_filter_op+0xb1/0x100
? auditd_test_task+0x38/0x50
ksys_write+0xa5/0xe0
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Kernel panic - not syncing: Fatal exception
So call INIT_LIST_HEAD() to initialize the list member to fix this issue.
Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com
Cc: stable(a)vger.kernel.org
Fixes: 673feb9d76ab ("ftrace: Add :mod: caching infrastructure to trace_array")
Signed-off-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 56a168121bfc..33236241f236 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1289,6 +1289,7 @@ static int ftrace_add_mod(struct trace_array *tr,
if (!ftrace_mod)
return -ENOMEM;
+ INIT_LIST_HEAD(&ftrace_mod->list);
ftrace_mod->func = kstrdup(func, GFP_KERNEL);
ftrace_mod->module = kstrdup(module, GFP_KERNEL);
ftrace_mod->enable = enable;
--
2.35.1
From: Wang Wensheng <wangwensheng4(a)huawei.com>
If we can't allocate this size, try something smaller with half of the
size. Its order should be decreased by one instead of divided by two.
Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
Cc: <mhiramat(a)kernel.org>
Cc: <mark.rutland(a)arm.com>
Cc: stable(a)vger.kernel.org
Fixes: a79008755497d ("ftrace: Allocate the mcount record pages as groups")
Signed-off-by: Wang Wensheng <wangwensheng4(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8b13ce2eae70..56a168121bfc 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3190,7 +3190,7 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
/* if we can't allocate this size, try something smaller */
if (!order)
return -ENOMEM;
- order >>= 1;
+ order--;
goto again;
}
--
2.35.1
Pass in EPOLL_URING when signaling eventfd or doing poll related wakups,
so that we can check for a circular event dependency between eventfd
and epoll. If this flag is set when our wakeup handlers are called, then
we know we have a dependency that needs to terminate multishot requests.
eventfd and epoll are the only such possible dependencies.
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
io_uring/io_uring.c | 4 ++--
io_uring/io_uring.h | 15 +++++++++++----
io_uring/poll.c | 8 ++++++++
3 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 8840cf3e20f2..53d0043b77a5 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -495,7 +495,7 @@ static void io_eventfd_ops(struct rcu_head *rcu)
int ops = atomic_xchg(&ev_fd->ops, 0);
if (ops & BIT(IO_EVENTFD_OP_SIGNAL_BIT))
- eventfd_signal(ev_fd->cq_ev_fd, 1);
+ eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING);
/* IO_EVENTFD_OP_FREE_BIT may not be set here depending on callback
* ordering in a race but if references are 0 we know we have to free
@@ -531,7 +531,7 @@ static void io_eventfd_signal(struct io_ring_ctx *ctx)
goto out;
if (likely(eventfd_signal_allowed())) {
- eventfd_signal(ev_fd->cq_ev_fd, 1);
+ eventfd_signal_mask(ev_fd->cq_ev_fd, 1, EPOLL_URING);
} else {
atomic_inc(&ev_fd->refs);
if (!atomic_fetch_or(BIT(IO_EVENTFD_OP_SIGNAL_BIT), &ev_fd->ops))
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
index cef5ff924e63..f6cf74cd692b 100644
--- a/io_uring/io_uring.h
+++ b/io_uring/io_uring.h
@@ -4,6 +4,7 @@
#include <linux/errno.h>
#include <linux/lockdep.h>
#include <linux/io_uring_types.h>
+#include <uapi/linux/eventpoll.h>
#include "io-wq.h"
#include "slist.h"
#include "filetable.h"
@@ -207,12 +208,18 @@ static inline void io_commit_cqring(struct io_ring_ctx *ctx)
static inline void __io_cqring_wake(struct io_ring_ctx *ctx)
{
/*
- * wake_up_all() may seem excessive, but io_wake_function() and
- * io_should_wake() handle the termination of the loop and only
- * wake as many waiters as we need to.
+ * Trigger waitqueue handler on all waiters on our waitqueue. This
+ * won't necessarily wake up all the tasks, io_should_wake() will make
+ * that decision.
+ *
+ * Pass in EPOLLIN|EPOLL_URING as the poll wakeup key. The latter set
+ * in the mask so that if we recurse back into our own poll waitqueue
+ * handlers, we know we have a dependency between eventfd or epoll and
+ * should terminate multishot poll at that point.
*/
if (waitqueue_active(&ctx->cq_wait))
- wake_up_all(&ctx->cq_wait);
+ __wake_up(&ctx->cq_wait, TASK_NORMAL, 0,
+ poll_to_key(EPOLL_URING | EPOLLIN));
}
static inline void io_cqring_wake(struct io_ring_ctx *ctx)
diff --git a/io_uring/poll.c b/io_uring/poll.c
index 055632e9092a..b5d9426c60f6 100644
--- a/io_uring/poll.c
+++ b/io_uring/poll.c
@@ -394,6 +394,14 @@ static int io_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
return 0;
if (io_poll_get_ownership(req)) {
+ /*
+ * If we trigger a multishot poll off our own wakeup path,
+ * disable multishot as there is a circular dependency between
+ * CQ posting and triggering the event.
+ */
+ if (mask & EPOLL_URING)
+ poll->events |= EPOLLONESHOT;
+
/* optional, saves extra locking for removal in tw handler */
if (mask && poll->events & EPOLLONESHOT) {
list_del_init(&poll->wait.entry);
--
2.35.1
This is identical to eventfd_signal(), but it allows the caller to pass
in a mask to be used for the poll wakeup key. The use case is avoiding
repeated multishot triggers if we have a dependency between eventfd and
io_uring.
If we setup an eventfd context and register that as the io_uring eventfd,
and at the same time queue a multishot poll request for the eventfd
context, then any CQE posted will repeatedly trigger the multishot request
until it terminates when the CQ ring overflows.
In preparation for io_uring detecting this circular dependency, add the
mentioned helper so that io_uring can pass in EPOLL_URING as part of the
poll wakeup key.
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/eventfd.c | 37 +++++++++++++++++++++----------------
include/linux/eventfd.h | 1 +
2 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/fs/eventfd.c b/fs/eventfd.c
index c0ffee99ad23..249ca6c0b784 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -43,21 +43,7 @@ struct eventfd_ctx {
int id;
};
-/**
- * eventfd_signal - Adds @n to the eventfd counter.
- * @ctx: [in] Pointer to the eventfd context.
- * @n: [in] Value of the counter to be added to the eventfd internal counter.
- * The value cannot be negative.
- *
- * This function is supposed to be called by the kernel in paths that do not
- * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
- * value, and we signal this as overflow condition by returning a EPOLLERR
- * to poll(2).
- *
- * Returns the amount by which the counter was incremented. This will be less
- * than @n if the counter has overflowed.
- */
-__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask)
{
unsigned long flags;
@@ -78,12 +64,31 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
n = ULLONG_MAX - ctx->count;
ctx->count += n;
if (waitqueue_active(&ctx->wqh))
- wake_up_locked_poll(&ctx->wqh, EPOLLIN);
+ wake_up_locked_poll(&ctx->wqh, EPOLLIN | mask);
current->in_eventfd = 0;
spin_unlock_irqrestore(&ctx->wqh.lock, flags);
return n;
}
+
+/**
+ * eventfd_signal - Adds @n to the eventfd counter.
+ * @ctx: [in] Pointer to the eventfd context.
+ * @n: [in] Value of the counter to be added to the eventfd internal counter.
+ * The value cannot be negative.
+ *
+ * This function is supposed to be called by the kernel in paths that do not
+ * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX
+ * value, and we signal this as overflow condition by returning a EPOLLERR
+ * to poll(2).
+ *
+ * Returns the amount by which the counter was incremented. This will be less
+ * than @n if the counter has overflowed.
+ */
+__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n)
+{
+ return eventfd_signal_mask(ctx, n, 0);
+}
EXPORT_SYMBOL_GPL(eventfd_signal);
static void eventfd_free_ctx(struct eventfd_ctx *ctx)
diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h
index 30eb30d6909b..e849329ce1a8 100644
--- a/include/linux/eventfd.h
+++ b/include/linux/eventfd.h
@@ -40,6 +40,7 @@ struct file *eventfd_fget(int fd);
struct eventfd_ctx *eventfd_ctx_fdget(int fd);
struct eventfd_ctx *eventfd_ctx_fileget(struct file *file);
__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n);
+__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask);
int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait,
__u64 *cnt);
void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);
--
2.35.1
We can have dependencies between epoll and io_uring. Consider an epoll
context, identified by the epfd file descriptor, and an io_uring file
descriptor identified by iofd. If we add iofd to the epfd context, and
arm a multishot poll request for epfd with iofd, then the multishot
poll request will repeatedly trigger and generate events until terminated
by CQ ring overflow. This isn't a desired behavior.
Add EPOLL_URING so that io_uring can pass it in as part of the poll wakeup
key, and io_uring can check for that to detect a potential recursive
invocation.
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
fs/eventpoll.c | 18 ++++++++++--------
include/uapi/linux/eventpoll.h | 6 ++++++
2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 52954d4637b5..e864256001e8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -491,7 +491,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi)
*/
#ifdef CONFIG_DEBUG_LOCK_ALLOC
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
+static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
+ unsigned pollflags)
{
struct eventpoll *ep_src;
unsigned long flags;
@@ -522,16 +523,17 @@ static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
}
spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests);
ep->nests = nests + 1;
- wake_up_locked_poll(&ep->poll_wait, EPOLLIN);
+ wake_up_locked_poll(&ep->poll_wait, EPOLLIN | pollflags);
ep->nests = 0;
spin_unlock_irqrestore(&ep->poll_wait.lock, flags);
}
#else
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi)
+static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi,
+ unsigned pollflags)
{
- wake_up_poll(&ep->poll_wait, EPOLLIN);
+ wake_up_poll(&ep->poll_wait, EPOLLIN | pollflags);
}
#endif
@@ -742,7 +744,7 @@ static void ep_free(struct eventpoll *ep)
/* We need to release all tasks waiting for these file */
if (waitqueue_active(&ep->poll_wait))
- ep_poll_safewake(ep, NULL);
+ ep_poll_safewake(ep, NULL, 0);
/*
* We need to lock this because we could be hit by
@@ -1208,7 +1210,7 @@ static int ep_poll_callback(wait_queue_entry_t *wait, unsigned mode, int sync, v
/* We have to call this outside the lock */
if (pwake)
- ep_poll_safewake(ep, epi);
+ ep_poll_safewake(ep, epi, pollflags & EPOLL_URING);
if (!(epi->event.events & EPOLLEXCLUSIVE))
ewake = 1;
@@ -1553,7 +1555,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
/* We have to call this outside the lock */
if (pwake)
- ep_poll_safewake(ep, NULL);
+ ep_poll_safewake(ep, NULL, 0);
return 0;
}
@@ -1629,7 +1631,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
/* We have to call this outside the lock */
if (pwake)
- ep_poll_safewake(ep, NULL);
+ ep_poll_safewake(ep, NULL, 0);
return 0;
}
diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h
index 8a3432d0f0dc..532cc7fa75c0 100644
--- a/include/uapi/linux/eventpoll.h
+++ b/include/uapi/linux/eventpoll.h
@@ -41,6 +41,12 @@
#define EPOLLMSG (__force __poll_t)0x00000400
#define EPOLLRDHUP (__force __poll_t)0x00002000
+/*
+ * Internal flag - wakeup generated by io_uring, used to detect recursion back
+ * into the io_uring poll handler.
+ */
+#define EPOLL_URING ((__force __poll_t)(1U << 27))
+
/* Set exclusive wakeup mode for the target file descriptor */
#define EPOLLEXCLUSIVE ((__force __poll_t)(1U << 28))
--
2.35.1
The member void *data in the structure devfreq can be overwrite
by governor_userspace. For example:
1. The device driver assigned the devfreq governor to simple_ondemand
by the function devfreq_add_device() and init the devfreq member
void *data to a pointer of a static structure devfreq_simple_ondemand_data
by the function devfreq_add_device().
2. The user changed the devfreq governor to userspace by the command
"echo userspace > /sys/class/devfreq/.../governor".
3. The governor userspace alloced a dynamic memory for the struct
userspace_data and assigend the member void *data of devfreq to
this memory by the function userspace_init().
4. The user changed the devfreq governor back to simple_ondemand
by the command "echo simple_ondemand > /sys/class/devfreq/.../governor".
5. The governor userspace exited and assigned the member void *data
in the structure devfreq to NULL by the function userspace_exit().
6. The governor simple_ondemand fetched the static information of
devfreq_simple_ondemand_data in the function
devfreq_simple_ondemand_func() but the member void *data of devfreq was
assigned to NULL by the function userspace_exit().
7. The information of upthreshold and downdifferential is lost
and the governor simple_ondemand can't work correctly.
The member void *data in the structure devfreq is designed for
a static pointer used in a governor and inited by the function
devfreq_add_device(). This patch add an element named governor_data
in the devfreq structure which can be used by a governor(E.g userspace)
who want to assign a private data to do some private things.
Fixes: ce26c5bb9569 ("PM / devfreq: Add basic governors")
Cc: stable(a)vger.kernel.org # 5.10+
Reviewed-by: Chanwoo Choi <cwchoi00(a)gmail.com>
Acked-by: MyungJoo Ham <myungjoo.ham(a)samsung.com>
Signed-off-by: Kant Fan <kant(a)allwinnertech.com>
---
drivers/devfreq/devfreq.c | 6 ++----
drivers/devfreq/governor_userspace.c | 12 ++++++------
include/linux/devfreq.h | 7 ++++---
3 files changed, 12 insertions(+), 13 deletions(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 63347a5ae599..8c5f6f7fca11 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -776,8 +776,7 @@ static void remove_sysfs_files(struct devfreq *devfreq,
* @dev: the device to add devfreq feature.
* @profile: device-specific profile to run devfreq.
* @governor_name: name of the policy to choose frequency.
- * @data: private data for the governor. The devfreq framework does not
- * touch this value.
+ * @data: devfreq driver pass to governors, governor should not change it.
*/
struct devfreq *devfreq_add_device(struct device *dev,
struct devfreq_dev_profile *profile,
@@ -1011,8 +1010,7 @@ static void devm_devfreq_dev_release(struct device *dev, void *res)
* @dev: the device to add devfreq feature.
* @profile: device-specific profile to run devfreq.
* @governor_name: name of the policy to choose frequency.
- * @data: private data for the governor. The devfreq framework does not
- * touch this value.
+ * @data: devfreq driver pass to governors, governor should not change it.
*
* This function manages automatically the memory of devfreq device using device
* resource management and simplify the free operation for memory of devfreq
diff --git a/drivers/devfreq/governor_userspace.c b/drivers/devfreq/governor_userspace.c
index ab9db7adb3ad..d69672ccacc4 100644
--- a/drivers/devfreq/governor_userspace.c
+++ b/drivers/devfreq/governor_userspace.c
@@ -21,7 +21,7 @@ struct userspace_data {
static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq)
{
- struct userspace_data *data = df->data;
+ struct userspace_data *data = df->governor_data;
if (data->valid)
*freq = data->user_frequency;
@@ -40,7 +40,7 @@ static ssize_t set_freq_store(struct device *dev, struct device_attribute *attr,
int err = 0;
mutex_lock(&devfreq->lock);
- data = devfreq->data;
+ data = devfreq->governor_data;
sscanf(buf, "%lu", &wanted);
data->user_frequency = wanted;
@@ -60,7 +60,7 @@ static ssize_t set_freq_show(struct device *dev,
int err = 0;
mutex_lock(&devfreq->lock);
- data = devfreq->data;
+ data = devfreq->governor_data;
if (data->valid)
err = sprintf(buf, "%lu\n", data->user_frequency);
@@ -91,7 +91,7 @@ static int userspace_init(struct devfreq *devfreq)
goto out;
}
data->valid = false;
- devfreq->data = data;
+ devfreq->governor_data = data;
err = sysfs_create_group(&devfreq->dev.kobj, &dev_attr_group);
out:
@@ -107,8 +107,8 @@ static void userspace_exit(struct devfreq *devfreq)
if (devfreq->dev.kobj.sd)
sysfs_remove_group(&devfreq->dev.kobj, &dev_attr_group);
- kfree(devfreq->data);
- devfreq->data = NULL;
+ kfree(devfreq->governor_data);
+ devfreq->governor_data = NULL;
}
static int devfreq_userspace_handler(struct devfreq *devfreq,
diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
index 34aab4dd336c..4dc7cda4fd46 100644
--- a/include/linux/devfreq.h
+++ b/include/linux/devfreq.h
@@ -152,8 +152,8 @@ struct devfreq_stats {
* @max_state: count of entry present in the frequency table.
* @previous_freq: previously configured frequency value.
* @last_status: devfreq user device info, performance statistics
- * @data: Private data of the governor. The devfreq framework does not
- * touch this.
+ * @data: devfreq driver pass to governors, governor should not change it.
+ * @governor_data: private data for governors, devfreq core doesn't touch it.
* @user_min_freq_req: PM QoS minimum frequency request from user (via sysfs)
* @user_max_freq_req: PM QoS maximum frequency request from user (via sysfs)
* @scaling_min_freq: Limit minimum frequency requested by OPP interface
@@ -193,7 +193,8 @@ struct devfreq {
unsigned long previous_freq;
struct devfreq_dev_status last_status;
- void *data; /* private data for governors */
+ void *data;
+ void *governor_data;
struct dev_pm_qos_request user_min_freq_req;
struct dev_pm_qos_request user_max_freq_req;
--
2.29.0
The AMD Secure Processor (ASP) and an SNP guest use a series of
AES-GCM keys called VMPCKs to communicate securely with each other.
The IV to this scheme is a sequence number that both the ASP and the
guest track. Currently this sequence number in a guest request must
exactly match the sequence number tracked by the ASP. This means that
if the guest sees an error from the host during a request it can only
retry that exact request or disable the VMPCK to prevent an IV reuse.
AES-GCM cannot tolerate IV reuse see: "Authentication Failures in NIST
version of GCM" - Antoine Joux et al.
In order to address this make handle_guest_request() delete the VMPCK
on any non successful return. To allow userspace querying the cert_data
length make handle_guest_request() safe the number of pages required by
the host, then handle_guest_request() retry the request without
requesting the extended data, then return the number of pages required
back to userspace.
Fixes: fce96cf044308 ("virt: Add SEV-SNP guest driver")
Signed-off-by: Peter Gonda <pgonda(a)google.com>
Reported-by: Peter Gonda <pgonda(a)google.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Tom Lendacky <thomas.lendacky(a)amd.com>
Cc: Michael Roth <michael.roth(a)amd.com>
Cc: Haowen Bai <baihaowen(a)meizu.com>
Cc: Yang Yingliang <yangyingliang(a)huawei.com>
Cc: Marc Orr <marcorr(a)google.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Dionna Glaze <dionnaglaze(a)google.com>
Cc: Ashish Kalra <Ashish.Kalra(a)amd.com>
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: kvm(a)vger.kernel.org
---
drivers/virt/coco/sev-guest/sev-guest.c | 83 ++++++++++++++++++++-----
1 file changed, 69 insertions(+), 14 deletions(-)
diff --git a/drivers/virt/coco/sev-guest/sev-guest.c b/drivers/virt/coco/sev-guest/sev-guest.c
index f422f9c58ba79..64b4234c14da8 100644
--- a/drivers/virt/coco/sev-guest/sev-guest.c
+++ b/drivers/virt/coco/sev-guest/sev-guest.c
@@ -67,8 +67,27 @@ static bool is_vmpck_empty(struct snp_guest_dev *snp_dev)
return true;
}
+/*
+ * If an error is received from the host or AMD Secure Processor (ASP) there
+ * are two options. Either retry the exact same encrypted request or discontinue
+ * using the VMPCK.
+ *
+ * This is because in the current encryption scheme GHCB v2 uses AES-GCM to
+ * encrypt the requests. The IV for this scheme is the sequence number. GCM
+ * cannot tolerate IV reuse.
+ *
+ * The ASP FW v1.51 only increments the sequence numbers on a successful
+ * guest<->ASP back and forth and only accepts messages at its exact sequence
+ * number.
+ *
+ * So if the sequence number were to be reused the encryption scheme is
+ * vulnerable. If the sequence number were incremented for a fresh IV the ASP
+ * will reject the request.
+ */
static void snp_disable_vmpck(struct snp_guest_dev *snp_dev)
{
+ dev_alert(snp_dev->dev, "Disabling vmpck_id: %d to prevent IV reuse.\n",
+ vmpck_id);
memzero_explicit(snp_dev->vmpck, VMPCK_KEY_LEN);
snp_dev->vmpck = NULL;
}
@@ -321,34 +340,70 @@ static int handle_guest_request(struct snp_guest_dev *snp_dev, u64 exit_code, in
if (rc)
return rc;
- /* Call firmware to process the request */
+ /*
+ * Call firmware to process the request. In this function the encrypted
+ * message enters shared memory with the host. So after this call the
+ * sequence number must be incremented or the VMPCK must be deleted to
+ * prevent reuse of the IV.
+ */
rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
+
+ /*
+ * If the extended guest request fails due to having too small of a
+ * certificate data buffer retry the same guest request without the
+ * extended data request in order to not have to reuse the IV.
+ */
+ if (exit_code == SVM_VMGEXIT_EXT_GUEST_REQUEST &&
+ err == SNP_GUEST_REQ_INVALID_LEN) {
+ const unsigned int certs_npages = snp_dev->input.data_npages;
+
+ exit_code = SVM_VMGEXIT_GUEST_REQUEST;
+
+ /*
+ * If this call to the firmware succeeds the sequence number can
+ * be incremented allowing for continued use of the VMPCK. If
+ * there is an error reflected in the return value, this value
+ * is checked further down and the result will be the deletion
+ * of the VMPCK and the error code being propagated back to the
+ * user as an IOCLT return code.
+ */
+ rc = snp_issue_guest_request(exit_code, &snp_dev->input, &err);
+
+ /*
+ * Override the error to inform callers the given extended
+ * request buffer size was too small and give the caller the
+ * required buffer size.
+ */
+ err = SNP_GUEST_REQ_INVALID_LEN;
+ snp_dev->input.data_npages = certs_npages;
+ }
+
if (fw_err)
*fw_err = err;
- if (rc)
- return rc;
+ if (rc) {
+ dev_alert(snp_dev->dev,
+ "Detected error from ASP request. rc: %d, fw_err: %llu\n",
+ rc, *fw_err);
+ goto disable_vmpck;
+ }
- /*
- * The verify_and_dec_payload() will fail only if the hypervisor is
- * actively modifying the message header or corrupting the encrypted payload.
- * This hints that hypervisor is acting in a bad faith. Disable the VMPCK so that
- * the key cannot be used for any communication. The key is disabled to ensure
- * that AES-GCM does not use the same IV while encrypting the request payload.
- */
rc = verify_and_dec_payload(snp_dev, resp_buf, resp_sz);
if (rc) {
dev_alert(snp_dev->dev,
- "Detected unexpected decode failure, disabling the vmpck_id %d\n",
- vmpck_id);
- snp_disable_vmpck(snp_dev);
- return rc;
+ "Detected unexpected decode failure from ASP. rc: %d\n",
+ rc);
+ goto disable_vmpck;
}
/* Increment to new message sequence after payload decryption was successful. */
snp_inc_msg_seqno(snp_dev);
return 0;
+
+disable_vmpck:
+ snp_disable_vmpck(snp_dev);
+ return rc;
}
static int get_report(struct snp_guest_dev *snp_dev, struct snp_guest_request_ioctl *arg)
--
2.38.1.493.g58b659f92b-goog
The second (UID) strcmp in acpi_dev_hid_uid_match considers
"0" and "00" different, which can prevent device registration.
Have the AMD IOMMU driver's ivrs_acpihid parsing code remove
any leading zeroes to make the UID strcmp succeed. Now users
can safely specify "AMDxxxxx:00" or "AMDxxxxx:0" and expect
the same behaviour.
Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter")
Signed-off-by: Kim Phillips <kim.phillips(a)amd.com>
Cc: stable(a)vger.kernel.org
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit(a)amd.com>
Cc: Joerg Roedel <jroedel(a)suse.de>
---
v2: no changes
drivers/iommu/amd/init.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index fdc642362c14..ef0e1a4b5a11 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3471,6 +3471,13 @@ static int __init parse_ivrs_acpihid(char *str)
return 1;
}
+ /*
+ * Ignore leading zeroes after ':', so e.g., AMDI0095:00
+ * will match AMDI0095:0 in the second strcmp in acpi_dev_hid_uid_match
+ */
+ while (*uid == '0' && *(uid + 1))
+ uid++;
+
i = early_acpihid_map_size++;
memcpy(early_acpihid_map[i].hid, hid, strlen(hid));
memcpy(early_acpihid_map[i].uid, uid, strlen(uid));
--
2.34.1
If O_EXCL is *not* specified, then linkat() can be
used to link the temporary file into the filesystem.
If O_EXCL is specified then linkat() should fail (-1).
After commit 863f144f12ad ("vfs: open inside ->tmpfile()")
the O_EXCL flag is no longer honored by the vfs layer for
tmpfile, which means the file can be linked even if O_EXCL
flag is specified, which is a change in behaviour for
userspace!
The open flags was previously passed as a parameter, so it
was uneffected by the changes to file->f_flags caused by
finish_open(). This patch fixes the issue by storing
file->f_flags in a local variable so the O_EXCL test
logic is restored.
This regression was detected by Android CTS Bionic fcntl()
tests running on android-mainline [1].
[1] https://android.googlesource.com/platform/bionic/+/
refs/heads/master/tests/fcntl_test.cpp#352
Fixes: 863f144f12ad ("vfs: open inside ->tmpfile()")
To: lkml <linux-kernel(a)vger.kernel.org>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Miklos Szeredi <mszeredi(a)redhat.com>
Cc: stable(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Will McVicker <willmcvicker(a)google.com>
Cc: Peter Griffin <gpeter(a)google.com>
Signed-off-by: Peter Griffin <peter.griffin(a)linaro.org>
---
fs/namei.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/namei.c b/fs/namei.c
index 578c2110df02..9155ecb547ce 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -3591,6 +3591,7 @@ static int vfs_tmpfile(struct user_namespace *mnt_userns,
struct inode *dir = d_inode(parentpath->dentry);
struct inode *inode;
int error;
+ int open_flag = file->f_flags;
/* we want directory to be writable */
error = inode_permission(mnt_userns, dir, MAY_WRITE | MAY_EXEC);
@@ -3613,7 +3614,7 @@ static int vfs_tmpfile(struct user_namespace *mnt_userns,
if (error)
return error;
inode = file_inode(file);
- if (!(file->f_flags & O_EXCL)) {
+ if (!(open_flag & O_EXCL)) {
spin_lock(&inode->i_lock);
inode->i_state |= I_LINKABLE;
spin_unlock(&inode->i_lock);
--
2.34.1
It is a bit unlcear to us why that's helping, but it does and unbreaks
suspend/resume on a lot of GPUs without any known drawbacks.
Cc: stable(a)vger.kernel.org # v5.15+
Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/156
Signed-off-by: Karol Herbst <kherbst(a)redhat.com>
---
drivers/gpu/drm/nouveau/nouveau_bo.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 35bb0bb3fe61..126b3c6e12f9 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -822,6 +822,15 @@ nouveau_bo_move_m2mf(struct ttm_buffer_object *bo, int evict,
if (ret == 0) {
ret = nouveau_fence_new(chan, false, &fence);
if (ret == 0) {
+ /* TODO: figure out a better solution here
+ *
+ * wait on the fence here explicitly as going through
+ * ttm_bo_move_accel_cleanup somehow doesn't seem to do it.
+ *
+ * Without this the operation can timeout and we'll fallback to a
+ * software copy, which might take several minutes to finish.
+ */
+ nouveau_fence_wait(fence, false, false);
ret = ttm_bo_move_accel_cleanup(bo,
&fence->base,
evict, false,
--
2.37.1
From: Jonas Jelonek <jelonek.jonas(a)gmail.com>
[ Upstream commit 69188df5f6e4cecc6b76b958979ba363cd5240e8 ]
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas(a)gmail.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 70251c703c9e..53e0fec274a4 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -671,6 +671,7 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
struct hwsim_vif_priv *vp = (void *)vif->drv_priv;
struct sk_buff *skb;
struct ieee80211_hdr *hdr;
+ struct ieee80211_tx_info *cb;
if (!vp->assoc)
return;
@@ -691,6 +692,10 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
memcpy(hdr->addr2, mac, ETH_ALEN);
memcpy(hdr->addr3, vp->bssid, ETH_ALEN);
+ cb = IEEE80211_SKB_CB(skb);
+ cb->control.rates[0].count = 1;
+ cb->control.rates[1].idx = -1;
+
rcu_read_lock();
mac80211_hwsim_tx_frame(data->hw, skb,
rcu_dereference(vif->chanctx_conf)->def.chan);
--
2.35.1
From: Jonas Jelonek <jelonek.jonas(a)gmail.com>
[ Upstream commit 69188df5f6e4cecc6b76b958979ba363cd5240e8 ]
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas(a)gmail.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index 55cca2ffa392..d3905e70b1e9 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -670,6 +670,7 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
struct hwsim_vif_priv *vp = (void *)vif->drv_priv;
struct sk_buff *skb;
struct ieee80211_hdr *hdr;
+ struct ieee80211_tx_info *cb;
if (!vp->assoc)
return;
@@ -690,6 +691,10 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
memcpy(hdr->addr2, mac, ETH_ALEN);
memcpy(hdr->addr3, vp->bssid, ETH_ALEN);
+ cb = IEEE80211_SKB_CB(skb);
+ cb->control.rates[0].count = 1;
+ cb->control.rates[1].idx = -1;
+
rcu_read_lock();
mac80211_hwsim_tx_frame(data->hw, skb,
rcu_dereference(vif->chanctx_conf)->def.chan);
--
2.35.1
From: Jonas Jelonek <jelonek.jonas(a)gmail.com>
[ Upstream commit 69188df5f6e4cecc6b76b958979ba363cd5240e8 ]
Fixes a warning that occurs when rc table support is enabled
(IEEE80211_HW_SUPPORTS_RC_TABLE) in mac80211_hwsim and the PS mode
is changed via the exported debugfs attribute.
When the PS mode is changed, a packet is broadcasted via
hwsim_send_nullfunc by creating and transmitting a plain skb with only
header initialized. The ieee80211 rate array in the control buffer is
zero-initialized. When ratetbl support is enabled, ieee80211_get_tx_rates
is called for the skb with sta parameter set to NULL and thus no
ratetbl can be used. The final rate array then looks like
[-1,0; 0,0; 0,0; 0,0] which causes the warning in ieee80211_get_tx_rate.
The issue is fixed by setting the count of the first rate with idx '0'
to 1 and hence ieee80211_get_tx_rates won't overwrite it with idx '-1'.
Signed-off-by: Jonas Jelonek <jelonek.jonas(a)gmail.com>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/mac80211_hwsim.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c
index c52802adb5b2..22738ba7d65b 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -686,6 +686,7 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
struct hwsim_vif_priv *vp = (void *)vif->drv_priv;
struct sk_buff *skb;
struct ieee80211_hdr *hdr;
+ struct ieee80211_tx_info *cb;
if (!vp->assoc)
return;
@@ -707,6 +708,10 @@ static void hwsim_send_nullfunc(struct mac80211_hwsim_data *data, u8 *mac,
memcpy(hdr->addr2, mac, ETH_ALEN);
memcpy(hdr->addr3, vp->bssid, ETH_ALEN);
+ cb = IEEE80211_SKB_CB(skb);
+ cb->control.rates[0].count = 1;
+ cb->control.rates[1].idx = -1;
+
rcu_read_lock();
mac80211_hwsim_tx_frame(data->hw, skb,
rcu_dereference(vif->chanctx_conf)->def.chan);
--
2.35.1
The patch titled
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1
Date: Fri, 18 Nov 2022 12:36:03 +0530
balance_dirty_pages doesn't do the required dirty throttling on cgroupv1.
See commit 9badce000e2c ("cgroup, writeback: don't enable cgroup writeback
on traditional hierarchies"). Instead, the kernel depends on writeback
throttling in shrink_folio_list to achieve the same goal. With large
memory systems, the flusher may not be able to writeback quickly enough
such that we will start finding pages in the shrink_folio_list already in
writeback. Hence for cgroupv1 let's do a reclaim throttle after waking up
the flusher.
The below test which used to fail on a 256GB system completes till the the
file system is full with this change.
root@lp2:/sys/fs/cgroup/memory# mkdir test
root@lp2:/sys/fs/cgroup/memory# cd test/
root@lp2:/sys/fs/cgroup/memory/test# echo 120M > memory.limit_in_bytes
root@lp2:/sys/fs/cgroup/memory/test# echo $$ > tasks
root@lp2:/sys/fs/cgroup/memory/test# dd if=/dev/zero of=/home/kvaneesh/test bs=1M
Killed
Link: https://lkml.kernel.org/r/20221118070603.84081-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Suggested-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: zefan li <lizefan.x(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1
+++ a/mm/vmscan.c
@@ -2514,8 +2514,20 @@ static unsigned long shrink_inactive_lis
* the flushers simply cannot keep up with the allocation
* rate. Nudge the flusher threads in case they are asleep.
*/
- if (stat.nr_unqueued_dirty == nr_taken)
+ if (stat.nr_unqueued_dirty == nr_taken) {
wakeup_flusher_threads(WB_REASON_VMSCAN);
+ /*
+ * For cgroupv1 dirty throttling is achieved by waking up
+ * the kernel flusher here and later waiting on folios
+ * which are in writeback to finish (see shrink_folio_list()).
+ *
+ * Flusher may not be able to issue writeback quickly
+ * enough for cgroupv1 writeback throttling to work
+ * on a large system.
+ */
+ if (!writeback_throttling_sane(sc))
+ reclaim_throttle(pgdat, VMSCAN_THROTTLE_WRITEBACK);
+ }
sc->nr.dirty += stat.nr_dirty;
sc->nr.congested += stat.nr_congested;
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-cgroup-reclaim-fix-dirty-pages-throttling-on-cgroup-v1.patch
The patch titled
Subject: mm: fix unexpected changes to {failslab|fail_page_alloc}.attr
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-unexpected-changes-to-failslabfail_page_allocattr.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Subject: mm: fix unexpected changes to {failslab|fail_page_alloc}.attr
Date: Fri, 18 Nov 2022 18:00:11 +0800
When we specify __GFP_NOWARN, we only expect that no warnings will be
issued for current caller. But in the __should_failslab() and
__should_fail_alloc_page(), the local GFP flags alter the global
{failslab|fail_page_alloc}.attr, which is persistent and shared by all
tasks. This is not what we expected, let's fix it.
Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com
Fixes: 3f913fc5f974 ("mm: fix missing handler for __GFP_NOWARN")
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Reported-by: Dmitry Vyukov <dvyukov(a)google.com>
Reviewed-by: Akinobu Mita <akinobu.mita(a)gmail.com>
Reviewed-by: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Akinobu Mita <akinobu.mita(a)gmail.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/fault-inject.h | 7 +++++--
lib/fault-inject.c | 14 +++++++++-----
mm/failslab.c | 12 ++++++++++--
mm/page_alloc.c | 7 +++++--
4 files changed, 29 insertions(+), 11 deletions(-)
--- a/include/linux/fault-inject.h~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/include/linux/fault-inject.h
@@ -20,7 +20,6 @@ struct fault_attr {
atomic_t space;
unsigned long verbose;
bool task_filter;
- bool no_warn;
unsigned long stacktrace_depth;
unsigned long require_start;
unsigned long require_end;
@@ -32,6 +31,10 @@ struct fault_attr {
struct dentry *dname;
};
+enum fault_flags {
+ FAULT_NOWARN = 1 << 0,
+};
+
#define FAULT_ATTR_INITIALIZER { \
.interval = 1, \
.times = ATOMIC_INIT(1), \
@@ -40,11 +43,11 @@ struct fault_attr {
.ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \
.verbose = 2, \
.dname = NULL, \
- .no_warn = false, \
}
#define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER
int setup_fault_attr(struct fault_attr *attr, char *str);
+bool should_fail_ex(struct fault_attr *attr, ssize_t size, int flags);
bool should_fail(struct fault_attr *attr, ssize_t size);
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
--- a/lib/fault-inject.c~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/lib/fault-inject.c
@@ -41,9 +41,6 @@ EXPORT_SYMBOL_GPL(setup_fault_attr);
static void fail_dump(struct fault_attr *attr)
{
- if (attr->no_warn)
- return;
-
if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) {
printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n"
"name %pd, interval %lu, probability %lu, "
@@ -103,7 +100,7 @@ static inline bool fail_stacktrace(struc
* http://www.nongnu.org/failmalloc/
*/
-bool should_fail(struct fault_attr *attr, ssize_t size)
+bool should_fail_ex(struct fault_attr *attr, ssize_t size, int flags)
{
if (in_task()) {
unsigned int fail_nth = READ_ONCE(current->fail_nth);
@@ -146,13 +143,20 @@ bool should_fail(struct fault_attr *attr
return false;
fail:
- fail_dump(attr);
+ if (!(flags & FAULT_NOWARN))
+ fail_dump(attr);
if (atomic_read(&attr->times) != -1)
atomic_dec_not_zero(&attr->times);
return true;
}
+EXPORT_SYMBOL_GPL(should_fail_ex);
+
+bool should_fail(struct fault_attr *attr, ssize_t size)
+{
+ return should_fail_ex(attr, size, 0);
+}
EXPORT_SYMBOL_GPL(should_fail);
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
--- a/mm/failslab.c~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/mm/failslab.c
@@ -16,6 +16,8 @@ static struct {
bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags)
{
+ int flags = 0;
+
/* No fault-injection for bootstrap cache */
if (unlikely(s == kmem_cache))
return false;
@@ -30,10 +32,16 @@ bool __should_failslab(struct kmem_cache
if (failslab.cache_filter && !(s->flags & SLAB_FAILSLAB))
return false;
+ /*
+ * In some cases, it expects to specify __GFP_NOWARN
+ * to avoid printing any information(not just a warning),
+ * thus avoiding deadlocks. See commit 6b9dbedbe349 for
+ * details.
+ */
if (gfpflags & __GFP_NOWARN)
- failslab.attr.no_warn = true;
+ flags |= FAULT_NOWARN;
- return should_fail(&failslab.attr, s->object_size);
+ return should_fail_ex(&failslab.attr, s->object_size, flags);
}
static int __init setup_failslab(char *str)
--- a/mm/page_alloc.c~mm-fix-unexpected-changes-to-failslabfail_page_allocattr
+++ a/mm/page_alloc.c
@@ -3887,6 +3887,8 @@ __setup("fail_page_alloc=", setup_fail_p
static bool __should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
{
+ int flags = 0;
+
if (order < fail_page_alloc.min_order)
return false;
if (gfp_mask & __GFP_NOFAIL)
@@ -3897,10 +3899,11 @@ static bool __should_fail_alloc_page(gfp
(gfp_mask & __GFP_DIRECT_RECLAIM))
return false;
+ /* See comment in __should_failslab() */
if (gfp_mask & __GFP_NOWARN)
- fail_page_alloc.attr.no_warn = true;
+ flags |= FAULT_NOWARN;
- return should_fail(&fail_page_alloc.attr, 1 << order);
+ return should_fail_ex(&fail_page_alloc.attr, 1 << order, flags);
}
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS
_
Patches currently in -mm which might be from zhengqi.arch(a)bytedance.com are
mm-fix-unexpected-changes-to-failslabfail_page_allocattr.patch
There's been a very long running bug that seems to have been neglected for
a while, where amdgpu consistently triggers a KASAN error at start:
BUG: KASAN: global-out-of-bounds in read_indirect_azalia_reg+0x1d4/0x2a0 [amdgpu]
Read of size 4 at addr ffffffffc2274b28 by task modprobe/1889
After digging through amd's rather creative method for accessing registers,
I eventually discovered the problem likely has to do with the fact that on
my dce120 GPU there are supposedly 7 sets of audio registers. But we only
define a register mapping for 6 sets.
So, fix this and fix the KASAN warning finally.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: stable(a)vger.kernel.org
---
Sending this one separately from the rest of my fixes since:
* It's definitely completely unrelated to the Gitlab 2171 issue
* I'm not sure if this is the correct fix since it's in DC
drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
index 1b70b78e2fa15..af631085e88c5 100644
--- a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
@@ -359,7 +359,8 @@ static const struct dce_audio_registers audio_regs[] = {
audio_regs(2),
audio_regs(3),
audio_regs(4),
- audio_regs(5)
+ audio_regs(5),
+ audio_regs(6),
};
#define DCE120_AUD_COMMON_MASK_SH_LIST(mask_sh)\
--
2.37.3
> On Fri, Oct 28, 2022 at 02:51:43PM +0000, Dominic Jones wrote:
> > Updating the machine's kernel from v5.19.x to v6.0.x causes the machine to not
> > successfully boot. The machine boots successfully (and exhibits stable operation)
> > with version v5.19.17 and multiple earlier releases in the 5.19 line. Multiple releases
> > from the 6.0 line (including 6.0.0, 6.0.3, and 6.0.5), with no other changes to the
> > software environment, do not boot. Instead, the machine hangs after loading services
> > but before presenting a display manager; the machine instead shows repetitive hard
> > drive activity at this point and then no apparent activity.
> >
> > ''uname'' output for the machine successfully running v5.19.17 is:
> >
> > Linux [MACHINE_NAME] 5.19.17 #1 SMP PREEMPT_DYNAMIC Mon Oct 24 13:32:29 2022 i686 Intel(R) Atom(TM) CPU N270 @ 1.60GHz GenuineIntel GNU/Linux
> >
> > The machine is an OCZ Neutrino netbook, running a custom OS build largely similar to
> > LFS development. The kernel update uses ''make olddefconfig''.
>
> Can you use 'git bisect' to find the offending change that causes this
> to happen?
Wanted to follow up here. I got up to speed on ''git bisect'' and am currently
running the process. It looks like I have about six iterations to go and I'm
typically building/testing 1-3 kernels per day. I can also verify that the issue
occurs under 6.1rc4 as a side effect of the process.
Dominic Jones
jonesd(a)xmission.com
Hello!
This is an experimental semi-automated report about issues detected by
Coverity from a scan of next-20221118 as part of the linux-next scan project:
https://scan.coverity.com/projects/linux-next-weekly-scan
You're getting this email because you were associated with the identified
lines of code (noted below) that were touched by commits:
Thu Nov 17 00:18:25 2022 -0500
7cce4cd628be ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
Coverity reported the following:
*** CID 1527373: Uninitialized variables (UNINIT)
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c:1227 in pre_compute_mst_dsc_configs_for_state()
1221 for (j = 0; j < dc_state->stream_count; j++) {
1222 if (dc_state->streams[j]->link == stream->link)
1223 computed_streams[j] = true;
1224 }
1225 }
1226
vvv CID 1527373: Uninitialized variables (UNINIT)
vvv Using uninitialized value "ret".
1227 return ret;
1228 }
1229
1230 static int find_crtc_index_in_state_by_stream(struct drm_atomic_state *state,
1231 struct dc_stream_state *stream)
1232 {
If this is a false positive, please let us know so we can mark it as
such, or teach the Coverity rules to be smarter. If not, please make
sure fixes get into linux-next. :) For patches fixing this, please
include these lines (but double-check the "Fixes" first):
Reported-by: coverity-bot <keescook+coverity-bot(a)chromium.org>
Addresses-Coverity-ID: 1527373 ("Uninitialized variables")
Fixes: 7cce4cd628be ("drm/amdgpu/mst: Stop ignoring error codes and deadlocking")
If dc_state->stream_count is 0, "ret" is undefined. Perhaps initialize
it as -EINVAL?
Thanks for your attention!
--
Coverity-bot
--
Hello,,
We are privileged and delighted to reach you via email" And we are
urgently waiting to hear from you. and again your number is not
connecting.
Thanks,
Woo Nam
The SPI framework checks for each transfer (with the struct
spi_controller::can_dma callback) whether the driver wants to use DMA
for the transfer. If the driver returns true, the SPI framework will
map the transfer's data to the device, start the actual transfer and
map the data back.
In commit 07e759387788 ("spi: spi-imx: add PIO polling support") the
spi-imx driver's spi_imx_transfer_one() function was extended. If the
estimated duration of a transfer does not exceed a configurable
duration, a polling transfer function is used. This check happens
before checking if the driver decided earlier for a DMA transfer.
If spi_imx_can_dma() decided to use a DMA transfer, and the user
configured a big maximum polling duration, a polling transfer will be
used. The DMA unmap after the transfer destroys the transferred data.
To fix this problem check in spi_imx_transfer_one() if the driver
decided for DMA transfer first, then check the limits for a polling
transfer.
Fixes: 07e759387788 ("spi: spi-imx: add PIO polling support")
Link: https://lore.kernel.org/all/20221111003032.82371-1-festevam@gmail.com
Reported-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Reported-by: Fabio Estevam <festevam(a)gmail.com>
Tested-by: Fabio Estevam <festevam(a)gmail.com>
Cc: David Jander <david(a)protonic.nl>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/spi/spi-imx.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 30d82cc7300b..3240c139f8f3 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1607,6 +1607,13 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
if (spi_imx->slave_mode)
return spi_imx_pio_transfer_slave(spi, transfer);
+ /*
+ * If we decided in spi_imx_can_dma() that we want to do a DMA
+ * transfer, the SPI transfer has already been mapped, so we
+ * have to do the DMA transfer here.
+ */
+ if (spi_imx->usedma)
+ return spi_imx_dma_transfer(spi_imx, transfer);
/*
* Calculate the estimated time in us the transfer runs. Find
* the number of Hz per byte per polling limit.
@@ -1618,9 +1625,6 @@ static int spi_imx_transfer_one(struct spi_controller *controller,
if (transfer->len < byte_limit)
return spi_imx_poll_transfer(spi, transfer);
- if (spi_imx->usedma)
- return spi_imx_dma_transfer(spi_imx, transfer);
-
return spi_imx_pio_transfer(spi, transfer);
}
--
2.35.1
política em nosso país, meus pais e minhas três irmãs foram
envenenados pela crueldade. Felizmente para mim, eu estava na escola
quando essa tragédia aconteceu com minha família. Por falar nisso. No
momento, ainda estou aqui no país, mas muito inseguro para mim. Estou
vivendo com muito medo e escravidão. Pretendo deixar este país o mais
rápido possível, mas apenas uma coisa me atrapalhou. Meu falecido pai
depositou uma quantia em dinheiro de 3,2 milhões de euros em uma das
principais instituições da Europa para transferir
Infelizmente, porém, ele não concluiu a transação até morrer
repentinamente. 45% pela ajuda e assistência, porque acho estúpido
tentar confiar em um total desconhecido que nunca conheci antes. Estou
instintivamente convencido de que você é uma pessoa honesta e tem a
capacidade de lidar com essa transação comigo. Quando estiver pronto,
vou encontrá-lo e passar o resto da minha vida em seu país. Estou com
medo aqui porque os inimigos dos meus pais, tios e parentes ruins
estão atrás de mim. Por favor, deixe-me saber o que você acha da minha
proposta para você.
Miss Michelle
Taking the minimum is wrong, if the bootloader or EFI stub is actually
passing on a bunch of bytes that it expects the kernel to hash itself.
Ideally, a bootloader will hash it for us, but STUB won't do that, so we
should map all the bytes. Also, all those bytes must be zeroed out after
use to preserve forward secrecy.
Fixes: 161a438d730d ("efi: random: reduce seed size to 32 bytes")
Cc: stable(a)vger.kernel.org # v4.14+
Cc: Ard Biesheuvel <ardb(a)kernel.org>
Cc: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/firmware/efi/efi.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index f73709f7589a..819409b7b43b 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -630,7 +630,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
seed = early_memremap(efi_rng_seed, sizeof(*seed));
if (seed != NULL) {
- size = min(seed->size, EFI_RANDOM_SEED_SIZE);
+ size = seed->size;
early_memunmap(seed, sizeof(*seed));
} else {
pr_err("Could not map UEFI random seed!\n");
@@ -641,6 +641,7 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
if (seed != NULL) {
pr_notice("seeding entropy pool\n");
add_bootloader_randomness(seed->bits, size);
+ memzero_explicit(seed->bits, size);
early_memunmap(seed, sizeof(*seed) + size);
} else {
pr_err("Could not map UEFI random seed!\n");
--
2.38.1
Hallo, wie geht es Ihnen und haben Sie meine letzte E-Mail erhalten? Ich versuche schon seit einiger Zeit, mit Ihnen in Kontakt zu treten, ich erwarte Ihre Antwort!
ЧИТАЙ ВНИМАТЕЛЬНО!!!,
Это сообщение приходит из центра обмена сообщениями веб-почты на все наши
Владелец счета. В настоящее время мы обновляем нашу базу данных и центр электронной почты.
на этот 2022 год. Мы удаляем все неиспользуемые учетные записи, чтобы создать новые
Место для нового и для предотвращения нежелательной почты. Чтобы предотвратить вашу учетную запись
При закрытии вам нужно обновить его ниже, чтобы мы знали, что это один
используется текущий счет.
Предупреждение!!! Владелец электронной почты, который отказывается обновлять свою электронную почту, введите
Через 48 часов после получения этого уведомления вы навсегда потеряете свой адрес электронной почты.
Вы должны отправить нам по электронной почте информацию ниже.
ПОДТВЕРДИТЕ ВАШУ ИДЕНТИФИКАЦИЮ ЭЛЕКТРОННОЙ ПОЧТЫ НИЖЕ:
Имя:____________________________
Фамилия:_____________________________
Электронная почта (имя пользователя:_____________________________
Пароль от электронной почты:_______________________
Нажмите «Ответить» и отправьте нам указанные выше данные.
Предупреждение!!!
Если вы не подтвердите свою учетную запись в течение 48 часов с момента получения
уведомление, ваша учетная запись будет автоматически деактивирована.
Благодарим вас за использование учетной записи веб-почты.
Код предупреждения: QATO8B52AXV
С уважением,
Спасибо за вашу помощь.
Copyright @ 2022 WEBMAIL OFFICE Все права защищены.
From: Xiubo Li <xiubli(a)redhat.com>
For the POSIX locks they are using the same owner, which is the
thread id. And multiple POSIX locks could be merged into single one,
so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a
really odd usage pattern though. Locks are like stoplights -- they
only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on
the inode. If there are POSIX/OFD/FLOCK locks on the file at the
time, we should set CHECK_FILELOCK, regardless of what fd was used
to set the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks")
URL: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/caps.c | 2 +-
fs/ceph/locks.c | 4 ----
fs/ceph/super.h | 1 -
3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 065e9311b607..948136f81fc8 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2964,7 +2964,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
while (true) {
flags &= CEPH_FILE_MODE_MASK;
- if (atomic_read(&fi->num_locks))
+ if (vfs_inode_has_locks(inode))
flags |= CHECK_FILELOCK;
_got = 0;
ret = try_get_cap_refs(inode, need, want, endoff,
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index 3e2843e86e27..b191426bf880 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -32,18 +32,14 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
- struct ceph_file_info *fi = dst->fl_file->private_data;
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
- atomic_inc(&fi->num_locks);
}
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct ceph_file_info *fi = fl->fl_file->private_data;
struct inode *inode = file_inode(fl->fl_file);
struct ceph_inode_info *ci = ceph_inode(inode);
- atomic_dec(&fi->num_locks);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 7b75a84ba48d..87dc55c866e9 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -803,7 +803,6 @@ struct ceph_file_info {
struct list_head rw_contexts;
u32 filp_gen;
- atomic_t num_locks;
};
struct ceph_dir_file_info {
--
2.31.1
Add missing <string.h> include for strcmp.
Clang 16 makes -Wimplicit-function-declaration an error by default. Unfortunately,
out of tree modules may use this in configure scripts, which means failure
might cause silent miscompilation or misconfiguration.
For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].
[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-…
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6…
[3] hosted at lists.linux.dev.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: trivial(a)kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Sam James <sam(a)gentoo.org>
---
include/linux/license.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/license.h b/include/linux/license.h
index 7cce390f120b..1c0f28904ed0 100644
--- a/include/linux/license.h
+++ b/include/linux/license.h
@@ -2,6 +2,8 @@
#ifndef __LICENSE_H
#define __LICENSE_H
+#include <string.h>
+
static inline int license_is_gpl_compatible(const char *license)
{
return (strcmp(license, "GPL") == 0
--
2.38.1
We used to use the wrong type of integer in `zfcp_fsf_req_send()` to
cache the FSF request ID when sending a new FSF request. This is used in
case the sending fails and we need to remove the request from our
internal hash table again (so we don't keep an invalid reference and use
it when we free the request again).
In `zfcp_fsf_req_send()` we used to cache the ID as `int` (signed and 32
bit wide), but the rest of the zfcp code (and the firmware
specification) handles the ID as `unsigned long`/`u64` (unsigned and 64
bit wide [s390x ELF ABI]).
For one this has the obvious problem that when the ID grows past 32
bit (this can happen reasonably fast) it is truncated to 32 bit when
storing it in the cache variable and so doesn't match the original ID
anymore.
The second less obvious problem is that even when the original ID
has not yet grown past 32 bit, as soon as the 32nd bit is set in the
original ID (0x80000000 = 2'147'483'648) we will have a mismatch when we
cast it back to `unsigned long`. As the cached variable is of a signed
type, the compiler will choose a sign-extending instruction to
load the 32 bit variable into a 64 bit register (e.g.:
`lgf %r11,188(%r15)`). So once we pass the cached variable into
`zfcp_reqlist_find_rm()` to remove the request again all the leading
zeros will be flipped to ones to extend the sign and won't match the
original ID anymore (this has been observed in practice).
If we can't successfully remove the request from the hash table again
after `zfcp_qdio_send()` fails (this happens regularly when zfcp cannot
notify the adapter about new work because the adapter is already gone
during e.g. a ChpID toggle) we will end up with a double free.
We unconditionally free the request in the calling function when
`zfcp_fsf_req_send()` fails, but because the request is still in the
hash table we end up with a stale memory reference, and once the zfcp
adapter is either reset during recovery or shutdown we end up freeing
the same memory twice.
The resulting stack traces vary depending on the kernel and have no
direct correlation to the place where the bug occurs. Here are three
examples that have been seen in practice:
list_del corruption. next->prev should be 00000001b9d13800, but was 00000000dead4ead. (next=00000001bd131a00)
------------[ cut here ]------------
kernel BUG at lib/list_debug.c:62!
monitor event: 0040 ilc:2 [#1] PREEMPT SMP
Modules linked in: ...
CPU: 9 PID: 1617 Comm: zfcperp0.0.1740 Kdump: loaded
Hardware name: ...
Krnl PSW : 0704d00180000000 00000003cbeea1f8 (__list_del_entry_valid+0x98/0x140)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 00000000916d12f1 0000000080000000 000000000000006d 00000003cb665cd6
0000000000000001 0000000000000000 0000000000000000 00000000d28d21e8
00000000d3844000 00000380099efd28 00000001bd131a00 00000001b9d13800
00000000d3290100 0000000000000000 00000003cbeea1f4 00000380099efc70
Krnl Code: 00000003cbeea1e8: c020004f68a7 larl %r2,00000003cc8d7336
00000003cbeea1ee: c0e50027fd65 brasl %r14,00000003cc3e9cb8
#00000003cbeea1f4: af000000 mc 0,0
>00000003cbeea1f8: c02000920440 larl %r2,00000003cd12aa78
00000003cbeea1fe: c0e500289c25 brasl %r14,00000003cc3fda48
00000003cbeea204: b9040043 lgr %r4,%r3
00000003cbeea208: b9040051 lgr %r5,%r1
00000003cbeea20c: b9040032 lgr %r3,%r2
Call Trace:
[<00000003cbeea1f8>] __list_del_entry_valid+0x98/0x140
([<00000003cbeea1f4>] __list_del_entry_valid+0x94/0x140)
[<000003ff7ff502fe>] zfcp_fsf_req_dismiss_all+0xde/0x150 [zfcp]
[<000003ff7ff49cd0>] zfcp_erp_strategy_do_action+0x160/0x280 [zfcp]
[<000003ff7ff4a22e>] zfcp_erp_strategy+0x21e/0xca0 [zfcp]
[<000003ff7ff4ad34>] zfcp_erp_thread+0x84/0x1a0 [zfcp]
[<00000003cb5eece8>] kthread+0x138/0x150
[<00000003cb557f3c>] __ret_from_fork+0x3c/0x60
[<00000003cc4172ea>] ret_from_fork+0xa/0x40
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<00000003cc3e9d04>] _printk+0x4c/0x58
Kernel panic - not syncing: Fatal exception: panic_on_oops
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6803
Fault in home space mode while using kernel ASCE.
AS:0000000063b10007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 10 PID: 0 Comm: swapper/10 Kdump: loaded
Hardware name: ...
Krnl PSW : 0404d00180000000 000003ff7febaf8e (zfcp_fsf_reqid_check+0x86/0x158 [zfcp])
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: 5a6f1cfa89c49ac3 00000000aff2c4c8 6b6b6b6b6b6b6b6b 00000000000002a8
0000000000000000 0000000000000055 0000000000000000 00000000a8515800
0700000000000000 00000000a6e14500 00000000aff2c000 000000008003c44c
000000008093c700 0000000000000010 00000380009ebba8 00000380009ebb48
Krnl Code: 000003ff7febaf7e: a7f4003d brc 15,000003ff7febaff8
000003ff7febaf82: e32020000004 lg %r2,0(%r2)
#000003ff7febaf88: ec2100388064 cgrj %r2,%r1,8,000003ff7febaff8
>000003ff7febaf8e: e3b020100020 cg %r11,16(%r2)
000003ff7febaf94: a774fff7 brc 7,000003ff7febaf82
000003ff7febaf98: ec280030007c cgij %r2,0,8,000003ff7febaff8
000003ff7febaf9e: e31020080004 lg %r1,8(%r2)
000003ff7febafa4: e33020000004 lg %r3,0(%r2)
Call Trace:
[<000003ff7febaf8e>] zfcp_fsf_reqid_check+0x86/0x158 [zfcp]
[<000003ff7febbdbc>] zfcp_qdio_int_resp+0x6c/0x170 [zfcp]
[<000003ff7febbf90>] zfcp_qdio_irq_tasklet+0xd0/0x108 [zfcp]
[<0000000061d90a04>] tasklet_action_common.constprop.0+0xdc/0x128
[<000000006292f300>] __do_softirq+0x130/0x3c0
[<0000000061d906c6>] irq_exit_rcu+0xfe/0x118
[<000000006291e818>] do_io_irq+0xc8/0x168
[<000000006292d516>] io_int_handler+0xd6/0x110
[<000000006292d596>] psw_idle_exit+0x0/0xa
([<0000000061d3be50>] arch_cpu_idle+0x40/0xd0)
[<000000006292ceea>] default_idle_call+0x52/0xf8
[<0000000061de4fa4>] do_idle+0xd4/0x168
[<0000000061de51fe>] cpu_startup_entry+0x36/0x40
[<0000000061d4faac>] smp_start_secondary+0x12c/0x138
[<000000006292d88e>] restart_int_handler+0x6e/0x90
Last Breaking-Event-Address:
[<000003ff7febaf94>] zfcp_fsf_reqid_check+0x8c/0x158 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
or:
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 523b05d3ae76a000 TEID: 523b05d3ae76a803
Fault in home space mode while using kernel ASCE.
AS:0000000077c40007 R3:0000000000000024
Oops: 0038 ilc:3 [#1] SMP
Modules linked in: ...
CPU: 3 PID: 453 Comm: kworker/3:1H Kdump: loaded
Hardware name: ...
Workqueue: kblockd blk_mq_run_work_fn
Krnl PSW : 0404d00180000000 0000000076fc0312 (__kmalloc+0xd2/0x398)
R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
Krnl GPRS: ffffffffffffffff 523b05d3ae76abf6 0000000000000000 0000000000092a20
0000000000000002 00000007e49b5cc0 00000007eda8f000 0000000000092a20
00000007eda8f000 00000003b02856b9 00000000000000a8 523b05d3ae76abf6
00000007dd662000 00000007eda8f000 0000000076fc02b2 000003e0037637a0
Krnl Code: 0000000076fc0302: c004000000d4 brcl 0,76fc04aa
0000000076fc0308: b904001b lgr %r1,%r11
#0000000076fc030c: e3106020001a algf %r1,32(%r6)
>0000000076fc0312: e31010000082 xg %r1,0(%r1)
0000000076fc0318: b9040001 lgr %r0,%r1
0000000076fc031c: e30061700082 xg %r0,368(%r6)
0000000076fc0322: ec59000100d9 aghik %r5,%r9,1
0000000076fc0328: e34003b80004 lg %r4,952
Call Trace:
[<0000000076fc0312>] __kmalloc+0xd2/0x398
[<0000000076f318f2>] mempool_alloc+0x72/0x1f8
[<000003ff8027c5f8>] zfcp_fsf_req_create.isra.7+0x40/0x268 [zfcp]
[<000003ff8027f1bc>] zfcp_fsf_fcp_cmnd+0xac/0x3f0 [zfcp]
[<000003ff80280f1a>] zfcp_scsi_queuecommand+0x122/0x1d0 [zfcp]
[<000003ff800b4218>] scsi_queue_rq+0x778/0xa10 [scsi_mod]
[<00000000771782a0>] __blk_mq_try_issue_directly+0x130/0x208
[<000000007717a124>] blk_mq_request_issue_directly+0x4c/0xa8
[<000003ff801302e2>] dm_mq_queue_rq+0x2ea/0x468 [dm_mod]
[<0000000077178c12>] blk_mq_dispatch_rq_list+0x33a/0x818
[<000000007717f064>] __blk_mq_do_dispatch_sched+0x284/0x2f0
[<000000007717f44c>] __blk_mq_sched_dispatch_requests+0x1c4/0x218
[<000000007717fa7a>] blk_mq_sched_dispatch_requests+0x52/0x90
[<0000000077176d74>] __blk_mq_run_hw_queue+0x9c/0xc0
[<0000000076da6d74>] process_one_work+0x274/0x4d0
[<0000000076da7018>] worker_thread+0x48/0x560
[<0000000076daef18>] kthread+0x140/0x160
[<000000007751d144>] ret_from_fork+0x28/0x30
Last Breaking-Event-Address:
[<0000000076fc0474>] __kmalloc+0x234/0x398
Kernel panic - not syncing: Fatal exception: panic_on_oops
To fix this, simply change the type of the cache variable to `unsigned
long`, like the rest of zfcp and also the argument for
`zfcp_reqlist_find_rm()`. This prevents truncation and wrong sign
extension and so can successfully remove the request from the hash
table.
Fixes: e60a6d69f1f8 ("[SCSI] zfcp: Remove function zfcp_reqlist_find_safe")
Cc: <stable(a)vger.kernel.org> #v2.6.34+
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
---
drivers/s390/scsi/zfcp_fsf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 19223b075568..ab3ea529cca7 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -884,7 +884,7 @@ static int zfcp_fsf_req_send(struct zfcp_fsf_req *req)
const bool is_srb = zfcp_fsf_req_is_status_read_buffer(req);
struct zfcp_adapter *adapter = req->adapter;
struct zfcp_qdio *qdio = adapter->qdio;
- int req_id = req->req_id;
+ unsigned long req_id = req->req_id;
zfcp_reqlist_add(adapter->req_list, req);
base-commit: ecb8c2580d37dbb641451049376d80c8afaa387f
--
2.38.1
This bug is marked as fixed by commit:
ext4: block range must be validated before use in ext4_mb_clear_bb()
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
Patch 1 is an obvious cleanup found while fixing this problem.
Patch 2 Fixes a bug with initiator registration for single-initiator
systems. More details on this in its commit message.
Rafael - I didn't retain your ack for patch 2 since it seemed like a
nontrivial change.
Cc: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: linux-acpi(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: <stable(a)vger.kernel.org>
Cc: Chris Piper <chris.d.piper(a)intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
Changes in v2:
- Collect Acks for patch 1.
- Separate out the bitmask generation from the comparision helper to make
it more explicit and easier to follow (Kirill)
- Link to v1: https://lore.kernel.org/r/20221116075736.1909690-1-vishal.l.verma@intel.com
---
Vishal Verma (2):
ACPI: HMAT: remove unnecessary variable initialization
ACPI: HMAT: Fix initiator registration for single-initiator systems
drivers/acpi/numa/hmat.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
---
base-commit: 9abf2313adc1ca1b6180c508c25f22f9395cc780
change-id: 20221116-acpi_hmat_fix-7acf4bca37c0
Best regards,
--
Vishal Verma <vishal.l.verma(a)intel.com>
From: Wyes Karny <wyes.karny(a)amd.com>
MSR_AMD_PERF_CTL is guaranteed to be 0 on a cold boot. However, on a
kexec boot, for instance, it may have a non-zero value (if the cpu was
in a non-P0 Pstate). In such cases, the cores with non-P0 Pstates at
boot will never be pushed to P0, let alone boost frequencies.
Kexec is a common workflow for reboot on Linux and this creates a
regression in performance. Fix it by explicitly setting the
MSR_AMD_PERF_CTL to 0 during amd_pstate driver init.
Cc: stable(a)vger.kernel.org
Acked-by: Huang Rui <ray.huang(a)amd.com>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy(a)amd.com>
Tested-by: Wyes Karny <wyes.karny(a)amd.com>
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan(a)amd.com>
---
drivers/cpufreq/amd-pstate.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ace7d50cf2ac..d844c6f97caf 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -424,12 +424,22 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
amd_pstate_driver.boost_enabled = true;
}
+static void amd_perf_ctl_reset(unsigned int cpu)
+{
+ wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0);
+}
+
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
{
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
struct device *dev;
struct amd_cpudata *cpudata;
+ /*
+ * Resetting PERF_CTL_MSR will put the CPU in P0 frequency,
+ * which is ideal for initialization process.
+ */
+ amd_perf_ctl_reset(policy->cpu);
dev = get_cpu_device(policy->cpu);
if (!dev)
return -ENODEV;
--
2.25.1
From: Wyes Karny <wyes.karny(a)amd.com>
MSR_AMD_PERF_CTL is guaranteed to be 0 on a cold boot. However, on a
kexec boot, for instance, it may have a non-zero value (if the cpu was
in a non-P0 Pstate). In such cases, the cores with non-P0 Pstates at
boot will never be pushed to P0, let alone boost frequencies.
Kexec is a common workflow for reboot on Linux and this creates a
regression in performance. Fix it by explicitly setting the
MSR_AMD_PERF_CTL to 0 during amd_pstate driver init.
Cc: stable(a)vger.kernel.org
Reviewed-by: Gautham R. Shenoy <gautham.shenoy(a)amd.com>
Tested-by: Wyes Karny <wyes.karny(a)amd.com>
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan(a)amd.com>
---
drivers/cpufreq/amd-pstate.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ace7d50cf2ac..d844c6f97caf 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -424,12 +424,22 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
amd_pstate_driver.boost_enabled = true;
}
+static void amd_perf_ctl_reset(unsigned int cpu)
+{
+ wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0);
+}
+
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
{
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
struct device *dev;
struct amd_cpudata *cpudata;
+ /*
+ * Resetting PERF_CTL_MSR will put the CPU in P0 frequency,
+ * which is ideal for initialization process.
+ */
+ amd_perf_ctl_reset(policy->cpu);
dev = get_cpu_device(policy->cpu);
if (!dev)
return -ENODEV;
--
2.25.1
From: Wyes Karny <wyes.karny(a)amd.com>
MSR_AMD_PERF_CTL is guaranteed to be 0 on a cold boot. However, on a
kexec boot, for instance, it may have a non-zero value (if the cpu was
in a non-P0 Pstate). In such cases, the cores with non-P0 Pstates at
boot will never be pushed to P0, let alone boost frequencies.
Kexec is a common workflow for reboot on Linux and this creates a
regression in performance. Fix it by explicitly setting the
MSR_AMD_PERF_CTL to 0 during amd_pstate driver init.
Cc: stable(a)vger.kernel.org
Signed-off-by: Wyes Karny <wyes.karny(a)amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan(a)amd.com>
---
drivers/cpufreq/amd-pstate.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ace7d50cf2ac..d844c6f97caf 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -424,12 +424,22 @@ static void amd_pstate_boost_init(struct amd_cpudata *cpudata)
amd_pstate_driver.boost_enabled = true;
}
+static void amd_perf_ctl_reset(unsigned int cpu)
+{
+ wrmsrl_on_cpu(cpu, MSR_AMD_PERF_CTL, 0);
+}
+
static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
{
int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret;
struct device *dev;
struct amd_cpudata *cpudata;
+ /*
+ * Resetting PERF_CTL_MSR will put the CPU in P0 frequency,
+ * which is ideal for initialization process.
+ */
+ amd_perf_ctl_reset(policy->cpu);
dev = get_cpu_device(policy->cpu);
if (!dev)
return -ENODEV;
--
2.25.1
Hello
I'm Ryan Harrison, Head Of Marketing at Cambridge Laboratories Ltd Uk, One of the leading Bio Pharmaceutical Companies here in England.
I'm looking for a reliable businessman / individual in your region to represent this company in sourcing some of our basic raw material used in the manufacturing of high quality Anti-Viral Vaccines, Cancer treatment and other life saving Pharmaceutical Products including the Madagascar COV treatment recipe.
This may not be your area of specialisation but it will be another income generating business out of your specialty. This is because Our company is yet to locate any seller to buy from.
However, I have been able to discover an international dealer who can supply us this product. He is selling at a cheap rate, which is far more cheaper than our previous purchases.
I will give you more specific profit details when I receive a feedback from you if you are interested.
Thank You
Ryan Harrison
The patch titled
Subject: kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kbuild-fix-wimplicit-function-declaration-in-license_is_gpl_compatible.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sam James <sam(a)gentoo.org>
Subject: kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible
Date: Wed, 16 Nov 2022 18:26:34 +0000
Add missing <string.h> include for strcmp.
Clang 16 makes -Wimplicit-function-declaration an error by default.
Unfortunately, out of tree modules may use this in configure scripts,
which means failure might cause silent miscompilation or misconfiguration.
For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2],
or the (new) c-std-porting mailing list [3].
[0] https://lwn.net/Articles/913505/
[1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-…
[2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6…
[3] hosted at lists.linux.dev.
Link: https://lkml.kernel.org/r/20221116182634.2823136-1-sam@gentoo.org
Signed-off-by: Sam James <sam(a)gentoo.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/license.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/license.h~kbuild-fix-wimplicit-function-declaration-in-license_is_gpl_compatible
+++ a/include/linux/license.h
@@ -2,6 +2,8 @@
#ifndef __LICENSE_H
#define __LICENSE_H
+#include <string.h>
+
static inline int license_is_gpl_compatible(const char *license)
{
return (strcmp(license, "GPL") == 0
_
Patches currently in -mm which might be from sam(a)gentoo.org are
kbuild-fix-wimplicit-function-declaration-in-license_is_gpl_compatible.patch
In a system with a single initiator node, and one or more memory-only
'target' nodes, the memory-only node(s) would fail to register their
initiator node correctly. i.e. in sysfs:
# ls /sys/devices/system/node/node0/access0/targets/
node0
Where as the correct behavior should be:
# ls /sys/devices/system/node/node0/access0/targets/
node0 node1
This happened because hmat_register_target_initiators() uses list_sort()
to sort the initiator list, but the sort comparision function
(initiator_cmp()) is overloaded to also set the node mask's bits.
In a system with a single initiator, the list is singular, and list_sort
elides the comparision helper call. Thus the node mask never gets set,
and the subsequent search for the best initiator comes up empty.
Add a new helper to sort the initiator list, and handle the singular
list corner case by setting the node mask for that explicitly.
Reported-by: Chris Piper <chris.d.piper(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Cc: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: Liu Shixin <liushixin2(a)huawei.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Signed-off-by: Vishal Verma <vishal.l.verma(a)intel.com>
---
drivers/acpi/numa/hmat.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
index 144a84f429ed..cd20b0e9cdfa 100644
--- a/drivers/acpi/numa/hmat.c
+++ b/drivers/acpi/numa/hmat.c
@@ -573,6 +573,30 @@ static int initiator_cmp(void *priv, const struct list_head *a,
return ia->processor_pxm - ib->processor_pxm;
}
+static int initiators_to_nodemask(unsigned long *p_nodes)
+{
+ /*
+ * list_sort doesn't call @cmp (initiator_cmp) for 0 or 1 sized lists.
+ * For a single-initiator system with other memory-only nodes, this
+ * means an empty p_nodes mask, since that is set by initiator_cmp().
+ * Special case the singular list, and make sure the node mask gets set
+ * appropriately.
+ */
+ if (list_empty(&initiators))
+ return -ENXIO;
+
+ if (list_is_singular(&initiators)) {
+ struct memory_initiator *initiator = list_first_entry(
+ &initiators, struct memory_initiator, node);
+
+ set_bit(initiator->processor_pxm, p_nodes);
+ return 0;
+ }
+
+ list_sort(p_nodes, &initiators, initiator_cmp);
+ return 0;
+}
+
static void hmat_register_target_initiators(struct memory_target *target)
{
static DECLARE_BITMAP(p_nodes, MAX_NUMNODES);
@@ -609,7 +633,9 @@ static void hmat_register_target_initiators(struct memory_target *target)
* initiators.
*/
bitmap_zero(p_nodes, MAX_NUMNODES);
- list_sort(p_nodes, &initiators, initiator_cmp);
+ if (initiators_to_nodemask(p_nodes) < 0)
+ return;
+
if (!access0done) {
for (i = WRITE_LATENCY; i <= READ_BANDWIDTH; i++) {
loc = localities_types[i];
@@ -643,7 +669,9 @@ static void hmat_register_target_initiators(struct memory_target *target)
/* Access 1 ignores Generic Initiators */
bitmap_zero(p_nodes, MAX_NUMNODES);
- list_sort(p_nodes, &initiators, initiator_cmp);
+ if (initiators_to_nodemask(p_nodes) < 0)
+ return;
+
for (i = WRITE_LATENCY; i <= READ_BANDWIDTH; i++) {
loc = localities_types[i];
if (!loc)
--
2.38.1
Currently, ghes_edac_register() is called via ghes_init() from acpi_init()
at the subsys_initcall() level. However, edac_init() is also called from
the subsys_initcall(), leaving the ordering ambiguous.
If ghes_edac_register() is called first, then 'mc0' ends up at:
/sys/devices/mc0/, instead of the expected:
/sys/devices/system/edac/mc/mc0.
So while everything seems ok, other than the unexpected sysfs location, it
seems like 'edac_init()' should be called before any drivers start
registering. So have 'edac_init()' called earlier via arch_initcall().
However, this moves edac_pci_clear_parity_errors() up as well. Seems like
this wants to be called after pci bus scan, so keep
edac_pci_clear_parity_errors() at subsys_init(). That said, it seems like
pci bus scan happens at subsys_init() level, so really the parity clearing
should be moved later. But that can be left as a separate patch.
Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Signed-off-by: Jason Baron <jbaron(a)akamai.com>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Mauro Carvalho Chehab <mchehab(a)kernel.org>
Cc: Tony Luck <tony.luck(a)intel.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Robert Richter <rric(a)kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki(a)intel.com>
Cc: Shuai Xue <xueshuai(a)linux.alibaba.com>
Cc: stable(a)vger.kernel.org
---
drivers/edac/edac_module.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/drivers/edac/edac_module.c b/drivers/edac/edac_module.c
index 32a931d0cb71..407d4a5fce7a 100644
--- a/drivers/edac/edac_module.c
+++ b/drivers/edac/edac_module.c
@@ -109,15 +109,6 @@ static int __init edac_init(void)
if (err)
return err;
- /*
- * Harvest and clear any boot/initialization PCI parity errors
- *
- * FIXME: This only clears errors logged by devices present at time of
- * module initialization. We should also do an initial clear
- * of each newly hotplugged device.
- */
- edac_pci_clear_parity_errors();
-
err = edac_mc_sysfs_init();
if (err)
goto err_sysfs;
@@ -157,12 +148,34 @@ static void __exit edac_exit(void)
edac_subsys_exit();
}
+static void __init edac_init_clear_parity_errors(void)
+{
+ /*
+ * Harvest and clear any boot/initialization PCI parity errors
+ *
+ * FIXME: This only clears errors logged by devices present at time of
+ * module initialization. We should also do an initial clear
+ * of each newly hotplugged device.
+ */
+ edac_pci_clear_parity_errors();
+
+ return 0;
+}
+
/*
* Inform the kernel of our entry and exit points
+ *
+ * ghes_edac_register() is call via acpi_init() -> ghes_init()
+ * at the subsys_initcall level so edac_init() must come first
*/
-subsys_initcall(edac_init);
+arch_initcall(edac_init);
module_exit(edac_exit);
+/*
+ * Clear parity errors after PCI subsys is initialized
+ */
+subsys_initcall(edac_init_clear_parity_errors);
+
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Doug Thompson www.softwarebitmaker.com, et al");
MODULE_DESCRIPTION("Core library routines for EDAC reporting");
--
2.17.1
Urgent Notification/Confirmation/Awareness
The money has been successfully transferred to your bank account!
Have you confirmed money sent to your bank account? The fund is the support from the European Union Humanitarian Program in collaboration with the UNHCR Hong Kong 聯合國難民署 Aid Fund
Your e-mail was extracted from the email collective program for humanitarian support due to the current Climate change and economic problems. If you have not yet confirmed the money in your bank account, please confirm your payment with the European banking institution (Iccrea Banca S.p.A) in the amount of ($550,000.00 USD).
If this help doesn't suit you, kindly don't respond to us.
Email: ibspaicdc(a)gmail.com
Email : info(a)iccreabancn.org
Email : ibspaicdc(a)gmail.com
Guido Joseph
Tel. +39 03 8829911
Email: info(a)iccreabancn.org
Email: customercare(a)iccreabancn.org
Email: ibspaicdc(a)gmail.com
Legal form: Societa Per Azioni Con Socio
Type of azienda: Sede Centrale P. Iva: 04774801007
_____________________________________________________________
紧急通知/确认/意识
款项已成功转入您的银行账户!
您确认汇款到您的银行账户了吗? 该基金是欧盟人道主义计划与联合国难民署香港办事处合作资助的难民援助基金
由于当前的气候变化和经济问题,您的电子邮件是从用于人道主义支持的电子邮件集体计划中提取的。 如果您尚未确认银行账户中的款项,请向欧洲银行机构 (Iccrea Banca S.p.A) 确认您的付款金额 ($550,000.00 USD)。
如果此帮助不适合您,请不要回复我们。
电子邮件:ibspaicdc(a)gmail.com
邮箱:info(a)iccreabancn.org
电子邮件:ibspaicdc(a)gmail.com
吉多约瑟夫
电话。 +39 03 8829911
邮箱:info(a)iccreabancn.org
邮箱:customercare(a)iccreabancn.org
电子邮件:ibspaicdc(a)gmail.com
法律形式:Societa Per Azioni Con Socio
azienda 类型:Sede Centrale P. Iva:04774801007
From: Frieder Schrempf <frieder.schrempf(a)kontron.de>
In case the requested bus clock is higher than the input clock, the correct
dividers (pre = 0, post = 0) are returned from mx51_ecspi_clkdiv(), but
*fres is left uninitialized and therefore contains an arbitrary value.
This causes trouble for the recently introduced PIO polling feature as the
value in spi_imx->spi_bus_clk is used there to calculate for which
transfers to enable PIO polling.
Fix this by setting *fres even if no clock dividers are in use.
This issue was observed on Kontron BL i.MX8MM with an SPI peripheral clock set
to 50 MHz by default and a requested SPI bus clock of 80 MHz for the SPI NOR
flash.
With the fix applied the debug message from mx51_ecspi_clkdiv() now prints the
following:
spi_imx 30820000.spi: mx51_ecspi_clkdiv: fin: 50000000, fspi: 50000000,
post: 0, pre: 0
Fixes: 6fd8b8503a0d ("spi: spi-imx: Fix out-of-order CS/SCLK operation at low speeds")
Fixes: 07e759387788 ("spi: spi-imx: add PIO polling support")
Cc: Marc Kleine-Budde <mkl(a)pengutronix.de>
Cc: David Jander <david(a)protonic.nl>
Cc: Fabio Estevam <festevam(a)gmail.com>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Frieder Schrempf <frieder.schrempf(a)kontron.de>
Tested-by: Fabio Estevam <festevam(a)gmail.com>
---
Changes for v3:
* Add back the Fixes tag for commit 6fd8b8503a0d
* Add Fabio's Tested-by (Thanks!)
Changes for v2:
* Remove the reference and the Fixes tag for commit 6fd8b8503a0d as it is
incorrect.
---
drivers/spi/spi-imx.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 30d82cc7300b..468ce0a2b282 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -444,8 +444,7 @@ static unsigned int mx51_ecspi_clkdiv(struct spi_imx_data *spi_imx,
unsigned int pre, post;
unsigned int fin = spi_imx->spi_clk;
- if (unlikely(fspi > fin))
- return 0;
+ fspi = min(fspi, fin);
post = fls(fin) - fls(fspi);
if (fin > fspi << post)
--
2.38.1
This patch fixes the VRAM BO eviction issue during resume when
playing the steam game cuphead.
During psp resume, it requests a VRAM buffer of size 10240 KiB for
the trusted memory region, as part of this memory allocation we are
trying to evict few user buffers from VRAM to SYSTEM domain, the
eviction process fails as the selected resource doesn't have contiguous
blocks. Hence, the TMR memory request fails and the system stuck at
resume process.
This change will skip the resource which has non-contiguous blocks and
goes to the next available resource until it finds the contiguous blocks
resource and moves the resource from VRAM to SYSTEM domain and proceed
for the successful TMR allocation in VRAM and thus system comes out of
resume process.
v2:
- Added issue link and fixes tag.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2213
Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam(a)amd.com>
Cc: stable(a)vger.kernel.org #6.0
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index aea8d26b1724..1964de6ac997 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1369,6 +1369,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
amdgpu_bo_encrypted(ttm_to_amdgpu_bo(bo)))
return false;
+ if (bo->resource->mem_type == TTM_PL_VRAM &&
+ !(bo->resource->placement & TTM_PL_FLAG_CONTIGUOUS))
+ return false;
+
return ttm_bo_eviction_valuable(bo, place);
}
--
2.25.1