From: Yu Kuai <yukuai3(a)huawei.com>
commit e4c4871a73944353ea23e319de27ef73ce546623 upstream.
commit b1a811633f73 ("block: nbd: add sanity check for first_minor")
checks that 'first_minor' should not be greater than 0xff, which is
wrong. Whitout the commit, the details that when user pass 0x100000,
it ends up create sysfs dir "/sys/block/43:0" are as follows:
nbd_dev_add
disk->first_minor = index << part_shift
-> default part_shift is 5, first_minor is 0x2000000
device_add_disk
ddev->devt = MKDEV(disk->major, disk->first_minor)
-> (0x2b << 20) | (0x2000000) = 0x2b00000
device_add
device_create_sys_dev_entry
format_dev_t
sprintf(buffer, "%u:%u", MAJOR(dev), MINOR(dev));
-> got 43:0
sysfs_create_link -> /sys/block/43:0
By the way, with the wrong fix, when part_shift is the default value,
only 8 ndb devices can be created since 8 << 5 is greater than 0xff.
Since the max bits for 'first_minor' should be the same as what
MKDEV() does, which is 20. Change the upper bound of 'first_minor'
from 0xff to 0xfffff.
Fixes: b1a811633f73 ("block: nbd: add sanity check for first_minor")
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-2-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Wen Yang <wenyang.linux(a)foxmail.com>
---
drivers/block/nbd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index bf8148ebd858..bd05eaf04143 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1773,11 +1773,11 @@ static int nbd_dev_add(int index)
disk->major = NBD_MAJOR;
/* Too big first_minor can cause duplicate creation of
- * sysfs files/links, since first_minor will be truncated to
- * byte in __device_add_disk().
+ * sysfs files/links, since MKDEV() expect that the max bits of
+ * first_minor is 20.
*/
disk->first_minor = index << part_shift;
- if (disk->first_minor > 0xff) {
+ if (disk->first_minor > MINORMASK) {
err = -EINVAL;
goto out_free_idr;
}
--
2.37.2
[Public]
Hi,
We had noticed some WCN6855 hardware we have BT wasn't working in the LTS kernel. The IDs for the variants we have were introduced in 6.2.
Can you please add this to 6.1.y?
ca2a99447e17 ("Bluetooth: btusb: Add more device IDs for WCN6855")
Thanks,
Hi Greg and Sasha,
Please consider applying the following commits to 6.2 (they apply
cleanly with some fuzz for me):
db7adcfd1cec ("x86/alternatives: Introduce int3_emulate_jcc()")
ac0ee0a9560c ("x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions")
923510c88d2b ("x86/static_call: Add support for Jcc tail-calls")
I have attached backports to 6.1.
Without these changes, kernels built with any version of clang using -Os
or clang 17.0.0 (tip of tree) at any kernel-supported optimization level
do not boot. This has been tested by people affected by this problem,
see https://github.com/ClangBuiltLinux/linux/issues/1774 for more info.
If there are any problems, please let me know.
Cheers,
Nathan
From: Nathan Chancellor <nathan(a)kernel.org>
Now that CONFIG_PAHOLE_VERSION exists, use it in the definition of
CONFIG_PAHOLE_HAS_SPLIT_BTF and CONFIG_PAHOLE_HAS_BTF_TAG to reduce the
amount of duplication across the tree.
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-5-nathan@kernel.org
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
lib/Kconfig.debug | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f71db0cc3bf1..0743c9567d7e 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -328,7 +328,15 @@ config DEBUG_INFO_BTF
DWARF type info into equivalent deduplicated BTF type info.
config PAHOLE_HAS_SPLIT_BTF
- def_bool $(success, test `$(PAHOLE) --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'` -ge "119")
+ def_bool PAHOLE_VERSION >= 119
+
+config PAHOLE_HAS_BTF_TAG
+ def_bool PAHOLE_VERSION >= 123
+ depends on CC_IS_CLANG
+ help
+ Decide whether pahole emits btf_tag attributes (btf_type_tag and
+ btf_decl_tag) or not. Currently only clang compiler implements
+ these attributes, so make the config depend on CC_IS_CLANG.
config DEBUG_INFO_BTF_MODULES
def_bool y
--
2.39.2.637.g21b0678d19-goog
From: Jim Mattson <jmattson(a)google.com>
commit 2e7eab81425ad6c875f2ed47c0ce01e78afc38a5 upstream.
According to Intel's document on Indirect Branch Restricted
Speculation, "Enabling IBRS does not prevent software from controlling
the predicted targets of indirect branches of unrelated software
executed later at the same predictor mode (for example, between two
different user applications, or two different virtual machines). Such
isolation can be ensured through use of the Indirect Branch Predictor
Barrier (IBPB) command." This applies to both basic and enhanced IBRS.
Since L1 and L2 VMs share hardware predictor modes (guest-user and
guest-kernel), hardware IBRS is not sufficient to virtualize
IBRS. (The way that basic IBRS is implemented on pre-eIBRS parts,
hardware IBRS is actually sufficient in practice, even though it isn't
sufficient architecturally.)
For virtual CPUs that support IBRS, add an indirect branch prediction
barrier on emulated VM-exit, to ensure that the predicted targets of
indirect branches executed in L1 cannot be controlled by software that
was executed in L2.
Since we typically don't intercept guest writes to IA32_SPEC_CTRL,
perform the IBPB at emulated VM-exit regardless of the current
IA32_SPEC_CTRL.IBRS value, even though the IBPB could technically be
deferred until L1 sets IA32_SPEC_CTRL.IBRS, if IA32_SPEC_CTRL.IBRS is
clear at emulated VM-exit.
This is CVE-2022-2196.
Fixes: 5c911beff20a ("KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02")
Cc: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Link: https://lore.kernel.org/r/20221019213620.1953281-3-jmattson@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)eng.windriver.com>
---
arch/x86/kvm/vmx/nested.c | 11 +++++++++++
arch/x86/kvm/vmx/vmx.c | 6 ++++--
2 files changed, 15 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index cdebeceedbd0..f3c136548af6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4617,6 +4617,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx_switch_vmcs(vcpu, &vmx->vmcs01);
+ /*
+ * If IBRS is advertised to the vCPU, KVM must flush the indirect
+ * branch predictors when transitioning from L2 to L1, as L1 expects
+ * hardware (KVM in this case) to provide separate predictor modes.
+ * Bare metal isolates VMX root (host) from VMX non-root (guest), but
+ * doesn't isolate different VMCSs, i.e. in this case, doesn't provide
+ * separate modes for L2 vs L1.
+ */
+ if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL))
+ indirect_branch_prediction_barrier();
+
/* Update any VMCS fields that might have changed while L2 ran */
vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr);
vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0718658268fe..c849173b60c2 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1332,8 +1332,10 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
/*
* No indirect branch prediction barrier needed when switching
- * the active VMCS within a guest, e.g. on nested VM-Enter.
- * The L1 VMM can protect itself with retpolines, IBPB or IBRS.
+ * the active VMCS within a vCPU, unless IBRS is advertised to
+ * the vCPU. To minimize the number of IBPBs executed, KVM
+ * performs IBPB on nested VM-Exit (a single nested transition
+ * may switch the active VMCS multiple times).
*/
if (!buddy || WARN_ON_ONCE(buddy->vmcs != prev))
indirect_branch_prediction_barrier();
--
2.39.1
From: Zheng Wang <zyytlz.wz(a)163.com>
commit 4a61648af68f5ba4884f0e3b494ee1cabc4b6620 upstream.
If intel_gvt_dma_map_guest_page failed, it will call
ppgtt_invalidate_spt, which will finally free the spt.
But the caller function ppgtt_populate_spt_by_guest_entry
does not notice that, it will free spt again in its error
path.
Fix this by canceling the mapping of DMA address and freeing sub_spt.
Besides, leave the handle of spt destroy to caller function instead
of callee function when error occurs.
Fixes: b901b252b6cf ("drm/i915/gvt: Add 2M huge gtt support")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Reviewed-by: Zhenyu Wang <zhenyuw(a)linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw(a)linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20221229165641.1192455-1-zyytl…
Signed-off-by: Ovidiu Panait <ovidiu.panait(a)eng.windriver.com>
---
Backport of CVE-2022-3707 fix.
drivers/gpu/drm/i915/gvt/gtt.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c
index e5c2fdfc20e3..7ea7abef6143 100644
--- a/drivers/gpu/drm/i915/gvt/gtt.c
+++ b/drivers/gpu/drm/i915/gvt/gtt.c
@@ -1195,10 +1195,8 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
for_each_shadow_entry(sub_spt, &sub_se, sub_index) {
ret = intel_gvt_hypervisor_dma_map_guest_page(vgpu,
start_gfn + sub_index, PAGE_SIZE, &dma_addr);
- if (ret) {
- ppgtt_invalidate_spt(spt);
- return ret;
- }
+ if (ret)
+ goto err;
sub_se.val64 = se->val64;
/* Copy the PAT field from PDE. */
@@ -1217,6 +1215,17 @@ static int split_2MB_gtt_entry(struct intel_vgpu *vgpu,
ops->set_pfn(se, sub_spt->shadow_page.mfn);
ppgtt_set_shadow_entry(spt, se, index);
return 0;
+err:
+ /* Cancel the existing addess mappings of DMA addr. */
+ for_each_present_shadow_entry(sub_spt, &sub_se, sub_index) {
+ gvt_vdbg_mm("invalidate 4K entry\n");
+ ppgtt_invalidate_pte(sub_spt, &sub_se);
+ }
+ /* Release the new allocated spt. */
+ trace_spt_change(sub_spt->vgpu->id, "release", sub_spt,
+ sub_spt->guest_page.gfn, sub_spt->shadow_page.type);
+ ppgtt_free_spt(sub_spt);
+ return ret;
}
static int split_64KB_gtt_entry(struct intel_vgpu *vgpu,
--
2.39.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d125d1349abe ("alarmtimer: Prevent starvation by small intervals and SIG_IGN")
41b3b8dffc1f ("alarmtimer: Rename gettime() callback to get_ktime()")
5f936e19cc0e ("alarmtimer: Prevent overflow for relative nanosleep")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d125d1349abeb46945dc5e98f7824bf688266f13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 9 Feb 2023 23:25:49 +0100
Subject: [PATCH] alarmtimer: Prevent starvation by small intervals and SIG_IGN
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b(a)syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828b9d7e..7e5dff602585 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -470,11 +470,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
}
EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
+ ktime_t now = base->get_ktime();
+
+ if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
+ /*
+ * Same issue as with posix_timer_fn(). Timers which are
+ * periodic but the signal is ignored can starve the system
+ * with a very small interval. The real fix which was
+ * promised in the context of posix_timer_fn() never
+ * materialized, but someone should really work on it.
+ *
+ * To prevent DOS fake @now to be 1 jiffie out which keeps
+ * the overrun accounting correct but creates an
+ * inconsistency vs. timer_gettime(2).
+ */
+ ktime_t kj = NSEC_PER_SEC / HZ;
+
+ if (interval < kj)
+ now = ktime_add(now, kj);
+ }
+
+ return alarm_forward(alarm, now, interval);
+}
- return alarm_forward(alarm, base->get_ktime(), interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+ return __alarm_forward_now(alarm, interval, false);
}
EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -551,9 +575,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (posix_timer_event(ptr, si_private) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
- * away once we handle ignored signals proper.
+ * away once we handle ignored signals proper. Ensure that
+ * small intervals cannot starve the system.
*/
- ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval);
+ ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
Hi Andrew,
Thank you for revising this patch!
> From: "andrew.yang" <andrew.yang(a)mediatek.com>
>
> damon_get_folio() would always increase folio _refcount and
> folio_isolate_lru() would increase folio _refcount if the folio's lru
> flag is set.
>
> If an unevictable folio isolated successfully, there will be two more
> _refcount. The one from folio_isolate_lru() will be decreased in
> folio_puback_lru(), but the other one from damon_get_folio() will be
> left behind. This causes a pin page.
>
> Whatever the case, the _refcount from damon_get_folio() should be
> decreased.
>
> Signed-off-by: andrew.yang <andrew.yang(a)mediatek.com>
> Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme")
> Cc: <stable(a)vger.kernel.org> # 5.16.x
Reviewed-by: SeongJae Park <sj(a)kernel.org>
This may not cleanly applicable on 6.1.y and 6.2.y. I would post backports
once this is merged in the mainline, unless others do earlier than me.
Thanks,
SJ
> ---
> v3:
> add fixes tag and cc stable
> v2:
> according to David's suggestion
> 1. revise subject
> according to SeongJae's suggestions
> 1. rebase to mm-unstable tree
> 2. remove braces for th single statements
> ---
> mm/damon/paddr.c | 7 +++----
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 607bb69e526c..6c655d9b5639 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -250,12 +250,11 @@ static unsigned long damon_pa_pageout(struct damon_region *r, struct damos *s)
> folio_put(folio);
> continue;
> }
> - if (folio_test_unevictable(folio)) {
> + if (folio_test_unevictable(folio))
> folio_putback_lru(folio);
> - } else {
> + else
> list_add(&folio->lru, &folio_list);
> - folio_put(folio);
> - }
> + folio_put(folio);
> }
> applied = reclaim_pages(&folio_list);
> cond_resched();
> --
> 2.18.0
>
>
During page migration, the copy_highpage function is used to copy the
page data to the target page. If the source page is a userspace page
with MTE tags, the KASAN tag of the target page must have the match-all
tag in order to avoid tag check faults during subsequent accesses to the
page by the kernel. However, the target page may have been allocated in
a number of ways, some of which will use the KASAN allocator and will
therefore end up setting the KASAN tag to a non-match-all tag. Therefore,
update the target page's KASAN tag to match the source page.
We ended up unintentionally fixing this issue as a result of a bad
merge conflict resolution between commit e059853d14ca ("arm64: mte:
Fix/clarify the PG_mte_tagged semantics") and commit 20794545c146 ("arm64:
kasan: Revert "arm64: mte: reset the page tag in page->flags""), which
preserved a tag reset for PG_mte_tagged pages which was considered to be
unnecessary at the time. Because SW tags KASAN uses separate tag storage,
update the code to only reset the tags when HW tags KASAN is enabled.
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Link: https://linux-review.googlesource.com/id/If303d8a709438d3ff5af5fd8570650583…
Reported-by: "Kuan-Ying Lee (李冠穎)" <Kuan-Ying.Lee(a)mediatek.com>
Cc: <stable(a)vger.kernel.org> # 6.1
Fixes: 20794545c146 ("arm64: kasan: Revert "arm64: mte: reset the page tag in page->flags"")
---
v2:
- added Fixes tag
For the stable branch, e059853d14ca needs to be cherry-picked and the following
merge conflict resolution is needed:
- page_kasan_tag_reset(to);
+ if (kasan_hw_tags_enabled())
+ page_kasan_tag_reset(to);
- /* It's a new page, shouldn't have been tagged yet */
- WARN_ON_ONCE(!try_page_mte_tagging(to));
arch/arm64/mm/copypage.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/mm/copypage.c b/arch/arm64/mm/copypage.c
index 8dd5a8fe64b4..4aadcfb01754 100644
--- a/arch/arm64/mm/copypage.c
+++ b/arch/arm64/mm/copypage.c
@@ -22,7 +22,8 @@ void copy_highpage(struct page *to, struct page *from)
copy_page(kto, kfrom);
if (system_supports_mte() && page_mte_tagged(from)) {
- page_kasan_tag_reset(to);
+ if (kasan_hw_tags_enabled())
+ page_kasan_tag_reset(to);
/* It's a new page, shouldn't have been tagged yet */
WARN_ON_ONCE(!try_page_mte_tagging(to));
mte_copy_page_tags(kto, kfrom);
--
2.39.1.581.gbfd45094c4-goog
The following crash was seen on a 1k-block ext4 filesystem when the user
passed a fmr_physical of value zero for the high key:
kernel BUG at fs/ext4/ext4.h:3354!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 422 Comm: syz-executor339 Not tainted 5.15.74-syzkaller-00001-g4ec71a9ec769 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:ext4_get_group_info fs/ext4/ext4.h:3354 [inline]
RIP: 0010:ext4_mb_load_buddy_gfp+0xe9d/0xeb0 fs/ext4/mballoc.c:1498
Code: 99 ed c1 ff e9 47 f4 ff ff e8 5f a0 7f ff 48 c7 c7 c0 c3 a9 86 48 89 de 4c 89 ea e8 ad 8e 92 00 e9 a1 f2 ff ff e8 43 a0 7f ff <0f> 0b e8 3c a0 7f ff 0f 0b e8 35 a0 7f ff 0f 0b 0f 1f 00 55 48 89
RSP: 0018:ffffc9000044f320 EFLAGS: 00010293
RAX: ffffffff81f1f14d RBX: 0000000000000001 RCX: ffff888106194f00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffffc9000044f3b0 R08: ffffffff81f1e3a4 R09: ffffc9000044f440
R10: fffff52000089e8f R11: 1ffff92000089e88 R12: ffff888100a553c8
R13: dffffc0000000000 R14: 0000000000000001 R15: ffff888105f9a000
FS: 000055555675b300(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c4f7cde3e8 CR3: 000000011c064000 CR4: 00000000003506b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ext4_mb_load_buddy fs/ext4/mballoc.c:1620 [inline]
ext4_mballoc_query_range+0xb8/0x7b0 fs/ext4/mballoc.c:6560
ext4_getfsmap_datadev+0x1c8a/0x28c0 fs/ext4/fsmap.c:537
ext4_getfsmap+0xcff/0x1040 fs/ext4/fsmap.c:708
ext4_ioc_getfsmap fs/ext4/ioctl.c:660 [inline]
__ext4_ioctl fs/ext4/ioctl.c:862 [inline]
ext4_ioctl+0x3020/0x50e0 fs/ext4/ioctl.c:1276
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl+0x115/0x190 fs/ioctl.c:860
__x64_sys_ioctl+0x7b/0x90 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f8d2372e3e9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcae7c2578 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f8d2372e3e9
RDX: 0000000020000200 RSI: 00000000c0c0583b RDI: 0000000000000004
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 00000000000003f1 R11: 0000000000000246 R12: 00007f8d236ed5c0
R13: 00007ffcae7c25a0 R14: 00007ffcae7c258c R15: 00007ffcae7c2590
The crash is due to insufficient key validation.
When high_key->fmr_physical is zero for a 1k-block ext4 filesystem we
encounter an underflow in the second call to
ext4_get_group_no_and_offset():
blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
blocknr comes with value zero (high_key->fmr_physical is zero) and
the first_data_block is one, thus blocknr becomes -1. Since blocknr is
of type ext4_fsblk_t (unsigned long long) the value is all ones. Due to
the high values for blocknr and offset we hit the BUG_ON in
ext4_get_group_info() as the groupnr exceeds EXT4_SB(sb)->s_groups_count).
Fix it by returning early when keys[1].fmr_physical is less than or equal
to the first data block, as there's no data to return. This replicates the
behavior of returning zero, like in keys[0].fmr_physical >= eofs. In both
cases -EINVAL could be returned but for some reason the author decided to
just ignore out of bounds requests.
One may notice that the check introduced addresses just the
ext4_getfsmap_datadev() handler. This is intentional, because for the
ext4_getfsmap_logdev() handler, the value of the high key passed by the
user is ignored and instead the code uses:
memset(&info->gfi_high, 0xFF, sizeof(info->gfi_high));
Cc: stable(a)vger.kernel.org
Fixes: 0c9ec4beecac ("ext4: support GETFSMAP ioctls")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c300…
Reported-by: syzbot+6be2b977c89f79b6b153(a)syzkaller.appspotmail.com
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
---
fs/ext4/fsmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
index 4493ef0c715e..b5289378a761 100644
--- a/fs/ext4/fsmap.c
+++ b/fs/ext4/fsmap.c
@@ -479,6 +479,8 @@ static int ext4_getfsmap_datadev(struct super_block *sb,
int error = 0;
bofs = le32_to_cpu(sbi->s_es->s_first_data_block);
+ if (keys[1].fmr_physical <= bofs)
+ return 0;
eofs = ext4_blocks_count(sbi->s_es);
if (keys[0].fmr_physical >= eofs)
return 0;
--
2.39.2.637.g21b0678d19-goog
From: Matthias Maennich <maennich(a)google.com>
Can we please pick this series up for 5.15? I am particularly interested
in the last patch to enable BTF + DWARF5, but the cleanup patches before
are a very reasonable choice for stable@ as well as they simplify the
pahole version calculation and allow future BTF/pahole related patches
to apply cleanly as well. I intentionally kept the config
PAHOLE_HAS_BTF_TAG and hence its patch complete, even though there is no
user for it.
Cheers,
Matthias
Cc: <stable(a)vger.kernel.org> # v5.15+
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Matthias Maennich <maennich(a)google.com>
Nathan Chancellor (5):
MAINTAINERS: Add scripts/pahole-flags.sh to BPF section
kbuild: Add CONFIG_PAHOLE_VERSION
scripts/pahole-flags.sh: Use pahole-version.sh
lib/Kconfig.debug: Use CONFIG_PAHOLE_VERSION
lib/Kconfig.debug: Allow BTF + DWARF5 with pahole 1.21+
MAINTAINERS | 2 ++
init/Kconfig | 4 ++++
lib/Kconfig.debug | 12 ++++++++++--
scripts/pahole-flags.sh | 2 +-
scripts/pahole-version.sh | 13 +++++++++++++
5 files changed, 30 insertions(+), 3 deletions(-)
create mode 100755 scripts/pahole-version.sh
--
2.39.2.637.g21b0678d19-goog
From: Nathan Chancellor <nathan(a)kernel.org>
Commit 98cd6f521f10 ("Kconfig: allow explicit opt in to DWARF v5")
prevented CONFIG_DEBUG_INFO_DWARF5 from being selected when
CONFIG_DEBUG_INFO_BTF is enabled because pahole had issues with clang's
DWARF5 info. This was resolved by [1], which is in pahole v1.21.
Allow DEBUG_INFO_DWARF5 to be selected with DEBUG_INFO_BTF when using
pahole v1.21 or newer.
[1]: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=7d8e829f6…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-6-nathan@kernel.org
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
lib/Kconfig.debug | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0743c9567d7e..dd86de130cdf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -302,7 +302,7 @@ config DEBUG_INFO_DWARF4
config DEBUG_INFO_DWARF5
bool "Generate DWARF Version 5 debuginfo"
depends on !CC_IS_CLANG || AS_IS_LLVM || (AS_IS_GNU && AS_VERSION >= 23502 && AS_HAS_NON_CONST_LEB128)
- depends on !DEBUG_INFO_BTF
+ depends on !DEBUG_INFO_BTF || PAHOLE_VERSION >= 121
help
Generate DWARF v5 debug info. Requires binutils 2.35.2, gcc 5.0+ (gcc
5.0+ accepts the -gdwarf-5 flag but only had partial support for some
--
2.39.2.637.g21b0678d19-goog
From: Nathan Chancellor <nathan(a)kernel.org>
There are a few different places where pahole's version is turned into a
three digit form with the exact same command. Move this command into
scripts/pahole-version.sh to reduce the amount of duplication across the
tree.
Create CONFIG_PAHOLE_VERSION so the version code can be used in Kconfig
to enable and disable configuration options based on the pahole version,
which is already done in a couple of places.
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-3-nathan@kernel.org
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
MAINTAINERS | 1 +
init/Kconfig | 4 ++++
scripts/pahole-version.sh | 13 +++++++++++++
3 files changed, 18 insertions(+)
create mode 100755 scripts/pahole-version.sh
diff --git a/MAINTAINERS b/MAINTAINERS
index 176485e625a0..ee8a1b5c28a6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3408,6 +3408,7 @@ F: net/sched/cls_bpf.c
F: samples/bpf/
F: scripts/bpf_doc.py
F: scripts/pahole-flags.sh
+F: scripts/pahole-version.sh
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
diff --git a/init/Kconfig b/init/Kconfig
index a4144393717b..dafc3ba6fa7a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -91,6 +91,10 @@ config CC_HAS_ASM_INLINE
config CC_HAS_NO_PROFILE_FN_ATTR
def_bool $(success,echo '__attribute__((no_profile_instrument_function)) int x();' | $(CC) -x c - -c -o /dev/null -Werror)
+config PAHOLE_VERSION
+ int
+ default $(shell,$(srctree)/scripts/pahole-version.sh $(PAHOLE))
+
config CONSTRUCTORS
bool
diff --git a/scripts/pahole-version.sh b/scripts/pahole-version.sh
new file mode 100755
index 000000000000..f8a32ab93ad1
--- /dev/null
+++ b/scripts/pahole-version.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Usage: $ ./pahole-version.sh pahole
+#
+# Prints pahole's version in a 3-digit form, such as 119 for v1.19.
+
+if [ ! -x "$(command -v "$@")" ]; then
+ echo 0
+ exit 1
+fi
+
+"$@" --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'
--
2.39.2.637.g21b0678d19-goog
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam serdecznie,
Wiktor Nurek
This is the start of the stable review cycle for the 5.10.169 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.169-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.169-rc1
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix return value
Dan Carpenter <error27(a)gmail.com>
net: sched: sch: Fix off by one in htb_activate_prios()
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
Thomas Gleixner <tglx(a)linutronix.de>
alarmtimer: Prevent starvation by small intervals and SIG_IGN
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
kvm: initialize all of the kvm_debugregs structure before sending it to userspace
Pedro Tammela <pctammela(a)mojatatu.com>
net/sched: tcindex: search key must be 16 bits
Natalia Petrova <n.petrova(a)fintech.ru>
i40e: Add checking for null for nlmsg_find_attr()
Pedro Tammela <pctammela(a)mojatatu.com>
net/sched: act_ctinfo: use percpu stats
Baowen Zheng <baowen.zheng(a)corigine.com>
flow_offload: fill flags to action structure
Matt Roper <matthew.d.roper(a)intel.com>
drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list
Raviteja Goud Talla <ravitejax.goud.talla(a)intel.com>
drm/i915/gen11: Moving WAs to icl_gt_workarounds_init()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix underflow in second superblock position calculations
Guillaume Nault <gnault(a)redhat.com>
ipv6: Fix tcp socket connection with DSCP.
Guillaume Nault <gnault(a)redhat.com>
ipv6: Fix datagram socket connection with DSCP.
Jason Xing <kernelxing(a)tencent.com>
ixgbe: add double of VLAN header when computing the max MTU
Jakub Kicinski <kuba(a)kernel.org>
net: mpls: fix stale pointer if allocation fails during device rename
Cristian Ciocaltea <cristian.ciocaltea(a)collabora.com>
net: stmmac: Restrict warning on disabling DMA store and fwd mode
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix mqprio and XDP ring checking logic
Johannes Zink <j.zink(a)pengutronix.de>
net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
Hangyu Hua <hbh25y(a)gmail.com>
net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()
Miko Larsson <mikoxyzzz(a)gmail.com>
net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
Kuniyuki Iwashima <kuniyu(a)amazon.com>
dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.
Pedro Tammela <pctammela(a)mojatatu.com>
net/sched: tcindex: update imperfect hash filters respecting rcu
Pietro Borrello <borrello(a)diag.uniroma1.it>
sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list
Rafał Miłecki <rafal(a)milecki.pl>
net: bgmac: fix BCM5358 support by setting correct flags
Jason Xing <kernelxing(a)tencent.com>
i40e: add double of VLAN header when computing the max MTU
Jason Xing <kernelxing(a)tencent.com>
ixgbe: allow to increase MTU to 3K with XDP enabled
Andrew Morton <akpm(a)linux-foundation.org>
revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"
Felix Riemann <felix.riemann(a)sma.de>
net: Fix unwanted sign extension in netdev_stats_to_stats64()
Aaron Thompson <dev(a)aaront.org>
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
Mike Kravetz <mike.kravetz(a)oracle.com>
hugetlb: check for undefined shift on 32 bit architectures
Munehisa Kamata <kamatam(a)amazon.com>
sched/psi: Fix use-after-free in ep_remove_wait_queue()
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - fixed wrong gpio assigned
Bo Liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: add a new hda codec SN6180
Yang Yingliang <yangyingliang(a)huawei.com>
mmc: mmc_spi: fix error handling in mmc_spi_probe()
Yang Yingliang <yangyingliang(a)huawei.com>
mmc: sdio: fix possible resource leaks in some error paths
Paul Cercueil <paul(a)crapouillou.net>
mmc: jz4740: Work around bug on JZ4760(B)
Florian Westphal <fw(a)strlen.de>
netfilter: nft_tproxy: restrict to prerouting hook
Amir Goldstein <amir73il(a)gmail.com>
ovl: remove privs in ovl_fallocate()
Amir Goldstein <amir73il(a)gmail.com>
ovl: remove privs in ovl_copyfile()
Sumanth Korikkar <sumanthk(a)linux.ibm.com>
s390/signal: fix endless loop in do_signal
Seth Jenkins <sethjenkins(a)google.com>
aio: fix mremap after fork null-deref
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix registration vs use race
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix cleanup after dev_set_name()
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: remove nvmem_config wp_gpio
Gaosheng Cui <cuigaosheng1(a)huawei.com>
nvmem: core: add error handling for dev_set_name
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match
Amit Engel <Amit.Engel(a)dell.com>
nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
Vasily Gorbik <gor(a)linux.ibm.com>
s390/decompressor: specify __decompress() buf len to avoid overflow
Kees Cook <keescook(a)chromium.org>
net: sched: sch: Bounds check priority
Andrey Konovalov <andrey.konovalov(a)linaro.org>
net: stmmac: do not stop RX_CLK in Rx LPI state for qcs404 SoC
Hyunwoo Kim <v4bel(a)theori.io>
net/rose: Fix to not accept on connected socket
Shunsuke Mie <mie(a)igel.co.jp>
tools/virtio: fix the vringh test for virtio ring changes
Arnd Bergmann <arnd(a)arndb.de>
ASoC: cs42l56: fix DT probe
Cezary Rojewski <cezary.rojewski(a)intel.com>
ALSA: hda: Do not unset preset when cleaning up codec
Eduard Zingerman <eddyz87(a)gmail.com>
selftests/bpf: Verify copy_register_state() preserves parent/live fields
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: Intel: sof_rt5682: always set dpcm_capture for amplifiers
-------------
Diffstat:
Makefile | 4 +-
arch/s390/boot/compressed/decompressor.c | 2 +-
arch/s390/kernel/signal.c | 2 +-
arch/x86/kvm/x86.c | 3 +-
drivers/gpu/drm/i915/gt/intel_workarounds.c | 32 ++++++++--------
drivers/mmc/core/sdio_bus.c | 17 +++++++--
drivers/mmc/core/sdio_cis.c | 12 ------
drivers/mmc/host/jz4740_mmc.c | 10 +++++
drivers/mmc/host/mmc_spi.c | 8 ++--
drivers/net/ethernet/broadcom/bgmac-bcma.c | 6 +--
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 ++++++++------
.../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 2 +
drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 3 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +-
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +-
drivers/net/usb/kalmia.c | 8 ++--
drivers/nvme/target/fc.c | 4 +-
drivers/nvmem/core.c | 43 +++++++++++-----------
drivers/platform/x86/touchscreen_dmi.c | 9 +++++
fs/aio.c | 4 ++
fs/nilfs2/ioctl.c | 7 ++++
fs/nilfs2/super.c | 9 +++++
fs/nilfs2/the_nilfs.c | 8 +++-
fs/overlayfs/file.c | 28 ++++++++++++--
fs/squashfs/xattr_id.c | 2 +-
include/linux/hugetlb.h | 5 ++-
include/linux/nvmem-provider.h | 2 -
include/linux/stmmac.h | 1 +
include/net/sock.h | 13 +++++++
kernel/sched/psi.c | 7 ++--
kernel/time/alarmtimer.c | 33 +++++++++++++++--
mm/memblock.c | 8 +---
net/core/dev.c | 2 +-
net/dccp/ipv6.c | 7 +---
net/ipv6/datagram.c | 2 +-
net/ipv6/tcp_ipv6.c | 11 ++----
net/mpls/af_mpls.c | 4 ++
net/netfilter/nft_tproxy.c | 8 ++++
net/openvswitch/meter.c | 4 +-
net/rose/af_rose.c | 8 ++++
net/sched/act_bpf.c | 2 +-
net/sched/act_connmark.c | 2 +-
net/sched/act_ctinfo.c | 6 +--
net/sched/act_gate.c | 2 +-
net/sched/act_ife.c | 2 +-
net/sched/act_ipt.c | 2 +-
net/sched/act_mpls.c | 2 +-
net/sched/act_nat.c | 2 +-
net/sched/act_pedit.c | 2 +-
net/sched/act_police.c | 2 +-
net/sched/act_sample.c | 2 +-
net/sched/act_simple.c | 2 +-
net/sched/act_skbedit.c | 2 +-
net/sched/act_skbmod.c | 2 +-
net/sched/cls_tcindex.c | 34 +++++++++++++++--
net/sched/sch_htb.c | 5 ++-
net/sctp/diag.c | 4 +-
sound/pci/hda/hda_bind.c | 2 +
sound/pci/hda/hda_codec.c | 1 -
sound/pci/hda/patch_conexant.c | 1 +
sound/pci/hda/patch_realtek.c | 2 +-
sound/soc/codecs/cs42l56.c | 6 ---
sound/soc/intel/boards/sof_rt5682.c | 3 ++
sound/soc/sof/intel/hda-dai.c | 8 ++--
.../selftests/bpf/verifier/search_pruning.c | 36 ++++++++++++++++++
tools/virtio/linux/bug.h | 8 ++--
tools/virtio/linux/build_bug.h | 7 ++++
tools/virtio/linux/cpumask.h | 7 ++++
tools/virtio/linux/gfp.h | 7 ++++
tools/virtio/linux/kernel.h | 1 +
tools/virtio/linux/kmsan.h | 12 ++++++
tools/virtio/linux/scatterlist.h | 1 +
tools/virtio/linux/topology.h | 7 ++++
76 files changed, 404 insertions(+), 165 deletions(-)
Hi Andrew,
On Tue, 21 Feb 2023 17:03:13 +0800 Andrew Yang <andrew.yang(a)mediatek.com> wrote:
> From: "andrew.yang" <andrew.yang(a)mediatek.com>
>
> damon_get_page() would always increase page _refcount and
> isolate_lru_page() would increase page _refcount if the page's lru
> flag is set.
>
> If a unevictable page isolated successfully, there will be two more
> _refcount. The one from isolate_lru_page() will be decreased in
> putback_lru_page(), but the other one from damon_get_page() will be
> left behind. This causes a pin page.
>
> Whatever the case, the _refcount from damon_get_page() should be
> decreased.
Thank you for finding this issue! I think the David suggested subject[1] is
better, though.
I think we could add below Fixes: and Cc: tags?
Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme")
Cc: <stable(a)vger.kernel.org> # 5.16.x
>
> Signed-off-by: andrew.yang <andrew.yang(a)mediatek.com>
> ---
> mm/damon/paddr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index e1a4315c4be6..56d8abd08fb1 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -223,8 +223,8 @@ static unsigned long damon_pa_pageout(struct damon_region *r)
> putback_lru_page(page);
> } else {
> list_add(&page->lru, &page_list);
> - put_page(page);
> }
> + put_page(page);
Seems your patch is not based on mm-unstable tree[2]. Could you please rebase
on it?
Also, let's remove the braces for the single statements[3].
[1] https://lore.kernel.org/damon/1b3e8e88-ed5c-7302-553f-4ddb3400d466@redhat.c…
[2] https://docs.kernel.org/next/mm/damon/maintainer-profile.html#scm-trees
[3] https://docs.kernel.org/process/coding-style.html?highlight=coding+style#pl…
Thanks,
SJ
> }
> applied = reclaim_pages(&page_list);
> cond_resched();
> --
> 2.18.0
在 2023/2/22 4:26, Seth Jenkins 写道:
> > So you are letting an opaque US government agency, and random third
> > party companies, dictate your company's internal engineering policies
> > and resource allocations? That feels very very odd and ripe for abuse.
> I wholeheartedly agree with gregkh that the CVE system is flawed in
> numerous ways and generates many false positives, however in this case,
> this issue is a real issue with real security impact that received a
> real patch upstream. The idea of backporting this to earlier kernels
> probably warrants discussion if someone is willing to bite off that
> work. But this gets into the fuzzy "backporting exploit mitigations to
> stable" conversation.
>
> > And are you sure this can really happen? Have you proven this?
> Yes this can really, provably, happen:
> https://googleprojectzero.blogspot.com/2022/12/exploiting-CVE-2022-42703-br…
> <https://googleprojectzero.blogspot.com/2022/12/exploiting-CVE-2022-42703-br…>
>
> > And why is this really an issue, KAS[L]R is a known-weak-defense and
> almost
> > useless against local attacks.
> Granted, Peter Z's fix does help to mitigate the impact from remote
> attackers but yes this fix does not resolve the security impact from
> local exploits.
Thank you seth.
Yes, for this specific CVE issue, KASLR is indeed a mitigation measure.
>
>
> On Tue, Feb 21, 2023 at 7:30 AM Tong Tiangen <tongtiangen(a)huawei.com
> <mailto:tongtiangen@huawei.com>> wrote:
>
>
>
> 在 2023/2/21 18:40, Greg KH 写道:
> > On Tue, Feb 21, 2023 at 05:19:42PM +0800, Tong Tiangen wrote:
> >>
> >>
> >> 在 2023/2/21 16:40, Greg KH 写道:
> >>> On Tue, Feb 21, 2023 at 03:46:27PM +0800, Tong Tiangen wrote:
> >>>>
> >>>>
> >>>> 在 2023/2/21 15:30, Greg KH 写道:
> >>>>> On Tue, Feb 21, 2023 at 03:19:05PM +0800, Tong Tiangen wrote:
> >>>>>> Hi peter:
> >>>>>>
> >>>>>> Do you have any plans to backport this patch[1] to the
> stable branch of the
> >>>>>> lower version, such as 4.19.y ?
> >>>>>
> >>>>> Why? That is a new feature for 6.2 why would it be needed to fix
> >>>>> anything in really old kernels?
> >>>>
> >>>> Hi Greg:
> >>>>
> >>>> This patch fix CVE-2023-0597[1],
> >>>
> >>> The kernel developers do not care about CVEs as they are almost
> always
> >>> invalid and do not mean anything,
> >>
> >> Ok, thanks.
> >>
> >>
> >>> sorry. It is well known that, companies like Red Hat use them
> to make
> >>> up for broken internal engineering policies.
> >>
> >> Yeah, For company's internal engineering policies, the CVE with
> certain
> >> impact must be repaired.
> >
> > So you are letting an opaque US government agency, and random third
> > party companies, dictate your company's internal engineering policies
> > and resource allocations? That feels very very odd and ripe for
> abuse.
> >
> > Also note that MITRE refuses to allocate CVEs for many real kernel
> > issues for unknown reasons, (i.e. they reject all of my requests), so
> > you are getting only a small subset of real issues here.
> >
> > Also, how do you handle revocation of CVEs that are obviously invalid
> > and/or don't actually do anything (like this one?)
> >
> >>> Are you sure this really is a valid problem that must be fixed
> in older
> >>> kernels?
> >>>
> >>>> this CVE report a flaw possibility of memory leak. And this is
> >>>> important for some products using this stable version.
> >>>
> >>> What exact memory leak are you referring to?
> >>
> >> Sorry for Inaccurate description, the memory leak means: a potential
> >> security risk of kernel memory information disclosure caused by no
> >> randomization of the exception stacks.
> >
> > And are you sure this can really happen? Have you proven this?
> >
> > And why is this really an issue, KASR is a known-week-defense and
> almost
> > useless against local attacks.
> >
> > Anyway, please provide working patches if you think this really is an
> > issue.
> >
> > And please revisit your company's policies, they do not seem very
> sane :)
>
> Hi Greg:
>
> Thanks for these very useful suggestions and we will revisit our
> policies :)
>
> Thanks,
> Tong
> .
>
> >
> > thanks,
> >
> > greg k-h
> > .
>
Bom dia stable(a)vger.kernel.org,
Sou o advogado Mustafa Ayvaz, advogado pessoal do falecido Sr.
Robert, que perdeu a vida devido ao coronavírus, contraído
durante sua viagem de negócios à China. Entrei em contato com
você para trabalhar comigo na transferência de um fundo:
$4,420,000.00 (quatro milhões, quatrocentos e vinte mil dólares)
legado deixado por ele.
Procurei minuciosamente o parente mais próximo de meu cliente
falecido, mas falhei porque não tenho sua residência atual e
detalhes de contato. Enquanto pesquisava, encontrei seu perfil
com o mesmo sobrenome e na mesma localidade com o parente mais
próximo. Decidi entrar em contato com você e usá-lo como parente
genuíno.
Solicito seu consentimento para apresentá-lo como parente mais
próximo de meu falecido cliente, já que ambos têm o mesmo
sobrenome. Os fundos serão então transferidos para você como
beneficiário e compartilhados de acordo com um padrão/proporção
de compartilhamento proposto de 60:40, ou seja, 60% para mim e
40% para você. Por favor, entre em contato comigo imediatamente
para obter mais informações.
Cumprimentos
Mustafá Ayvaz
The riscv defconfig and tinyconfig builds failed with clang-nightly
due to below build warnings / errors on latest stable-rc 5.10.
Regression on riscv:
- build/clang-nightly-tinyconfig - FAILED
- build/clang-nightly-defconfig - FAILED
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
metadata:
-----
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
git_sha: 7d11e4c4fc56eb25c5b41da93748dbcf21956316
git_short_log: 7d11e4c4fc56 ("Linux 5.10.169-rc1")
git_describe: v5.10.168-58-g7d11e4c4fc56
Build log:
----
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/build ARCH=riscv
CROSS_COMPILE=riscv64-linux-gnu- HOSTCC=clang CC=clang LLVM=1
LLVM_IAS=1 LD=riscv64-linux-gnu-ld
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/version.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/do_mounts.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/noinitramfs.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/calibrate.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
Build details,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10…
Build history,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10…https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10…
steps to reproduce:
-----------
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.
# Original tuxmake command with fragments listed below.
# tuxmake --runtime podman --target-arch riscv --toolchain
clang-nightly --kconfig tinyconfig LLVM=1 LLVM_IAS=1
LD=riscv64-linux-gnu-ld
--
Linaro LKFT
https://lkft.linaro.org
Beste gebruiker,
Hallo mijn gelukkige vriend, ik ben Charles W. Jackson Jr., de
megawinnaar van $ 344,6 miljoen Mega Millions Jackpot, ik doneer aan 5
willekeurige individuen, als je deze e-mail ontvangt, is je e-mail
geselecteerd na een draaibal. Ik heb het grootste deel van mijn vermogen
verdeeld over een aantal goede doelen en organisaties. Ik heb vrijwillig
besloten om de som van $ 3 miljoen USD aan jou of je organisatie te
doneren als een van de geselecteerde 5, je kunt mijn winst verifiëren
via de onderstaande YouTube-pagina.
BEKIJK MIJ HIER: https://www.youtube.com/watch?v=0MUR8QEIMQI
Deze donatie van $ 3 miljoen USD is gedaan om u in staat te stellen uw
persoonlijke problemen te versterken en genereus de hand te reiken aan
de minst bevoorrechte, verweesde en liefdadigheidsorganisaties in uw
gemeenschap.
Wat voor mij het belangrijkste is, is dat u de gedoneerde fondsen op de
beste manier toewijst die u, uw familie, vrienden ten goede komt en om
de behoeftigen in uw directe gemeenschap te helpen.
DIT IS JE DONATIECODE: DON207152
Stuur uw DONATIECODE zo snel mogelijk naar onderstaand e-mailadres,
zodat we de donatieprocedure snel kunnen afronden.
E-mail contact: charlesjacksonj1(a)hotmail.com
charlesjackson06(a)bahnhof.se
Ik hoop u en uw gezin dit jaar gelukkig te maken, ik wens u het beste en
gefeliciteerd.
Groeten,
Charles W.Jackson Jr
Summary:
--------
CXL RAM support allows for the dynamic provisioning of new CXL RAM
regions, and more routinely, assembling a region from an existing
configuration established by platform-firmware. The latter is motivated
by CXL memory RAS (Reliability, Availability and Serviceability)
support, that requires associating device events with System Physical
Address ranges and vice versa.
The 'Soft Reserved' policy rework arranges for performance
differentiated memory like CXL attached DRAM, or high-bandwidth memory,
to be designated for 'System RAM' by default, rather than the device-dax
dedicated access mode. That current device-dax default is confusing and
surprising for the Pareto of users that do not expect memory to be
quarantined for dedicated access by default. Most users expect all
'System RAM'-capable memory to show up in FREE(1).
Details:
--------
Recall that the Linux 'Soft Reserved' designation for memory is a
reaction to platform-firmware, like EFI EDK2, delineating memory with
the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An
alternative way to think of that attribute is that it specifies the
*not* general-purpose memory pool. It is memory that may be too precious
for general usage or not performant enough for some hot data structures.
However, in the absence of explicit policy it should just be 'System
RAM' by default.
Rather than require every distribution to ship a udev policy to assign
dax devices to dax_kmem (the device-memory hotplug driver) just make
that the kernel default. This is similar to the rationale in:
commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE")
With this change the relatively niche use case of accessing this memory
via mapping a device-dax instance can be achieved by building with
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying
memhp_default_state=offline at boot, and then use:
daxctl reconfigure-device $device -m devdax --force
...to shift the corresponding address range to device-dax access.
The process of assembling a device-dax instance for a given CXL region
device configuration is similar to the process of assembling a
Device-Mapper or MDRAID storage-device array. Specifically, asynchronous
probing by the PCI and driver core enumerates all CXL endpoints and
their decoders. Then, once enough decoders have arrived to a describe a
given region, that region is passed to the device-dax subsystem where it
is subject to the above 'dax_kmem' policy. This assignment and policy
choice is only possible if memory is set aside by the 'Soft Reserved'
designation. Otherwise, CXL that is mapped as 'System RAM' becomes
immutable by CXL driver mechanisms, but is still enumerated for RAS
purposes.
This series is also available via:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/…
...and has gone through some preview testing in various forms.
---
Dan Williams (18):
cxl/Documentation: Update references to attributes added in v6.0
cxl/region: Add a mode attribute for regions
cxl/region: Support empty uuids for non-pmem regions
cxl/region: Validate region mode vs decoder mode
cxl/region: Add volatile region creation support
cxl/region: Refactor attach_target() for autodiscovery
cxl/region: Move region-position validation to a helper
kernel/range: Uplevel the cxl subsystem's range_contains() helper
cxl/region: Enable CONFIG_CXL_REGION to be toggled
cxl/region: Fix passthrough-decoder detection
cxl/region: Add region autodiscovery
tools/testing/cxl: Define a fixed volatile configuration to parse
dax/hmem: Move HMAT and Soft reservation probe initcall level
dax/hmem: Drop unnecessary dax_hmem_remove()
dax/hmem: Convey the dax range via memregion_info()
dax/hmem: Move hmem device registration to dax_hmem.ko
dax: Assign RAM regions to memory-hotplug by default
cxl/dax: Create dax devices for CXL RAM regions
Documentation/ABI/testing/sysfs-bus-cxl | 64 +-
MAINTAINERS | 1
drivers/acpi/numa/hmat.c | 4
drivers/cxl/Kconfig | 12
drivers/cxl/acpi.c | 3
drivers/cxl/core/core.h | 7
drivers/cxl/core/hdm.c | 8
drivers/cxl/core/pci.c | 5
drivers/cxl/core/port.c | 34 +
drivers/cxl/core/region.c | 848 ++++++++++++++++++++++++++++---
drivers/cxl/cxl.h | 46 ++
drivers/cxl/cxlmem.h | 3
drivers/cxl/port.c | 26 +
drivers/dax/Kconfig | 17 +
drivers/dax/Makefile | 2
drivers/dax/bus.c | 53 +-
drivers/dax/bus.h | 12
drivers/dax/cxl.c | 53 ++
drivers/dax/device.c | 3
drivers/dax/hmem/Makefile | 3
drivers/dax/hmem/device.c | 102 ++--
drivers/dax/hmem/hmem.c | 148 +++++
drivers/dax/kmem.c | 1
include/linux/dax.h | 7
include/linux/memregion.h | 2
include/linux/range.h | 5
lib/stackinit_kunit.c | 6
tools/testing/cxl/test/cxl.c | 146 +++++
28 files changed, 1355 insertions(+), 266 deletions(-)
create mode 100644 drivers/dax/cxl.c
base-commit: 172738bbccdb4ef76bdd72fc72a315c741c39161
A recent commit moved enabling of runtime PM from adreno_gpu_init() to
adreno_load_gpu() (called on first open()), which means that unbind()
may now be called with runtime PM disabled in case the device was never
opened in between.
Make sure to only forcibly suspend and disable runtime PM at unbind() in
case runtime PM has been enabled to prevent a disable count imbalance.
This specifically avoids leaving runtime PM disabled when the device
is later opened after a successful bind:
msm_dpu ae01000.display-controller: [drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -13
Fixes: 4b18299b3365 ("drm/msm/adreno: Defer enabling runpm until hw_init()")
Reported-by: Bjorn Andersson <quic_bjorande(a)quicinc.com>
Link: https://lore.kernel.org/lkml/20230203181245.3523937-1-quic_bjorande@quicinc…
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/gpu/drm/msm/adreno/adreno_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 36f062c7582f..c5c4c93b3689 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -558,7 +558,8 @@ static void adreno_unbind(struct device *dev, struct device *master,
struct msm_drm_private *priv = dev_get_drvdata(master);
struct msm_gpu *gpu = dev_to_gpu(dev);
- WARN_ON_ONCE(adreno_system_suspend(dev));
+ if (pm_runtime_enabled(dev))
+ WARN_ON_ONCE(adreno_system_suspend(dev));
gpu->funcs->destroy(gpu);
priv->gpu_pdev = NULL;
--
2.39.2
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 0af2795f936f1ea1f9f1497447145dfcc7ed2823
Gitweb: https://git.kernel.org/tip/0af2795f936f1ea1f9f1497447145dfcc7ed2823
Author: Marc Zyngier <maz(a)kernel.org>
AuthorDate: Mon, 20 Feb 2023 19:01:01
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 20 Feb 2023 22:29:54 +01:00
genirq/msi: Take the per-device MSI lock before validating the control structure
Calling msi_ctrl_valid() ultimately results in calling
msi_get_device_domain(), which requires holding the device MSI lock.
However, in msi_domain_populate_irqs() the lock is taken right after having
called msi_ctrl_valid(), which is just a tad too late.
Take the lock before invoking msi_ctrl_valid().
Fixes: 40742716f294 ("genirq/msi: Make msi_add_simple_msi_descs() device domain aware")
Reported-by: "Russell King (Oracle)" <linux(a)armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/Y/Opu6ETe3ZzZ/8E@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20230220190101.314446-1-maz@kernel.org
---
kernel/irq/msi.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 783a3e6..13d9649 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1084,10 +1084,13 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
struct xarray *xa;
int ret, virq;
- if (!msi_ctrl_valid(dev, &ctrl))
- return -EINVAL;
-
msi_lock_descs(dev);
+
+ if (!msi_ctrl_valid(dev, &ctrl)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
ret = msi_domain_add_simple_msi_descs(dev, &ctrl);
if (ret)
goto unlock;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c06ba7b892a5 ("nvme-apple: reset controller during shutdown")
285b6e9b5717 ("nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl")
e6d275de2e4a ("nvme: use nvme_wait_ready in nvme_shutdown_ctrl")
c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable")
9f27bd701d18 ("nvme: rename the queue quiescing helpers")
91c11d5f3254 ("nvme-rdma: stop auth work after tearing down queues in error recovery")
1f1a4f89562d ("nvme-tcp: stop auth work after tearing down queues in error recovery")
eac3ef262941 ("nvme-pci: split the initial probe from the rest path")
a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable")
3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl")
2e87570be9d2 ("nvme-pci: factor out a nvme_pci_alloc_dev helper")
081a7d958ce4 ("nvme-pci: factor the iod mempool creation into a helper")
94cc781f69f4 ("nvme: move OPAL setup from PCIe to core")
cd50f9b24726 ("nvme: split nvme_kill_queues")
6bcd5089ee13 ("nvme: don't unquiesce the admin queue in nvme_kill_queues")
0ffc7e98bfaa ("nvme-pci: refactor the tagset handling in nvme_reset_work")
71b26083d59c ("block: set the disk capacity to 0 in blk_mark_disk_dead")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c06ba7b892a50b48522ad441a40053f483dfee9e Mon Sep 17 00:00:00 2001
From: Janne Grunau <j(a)jannau.net>
Date: Tue, 17 Jan 2023 19:25:00 +0100
Subject: [PATCH] nvme-apple: reset controller during shutdown
This is a functional revert of c76b8308e4c9 ("nvme-apple: fix controller
shutdown in apple_nvme_disable").
The commit broke suspend/resume since apple_nvme_reset_work() tries to
disable the controller on resume. This does not work for the apple NVMe
controller since register access only works while the co-processor
firmware is running.
Disabling the NVMe controller in the shutdown path is also required
for shutting the co-processor down. The original code was appropriate
for this hardware. Add a comment to prevent a similar breaking changes
in the future.
Fixes: c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable")
Reported-by: Janne Grunau <j(a)jannau.net>
Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/
Signed-off-by: Janne Grunau <j(a)jannau.net>
[hch: updated with a more descriptive comment from Hector Martin]
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index bf1c60edb7f9..146c9e63ce77 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -829,7 +829,23 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
apple_nvme_remove_cq(anv);
}
- nvme_disable_ctrl(&anv->ctrl, shutdown);
+ /*
+ * Always disable the NVMe controller after shutdown.
+ * We need to do this to bring it back up later anyway, and we
+ * can't do it while the firmware is not running (e.g. in the
+ * resume reset path before RTKit is initialized), so for Apple
+ * controllers it makes sense to unconditionally do it here.
+ * Additionally, this sequence of events is reliable, while
+ * others (like disabling after bringing back the firmware on
+ * resume) seem to run into trouble under some circumstances.
+ *
+ * Both U-Boot and m1n1 also use this convention (i.e. an ANS
+ * NVMe controller is handed off with firmware shut down, in an
+ * NVMe disabled state, after a clean shutdown).
+ */
+ if (shutdown)
+ nvme_disable_ctrl(&anv->ctrl, shutdown);
+ nvme_disable_ctrl(&anv->ctrl, false);
}
WRITE_ONCE(anv->ioq.enabled, false);
Current btrfs_log_dev_io_error() increases the read error count even if the
erroneous IO is a WRITE request. This is because it forget to use "else
if", and all the error WRITE requests counts as READ error as there is (of
course) no REQ_RAHEAD bit set.
Fixes: c3a62baf21ad ("btrfs: use chained bios when cloning")
CC: stable(a)vger.kernel.org # 6.1
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
---
fs/btrfs/bio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index d8b90f95b157..726592868e9c 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -287,7 +287,7 @@ static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
if (btrfs_op(bio) == BTRFS_MAP_WRITE)
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
- if (!(bio->bi_opf & REQ_RAHEAD))
+ else if (!(bio->bi_opf & REQ_RAHEAD))
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS);
if (bio->bi_opf & REQ_PREFLUSH)
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_FLUSH_ERRS);
--
2.39.1
Calling msi_ctrl_valid() ultimately results in calling
msi_get_device_domain(), which requires holding the device MSI lock.
However, we take that lock right after having called msi_ctrl_valid(),
which is just a tad too late. Taking the lock earlier solves the issue.
Fixes: 40742716f294 ("genirq/msi: Make msi_add_simple_msi_descs() device domain aware")
Reported-by: "Russell King (Oracle)" <linux(a)armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/Y/Opu6ETe3ZzZ/8E@shell.armlinux.org.uk
---
kernel/irq/msi.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 783a3e6a0b10..13d96495e6d0 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1084,10 +1084,13 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
struct xarray *xa;
int ret, virq;
- if (!msi_ctrl_valid(dev, &ctrl))
- return -EINVAL;
-
msi_lock_descs(dev);
+
+ if (!msi_ctrl_valid(dev, &ctrl)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
ret = msi_domain_add_simple_msi_descs(dev, &ctrl);
if (ret)
goto unlock;
--
2.34.1
From: Yu Kuai <yukuai3(a)huawei.com>
commit 940c264984fd1457918393c49674f6b39ee16506 upstream.
If 'part_shift' is not zero, then 'index << part_shift' might
overflow to a value that is not greater than '0xfffff', then sysfs
might complains about duplicate creation.
Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices")
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-3-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Wen Yang <wenyang.linux(a)foxmail.com>
---
drivers/block/nbd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index bd05eaf04143..1ddae8f768d5 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1773,11 +1773,11 @@ static int nbd_dev_add(int index)
disk->major = NBD_MAJOR;
/* Too big first_minor can cause duplicate creation of
- * sysfs files/links, since MKDEV() expect that the max bits of
- * first_minor is 20.
+ * sysfs files/links, since index << part_shift might overflow, or
+ * MKDEV() expect that the max bits of first_minor is 20.
*/
disk->first_minor = index << part_shift;
- if (disk->first_minor > MINORMASK) {
+ if (disk->first_minor < index || disk->first_minor > MINORMASK) {
err = -EINVAL;
goto out_free_idr;
}
--
2.37.2
From: Wen Yang <wenyang.linux(a)foxmail.com>
This reverts commit 0daa75bf750c400af0a0127fae37cd959d36dee7.
These problems such as:
https://lore.kernel.org/all/CACPK8XfUWoOHr-0RwRoYoskia4fbAbZ7DYf5wWBnv6qUnG…
It was introduced by introduced by commit b1a811633f73 ("block: nbd: add sanity check for first_minor")
and has been have been fixed by commit e4c4871a7394 ("nbd: fix max value for 'first_minor'").
Cc: Joel Stanley <joel(a)jms.id.au>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Pavel Skripkin <paskripkin(a)gmail.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Wen Yang <wenyang.linux(a)foxmail.com>
---
drivers/block/nbd.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b0d3dadeb964..bf8148ebd858 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1771,7 +1771,17 @@ static int nbd_dev_add(int index)
refcount_set(&nbd->refs, 1);
INIT_LIST_HEAD(&nbd->list);
disk->major = NBD_MAJOR;
+
+ /* Too big first_minor can cause duplicate creation of
+ * sysfs files/links, since first_minor will be truncated to
+ * byte in __device_add_disk().
+ */
disk->first_minor = index << part_shift;
+ if (disk->first_minor > 0xff) {
+ err = -EINVAL;
+ goto out_free_idr;
+ }
+
disk->fops = &nbd_fops;
disk->private_data = nbd;
sprintf(disk->disk_name, "nbd%d", index);
--
2.37.2
With the introduction of KERNEL_IBRS, STIBP is no longer needed
to prevent cross thread training in the kernel space. When KERNEL_IBRS
was added, it also disabled the user-mode protections for spectre_v2.
KERNEL_IBRS does not mitigate cross thread training in the userspace.
In order to demonstrate the issue, one needs to avoid syscalls in the
victim as syscalls can shorten the window size due to
a user -> kernel -> user transition which sets the
IBRS bit when entering kernel space and clearing any training the
attacker may have done.
Allow users to select a spectre_v2_user mitigation (STIBP always on,
opt-in via prctl) when KERNEL_IBRS is enabled.
Reported-by: José Oliveira <joseloliveira11(a)gmail.com>
Reported-by: Rodrigo Branco <rodrigo(a)kernelhacking.com>
Reviewed-by: Alexandra Sandulescu <aesa(a)google.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Fixes: 7c693f54c873 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Cc: stable(a)vger.kernel.org
Signed-off-by: KP Singh <kpsingh(a)kernel.org>
---
arch/x86/kernel/cpu/bugs.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bca0bd8f4846..b05ca1575d81 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1132,6 +1132,19 @@ static inline bool spectre_v2_in_ibrs_mode(enum spectre_v2_mitigation mode)
mode == SPECTRE_V2_EIBRS_LFENCE;
}
+static inline bool spectre_v2_user_no_stibp(enum spectre_v2_mitigation mode)
+{
+ /* When IBRS or enhanced IBRS is enabled, STIBP is not needed.
+ *
+ * However, With KERNEL_IBRS, the IBRS bit is cleared on return
+ * to user and the user-mode code needs to be able to enable protection
+ * from cross-thread training, either by always enabling STIBP or
+ * by enabling it via prctl.
+ */
+ return (spectre_v2_in_ibrs_mode(mode) &&
+ !cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS));
+}
+
static void __init
spectre_v2_user_select_mitigation(void)
{
@@ -1193,13 +1206,8 @@ spectre_v2_user_select_mitigation(void)
"always-on" : "conditional");
}
- /*
- * If no STIBP, IBRS or enhanced IBRS is enabled, or SMT impossible,
- * STIBP is not required.
- */
- if (!boot_cpu_has(X86_FEATURE_STIBP) ||
- !smt_possible ||
- spectre_v2_in_ibrs_mode(spectre_v2_enabled))
+ if (!boot_cpu_has(X86_FEATURE_STIBP) || !smt_possible ||
+ spectre_v2_user_no_stibp(spectre_v2_enabled))
return;
/*
@@ -1496,6 +1504,7 @@ static void __init spectre_v2_select_mitigation(void)
break;
case SPECTRE_V2_IBRS:
+ pr_err("enabling KERNEL_IBRS");
setup_force_cpu_cap(X86_FEATURE_KERNEL_IBRS);
if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED))
pr_warn(SPECTRE_V2_IBRS_PERF_MSG);
@@ -2327,7 +2336,7 @@ static ssize_t mmio_stale_data_show_state(char *buf)
static char *stibp_state(void)
{
- if (spectre_v2_in_ibrs_mode(spectre_v2_enabled))
+ if (spectre_v2_user_no_stibp(spectre_v2_enabled))
return "";
switch (spectre_v2_user_stibp) {
--
2.39.2.637.g21b0678d19-goog
This is a request for adding the following patches to stable 5.10.y.
Poisoned shmem and hugetlb pages are removed from the pagecache.
Subsequent access to the offset in the file results in a NEW zero
filled page. Application code does not get notified of the data
loss, and the only 'clue' is a message in the system log. Data
loss has been experienced by real users.
This was addressed upstream. Most commits were marked for backports,
but some were not. This was discussed here [1] and here [2].
Patches apply cleanly to v5.4.224 and pass tests checking for this
specific data loss issue. LTP mm tests show no regressions.
All patches except 4 "mm: hwpoison: handle non-anonymous THP correctly"
required a small bit of change to apply correctly: mostly for context.
linux-mm Cc'ed as it would be great to get at least an ACK from others
familiar with this issue.
[1] https://lore.kernel.org/linux-mm/Y2UTUNBHVY5U9si2@monkey/
[2] https://lore.kernel.org/stable/20221114131403.GA3807058@u2004/
James Houghton (1):
hugetlbfs: don't delete error page from pagecache
Yang Shi (5):
mm: hwpoison: remove the unnecessary THP check
mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
mm: hwpoison: refactor refcount check handling
mm: hwpoison: handle non-anonymous THP correctly
mm: shmem: don't truncate page if memory failure happens
fs/hugetlbfs/inode.c | 13 ++--
include/linux/page-flags.h | 23 ++++++
mm/huge_memory.c | 2 +
mm/hugetlb.c | 4 +
mm/memory-failure.c | 153 ++++++++++++++++++++++++-------------
mm/memory.c | 9 +++
mm/page_alloc.c | 4 +-
mm/shmem.c | 51 +++++++++++--
8 files changed, 191 insertions(+), 68 deletions(-)
--
2.38.1
[ Upstream commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d ]
There were just too many code changes since 5.4 stable to prevent clean
picking the fix from mainline.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Cc: stable(a)vger.kernel.org # 5.4.x
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index b3cdd10c83ae..c30e450fa970 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -211,6 +211,10 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
int stream_tag;
int ret;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
+ if (!link)
+ return -EINVAL;
+
/* get stored dma data if resuming from system suspend */
link_dev = snd_soc_dai_get_dma_data(dai, substream);
if (!link_dev) {
@@ -231,10 +235,6 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
if (ret < 0)
return ret;
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
- if (!link)
- return -EINVAL;
-
/* set the stream tag in the codec dai dma params */
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
snd_soc_dai_set_tdm_slot(codec_dai, stream_tag, 0, 0, 0);
--
2.39.2
[ Upstream commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d ]
There were just too many code changes since 5.10/5.15 stable to prevent
clean picking the fix from mainline.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Cc: stable(a)vger.kernel.org # 5.15.x 5.10.x
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index de80f1b3d7f2..a6275cc92a40 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -212,6 +212,10 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
int stream_tag;
int ret;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
+ if (!link)
+ return -EINVAL;
+
/* get stored dma data if resuming from system suspend */
link_dev = snd_soc_dai_get_dma_data(dai, substream);
if (!link_dev) {
@@ -232,10 +236,6 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
if (ret < 0)
return ret;
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
- if (!link)
- return -EINVAL;
-
/* set the hdac_stream in the codec dai */
snd_soc_dai_set_stream(codec_dai, hdac_stream(link_dev), substream->stream);
--
2.39.2
[ Upstream commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d ]
The reason for manual backport is that in 6.2 there were a naming cleanup
around hdac, link, hlink, for example snd_hdac_ext_bus_get_link is renamed
to snd_hdac_ext_bus_get_hlink_by_name in 6.2.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Cc: stable(a)vger.kernel.org # 6.1.x 6.0.x
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index 556e883a32ed..5f03ee390d54 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -216,6 +216,10 @@ static int hda_link_dma_hw_params(struct snd_pcm_substream *substream,
struct hdac_bus *bus = hstream->bus;
struct hdac_ext_link *link;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
+ if (!link)
+ return -EINVAL;
+
hext_stream = snd_soc_dai_get_dma_data(cpu_dai, substream);
if (!hext_stream) {
hext_stream = hda_link_stream_assign(bus, substream);
@@ -225,10 +229,6 @@ static int hda_link_dma_hw_params(struct snd_pcm_substream *substream,
snd_soc_dai_set_dma_data(cpu_dai, substream, (void *)hext_stream);
}
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
- if (!link)
- return -EINVAL;
-
/* set the hdac_stream in the codec dai */
snd_soc_dai_set_stream(codec_dai, hdac_stream(hext_stream), substream->stream);
--
2.39.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d125d1349abe ("alarmtimer: Prevent starvation by small intervals and SIG_IGN")
41b3b8dffc1f ("alarmtimer: Rename gettime() callback to get_ktime()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d125d1349abeb46945dc5e98f7824bf688266f13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 9 Feb 2023 23:25:49 +0100
Subject: [PATCH] alarmtimer: Prevent starvation by small intervals and SIG_IGN
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b(a)syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828b9d7e..7e5dff602585 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -470,11 +470,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
}
EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
+ ktime_t now = base->get_ktime();
+
+ if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
+ /*
+ * Same issue as with posix_timer_fn(). Timers which are
+ * periodic but the signal is ignored can starve the system
+ * with a very small interval. The real fix which was
+ * promised in the context of posix_timer_fn() never
+ * materialized, but someone should really work on it.
+ *
+ * To prevent DOS fake @now to be 1 jiffie out which keeps
+ * the overrun accounting correct but creates an
+ * inconsistency vs. timer_gettime(2).
+ */
+ ktime_t kj = NSEC_PER_SEC / HZ;
+
+ if (interval < kj)
+ now = ktime_add(now, kj);
+ }
+
+ return alarm_forward(alarm, now, interval);
+}
- return alarm_forward(alarm, base->get_ktime(), interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+ return __alarm_forward_now(alarm, interval, false);
}
EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -551,9 +575,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (posix_timer_event(ptr, si_private) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
- * away once we handle ignored signals proper.
+ * away once we handle ignored signals proper. Ensure that
+ * small intervals cannot starve the system.
*/
- ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval);
+ ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d125d1349abe ("alarmtimer: Prevent starvation by small intervals and SIG_IGN")
41b3b8dffc1f ("alarmtimer: Rename gettime() callback to get_ktime()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d125d1349abeb46945dc5e98f7824bf688266f13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 9 Feb 2023 23:25:49 +0100
Subject: [PATCH] alarmtimer: Prevent starvation by small intervals and SIG_IGN
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b(a)syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828b9d7e..7e5dff602585 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -470,11 +470,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
}
EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
+ ktime_t now = base->get_ktime();
+
+ if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
+ /*
+ * Same issue as with posix_timer_fn(). Timers which are
+ * periodic but the signal is ignored can starve the system
+ * with a very small interval. The real fix which was
+ * promised in the context of posix_timer_fn() never
+ * materialized, but someone should really work on it.
+ *
+ * To prevent DOS fake @now to be 1 jiffie out which keeps
+ * the overrun accounting correct but creates an
+ * inconsistency vs. timer_gettime(2).
+ */
+ ktime_t kj = NSEC_PER_SEC / HZ;
+
+ if (interval < kj)
+ now = ktime_add(now, kj);
+ }
+
+ return alarm_forward(alarm, now, interval);
+}
- return alarm_forward(alarm, base->get_ktime(), interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+ return __alarm_forward_now(alarm, interval, false);
}
EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -551,9 +575,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (posix_timer_event(ptr, si_private) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
- * away once we handle ignored signals proper.
+ * away once we handle ignored signals proper. Ensure that
+ * small intervals cannot starve the system.
*/
- ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval);
+ ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
While checking Pin Assignments of the port and partner during probe, we
don't take into account whether the peripheral is a plug or receptacle.
This manifests itself in a mode entry failure on certain docks and
dongles with captive cables. For instance, the Startech.com Type-C to DP
dongle (Model #CDP2DP) advertises its DP VDO as 0x405. This would fail
the Pin Assignment compatibility check, despite it supporting
Pin Assignment C as a UFP.
Update the check to use the correct DP Pin Assign macros that
take the peripheral's receptacle bit into account.
Fixes: c1e5c2f0cb8a ("usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles")
Cc: stable(a)vger.kernel.org
Reported-by: Diana Zigterman <dzigterman(a)chromium.org>
Signed-off-by: Prashant Malani <pmalani(a)chromium.org>
---
I realize this is a bit late in the release cycle, but figured since it
is a fix it might still be considered. Please let me know if it's too
late and I can re-send this after the 6.3-rc1 is released. Thanks!
drivers/usb/typec/altmodes/displayport.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/altmodes/displayport.c b/drivers/usb/typec/altmodes/displayport.c
index 20db51471c98..662cd043b50e 100644
--- a/drivers/usb/typec/altmodes/displayport.c
+++ b/drivers/usb/typec/altmodes/displayport.c
@@ -547,10 +547,10 @@ int dp_altmode_probe(struct typec_altmode *alt)
/* FIXME: Port can only be DFP_U. */
/* Make sure we have compatiple pin configurations */
- if (!(DP_CAP_DFP_D_PIN_ASSIGN(port->vdo) &
- DP_CAP_UFP_D_PIN_ASSIGN(alt->vdo)) &&
- !(DP_CAP_UFP_D_PIN_ASSIGN(port->vdo) &
- DP_CAP_DFP_D_PIN_ASSIGN(alt->vdo)))
+ if (!(DP_CAP_PIN_ASSIGN_DFP_D(port->vdo) &
+ DP_CAP_PIN_ASSIGN_UFP_D(alt->vdo)) &&
+ !(DP_CAP_PIN_ASSIGN_UFP_D(port->vdo) &
+ DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo)))
return -ENODEV;
ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group);
--
2.39.1.519.gcb327c4b5f-goog
On default driver load device gets configured with unexpected
higher interrupt coalescing values instead of default expected
values as memory allocated from krealloc() is not supposed to
be zeroed out and may contain garbage values.
Fix this by allocating the memory of required size first with
kcalloc() and then use krealloc() to resize and preserve the
contents across down/up of the interface.
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Fixes: b0ec5489c480 ("qede: preserve per queue stats across up/down of interface")
Cc: stable(a)vger.kernel.org
Cc: Bhaskar Upadhaya <bupadhaya(a)marvell.com>
Cc: David S. Miller <davem(a)davemloft.net>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2160054
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Signed-off-by: Ariel Elior <aelior(a)marvell.com>
---
drivers/net/ethernet/qlogic/qede/qede_main.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 953f304b8588..af39513db1ba 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -970,8 +970,15 @@ static int qede_alloc_fp_array(struct qede_dev *edev)
goto err;
}
- mem = krealloc(edev->coal_entry, QEDE_QUEUE_CNT(edev) *
- sizeof(*edev->coal_entry), GFP_KERNEL);
+ if (!edev->coal_entry) {
+ mem = kcalloc(QEDE_MAX_RSS_CNT(edev),
+ sizeof(*edev->coal_entry), GFP_KERNEL);
+ } else {
+ mem = krealloc(edev->coal_entry,
+ QEDE_QUEUE_CNT(edev) * sizeof(*edev->coal_entry),
+ GFP_KERNEL);
+ }
+
if (!mem) {
DP_ERR(edev, "coalesce entry allocation failed\n");
kfree(edev->coal_entry);
--
2.27.0
While experimenting with CXL region removal the following corruption of
/proc/iomem appeared.
Before:
f010000000-f04fffffff : CXL Window 0
f010000000-f02fffffff : region4
f010000000-f02fffffff : dax4.0
f010000000-f02fffffff : System RAM (kmem)
After (modprobe -r cxl_test):
f010000000-f02fffffff : **redacted binary garbage**
f010000000-f02fffffff : System RAM (kmem)
...and testing further the same is visible with persistent memory
assigned to kmem:
Before:
480000000-243fffffff : Persistent Memory
480000000-57e1fffff : namespace3.0
580000000-243fffffff : dax3.0
580000000-243fffffff : System RAM (kmem)
After (ndctl disable-region all):
480000000-243fffffff : Persistent Memory
580000000-243fffffff : ***redacted binary garbage***
580000000-243fffffff : System RAM (kmem)
The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0"
resources, and it also shows that the "System RAM (kmem)" resource is
not being removed. The bug does not appear after "modprobe -r kmem", it
requires the parent of "dax4.0" and "dax3.0" to be removed which
re-parents the leaked "System RAM (kmem)" instances. Those in turn
reference the freed resource as a parent.
First up for the fix is release_mem_region_adjustable() needs to
reliably delete the resource inserted by add_memory_driver_managed().
That is thwarted by a check for IORESOURCE_SYSRAM that predates the
dax/kmem driver, from commit:
65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable")
That appears to be working around the behavior of HMM's
"MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that
check removed the "System RAM (kmem)" resource gets removed, but
corruption still occurs occasionally because the "dax" resource is not
reliably removed.
The dax range information is freed before the device is unregistered, so
the driver can not reliably recall (another use after free) what it is
meant to release. Lastly if that use after free got lucky, the driver
was covering up the leak of "System RAM (kmem)" due to its use of
release_resource() which detaches, but does not free, child resources.
The switch to remove_resource() forces remove_memory() to be responsible
for the deletion of the resource added by add_memory_driver_managed().
Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances")
Cc: <stable(a)vger.kernel.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/dax/bus.c | 2 +-
drivers/dax/kmem.c | 4 ++--
kernel/resource.c | 14 --------------
3 files changed, 3 insertions(+), 17 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 012d576004e9..67a64f4c472d 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -441,8 +441,8 @@ static void unregister_dev_dax(void *dev)
dev_dbg(dev, "%s\n", __func__);
kill_dev_dax(dev_dax);
- free_dev_dax_ranges(dev_dax);
device_del(dev);
+ free_dev_dax_ranges(dev_dax);
put_device(dev);
}
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 918d01d3fbaa..7b36db6f1cbd 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
i, range.start, range.end);
- release_resource(res);
+ remove_resource(res);
kfree(res);
data->res[i] = NULL;
if (mapped)
@@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
rc = remove_memory(range.start, range_len(&range));
if (rc == 0) {
- release_resource(data->res[i]);
+ remove_resource(data->res[i]);
kfree(data->res[i]);
data->res[i] = NULL;
success++;
diff --git a/kernel/resource.c b/kernel/resource.c
index ddbbacb9fb50..b1763b2fd7ef 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1343,20 +1343,6 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
continue;
}
- /*
- * All memory regions added from memory-hotplug path have the
- * flag IORESOURCE_SYSTEM_RAM. If the resource does not have
- * this flag, we know that we are dealing with a resource coming
- * from HMM/devm. HMM/devm use another mechanism to add/release
- * a resource. This goes via devm_request_mem_region and
- * devm_release_mem_region.
- * HMM/devm take care to release their resources when they want,
- * so if we are dealing with them, let us just back off here.
- */
- if (!(res->flags & IORESOURCE_SYSRAM)) {
- break;
- }
-
if (!(res->flags & IORESOURCE_MEM))
break;
The implementation of syscall_get_nr on mips used to ignore the task
argument and return the syscall number of the calling thread instead of
the target thread.
The bug was exposed to user space by commit 201766a20e30f ("ptrace: add
PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite.
Link: https://github.com/strace/strace/issues/235
Fixes: c2d9f1775731 ("MIPS: Fix syscall_get_nr for the syscall exit tracing.")
Cc: <stable(a)vger.kernel.org> # v3.19+
Co-developed-by: Dmitry V. Levin <ldv(a)strace.io>
Signed-off-by: Dmitry V. Levin <ldv(a)strace.io>
Signed-off-by: Elvira Khabirova <lineprinter0(a)gmail.com>
---
arch/mips/include/asm/syscall.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
index 25fa651c937d..ebdf4d910af2 100644
--- a/arch/mips/include/asm/syscall.h
+++ b/arch/mips/include/asm/syscall.h
@@ -38,7 +38,7 @@ static inline bool mips_syscall_is_indirect(struct task_struct *task,
static inline long syscall_get_nr(struct task_struct *task,
struct pt_regs *regs)
{
- return current_thread_info()->syscall;
+ return task_thread_info(task)->syscall;
}
static inline void mips_syscall_update_nr(struct task_struct *task,
--
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed.
ext4_feat_ktype was setting the "release" handler to "kfree", which
doesn't have a matching function prototype. Add a simple wrapper
with the correct prototype.
This was found as a result of Clang's new -Wcast-function-type-strict
flag, which is more sensitive than the simpler -Wcast-function-type,
which only checks for type width mismatches.
Note that this code is only reached when ext4 is a loadable module and
it is being unloaded:
CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698)
...
RIP: 0010:kobject_put+0xbb/0x1b0
...
Call Trace:
<TASK>
ext4_exit_sysfs+0x14/0x60 [ext4]
cleanup_module+0x67/0xedb [ext4]
Fixes: b99fee58a20a ("ext4: create ext4_feat kobject dynamically")
Cc: Theodore Ts'o <tytso(a)mit.edu>
Cc: Eric Biggers <ebiggers(a)kernel.org>
Cc: stable(a)vger.kernel.org
Build-tested-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Link: https://lore.kernel.org/r/20230103234616.never.915-kees@kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
v2: rename callback, improve commit log (ebiggers)
v1: https://lore.kernel.org/lkml/20230103234616.never.915-kees@kernel.org
---
fs/ext4/sysfs.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index d233c24ea342..e2b8b3437c58 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -491,6 +491,11 @@ static void ext4_sb_release(struct kobject *kobj)
complete(&sbi->s_kobj_unregister);
}
+static void ext4_feat_release(struct kobject *kobj)
+{
+ kfree(kobj);
+}
+
static const struct sysfs_ops ext4_attr_ops = {
.show = ext4_attr_show,
.store = ext4_attr_store,
@@ -505,7 +510,7 @@ static struct kobj_type ext4_sb_ktype = {
static struct kobj_type ext4_feat_ktype = {
.default_groups = ext4_feat_groups,
.sysfs_ops = &ext4_attr_ops,
- .release = (void (*)(struct kobject *))kfree,
+ .release = ext4_feat_release,
};
void ext4_notify_error_sysfs(struct ext4_sb_info *sbi)
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5956592ce337 ("mm/filemap: fix page end in filemap_get_read_batch")
25d6a23e8d28 ("filemap: Convert filemap_get_read_batch() to use a folio_batch")
d996fc7f615f ("filemap: Convert filemap_read() to use a folio")
65bca53b5f63 ("filemap: Convert filemap_get_pages to use folios")
a5d4ad098528 ("filemap: Convert filemap_create_page to folio")
9d427b4eb456 ("filemap: Convert filemap_read_page to take a folio")
bdb729329769 ("filemap: Convert filemap_get_read_batch to use folios")
512b7931ad05 ("Merge branch 'akpm' (patches from Andrew)")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5956592ce337330cdff0399a6f8b6a5aea397a8e Mon Sep 17 00:00:00 2001
From: Qian Yingjin <qian(a)ddn.com>
Date: Wed, 8 Feb 2023 10:24:00 +0800
Subject: [PATCH] mm/filemap: fix page end in filemap_get_read_batch
I was running traces of the read code against an RAID storage system to
understand why read requests were being misaligned against the underlying
RAID strips. I found that the page end offset calculation in
filemap_get_read_batch() was off by one.
When a read is submitted with end offset 1048575, then it calculates the
end page for read of 256 when it should be 255. "last_index" is the index
of the page beyond the end of the read and it should be skipped when get a
batch of pages for read in @filemap_get_read_batch().
The below simple patch fixes the problem. This code was introduced in
kernel 5.12.
Link: https://lkml.kernel.org/r/20230208022400.28962-1-coolqyj@163.com
Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read")
Signed-off-by: Qian Yingjin <qian(a)ddn.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/filemap.c b/mm/filemap.c
index c4d4ace9cc70..0e20a8d6dd93 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2588,18 +2588,19 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
struct folio *folio;
int err = 0;
+ /* "last_index" is the index of the page beyond the end of the read */
last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE);
retry:
if (fatal_signal_pending(current))
return -EINTR;
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & IOCB_NOIO)
return -EAGAIN;
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
}
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
In some specific situation, the return value of __bch_btree_node_alloc may
be NULL. This may lead to poential NULL pointer dereference in caller
function like a calling chaion :
btree_split->bch_btree_node_alloc->__bch_btree_node_alloc.
Fix it by initialize return value in __bch_btree_node_alloc before return.
Fixes: cafe56359144 ("bcache: A block layer cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
---
v3:
- Add Cc: stable(a)vger.kernel.org suggested by Eric
v2:
- split patch v1 into two patches to make it clearer suggested by Coly Li
---
drivers/md/bcache/btree.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 147c493a989a..cae25e74b9e0 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1090,10 +1090,12 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op,
struct btree *parent)
{
BKEY_PADDED(key) k;
- struct btree *b = ERR_PTR(-EAGAIN);
+ struct btree *b;
mutex_lock(&c->bucket_lock);
retry:
+ /* return ERR_PTR(-EAGAIN) when it fails */
+ b = ERR_PTR(-EAGAIN);
if (__bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, wait))
goto err;
--
2.25.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0967bf837784 ("ixgbe: add double of VLAN header when computing the max MTU")
f9cd6a4418ba ("ixgbe: allow to increase MTU to 3K with XDP enabled")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0967bf837784a11c65d66060623a74e65211af0b Mon Sep 17 00:00:00 2001
From: Jason Xing <kernelxing(a)tencent.com>
Date: Thu, 9 Feb 2023 10:41:28 +0800
Subject: [PATCH] ixgbe: add double of VLAN header when computing the max MTU
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout(a)intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index bc68b8f2176d..8736ca4b2628 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -73,6 +73,8 @@
#define IXGBE_RXBUFFER_4K 4096
#define IXGBE_MAX_RXBUFFER 16384 /* largest size for a single descriptor */
+#define IXGBE_PKT_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
+
/* Attempt to maximize the headroom available for incoming frames. We
* use a 2K buffer for receives and need 1536/1534 to store the data for
* the frame. This leaves us with 512 bytes of room. From that we need
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 25ca329f7d3c..4507fba8747a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6801,8 +6801,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
struct ixgbe_adapter *adapter = netdev_priv(netdev);
if (ixgbe_enabled_xdp_adapter(adapter)) {
- int new_frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN +
- VLAN_HLEN;
+ int new_frame_size = new_mtu + IXGBE_PKT_HDR_PAD;
if (new_frame_size > ixgbe_max_xdp_frame_size(adapter)) {
e_warn(probe, "Requested MTU size is not supported with XDP\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
40bd094d65fc ("flow_offload: fill flags to action structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 21c167aa0ba943a7cac2f6969814f83bb701666b Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Fri, 10 Feb 2023 17:08:25 -0300
Subject: [PATCH] net/sched: act_ctinfo: use percpu stats
The tc action act_ctinfo was using shared stats, fix it to use percpu stats
since bstats_update() must be called with locks or with a percpu pointer argument.
tdc results:
1..12
ok 1 c826 - Add ctinfo action with default setting
ok 2 0286 - Add ctinfo action with dscp
ok 3 4938 - Add ctinfo action with valid cpmark and zone
ok 4 7593 - Add ctinfo action with drop control
ok 5 2961 - Replace ctinfo action zone and action control
ok 6 e567 - Delete ctinfo action with valid index
ok 7 6a91 - Delete ctinfo action with invalid index
ok 8 5232 - List ctinfo actions
ok 9 7702 - Flush ctinfo actions
ok 10 3201 - Add ctinfo action with duplicate index
ok 11 8295 - Add ctinfo action with invalid index
ok 12 3964 - Replace ctinfo action with invalid goto_chain control
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Reviewed-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba(a)intel.com>
Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 4b1b59da5c0b..4d15b6a6169c 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -93,7 +93,7 @@ TC_INDIRECT_SCOPE int tcf_ctinfo_act(struct sk_buff *skb,
cp = rcu_dereference_bh(ca->params);
tcf_lastuse_update(&ca->tcf_tm);
- bstats_update(&ca->tcf_bstats, skb);
+ tcf_action_update_bstats(&ca->common, skb);
action = READ_ONCE(ca->tcf_action);
wlen = skb_network_offset(skb);
@@ -212,8 +212,8 @@ static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
index = actparm->index;
err = tcf_idr_check_alloc(tn, &index, a, bind);
if (!err) {
- ret = tcf_idr_create(tn, index, est, a,
- &act_ctinfo_ops, bind, false, flags);
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_ctinfo_ops, bind, flags);
if (ret) {
tcf_idr_cleanup(tn, index);
return ret;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
40bd094d65fc ("flow_offload: fill flags to action structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 21c167aa0ba943a7cac2f6969814f83bb701666b Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Fri, 10 Feb 2023 17:08:25 -0300
Subject: [PATCH] net/sched: act_ctinfo: use percpu stats
The tc action act_ctinfo was using shared stats, fix it to use percpu stats
since bstats_update() must be called with locks or with a percpu pointer argument.
tdc results:
1..12
ok 1 c826 - Add ctinfo action with default setting
ok 2 0286 - Add ctinfo action with dscp
ok 3 4938 - Add ctinfo action with valid cpmark and zone
ok 4 7593 - Add ctinfo action with drop control
ok 5 2961 - Replace ctinfo action zone and action control
ok 6 e567 - Delete ctinfo action with valid index
ok 7 6a91 - Delete ctinfo action with invalid index
ok 8 5232 - List ctinfo actions
ok 9 7702 - Flush ctinfo actions
ok 10 3201 - Add ctinfo action with duplicate index
ok 11 8295 - Add ctinfo action with invalid index
ok 12 3964 - Replace ctinfo action with invalid goto_chain control
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Reviewed-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba(a)intel.com>
Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 4b1b59da5c0b..4d15b6a6169c 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -93,7 +93,7 @@ TC_INDIRECT_SCOPE int tcf_ctinfo_act(struct sk_buff *skb,
cp = rcu_dereference_bh(ca->params);
tcf_lastuse_update(&ca->tcf_tm);
- bstats_update(&ca->tcf_bstats, skb);
+ tcf_action_update_bstats(&ca->common, skb);
action = READ_ONCE(ca->tcf_action);
wlen = skb_network_offset(skb);
@@ -212,8 +212,8 @@ static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
index = actparm->index;
err = tcf_idr_check_alloc(tn, &index, a, bind);
if (!err) {
- ret = tcf_idr_create(tn, index, est, a,
- &act_ctinfo_ops, bind, false, flags);
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_ctinfo_ops, bind, flags);
if (ret) {
tcf_idr_cleanup(tn, index);
return ret;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a1221703a0f7 ("sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a1221703a0f75a9d81748c516457e0fc76951496 Mon Sep 17 00:00:00 2001
From: Pietro Borrello <borrello(a)diag.uniroma1.it>
Date: Thu, 9 Feb 2023 12:13:05 +0000
Subject: [PATCH] sctp: sctp_sock_filter(): avoid list_entry() on possibly
empty list
Use list_is_first() to check whether tsp->asoc matches the first
element of ep->asocs, as the list is not guaranteed to have an entry.
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Pietro Borrello <borrello(a)diag.uniroma1.it>
Acked-by: Xin Long <lucien.xin(a)gmail.com>
Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniro…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index a557009e9832..c3d6b92dd386 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -343,11 +343,9 @@ static int sctp_sock_filter(struct sctp_endpoint *ep, struct sctp_transport *tsp
struct sctp_comm_param *commp = p;
struct sock *sk = ep->base.sk;
const struct inet_diag_req_v2 *r = commp->r;
- struct sctp_association *assoc =
- list_entry(ep->asocs.next, struct sctp_association, asocs);
/* find the ep only once through the transports by this condition */
- if (tsp->asoc != assoc)
+ if (!list_is_first(&tsp->asoc->asocs, &ep->asocs))
return 0;
if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family)