When we access a no-current task's pid namespace we need check that the
task hasn't been reaped in the meantime and it's pid namespace isn't
accessible anymore.
The user namespace is fine because it is only released when the last
reference to struct task_struct is put and exit_creds() is called.
Fixes: 5b08bd408534 ("pidfs: allow retrieval of namespace file descriptors")
CC: stable(a)vger.kernel.org # v6.11
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
fs/pidfs.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/pidfs.c b/fs/pidfs.c
index 7ffdc88dfb52..80675b6bf884 100644
--- a/fs/pidfs.c
+++ b/fs/pidfs.c
@@ -120,6 +120,7 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
struct nsproxy *nsp __free(put_nsproxy) = NULL;
struct pid *pid = pidfd_pid(file);
struct ns_common *ns_common = NULL;
+ struct pid_namespace *pid_ns;
if (arg)
return -EINVAL;
@@ -202,7 +203,9 @@ static long pidfd_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
case PIDFD_GET_PID_NAMESPACE:
if (IS_ENABLED(CONFIG_PID_NS)) {
rcu_read_lock();
- ns_common = to_ns_common( get_pid_ns(task_active_pid_ns(task)));
+ pid_ns = task_active_pid_ns(task);
+ if (pid_ns)
+ ns_common = to_ns_common(get_pid_ns(pid_ns));
rcu_read_unlock();
}
break;
--
2.45.2
After commit 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a work
queue"), it can happen that a work item is sent to an uninitialized work
queue. This may has the effect that the item being queued is never
actually queued, and any further actions depending on it will not proceed.
The following warning is observed while the fnic driver is loaded:
kernel: WARNING: CPU: 11 PID: 0 at ../kernel/workqueue.c:1524 __queue_work+0x373/0x410
kernel: <IRQ>
kernel: queue_work_on+0x3a/0x50
kernel: fnic_wq_copy_cmpl_handler+0x54a/0x730 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24]
kernel: fnic_isr_msix_wq_copy+0x2d/0x60 [fnic 62fbff0c42e7fb825c60a55cde2fb91facb2ed24]
kernel: __handle_irq_event_percpu+0x36/0x1a0
kernel: handle_irq_event_percpu+0x30/0x70
kernel: handle_irq_event+0x34/0x60
kernel: handle_edge_irq+0x7e/0x1a0
kernel: __common_interrupt+0x3b/0xb0
kernel: common_interrupt+0x58/0xa0
kernel: </IRQ>
It has been observed that this may break the rediscovery of fibre channel
devices after a temporary fabric failure.
This patch fixes it by moving the work queue initialization out of
an if block in fnic_probe().
Signed-off-by: Martin Wilck <mwilck(a)suse.com>
Fixes: 379a58caa199 ("scsi: fnic: Move fnic_fnic_flush_tx() to a work queue")
Cc: stable(a)vger.kernel.org
---
drivers/scsi/fnic/fnic_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/fnic/fnic_main.c b/drivers/scsi/fnic/fnic_main.c
index 0044717d4486..adec0df24bc4 100644
--- a/drivers/scsi/fnic/fnic_main.c
+++ b/drivers/scsi/fnic/fnic_main.c
@@ -830,7 +830,6 @@ static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
spin_lock_init(&fnic->vlans_lock);
INIT_WORK(&fnic->fip_frame_work, fnic_handle_fip_frame);
INIT_WORK(&fnic->event_work, fnic_handle_event);
- INIT_WORK(&fnic->flush_work, fnic_flush_tx);
skb_queue_head_init(&fnic->fip_frame_queue);
INIT_LIST_HEAD(&fnic->evlist);
INIT_LIST_HEAD(&fnic->vlans);
@@ -948,6 +947,7 @@ static int fnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
INIT_WORK(&fnic->link_work, fnic_handle_link);
INIT_WORK(&fnic->frame_work, fnic_handle_frame);
+ INIT_WORK(&fnic->flush_work, fnic_flush_tx);
skb_queue_head_init(&fnic->frame_queue);
skb_queue_head_init(&fnic->tx_queue);
--
2.46.1
The patch titled
Subject: fs/proc/kcore.c: allow translation of physical memory addresses
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-proc-kcorec-allow-translation-of-physical-memory-addresses.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Alexander Gordeev <agordeev(a)linux.ibm.com>
Subject: fs/proc/kcore.c: allow translation of physical memory addresses
Date: Mon, 30 Sep 2024 14:21:19 +0200
When /proc/kcore is read an attempt to read the first two pages results in
HW-specific page swap on s390 and another (so called prefix) pages are
accessed instead. That leads to a wrong read.
Allow architecture-specific translation of memory addresses using
kc_xlate_dev_mem_ptr() and kc_unxlate_dev_mem_ptr() callbacks similarily
to /dev/mem xlate_dev_mem_ptr() and unxlate_dev_mem_ptr() callbacks. That
way an architecture can deal with specific physical memory ranges.
Re-use the existing /dev/mem callback implementation on s390, which
handles the described prefix pages swapping correctly.
For other architectures the default callback is basically NOP. It is
expected the condition (vaddr == __va(__pa(vaddr))) always holds true for
KCORE_RAM memory type.
Link: https://lkml.kernel.org/r/20240930122119.1651546-1-agordeev@linux.ibm.com
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
Suggested-by: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/s390/include/asm/io.h | 2 +
fs/proc/kcore.c | 36 +++++++++++++++++++++++++++++++++--
2 files changed, 36 insertions(+), 2 deletions(-)
--- a/arch/s390/include/asm/io.h~fs-proc-kcorec-allow-translation-of-physical-memory-addresses
+++ a/arch/s390/include/asm/io.h
@@ -16,8 +16,10 @@
#include <asm/pci_io.h>
#define xlate_dev_mem_ptr xlate_dev_mem_ptr
+#define kc_xlate_dev_mem_ptr xlate_dev_mem_ptr
void *xlate_dev_mem_ptr(phys_addr_t phys);
#define unxlate_dev_mem_ptr unxlate_dev_mem_ptr
+#define kc_unxlate_dev_mem_ptr unxlate_dev_mem_ptr
void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr);
#define IO_SPACE_LIMIT 0
--- a/fs/proc/kcore.c~fs-proc-kcorec-allow-translation-of-physical-memory-addresses
+++ a/fs/proc/kcore.c
@@ -50,6 +50,20 @@ static struct proc_dir_entry *proc_root_
#define kc_offset_to_vaddr(o) ((o) + PAGE_OFFSET)
#endif
+#ifndef kc_xlate_dev_mem_ptr
+#define kc_xlate_dev_mem_ptr kc_xlate_dev_mem_ptr
+static inline void *kc_xlate_dev_mem_ptr(phys_addr_t phys)
+{
+ return __va(phys);
+}
+#endif
+#ifndef kc_unxlate_dev_mem_ptr
+#define kc_unxlate_dev_mem_ptr kc_unxlate_dev_mem_ptr
+static inline void kc_unxlate_dev_mem_ptr(phys_addr_t phys, void *virt)
+{
+}
+#endif
+
static LIST_HEAD(kclist_head);
static DECLARE_RWSEM(kclist_lock);
static int kcore_need_update = 1;
@@ -471,6 +485,8 @@ static ssize_t read_kcore_iter(struct ki
while (buflen) {
struct page *page;
unsigned long pfn;
+ phys_addr_t phys;
+ void *__start;
/*
* If this is the first iteration or the address is not within
@@ -537,7 +553,8 @@ static ssize_t read_kcore_iter(struct ki
}
break;
case KCORE_RAM:
- pfn = __pa(start) >> PAGE_SHIFT;
+ phys = __pa(start);
+ pfn = phys >> PAGE_SHIFT;
page = pfn_to_online_page(pfn);
/*
@@ -557,13 +574,28 @@ static ssize_t read_kcore_iter(struct ki
fallthrough;
case KCORE_VMEMMAP:
case KCORE_TEXT:
+ if (m->type == KCORE_RAM) {
+ __start = kc_xlate_dev_mem_ptr(phys);
+ if (!__start) {
+ ret = -ENOMEM;
+ if (iov_iter_zero(tsz, iter) != tsz)
+ ret = -EFAULT;
+ goto out;
+ }
+ } else {
+ __start = (void *)start;
+ }
+
/*
* Sadly we must use a bounce buffer here to be able to
* make use of copy_from_kernel_nofault(), as these
* memory regions might not always be mapped on all
* architectures.
*/
- if (copy_from_kernel_nofault(buf, (void *)start, tsz)) {
+ ret = copy_from_kernel_nofault(buf, __start, tsz);
+ if (m->type == KCORE_RAM)
+ kc_unxlate_dev_mem_ptr(phys, __start);
+ if (ret) {
if (iov_iter_zero(tsz, iter) != tsz) {
ret = -EFAULT;
goto out;
_
Patches currently in -mm which might be from agordeev(a)linux.ibm.com are
fs-proc-kcorec-allow-translation-of-physical-memory-addresses.patch
From: Kan Liang <kan.liang(a)linux.intel.com>
The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves.
To tell the availability of the sub-leaf 1 (enumerate the counter mask),
perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).
The error is not user-visible on bare metal. Because the sub-leaf 0 and
the sub-leaf 1 are always available. However, it may bring issues in a
virtualization environment when a VMM only enumerates the sub-leaf 0.
Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extension leaf")
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 4 ++--
arch/x86/include/asm/perf_event.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 342f8b1a2f93..123ed1d60118 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4900,8 +4900,8 @@ static void update_pmu_cap(struct x86_hybrid_pmu *pmu)
if (ebx & ARCH_PERFMON_EXT_EQ)
pmu->config_mask |= ARCH_PERFMON_EVENTSEL_EQ;
- if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF_BIT) {
- cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF,
+ if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF) {
+ cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF_BIT,
&eax, &ebx, &ecx, &edx);
pmu->cntr_mask64 = eax;
pmu->fixed_cntr_mask64 = ebx;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index e3b5e8e96fb3..1d4ce655aece 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -191,7 +191,7 @@ union cpuid10_edx {
#define ARCH_PERFMON_EXT_UMASK2 0x1
#define ARCH_PERFMON_EXT_EQ 0x2
#define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1
-#define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1
+#define ARCH_PERFMON_NUM_COUNTER_LEAF BIT(ARCH_PERFMON_NUM_COUNTER_LEAF_BIT)
/*
* Intel Architectural LBR CPUID detection/enumeration details:
--
2.38.1
Hey,
Would you be interested in acquiring an visitors list of Batimat Expo 2024?
We have 95,000 verified contact list.
List Contains: Contact Name, Title, Phone Number, Fax Number, Physical address, Company Name, Company URL, Employee Size, Revenue Size, Industry, and more…
Interested? I will share you more details and pricing for the same.
Kind Regards,
Clara Skylar
Senior Marketing Executive
If you do not wish to receive our emails, please reply with "Not Interested."
CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND is an int, defaulting to 250. When
the wakeref is non-zero, it's either -1 or a dynamically allocated
pointer, depending on CONFIG_DRM_I915_DEBUG_RUNTIME_PM. It's likely that
the code works by coincidence with the bitwise AND, but with
CONFIG_DRM_I915_DEBUG_RUNTIME_PM=y, there's the off chance that the
condition evaluates to false, and intel_wakeref_auto() doesn't get
called. Switch to the intended logical AND.
v2: Use != to avoid clang -Wconstant-logical-operand (Nathan)
Fixes: ad74457a6b5a ("drm/i915/dgfx: Release mmap on rpm suspend")
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Anshuman Gupta <anshuman.gupta(a)intel.com>
Cc: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v6.1+
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com> # v1
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com> # v1
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 5c72462d1f57..b22e2019768f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -1131,7 +1131,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf)
GEM_WARN_ON(!i915_ttm_cpu_maps_iomem(bo->resource));
}
- if (wakeref & CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)
+ if (wakeref && CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND != 0)
intel_wakeref_auto(&to_i915(obj->base.dev)->runtime_pm.userfault_wakeref,
msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND));
--
2.39.2
I'm announcing the release of the 6.11.1 kernel.
All users of the 6.11 kernel series must upgrade.
The updated 6.11.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.11.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/accel/drm_accel.c | 110 ++--------------------------------
drivers/bluetooth/btintel_pcie.c | 2
drivers/cpufreq/amd-pstate.c | 5 +
drivers/gpu/drm/drm_drv.c | 97 ++++++++++++++---------------
drivers/gpu/drm/drm_file.c | 2
drivers/gpu/drm/drm_internal.h | 4 -
drivers/nvme/host/nvme.h | 5 +
drivers/nvme/host/pci.c | 18 ++---
drivers/powercap/intel_rapl_common.c | 35 +++++++++-
drivers/usb/class/usbtmc.c | 2
drivers/usb/serial/pl2303.c | 1
drivers/usb/serial/pl2303.h | 4 +
include/drm/drm_accel.h | 18 -----
include/drm/drm_file.h | 5 +
net/netfilter/nft_socket.c | 4 -
sound/soc/amd/acp/acp-legacy-common.c | 5 +
sound/soc/amd/acp/amd.h | 2
18 files changed, 128 insertions(+), 193 deletions(-)
Dan Carpenter (2):
netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
powercap: intel_rapl: Change an error pointer to NULL
Dhananjay Ugwekar (3):
powercap/intel_rapl: Add support for AMD family 1Ah
powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
cpufreq/amd-pstate: Add the missing cpufreq_cpu_put()
Edward Adam Davis (1):
USB: usbtmc: prevent kernel-usb-infoleak
Greg Kroah-Hartman (1):
Linux 6.11.1
Junhao Xie (1):
USB: serial: pl2303: add device id for Macrosilicon MS3020
Keith Busch (1):
nvme-pci: qdepth 1 quirk
Kiran K (1):
Bluetooth: btintel_pcie: Allocate memory for driver private data
Michał Winiarski (3):
drm: Use XArray instead of IDR for minors
accel: Use XArray instead of IDR for minors
drm: Expand max DRM device number to full MINORBITS
Vijendar Mukunda (1):
ASoC: amd: acp: add ZSC control register programming sequence
Reset LPCR_MER bit before running a vCPU to ensure that it is not set if
there are no pending interrupts. Running a vCPU with LPCR_MER bit set
and no pending interrupts results in L2 vCPU getting an infinite flood
of spurious interrupts. The 'if check' in kvmhv_run_single_vcpu() sets
the LPCR_MER bit if there are pending interrupts.
The spurious flood problem can be observed in 2 cases:
1. Crashing the guest while interrupt heavy workload is running
a. Start a L2 guest and run an interrupt heavy workload (eg: ipistorm)
b. While the workload is running, crash the guest (make sure kdump
is configured)
c. Any one of the vCPUs of the guest will start getting an infinite
flood of spurious interrupts.
2. Running LTP stress tests in multiple guests at the same time
a. Start 4 L2 guests.
b. Start running LTP stress tests on all 4 guests at same time.
c. In some time, any one/more of the vCPUs of any of the guests will
start getting an infinite flood of spurious interrupts.
The root cause of both the above issues is the same:
1. A NMI is sent to a running vCPU that had LPCR_MER bit set.
2. In the NMI path, all registers are refreshed, i.e, H_GUEST_GET_STATE
is called for all the registers.
3. When H_GUEST_GET_STATE is called for lpcr, the vcpu->arch.vcore->lpcr
of that vCPU at L1 level gets updated with LPCR_MER set to 1, and this
new value is always used whenever that vCPU runs, regardless of whether
there was a pending interrupt.
4. Since LPCR_MER is set, the vCPU in L2 always jumps to the external
interrupt handler, and this cycle never ends.
Fix the spurious flood by making sure LPCR_MER is always reset before
running a vCPU.
Fixes: ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0")
Cc: stable(a)vger.kernel.org # v6.8+
Signed-off-by: Gautam Menghani <gautam(a)linux.ibm.com>
---
arch/powerpc/kvm/book3s_hv.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8f7d7e37bc8c..3cc2f1691001 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -98,6 +98,13 @@
/* Used to indicate that a guest passthrough interrupt needs to be handled */
#define RESUME_PASSTHROUGH (RESUME_GUEST | RESUME_FLAG_ARCH2)
+/* Clear LPCR_MER bit - If we run a L2 vCPU with LPCR_MER bit set but no pending external
+ * interrupts, we end up getting a flood of spurious interrupts in L2 KVM guests. To avoid
+ * that, reset LPCR_MER and let the 'if check' for pending interrupts in kvmhv_run_single_vcpu()
+ * set LPCR_MER if there are pending interrupts.
+ */
+#define kvmppc_reset_lpcr_mer(vcpu) (vcpu->arch.vcore->lpcr &= ~LPCR_MER)
+
/* Used as a "null" value for timebase values */
#define TB_NIL (~(u64)0)
@@ -5091,7 +5098,7 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
accumulate_time(vcpu, &vcpu->arch.guest_entry);
if (cpu_has_feature(CPU_FTR_ARCH_300))
r = kvmhv_run_single_vcpu(vcpu, ~(u64)0,
- vcpu->arch.vcore->lpcr);
+ kvmppc_reset_lpcr_mer(vcpu));
else
r = kvmppc_run_vcpu(vcpu);
--
2.46.0
Atomicity violation occurs during consecutive reads of the members of
gb_tty. Consider a scenario where, because the consecutive reads of gb_tty
members are not protected by a lock, the value of gb_tty may still be
changing during the read process.
gb_tty->port.close_delay and gb_tty->port.closing_wait are updated
together, such as in the set_serial_info() function. If during the
read process, gb_tty->port.close_delay and gb_tty->port.closing_wait
are still being updated, it is possible that gb_tty->port.close_delay
is updated while gb_tty->port.closing_wait is not. In this case,
the code first reads gb_tty->port.close_delay and then
gb_tty->port.closing_wait. A new gb_tty->port.close_delay and an
old gb_tty->port.closing_wait could be read. Such values, whether
before or after the update, should not coexist as they represent an
intermediate state.
This could result in a mismatch of the values read for gb_tty->minor,
gb_tty->port.close_delay, and gb_tty->port.closing_wait, which in turn
could cause ss->close_delay and ss->closing_wait to be mismatched.
To address this issue, we have enclosed all sequential read operations of
the gb_tty variable within a lock. This ensures that the value of gb_tty
remains unchanged throughout the process, guaranteeing its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: b71e571adaa5 ("staging: greybus: uart: fix TIOCSSERIAL jiffies conversions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/staging/greybus/uart.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/greybus/uart.c b/drivers/staging/greybus/uart.c
index cdf4ebb93b10..8cc18590d97b 100644
--- a/drivers/staging/greybus/uart.c
+++ b/drivers/staging/greybus/uart.c
@@ -595,12 +595,14 @@ static int get_serial_info(struct tty_struct *tty,
{
struct gb_tty *gb_tty = tty->driver_data;
+ mutex_lock(&gb_tty->port.mutex);
ss->line = gb_tty->minor;
ss->close_delay = jiffies_to_msecs(gb_tty->port.close_delay) / 10;
ss->closing_wait =
gb_tty->port.closing_wait == ASYNC_CLOSING_WAIT_NONE ?
ASYNC_CLOSING_WAIT_NONE :
jiffies_to_msecs(gb_tty->port.closing_wait) / 10;
+ mutex_unlock(&gb_tty->port.mutex);
return 0;
}
--
2.34.1
From: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
[ Upstream commit 2a3cfb9a24a28da9cc13d2c525a76548865e182c ]
Since 'adev->dm.dc' in amdgpu_dm_fini() might turn out to be NULL
before the call to dc_enable_dmub_notifications(), check
beforehand to ensure there will not be a possible NULL-ptr-deref
there.
Also, since commit 1e88eb1b2c25 ("drm/amd/display: Drop
CONFIG_DRM_AMD_DC_HDCP") there are two separate checks for NULL in
'adev->dm.dc' before dc_deinit_callbacks() and dc_dmub_srv_destroy().
Clean up by combining them all under one 'if'.
Found by Linux Verification Center (linuxtesting.org) with static
analysis tool SVACE.
Fixes: 81927e2808be ("drm/amd/display: Support for DMUB AUX")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 393e32259a77..4850aed54604 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1876,14 +1876,14 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
dc_deinit_callbacks(adev->dm.dc);
#endif
- if (adev->dm.dc)
+ if (adev->dm.dc) {
dc_dmub_srv_destroy(&adev->dm.dc->ctx->dmub_srv);
-
- if (dc_enable_dmub_notifications(adev->dm.dc)) {
- kfree(adev->dm.dmub_notify);
- adev->dm.dmub_notify = NULL;
- destroy_workqueue(adev->dm.delayed_hpd_wq);
- adev->dm.delayed_hpd_wq = NULL;
+ if (dc_enable_dmub_notifications(adev->dm.dc)) {
+ kfree(adev->dm.dmub_notify);
+ adev->dm.dmub_notify = NULL;
+ destroy_workqueue(adev->dm.delayed_hpd_wq);
+ adev->dm.delayed_hpd_wq = NULL;
+ }
}
if (adev->dm.dmub_bo)
--
2.25.1
This series fixes an error path where the reference of a child node is
not decremented upon early return. When at it, a trivial comma/semicolon
substitution I found by chance has been added to improve code clarity.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (2):
pinctrl: intel: platform: fix error path in device_for_each_child_node()
pinctrl: intel: platform: use semicolon instead of comma in ncommunities assignment
drivers/pinctrl/intel/pinctrl-intel-platform.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
---
base-commit: 92fc9636d1471b7f68bfee70c776f7f77e747b97
change-id: 20240926-intel-pinctrl-platform-scoped-68d000ba3c1d
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
The atomicity violation occurs when the variables cur_delay and new_delay
are defined. Imagine a scenario where, while defining cur_delay and
new_delay, the values stored in devfreq->profile->polling_ms and the delay
variable change. After acquiring the mutex_lock and entering the critical
section, due to possible concurrent modifications, cur_delay and new_delay
may no longer represent the correct values. Subsequent usage, such as if
(cur_delay > new_delay), could cause the program to run incorrectly,
resulting in inconsistencies.
To address this issue, it is recommended to acquire a lock in advance,
ensuring that devfreq->profile->polling_ms and delay are protected by the
lock when being read. This will help ensure the consistency of the program.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 7e6fdd4bad03 ("PM / devfreq: Core updates to support devices which can idle")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/devfreq/devfreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
index 98657d3b9435..9634739fc9cb 100644
--- a/drivers/devfreq/devfreq.c
+++ b/drivers/devfreq/devfreq.c
@@ -616,10 +616,10 @@ EXPORT_SYMBOL(devfreq_monitor_resume);
*/
void devfreq_update_interval(struct devfreq *devfreq, unsigned int *delay)
{
+ mutex_lock(&devfreq->lock);
unsigned int cur_delay = devfreq->profile->polling_ms;
unsigned int new_delay = *delay;
- mutex_lock(&devfreq->lock);
devfreq->profile->polling_ms = new_delay;
if (IS_SUPPORTED_FLAG(devfreq->governor->flags, IRQ_DRIVEN))
--
2.34.1
If the adp5588_read function returns 0, then there will be an
overflow of the kpad->keycode buffer.
If the adp5588_read function returns a negative value, then the
logic is broken - the wrong value is used as an index of
the kpad->keycode array.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
drivers/input/keyboard/adp5588-keys.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/drivers/input/keyboard/adp5588-keys.c b/drivers/input/keyboard/adp5588-keys.c
index 1b0279393df4..d05387f9c11f 100644
--- a/drivers/input/keyboard/adp5588-keys.c
+++ b/drivers/input/keyboard/adp5588-keys.c
@@ -526,14 +526,17 @@ static void adp5588_report_events(struct adp5588_kpad *kpad, int ev_cnt)
int i;
for (i = 0; i < ev_cnt; i++) {
- int key = adp5588_read(kpad->client, KEY_EVENTA + i);
- int key_val = key & KEY_EV_MASK;
- int key_press = key & KEY_EV_PRESSED;
+ int key, key_val, key_press;
+ key = adp5588_read(kpad->client, KEY_EVENTA + i);
+ if (key < 0)
+ continue;
+ key_val = key & KEY_EV_MASK;
+ key_press = key & KEY_EV_PRESSED;
if (key_val >= GPI_PIN_BASE && key_val <= GPI_PIN_END) {
/* gpio line used as IRQ source */
adp5588_gpio_irq_handle(kpad, key_val, key_press);
- } else {
+ } else if (key_val > 0) {
int row = (key_val - 1) / ADP5588_COLS_MAX;
int col = (key_val - 1) % ADP5588_COLS_MAX;
int code = MATRIX_SCAN_CODE(row, col, kpad->row_shift);
--
2.25.1
This change is specific to Hyper-V based VMs.
If the Virtual Machine Connection window is focused,
a Hyper-V VM user can unintentionally touch the keyboard/mouse
when the VM is hibernating or resuming, and consequently the
hibernation or resume operation can be aborted unexpectedly.
Fix the issue by no longer registering the keyboard/mouse as
wakeup devices (see the other two patches for the
changes to drivers/input/serio/hyperv-keyboard.c and
drivers/hid/hid-hyperv.c).
The keyboard/mouse were registered as wakeup devices because the
VM needs to be woken up from the Suspend-to-Idle state after
a user runs "echo freeze > /sys/power/state". It seems like
the Suspend-to-Idle feature has no real users in practice, so
let's no longer support that by returning -EOPNOTSUPP if a
user tries to use that.
$echo freeze > /sys/power/state
> bash: echo: write error: Operation not supported
Fixes: 1a06d017fb3f ("Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis(a)linux.microsoft.com>
---
Changes in v2:
* Add "#define vmbus_freeze NULL" when CONFIG_PM_SLEEP is not
enabled.
* Change commit message to clarify that this change is specifc to
Hyper-V based VMs.
Changes in v3:
* Add 'Cc: stable(a)vger.kernel.org' in sign-off area.
---
drivers/hv/vmbus_drv.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 965d2a4efb7e..8f445c849512 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -900,6 +900,19 @@ static void vmbus_shutdown(struct device *child_device)
}
#ifdef CONFIG_PM_SLEEP
+/*
+ * vmbus_freeze - Suspend-to-Idle
+ */
+static int vmbus_freeze(struct device *child_device)
+{
+/*
+ * Do not support Suspend-to-Idle ("echo freeze > /sys/power/state") as
+ * that would require registering the Hyper-V synthetic mouse/keyboard
+ * devices as wakeup devices, which can abort hibernation/resume unexpectedly.
+ */
+ return -EOPNOTSUPP;
+}
+
/*
* vmbus_suspend - Suspend a vmbus device
*/
@@ -938,6 +951,7 @@ static int vmbus_resume(struct device *child_device)
return drv->resume(dev);
}
#else
+#define vmbus_freeze NULL
#define vmbus_suspend NULL
#define vmbus_resume NULL
#endif /* CONFIG_PM_SLEEP */
@@ -969,7 +983,7 @@ static void vmbus_device_release(struct device *device)
*/
static const struct dev_pm_ops vmbus_pm = {
- .suspend_noirq = NULL,
+ .suspend_noirq = vmbus_freeze,
.resume_noirq = NULL,
.freeze_noirq = vmbus_suspend,
.thaw_noirq = vmbus_resume,
--
2.34.1
This change is specific to Hyper-V based VMs.
If the Virtual Machine Connection window is focused,
a Hyper-V VM user can unintentionally touch the keyboard/mouse
when the VM is hibernating or resuming, and consequently the
hibernation or resume operation can be aborted unexpectedly.
Fix the issue by no longer registering the keyboard/mouse as
wakeup devices (see the other two patches for the
changes to drivers/input/serio/hyperv-keyboard.c and
drivers/hid/hid-hyperv.c).
The keyboard/mouse were registered as wakeup devices because the
VM needs to be woken up from the Suspend-to-Idle state after
a user runs "echo freeze > /sys/power/state". It seems like
the Suspend-to-Idle feature has no real users in practice, so
let's no longer support that by returning -EOPNOTSUPP if a
user tries to use that.
$echo freeze > /sys/power/state
> bash: echo: write error: Operation not supported
Fixes: 1a06d017fb3f ("Drivers: hv: vmbus: Fix Suspend-to-Idle for Generation-2 VM")
Signed-off-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Signed-off-by: Erni Sri Satya Vennela <ernis(a)linux.microsoft.com>
---
Changes in v2:
* Add "#define vmbus_freeze NULL" when CONFIG_PM_SLEEP is not
enabled.
* Change commit message to clarify that this change is specifc to
Hyper-V based VMs.
---
drivers/hv/vmbus_drv.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 965d2a4efb7e..8f445c849512 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -900,6 +900,19 @@ static void vmbus_shutdown(struct device *child_device)
}
#ifdef CONFIG_PM_SLEEP
+/*
+ * vmbus_freeze - Suspend-to-Idle
+ */
+static int vmbus_freeze(struct device *child_device)
+{
+/*
+ * Do not support Suspend-to-Idle ("echo freeze > /sys/power/state") as
+ * that would require registering the Hyper-V synthetic mouse/keyboard
+ * devices as wakeup devices, which can abort hibernation/resume unexpectedly.
+ */
+ return -EOPNOTSUPP;
+}
+
/*
* vmbus_suspend - Suspend a vmbus device
*/
@@ -938,6 +951,7 @@ static int vmbus_resume(struct device *child_device)
return drv->resume(dev);
}
#else
+#define vmbus_freeze NULL
#define vmbus_suspend NULL
#define vmbus_resume NULL
#endif /* CONFIG_PM_SLEEP */
@@ -969,7 +983,7 @@ static void vmbus_device_release(struct device *device)
*/
static const struct dev_pm_ops vmbus_pm = {
- .suspend_noirq = NULL,
+ .suspend_noirq = vmbus_freeze,
.resume_noirq = NULL,
.freeze_noirq = vmbus_suspend,
.thaw_noirq = vmbus_resume,
--
2.34.1
This reverts commit 69197dfc64007b5292cc960581548f41ccd44828.
commit 937d46ecc5f9 ("net: wangxun: add ethtool_ops for
channel number") changed NIC misc irq from most significant
bit to least significant bit, the former condition is not
required to apply this patch, because we only need to set
irq affinity for NIC queue irq vectors.
this patch is required after commit 937d46ecc5f9 ("net: wangxun:
add ethtool_ops for channel number") was applied, so this is only
relevant to 6.6.y branch.
Signed-off-by: Duanqiang Wen <duanqiangwen(a)net-swift.com>
---
drivers/net/ethernet/wangxun/libwx/wx_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/wangxun/libwx/wx_lib.c b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
index 2b3d6586f44a..fb1caa40da6b 100644
--- a/drivers/net/ethernet/wangxun/libwx/wx_lib.c
+++ b/drivers/net/ethernet/wangxun/libwx/wx_lib.c
@@ -1620,7 +1620,7 @@ static void wx_set_num_queues(struct wx *wx)
*/
static int wx_acquire_msix_vectors(struct wx *wx)
{
- struct irq_affinity affd = { .pre_vectors = 1 };
+ struct irq_affinity affd = {0, };
int nvecs, i;
/* We start by asking for one vector per queue pair */
--
2.27.0
This is the start of the stable review cycle for the 6.11.1 release.
There are 12 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 29 Sep 2024 12:17:00 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.11.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.11.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.11.1-rc1
Edward Adam Davis <eadavis(a)qq.com>
USB: usbtmc: prevent kernel-usb-infoleak
Junhao Xie <bigfoot(a)classfun.cn>
USB: serial: pl2303: add device id for Macrosilicon MS3020
Keith Busch <kbusch(a)kernel.org>
nvme-pci: qdepth 1 quirk
Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
ASoC: amd: acp: add ZSC control register programming sequence
Kiran K <kiran.k(a)intel.com>
Bluetooth: btintel_pcie: Allocate memory for driver private data
Dan Carpenter <dan.carpenter(a)linaro.org>
netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
cpufreq/amd-pstate: Add the missing cpufreq_cpu_put()
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
powercap/intel_rapl: Fix the energy-pkg event for AMD CPUs
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
powercap/intel_rapl: Add support for AMD family 1Ah
Michał Winiarski <michal.winiarski(a)intel.com>
drm: Expand max DRM device number to full MINORBITS
Michał Winiarski <michal.winiarski(a)intel.com>
accel: Use XArray instead of IDR for minors
Michał Winiarski <michal.winiarski(a)intel.com>
drm: Use XArray instead of IDR for minors
-------------
Diffstat:
Makefile | 4 +-
drivers/accel/drm_accel.c | 110 +++-------------------------------
drivers/bluetooth/btintel_pcie.c | 2 +-
drivers/cpufreq/amd-pstate.c | 5 +-
drivers/gpu/drm/drm_drv.c | 97 +++++++++++++++---------------
drivers/gpu/drm/drm_file.c | 2 +-
drivers/gpu/drm/drm_internal.h | 4 --
drivers/nvme/host/nvme.h | 5 ++
drivers/nvme/host/pci.c | 18 +++---
drivers/powercap/intel_rapl_common.c | 35 +++++++++--
drivers/usb/class/usbtmc.c | 2 +-
drivers/usb/serial/pl2303.c | 1 +
drivers/usb/serial/pl2303.h | 4 ++
include/drm/drm_accel.h | 18 +-----
include/drm/drm_file.h | 5 ++
net/netfilter/nft_socket.c | 4 +-
sound/soc/amd/acp/acp-legacy-common.c | 5 ++
sound/soc/amd/acp/amd.h | 2 +
18 files changed, 129 insertions(+), 194 deletions(-)
According to Intel SDM, for the local APIC timer,
1. "The initial-count register is a read-write register. A write of 0 to
the initial-count register effectively stops the local APIC timer, in
both one-shot and periodic mode."
2. "In TSC deadline mode, writes to the initial-count register are
ignored; and current-count register always reads 0. Instead, timer
behavior is controlled using the IA32_TSC_DEADLINE MSR."
"In TSC-deadline mode, writing 0 to the IA32_TSC_DEADLINE MSR disarms
the local-APIC timer."
Current code in lapic_timer_shutdown() writes 0 to the initial-count
register. This stops the local APIC timer for one-shot and periodic mode
only. In TSC deadline mode, the timer is not properly stopped.
Some CPUs are affected by this and they are woke up by the armed timer
in s2idle in TSC deadline mode.
Stop the TSC deadline timer in lapic_timer_shutdown() by writing 0 to
MSR_IA32_TSC_DEADLINE.
Cc: stable(a)vger.kernel.org
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
---
arch/x86/kernel/apic/apic.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 6513c53c9459..d1006531729a 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -441,6 +441,10 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
apic_write(APIC_TMICT, 0);
+
+ if (boot_cpu_has(X86_FEATURE_TSC_DEADLINE_TIMER))
+ wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
+
return 0;
}
--
2.34.1
We found some bugs when testing the XDP function of enetc driver,
and these bugs are easy to reproduce. This is not only causes XDP
to not work, but also the network cannot be restored after exiting
the XDP program. So the patch set is mainly to fix these bugs. For
details, please see the commit message of each patch.
Wei Fang (3):
net: enetc: remove xdp_drops statistic from enetc_xdp_drop()
net: enetc: fix the issues of XDP_REDIRECT feature
net: enetc: reset xdp_tx_in_flight when updating bpf program
drivers/net/ethernet/freescale/enetc/enetc.c | 46 +++++++++++++++-----
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
2 files changed, 37 insertions(+), 10 deletions(-)
--
2.34.1
I was benchmarking some compressors, piping to and from a network share on a NAS, and some consistently wrote corrupted data.
First, apologies in advance:
* if I'm not in the right place. I tried to follow the directions from the Regressions guide - https://www.kernel.org/doc/html/latest/admin-guide/reporting-regressions.ht…
* I know there's a ton of context I don't know
* I’m trying a different mail app, because the first one looked concussed with plain text. This might be worse.
The detailed description:
I was benchmarking some compressors on Debian on a Raspberry Pi, piping to and from a network share on a NAS, and found that some consistently had issues writing to my NAS. Specifically:
* lzop
* pigz - parallel gzip
* pbzip2 - parallel bzip2
This is dependent on kernel version. I've done a survey, below.
While I tripped over the issue on a Debian port (Debian 12, bookworm, kernel v6.6), I compiled my own vanilla / mainline kernels for testing and reporting this.
Even more details:
The Pi and the Synology NAS are directly connected by Gigabit Ethernet. Both sides are using self-assigned IP addresses. I'll note that at boot, getting the Pi to see the NAS requires some nudging of avahi-autoipd; while I think it's stable before testing, I'm not positive, and reconnection issues might be in play.
The files in question are tars of sparse file systems, about 270 gig, compressing down to 10-30 gig.
Compression seems to work, without complaint; decompression crashes the process, usually within the first gig of the compressed file. The output of the stream doesn't match what ends up written to disk.
Trying decompression during compression gets further along than it does after compression finishes; this might point toward something with writes and caches.
A previous attempt involved rpi-update, which:
* good: let me install kernels without building myself
* bad: updated the bootloader and firmware, to bleeding edge, with possible regressions; it definitely muddied the results of my tests
I started over with a fresh install, and no results involving rpi-update are included in this email.
A survey of major branches:
* 5.15.167, LTS - good
* 6.1.109, LTS - good
* 6.2.16 - good
* 6.3.13 - bad
* 6.4.16 - bad
* 6.5.13 - bad
* 6.6.50, LTS - bad
* 6.7.12 - bad
* 6.8.12 - bad
* 6.9.12 - bad
* 6.10.9 - good
* 6.11.0 - good
I tried, but couldn't fully build 4.19.322 or 6.0.19, due to issues with modules.
Important commits:
It looked like both the breakage and the fix came in during rc1 releases.
Breakage, v6.3-rc1:
I manually bisected commits in fs/smb* and fs/cifs.
3d78fe73fa12 cifs: Build the RDMA SGE list directly from an iterator
> lzop and pigz worked. last working. test in progress: pbzip2
607aea3cc2a8 cifs: Remove unused code
> lzop didn't work. first broken
Fix, v6.10-rc1:
I manually bisected commits in fs/smb.
69c3c023af25 cifs: Implement netfslib hooks
> lzop didn't work. last broken one
3ee1a1fc3981 cifs: Cut over to using netfslib
> lzop, pigz, pbzip2, all worked. first fixed one
To test / reproduce:
It looks like this, on a mounted network share, with extra pv for progress meters:
cat 1tb-rust-ext4.img.tar.gz | \
gzip -d | \
lzop -1 > \
1tb-rust-ext4.img.tar.lzop
# wait 40 minutes
cat 1tb-rust-ext4.img.tar.lzop | \
lzop -d | \
sha1sum
# either it works, and shows the right checksum
# or it crashes early, due to a corrupt file, and shows an incorrect checksum
As I re-read this, I realize it might look like the compressor behaves differently. I added a "tee $output | sha1sum; sha1sum $output" and ran it on a broken version. The checksums from the pipe and for the file on disk are different.
Assorted info:
This is a Raspberry Pi 4, with 4 GiB RAM, running Debian 12, bookworm, or a port.
mount.cifs version: 7.0
# cat /proc/sys/kernel/tainted
1024
# cat /proc/version
Linux version 6.2.0-3d78fe73f-v8-pronoiac+ (pronoiac@bisect) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #21 SMP PREEMPT Thu Sep 19 16:51:22 PDT 2024
DebugData:
/proc/fs/cifs/DebugData
Display Internal CIFS Data Structures for Debugging
---------------------------------------------------
CIFS Version 2.41
Features: DFS,FSCACHE,STATS2,DEBUG,ALLOW_INSECURE_LEGACY,CIFS_POSIX,UPCALL(SPNEGO),XATTR,ACL
CIFSMaxBufSize: 16384
Active VFS Requests: 1
Servers:
1) ConnectionId: 0x1 Hostname: drums.local
Number of credits: 8062 Dialect 0x300
TCP status: 1 Instance: 1
Local Users To Server: 1 SecMode: 0x1 Req On Wire: 2
In Send: 1 In MaxReq Wait: 0
Sessions:
1) Address: 169.254.132.219 Uses: 1 Capability: 0x300047 Session Status: 1
Security type: RawNTLMSSP SessionId: 0x4969841e
User: 1000 Cred User: 0
Shares:
0) IPC: \\drums.local\IPC$ Mounts: 1 DevInfo: 0x0 Attributes: 0x0
PathComponentMax: 0 Status: 1 type: 0 Serial Number: 0x0
Share Capabilities: None Share Flags: 0x0
tid: 0xeb093f0b Maximal Access: 0x1f00a9
1) \\drums.local\billions Mounts: 1 DevInfo: 0x20 Attributes: 0x5007f
PathComponentMax: 255 Status: 1 type: DISK Serial Number: 0x735a9af5
Share Capabilities: None Aligned, Partition Aligned, Share Flags: 0x0
tid: 0x5e6832e6 Optimal sector size: 0x200 Maximal Access: 0x1f01ff
MIDs:
State: 2 com: 9 pid: 3117 cbdata: 00000000e003293e mid 962892
State: 2 com: 9 pid: 3117 cbdata: 000000002610602a mid 962956
--
Let me know how I can help.
The process of iterating can take hours, and it's not automated, so my resources are limited.
#regzbot introduced: 607aea3cc2a8
#regzbot fix: 3ee1a1fc3981
-James
This series updates the driver for the veml6030 ALS and adds support for
the veml6035, which shares most of its functionality with the former.
The most relevant updates for the veml6030 are the resolution correction
to meet the datasheet update that took place with Rev 1.7, 28-Nov-2023,
a fix to avoid a segmentation fault when reading the
in_illuminance_period_available attribute, and the removal of the
processed value for the WHITE channel, as it uses the ALS resolution.
Vishay does not host the Product Information Notification where the
resolution correction was introduced, but it can still be found
online[1], and the corrected value is the one listed on the latest
version of the datasheet[2] (Rev. 1.7, 28-Nov-2023) and application
note[3] (Rev. 17-Jan-2024).
Link: https://www.farnell.com/datasheets/4379688.pdf [1]
Link: https://www.vishay.com/docs/84366/veml6030.pdf [2]
Link: https://www.vishay.com/docs/84367/designingveml6030.pdf [3]
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Changes in v2:
- Rebase to iio/testing, dropping applied patches [1/7], [4/7].
- Drop [3/7] (applied to iio/fixes-togreg).
- Add patch to use dev_err_probe() in probe error paths.
- Add patch to use read_avail() for available attributes.
- Add patches to use to support a regulator.
- Add patch to ensure that the device is powered off in error paths
after powering it on.
- Add patch to drop processed values from the WHITE channel.
- Use fsleep() instead of usleep_range() in veml6030_als_pwr_on()
- Link to v1: https://lore.kernel.org/r/20240913-veml6035-v1-0-0b09c0c90418@gmail.com
---
Javier Carrasco (10):
iio: light: veml6030: fix ALS sensor resolution
iio: light: veml6030: add set up delay after any power on sequence
iio: light: veml6030: use dev_err_probe()
dt-bindings: iio: light: veml6030: add vdd-supply property
iio: light: veml6030: add support for a regulator
iio: light: veml6030: use read_avail() for available attributes
iio: light: veml6030: drop processed info for white channel
iio: light: veml6030: power off device in probe error paths
dt-bindings: iio: light: veml6030: add veml6035
iio: light: veml6030: add support for veml6035
.../bindings/iio/light/vishay,veml6030.yaml | 43 +-
drivers/iio/light/Kconfig | 4 +-
drivers/iio/light/veml6030.c | 453 ++++++++++++++++-----
3 files changed, 391 insertions(+), 109 deletions(-)
---
base-commit: 8bea3878a1511bceadc2fbf284b00bcc5a2ef28d
change-id: 20240903-veml6035-7a91bc088c6f
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Hi,
Upstream commit 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just
X86_VENDOR_INTEL") introduced a flags member to struct x86_cpu_id. Bit 0
of x86_cpu.id.flags must be set to 1 for x86_match_cpu() to work correctly.
This upstream commit has been backported to 5.15.y.
Callers that use the X86_MATCH_*() family of macros to compose the argument
of x86_match_cpu() function correctly. Callers that use their own custom
mechanisms may not work if they fail to set x86_cpu_id.flags correctly.
There are three remaining callers in 5.15.y that use their own mechanisms:
a) setup_pcid(), b) rapl_msr_probe(), and c) goodix_add_acpi_gpio_
mappings(). a) caused a regression that Thomas Lindroth reported in [1]. b)
works by luck but it still populates its x86_cpu_id[] array incorrectly. I
am not aware of any reports on c), but inspecting the code reveals that it
will fail to identify INTEL_FAM6_ATOM_SILVERMONT for the reason described
above.
I backported three patches that fix these bugs in mainline. Hopefully the
authors and/or maintainers can ack the backports?
I tested patches 2/3 and 3/3 on Alder Lake, and Meteor Lake systems as the
two involved callers behave differently on these two systems. I wrote the
testing details in each patch separately. I could not test patch 1/3
because I do not have access to Bay Trail hardware.
Thanks and BR,
Ricardo
[1]. https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Hans de Goede (1):
Input: goodix - use the new soc_intel_is_byt() helper
Sumeet Pawnikar (1):
powercap: RAPL: fix invalid initialization for pl4_supported field
Tony Luck (1):
x86/mm: Switch to new Intel CPU model defines
arch/x86/mm/init.c | 16 ++++++----------
drivers/input/touchscreen/goodix.c | 18 ++----------------
drivers/powercap/intel_rapl_msr.c | 6 +++---
3 files changed, 11 insertions(+), 29 deletions(-)
--
2.34.1
From: Zack Rusin <zack.rusin(a)broadcom.com>
commit aba07b9a0587f50e5d3346eaa19019cf3f86c0ea upstream.
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.r…
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v6.1.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 12 +++++++++++-
drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 +++
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index c46f380d9149..733b0013eda1 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -348,6 +348,8 @@ void *vmw_bo_map_and_cache(struct vmw_buffer_object *vbo)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -370,10 +372,17 @@ void *vmw_bo_map_and_cache(struct vmw_buffer_object *vbo)
*/
void vmw_bo_unmap(struct vmw_buffer_object *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -510,6 +519,7 @@ int vmw_bo_init(struct vmw_private *dev_priv,
BUILD_BUG_ON(TTM_MAX_BO_PRIORITY <= 3);
vmw_bo->base.priority = 3;
vmw_bo->res_tree = RB_ROOT;
+ atomic_set(&vmw_bo->map_count, 0);
size = ALIGN(size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->base.base, size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
index b0c23559511a..bca10214e0bf 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h
@@ -116,6 +116,8 @@ struct vmwgfx_hash_item {
* @base: The TTM buffer object
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @base_mapped_count: ttm BO mapping count; used by KMS atomic helpers.
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -129,6 +131,7 @@ struct vmw_buffer_object {
/* For KMS atomic helpers: ttm bo mapping count */
atomic_t base_mapped_count;
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.39.4
From: Zack Rusin <zack.rusin(a)broadcom.com>
commit aba07b9a0587f50e5d3346eaa19019cf3f86c0ea upstream.
The kms paths keep a persistent map active to read and compare the cursor
buffer. These maps can race with each other in simple scenario where:
a) buffer "a" mapped for update
b) buffer "a" mapped for compare
c) do the compare
d) unmap "a" for compare
e) update the cursor
f) unmap "a" for update
At step "e" the buffer has been unmapped and the read contents is bogus.
Prevent unmapping of active read buffers by simply keeping a count of
how many paths have currently active maps and unmap only when the count
reaches 0.
Fixes: 485d98d472d5 ("drm/vmwgfx: Add support for CursorMob and CursorBypass 4")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v5.19+
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240816183332.31961-2-zack.r…
Reviewed-by: Martin Krastev <martin.krastev(a)broadcom.com>
Reviewed-by: Maaz Mombasawala <maaz.mombasawala(a)broadcom.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v6.6.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 13 +++++++++++--
drivers/gpu/drm/vmwgfx/vmwgfx_bo.h | 3 +++
2 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index ae796e0c64aa..fdc34283eeb9 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -331,6 +331,8 @@ void *vmw_bo_map_and_cache(struct vmw_bo *vbo)
void *virtual;
int ret;
+ atomic_inc(&vbo->map_count);
+
virtual = ttm_kmap_obj_virtual(&vbo->map, ¬_used);
if (virtual)
return virtual;
@@ -353,11 +355,17 @@ void *vmw_bo_map_and_cache(struct vmw_bo *vbo)
*/
void vmw_bo_unmap(struct vmw_bo *vbo)
{
+ int map_count;
+
if (vbo->map.bo == NULL)
return;
- ttm_bo_kunmap(&vbo->map);
- vbo->map.bo = NULL;
+ map_count = atomic_dec_return(&vbo->map_count);
+
+ if (!map_count) {
+ ttm_bo_kunmap(&vbo->map);
+ vbo->map.bo = NULL;
+ }
}
@@ -390,6 +398,7 @@ static int vmw_bo_init(struct vmw_private *dev_priv,
BUILD_BUG_ON(TTM_MAX_BO_PRIORITY <= 3);
vmw_bo->tbo.priority = 3;
vmw_bo->res_tree = RB_ROOT;
+ atomic_set(&vmw_bo->map_count, 0);
params->size = ALIGN(params->size, PAGE_SIZE);
drm_gem_private_object_init(vdev, &vmw_bo->tbo.base, params->size);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index f349642e6190..156ea612fc2a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -68,6 +68,8 @@ struct vmw_bo_params {
* @map: Kmap object for semi-persistent mappings
* @res_tree: RB tree of resources using this buffer object as a backing MOB
* @res_prios: Eviction priority counts for attached resources
+ * @map_count: The number of currently active maps. Will differ from the
+ * cpu_writers because it includes kernel maps.
* @cpu_writers: Number of synccpu write grabs. Protected by reservation when
* increased. May be decreased without reservation.
* @dx_query_ctx: DX context if this buffer object is used as a DX query MOB
@@ -86,6 +88,7 @@ struct vmw_bo {
struct rb_root res_tree;
u32 res_prios[TTM_MAX_BO_PRIORITY];
+ atomic_t map_count;
atomic_t cpu_writers;
/* Not ref-counted. Protected by binding_mutex */
struct vmw_resource *dx_query_ctx;
--
2.39.4
From: Scott Mayhew <smayhew(a)redhat.com>
commit 76a0e79bc84f466999fa501fce5bf7a07641b8a7 upstream.
Marek Gresko reports that the root user on an NFS client is able to
change the security labels on files on an NFS filesystem that is
exported with root squashing enabled.
The end of the kerneldoc comment for __vfs_setxattr_noperm() states:
* This function requires the caller to lock the inode's i_mutex before it
* is executed. It also assumes that the caller will make the appropriate
* permission checks.
nfsd_setattr() does do permissions checking via fh_verify() and
nfsd_permission(), but those don't do all the same permissions checks
that are done by security_inode_setxattr() and its related LSM hooks do.
Since nfsd_setattr() is the only consumer of security_inode_setsecctx(),
simplest solution appears to be to replace the call to
__vfs_setxattr_noperm() with a call to __vfs_setxattr_locked(). This
fixes the above issue and has the added benefit of causing nfsd to
recall conflicting delegations on a file when a client tries to change
its security label.
Cc: stable(a)kernel.org
Reported-by: Marek Gresko <marek.gresko(a)protonmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809
Signed-off-by: Scott Mayhew <smayhew(a)redhat.com>
Tested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Chuck Lever <chuck.lever(a)oracle.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Acked-by: Casey Schaufler <casey(a)schaufler-ca.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v5.15.y-v6.1.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
security/selinux/hooks.c | 4 ++--
security/smack/smack_lsm.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 78f3da39b031..626d397c2f86 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6631,8 +6631,8 @@ static int selinux_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen
*/
static int selinux_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_SELINUX,
- ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(&init_user_ns, dentry, XATTR_NAME_SELINUX,
+ ctx, ctxlen, 0, NULL);
}
static int selinux_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index c18366dbbfed..25995df15e82 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -4714,8 +4714,8 @@ static int smack_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
static int smack_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(&init_user_ns, dentry, XATTR_NAME_SMACK,
- ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(&init_user_ns, dentry, XATTR_NAME_SMACK,
+ ctx, ctxlen, 0, NULL);
}
static int smack_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
--
2.39.4
From: Scott Mayhew <smayhew(a)redhat.com>
commit 76a0e79bc84f466999fa501fce5bf7a07641b8a7 upstream.
Marek Gresko reports that the root user on an NFS client is able to
change the security labels on files on an NFS filesystem that is
exported with root squashing enabled.
The end of the kerneldoc comment for __vfs_setxattr_noperm() states:
* This function requires the caller to lock the inode's i_mutex before it
* is executed. It also assumes that the caller will make the appropriate
* permission checks.
nfsd_setattr() does do permissions checking via fh_verify() and
nfsd_permission(), but those don't do all the same permissions checks
that are done by security_inode_setxattr() and its related LSM hooks do.
Since nfsd_setattr() is the only consumer of security_inode_setsecctx(),
simplest solution appears to be to replace the call to
__vfs_setxattr_noperm() with a call to __vfs_setxattr_locked(). This
fixes the above issue and has the added benefit of causing nfsd to
recall conflicting delegations on a file when a client tries to change
its security label.
Cc: stable(a)kernel.org
Reported-by: Marek Gresko <marek.gresko(a)protonmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218809
Signed-off-by: Scott Mayhew <smayhew(a)redhat.com>
Tested-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Stephen Smalley <stephen.smalley.work(a)gmail.com>
Reviewed-by: Chuck Lever <chuck.lever(a)oracle.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Acked-by: Casey Schaufler <casey(a)schaufler-ca.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[Shivani: Modified to apply on v5.10.y]
Signed-off-by: Shivani Agarwal <shivani.agarwal(a)broadcom.com>
---
security/selinux/hooks.c | 3 ++-
security/smack/smack_lsm.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 46c00a68bb4b..90935ed3d8d8 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6570,7 +6570,8 @@ static int selinux_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen
*/
static int selinux_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(dentry, XATTR_NAME_SELINUX, ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(dentry, XATTR_NAME_SELINUX, ctx, ctxlen, 0,
+ NULL);
}
static int selinux_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c
index 92bc6c9d793d..cb4801fcf9a8 100644
--- a/security/smack/smack_lsm.c
+++ b/security/smack/smack_lsm.c
@@ -4651,7 +4651,8 @@ static int smack_inode_notifysecctx(struct inode *inode, void *ctx, u32 ctxlen)
static int smack_inode_setsecctx(struct dentry *dentry, void *ctx, u32 ctxlen)
{
- return __vfs_setxattr_noperm(dentry, XATTR_NAME_SMACK, ctx, ctxlen, 0);
+ return __vfs_setxattr_locked(dentry, XATTR_NAME_SMACK, ctx, ctxlen, 0,
+ NULL);
}
static int smack_inode_getsecctx(struct inode *inode, void **ctx, u32 *ctxlen)
--
2.39.4
When CONFIG_ZRAM_MULTI_COMP isn't set ZRAM_SECONDARY_COMP can hold
default_compressor, because it's the same offset as ZRAM_PRIMARY_COMP,
so we need to make sure that we don't attempt to kfree() the
statically defined compressor name.
This is detected by KASAN.
==================================================================
Call trace:
kfree+0x60/0x3a0
zram_destroy_comps+0x98/0x198 [zram]
zram_reset_device+0x22c/0x4a8 [zram]
reset_store+0x1bc/0x2d8 [zram]
dev_attr_store+0x44/0x80
sysfs_kf_write+0xfc/0x188
kernfs_fop_write_iter+0x28c/0x428
vfs_write+0x4dc/0x9b8
ksys_write+0x100/0x1f8
__arm64_sys_write+0x74/0xb8
invoke_syscall+0xd8/0x260
el0_svc_common.constprop.0+0xb4/0x240
do_el0_svc+0x48/0x68
el0_svc+0x40/0xc8
el0t_64_sync_handler+0x120/0x130
el0t_64_sync+0x190/0x198
==================================================================
Signed-off-by: Andrey Skvortsov <andrej.skvortzov(a)gmail.com>
Fixes: 684826f8271a ("zram: free secondary algorithms names")
Cc: <stable(a)vger.kernel.org>
---
Changes in v2:
- removed comment from source code about freeing statically defined compression
- removed part of KASAN report from commit description
- added information about CONFIG_ZRAM_MULTI_COMP into commit description
Changes in v3:
- modified commit description based on Sergey's comment
- changed start for-loop to ZRAM_PRIMARY_COMP
drivers/block/zram/zram_drv.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index c3d245617083d..ad9c9bc3ccfc5 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2115,8 +2115,10 @@ static void zram_destroy_comps(struct zram *zram)
zram->num_active_comps--;
}
- for (prio = ZRAM_SECONDARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
- kfree(zram->comp_algs[prio]);
+ for (prio = ZRAM_PRIMARY_COMP; prio < ZRAM_MAX_COMPS; prio++) {
+ /* Do not free statically defined compression algorithms */
+ if (zram->comp_algs[prio] != default_compressor)
+ kfree(zram->comp_algs[prio]);
zram->comp_algs[prio] = NULL;
}
--
2.45.2
Hi, Thorsten here, the Linux kernel's regression tracker.
I noticed a report about a linux-6.6.y regression in bugzilla.kernel.org
that appears to be caused by this commit from Dan applied by Greg:
15fffc6a5624b1 ("driver core: Fix uevent_show() vs driver detach race")
[v6.11-rc3, v6.10.5, v6.6.46, v6.1.105, v5.15.165, v5.10.224, v5.4.282,
v4.19.320]
The reporter did not check yet if mainline is affected; decided to
forward the report by mail nevertheless, as the maintainer for the
subsystem is also the maintainer for the stable tree. ;-)
To quote from https://bugzilla.kernel.org/show_bug.cgi?id=219244 :
> The symptoms of this bug are as follows:
>
> - After booting (to the graphical login screen) the mouse pointer
> would frozen and only after unplugging and plugging-in again the usb
> plug of the mouse would the mouse be working as expected.
> - If one would log in without fixing the mouse issue, the mouse
> pointer would still be frozen after login.
> - The usb keyboard usually is not affected even though plugged into
> the same usb-hub - thus logging in is possible.
> - The mouse pointer is also frozen if the usb connector is plugged
> into a different usb-port (different from the usb-hub)
> - The pointer is moveable via the inbuilt synaptics trackpad
>
>
> The kernel log shows almost the same messages (not sure if the
> differences mean anything in regards to this bug) for the initial
> recognizing the mouse (frozen mouse pointer) and the re-plugged-in mouse
> (and subsequently moveable mouse pointer):
>
> [kernel] [ 8.763158] usb 1-2.2.1.2: new low-speed USB device number 10 using xhci_hcd
> [kernel] [ 8.956028] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
> [kernel] [ 8.956036] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [kernel] [ 8.956039] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0
> [kernel] [ 8.956041] usb 1-2.2.1.2: Manufacturer: Microsoft
> [kernel] [ 8.963554] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0002/input/input18
> [kernel] [ 8.964417] hid-generic 0003:045E:00CB.0002: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
>
> [kernel] [ 31.258381] usb 1-2.2.1.2: USB disconnect, device number 10
> [kernel] [ 31.595051] usb 1-2.2.1.2: new low-speed USB device number 16 using xhci_hcd
> [kernel] [ 31.804002] usb 1-2.2.1.2: New USB device found, idVendor=045e, idProduct=00cb, bcdDevice= 1.04
> [kernel] [ 31.804010] usb 1-2.2.1.2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
> [kernel] [ 31.804013] usb 1-2.2.1.2: Product: Microsoft Basic Optical Mouse v2.0
> [kernel] [ 31.804016] usb 1-2.2.1.2: Manufacturer: Microsoft
> [kernel] [ 31.812933] input: Microsoft Microsoft Basic Optical Mouse v2.0 as /devices/pci0000:00/0000:00:14.0/usb1/1-2/1-2.2/1-2.2.1/1-2.2.1.2/1-2.2.1.2:1.0/0003:045E:00CB.0004/input/input20
> [kernel] [ 31.814028] hid-generic 0003:045E:00CB.0004: input,hidraw1: USB HID v1.11 Mouse [Microsoft Microsoft Basic Optical Mouse v2.0 ] on usb-0000:00:14.0-2.2.1.2/input0
>
> Differences:
>
> ../0003:045E:00CB.0002/input/input18 vs ../0003:045E:00CB.0004/input/input20
>
> and
>
> hid-generic 0003:045E:00CB.0002 vs hid-generic 0003:045E:00CB.0004
>
>
> The connector / usb-port was not changed in this case!
>
>
> The symptoms of this bug have been present at one point in the
> recent
> past, but with kernel v6.6.45 (or maybe even some version before that)
> it was fine. But with 6.6.45 it seems to be definitely fine.
>
> But with v6.6.46 the symptoms returned. That's the reason I
> suspected
> the kernel to be the cause of this issue. So I did some bisecting -
> which wasn't easy because that bug would often times not appear if the
> system was simply rebooted into the test kernel.
> As the bug would definitely appear on the affected kernels (v6.6.46
> ff) after shutting down the system for the night and booting the next
> day, I resorted to simulating the over-night powering-off by shutting
> the system down, unplugging the power and pressing the power button to
> get rid of residual voltage. But even then a few times the bug would
> only appear if I repeated this procedure before booting the system again
> with the respective kernel.
>
> This is on a Thinkpad with Kaby Lake and integrated Intel graphics.
> Even though it is a laptop, it is used as a desktop device, and the
> internal battery is disconnected, the removable battery is removed as
> the system is plugged-in via the power cord at all times (when in use)!
> Also, the system has no power (except for the bios battery, of
> course)
> over night as the power outlet is switched off if the device is not in use.
>
> Not sure if this affects the issue - or how it does. But for
> successful bisecting I had to resort to the above procedure.
>
> Bisecting the issue (between the release commits of v6.6.45 and
> v6.6.46) resulted in this commit [1] being the probable culprit.
>
> I then tested kernel v6.6.49. It still produced the bug for me. So I
> reverted the changes of the assumed "bad commit" and re-compiled kernel
> v6.6.49. With this modified kernel the bug seems to be gone.
>
> Now, I assume the commit has a reason for being introduced, but
> maybe
> needs some adjusting in order to avoid this bug I'm experiencing on my
> system.
> Also, I can't say why the issue appeared in the past even without
> this
> commit being present, as I haven't bisected any kernel version before
> v6.6.45.
>
>
> [1]:
>
> 4d035c743c3e391728a6f81cbf0f7f9ca700cf62 is the first bad commit
> commit 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
> Author: Dan Williams <dan.j.williams(a)intel.com>
> Date: Fri Jul 12 12:42:09 2024 -0700
>
> driver core: Fix uevent_show() vs driver detach race
>
> commit 15fffc6a5624b13b428bb1c6e9088e32a55eb82c upstream.
>
> uevent_show() wants to de-reference dev->driver->name. There is no clean
See the ticket for more details. Note, you have to use bugzilla to reach
the reporter, as I sadly[1] can not CCed them in mails like this.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
[1] because bugzilla.kernel.org tells users upon registration their
"email address will never be displayed to logged out users"
P.S.: let me use this mail to also add the report to the list of tracked
regressions to ensure it's doesn't fall through the cracks:
#regzbot introduced: 4d035c743c3e391728a6f81cbf0f7f9ca700cf62
#regzbot from: brmails+k
#regzbot duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=219244
#regzbot title: driver core: frozen usb mouse pointer at boot
#regzbot ignore-activity
The patch titled
Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Barry Song <v-songbaohua(a)oppo.com>
Subject: mm: avoid unconditional one-tick sleep when swapcache_prepare fails
Date: Fri, 27 Sep 2024 09:19:36 +1200
Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
introduced an unconditional one-tick sleep when `swapcache_prepare()`
fails, which has led to reports of UI stuttering on latency-sensitive
Android devices. To address this, we can use a waitqueue to wake up tasks
that fail `swapcache_prepare()` sooner, instead of always sleeping for a
full tick. While tasks may occasionally be woken by an unrelated
`do_swap_page()`, this method is preferable to two scenarios: rapid
re-entry into page faults, which can cause livelocks, and multiple
millisecond sleeps, which visibly degrade user experience.
Oven's testing shows that a single waitqueue resolves the UI stuttering
issue. If a 'thundering herd' problem becomes apparent later, a waitqueue
hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit
locks can be introduced.
Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com
Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: Oven Liyang <liyangouwen1(a)oppo.com>
Tested-by: Oven Liyang <liyangouwen1(a)oppo.com>
Cc: Kairui Song <kasong(a)tencent.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Chris Li <chrisl(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
--- a/mm/memory.c~mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails
+++ a/mm/memory.c
@@ -4192,6 +4192,8 @@ static struct folio *alloc_swap_folio(st
}
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq);
+
/*
* We enter with non-exclusive mmap_lock (to exclude vma changes,
* but allow concurrent faults), and pte mapped but not yet locked.
@@ -4204,6 +4206,7 @@ vm_fault_t do_swap_page(struct vm_fault
{
struct vm_area_struct *vma = vmf->vma;
struct folio *swapcache, *folio = NULL;
+ DECLARE_WAITQUEUE(wait, current);
struct page *page;
struct swap_info_struct *si = NULL;
rmap_t rmap_flags = RMAP_NONE;
@@ -4302,7 +4305,9 @@ vm_fault_t do_swap_page(struct vm_fault
* Relax a bit to prevent rapid
* repeated page faults.
*/
+ add_wait_queue(&swapcache_wq, &wait);
schedule_timeout_uninterruptible(1);
+ remove_wait_queue(&swapcache_wq, &wait);
goto out_page;
}
need_clear_cache = true;
@@ -4609,8 +4614,10 @@ unlock:
pte_unmap_unlock(vmf->pte, vmf->ptl);
out:
/* Clear the swap cache pin for direct swapin after PTL unlock */
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
@@ -4625,8 +4632,10 @@ out_release:
folio_unlock(swapcache);
folio_put(swapcache);
}
- if (need_clear_cache)
+ if (need_clear_cache) {
swapcache_clear(si, entry, nr_pages);
+ wake_up(&swapcache_wq);
+ }
if (si)
put_swap_device(si);
return ret;
_
Patches currently in -mm which might be from v-songbaohua(a)oppo.com are
mm-avoid-unconditional-one-tick-sleep-when-swapcache_prepare-fails.patch
The patch titled
Subject: selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Donet Tom <donettom(a)linux.ibm.com>
Subject: selftests/mm: fix incorrect buffer->mirror size in hmm2 double_map test
Date: Fri, 27 Sep 2024 00:07:52 -0500
The hmm2 double_map test was failing due to an incorrect buffer->mirror
size. The buffer->mirror size was 6, while buffer->ptr size was 6 *
PAGE_SIZE. The test failed because the kernel's copy_to_user function was
attempting to copy a 6 * PAGE_SIZE buffer to buffer->mirror. Since the
size of buffer->mirror was incorrect, copy_to_user failed.
This patch corrects the buffer->mirror size to 6 * PAGE_SIZE.
Test Result without this patch
==============================
# RUN hmm2.hmm2_device_private.double_map ...
# hmm-tests.c:1680:double_map:Expected ret (-14) == 0 (0)
# double_map: Test terminated by assertion
# FAIL hmm2.hmm2_device_private.double_map
not ok 53 hmm2.hmm2_device_private.double_map
Test Result with this patch
===========================
# RUN hmm2.hmm2_device_private.double_map ...
# OK hmm2.hmm2_device_private.double_map
ok 53 hmm2.hmm2_device_private.double_map
Link: https://lkml.kernel.org/r/20240927050752.51066-1-donettom@linux.ibm.com
Fixes: fee9f6d1b8df ("mm/hmm/test: add selftests for HMM")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: J��r��me Glisse <jglisse(a)redhat.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hmm-tests.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/hmm-tests.c~selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test
+++ a/tools/testing/selftests/mm/hmm-tests.c
@@ -1657,7 +1657,7 @@ TEST_F(hmm2, double_map)
buffer->fd = -1;
buffer->size = size;
- buffer->mirror = malloc(npages);
+ buffer->mirror = malloc(size);
ASSERT_NE(buffer->mirror, NULL);
/* Reserve a range of addresses. */
_
Patches currently in -mm which might be from donettom(a)linux.ibm.com are
selftests-mm-fixed-incorrect-buffer-mirror-size-in-hmm2-double_map-test.patch
The patch titled
Subject: device-dax: correct pgoff align in dax_set_mapping()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
device-dax-correct-pgoff-align-in-dax_set_mapping.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Kun(llfl)" <llfl(a)linux.alibaba.com>
Subject: device-dax: correct pgoff align in dax_set_mapping()
Date: Fri, 27 Sep 2024 15:45:09 +0800
pgoff should be aligned using ALIGN_DOWN() instead of ALIGN(). Otherwise,
vmf->address not aligned to fault_size will be aligned to the next
alignment, that can result in memory failure getting the wrong address.
Link: https://lkml.kernel.org/r/23c02a03e8d666fef11bbe13e85c69c8b4ca0624.17274216…
Fixes: b9b5777f09be ("device-dax: use ALIGN() for determining pgoff")
Signed-off-by: Kun(llfl) <llfl(a)linux.alibaba.com>
Tested-by: JianXiong Zhao <zhaojianxiong.zjx(a)alibaba-inc.com>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/dax/device.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dax/device.c~device-dax-correct-pgoff-align-in-dax_set_mapping
+++ a/drivers/dax/device.c
@@ -86,7 +86,7 @@ static void dax_set_mapping(struct vm_fa
nr_pages = 1;
pgoff = linear_page_index(vmf->vma,
- ALIGN(vmf->address, fault_size));
+ ALIGN_DOWN(vmf->address, fault_size));
for (i = 0; i < nr_pages; i++) {
struct page *page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
_
Patches currently in -mm which might be from llfl(a)linux.alibaba.com are
device-dax-correct-pgoff-align-in-dax_set_mapping.patch
arch-s390.h uses types from std.h, but does not include it.
Depending on the inclusion order the compilation can fail.
Include std.h explicitly to avoid these errors.
Fixes: 404fa87c0eaf ("tools/nolibc: s390: provide custom implementation for sys_fork")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
tools/include/nolibc/arch-s390.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/include/nolibc/arch-s390.h b/tools/include/nolibc/arch-s390.h
index 2ec13d8b9a2db80efa8d6cbbbd01bfa3d0059de2..f9ab83a219b8a2d5e53b0b303d8bf0bf78280d5f 100644
--- a/tools/include/nolibc/arch-s390.h
+++ b/tools/include/nolibc/arch-s390.h
@@ -10,6 +10,7 @@
#include "compiler.h"
#include "crt.h"
+#include "std.h"
/* Syscalls for s390:
* - registers are 64-bit
---
base-commit: e477dba5442c0af7acb9e8bbbbde1108a37ed39c
change-id: 20240927-nolibc-s390-std-h-cbb13f70fa73
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Hello,
I am unsure if this is the 'correct' behavior for ptrace.
If you run ptrace_traceme followed by ptrace_attach, then the process
attaches its own parent to itself and cannot be attached by another
thing. The attach call errors out, but GDB does report something
attached to it. I am unsure if Bash does this itself perhaps.
It's a bit hard for me to reason about because my debugging skills are
bad and trying 'strace' with bash -c ./thing, or just on the thing
itself gives -1 on both ptrace calls as strace attaches to it.
similarly with GDB. Unsure how to debug this.
https://gist.github.com/x64-elf-sh42/83393e319ad8280b8704fbe3f499e381
to compile simply:
gcc test.c -o thingy
This code works on my machine which is:
Linux 6.1.0-25-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.106-3
(2024-08-26) x86_64 GNU/Linux
GDB -p on the pid reports that another pid is attached and the
operation is illegal. That other pid is the bash shell that i spawned
this binary from (code in gist).
it's useful for anti-debugging, but it seems odd it will attach it's
parent to the process since that's not actually doing the attach call.
If anything i'd expect the pid attached to itself, rather than the
parent getting attached.
The first call to ptrace (traceme) gets return value 0. The second
call (attach) gets return value -1. That does seem correct, but yet
there is something 'attached' when i try to use GDB.
If I only do the traceme call, it does not get attached by Bash, so it
looks totally like the 'attach' call has a side effect of attaching
the parent, rather than just only failing.
Kind regards,
~sh42
Hello everyone,
I wondered a few times in the last months how we could properly track
regressions for the stable series, as I'm always a bit unsure how proper
use of regzbot would look for these cases.
For issues that affect mainline usually just the commit itself is
specified with the relevant "#regzbot introduced: ..." invocation, but
how would one do this if the stable trees are also affected? Do we need
to specify "#regzbot introduced" for each backported version of the
upstream commit?!
IMO this is especially interesting because sometimes getting a patchset
to fix the older trees can take quite some time since possibly backport
dependencies have to be found or even custom patches need to be written,
as it i.e. was the case [here][0]. This led to fixes for the linux-6.1.y
branch landing a bit later which would be good to track with regzbot so
it does not fall through the cracks.
Maybe there already is a regzbot facility to do this that I have just
missed, but I thought if there is people on the lists will know :)
Cheers,
Chris
P.S.: I hope everybody had a great time at the conferences! I hope to
also make it next year ..
[0]: https://lore.kernel.org/all/66bcb6f68172f_adbf529471@willemb.c.googlers.com…
The atomicity violation issue is due to the invalidation of the function
port_has_data()'s check caused by concurrency. Imagine a scenario where a
port that contains data passes the validity check but is simultaneously
assigned a value with no data. This could result in an empty port passing
the validity check, potentially leading to a null pointer dereference
error later in the program, which is inconsistent.
To address this issue, we believe there is a problem with the original
logic. Since the validity check and the read operation were separated, it
could lead to inconsistencies between the data that passes the check and
the data being read. Therefore, we moved the main logic of the
port_has_data function into this function and placed the read operation
within a lock, ensuring that the validity check and read operation are
not separated, thus resolving the problem.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: 203baab8ba31 ("virtio: console: Introduce function to hand off data from host to readers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/char/virtio_console.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/char/virtio_console.c b/drivers/char/virtio_console.c
index de7d720d99fa..5aaf07f71a4c 100644
--- a/drivers/char/virtio_console.c
+++ b/drivers/char/virtio_console.c
@@ -656,10 +656,14 @@ static ssize_t fill_readbuf(struct port *port, u8 __user *out_buf,
struct port_buffer *buf;
unsigned long flags;
- if (!out_count || !port_has_data(port))
+ if (!out_count)
return 0;
- buf = port->inbuf;
+ spin_lock_irqsave(&port->inbuf_lock, flags);
+ buf = port->inbuf = get_inbuf(port);
+ spin_unlock_irqrestore(&port->inbuf_lock, flags);
+ if (!buf)
+ return 0;
out_count = min(out_count, buf->len - buf->offset);
if (to_user) {
--
2.34.1
Atomicity violation occurs when the fmc_send_cmd() function is executed
simultaneously with the modification of the fmdev->resp_skb value.
Consider a scenario where, after passing the validity check within the
function, a non-null fmdev->resp_skb variable is assigned a null value.
This results in an invalid fmdev->resp_skb variable passing the validity
check. As seen in the later part of the function, skb = fmdev->resp_skb;
when the invalid fmdev->resp_skb passes the check, a null pointer
dereference error may occur at line 478, evt_hdr = (void *)skb->data;
To address this issue, it is recommended to include the validity check of
fmdev->resp_skb within the locked section of the function. This
modification ensures that the value of fmdev->resp_skb does not change
during the validation process, thereby maintaining its validity.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs
to extract function pairs that can be concurrently executed, and then
analyzes the instructions in the paired functions to identify possible
concurrency bugs including data races and atomicity violations.
Fixes: e8454ff7b9a4 ("[media] drivers:media:radio: wl128x: FM Driver Common sources")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/media/radio/wl128x/fmdrv_common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/radio/wl128x/fmdrv_common.c b/drivers/media/radio/wl128x/fmdrv_common.c
index 3d36f323a8f8..4d032436691c 100644
--- a/drivers/media/radio/wl128x/fmdrv_common.c
+++ b/drivers/media/radio/wl128x/fmdrv_common.c
@@ -466,11 +466,12 @@ int fmc_send_cmd(struct fmdev *fmdev, u8 fm_op, u16 type, void *payload,
jiffies_to_msecs(FM_DRV_TX_TIMEOUT) / 1000);
return -ETIMEDOUT;
}
+ spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
if (!fmdev->resp_skb) {
+ spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
fmerr("Response SKB is missing\n");
return -EFAULT;
}
- spin_lock_irqsave(&fmdev->resp_skb_lock, flags);
skb = fmdev->resp_skb;
fmdev->resp_skb = NULL;
spin_unlock_irqrestore(&fmdev->resp_skb_lock, flags);
--
2.34.1
From: Filipe Manana <fdmanana(a)suse.com>
commit f8f210dc84709804c9f952297f2bfafa6ea6b4bd upstream.
When updating the global block reserve, we account for the 6 items needed
by an unlink operation and the 6 delayed references for each one of those
items. However the calculation for the delayed references is not correct
in case we have the free space tree enabled, as in that case we need to
touch the free space tree as well and therefore need twice the number of
bytes. So use the btrfs_calc_delayed_ref_bytes() helper to calculate the
number of bytes need for the delayed references at
btrfs_update_global_block_rsv().
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Diogo: this patch has been cherry-picked from the original commit;
conflicts included lack of a define (picked from commit 5630e2bcfe223)
and lack of btrfs_calc_delayed_ref_bytes (picked from commit 0e55a54502b97)
- changed const struct -> struct for compatibility.]
Signed-off-by: Diogo Jahchan Koike <djahchankoike(a)gmail.com>
---
Fixes WARNING in btrfs_chunk_alloc (2) [0]
[0]: https://syzkaller.appspot.com/bug?extid=57de2b05959bc1e659af
---
fs/btrfs/block-rsv.c | 16 +++++++++-------
fs/btrfs/block-rsv.h | 12 ++++++++++++
fs/btrfs/delayed-ref.h | 21 +++++++++++++++++++++
3 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index ec96285357e0..f7e3d6c2e302 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -378,17 +378,19 @@ void btrfs_update_global_block_rsv(struct btrfs_fs_info *fs_info)
/*
* But we also want to reserve enough space so we can do the fallback
- * global reserve for an unlink, which is an additional 5 items (see the
- * comment in __unlink_start_trans for what we're modifying.)
+ * global reserve for an unlink, which is an additional
+ * BTRFS_UNLINK_METADATA_UNITS items.
*
* But we also need space for the delayed ref updates from the unlink,
- * so its 10, 5 for the actual operation, and 5 for the delayed ref
- * updates.
+ * so add BTRFS_UNLINK_METADATA_UNITS units for delayed refs, one for
+ * each unlink metadata item.
*/
- min_items += 10;
-
+ min_items += BTRFS_UNLINK_METADATA_UNITS;
+
num_bytes = max_t(u64, num_bytes,
- btrfs_calc_insert_metadata_size(fs_info, min_items));
+ btrfs_calc_insert_metadata_size(fs_info, min_items) +
+ btrfs_calc_delayed_ref_bytes(fs_info,
+ BTRFS_UNLINK_METADATA_UNITS));
spin_lock(&sinfo->lock);
spin_lock(&block_rsv->lock);
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index 578c3497a455..662c52b4bd44 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -50,6 +50,18 @@ struct btrfs_block_rsv {
u64 qgroup_rsv_reserved;
};
+/*
+ * Number of metadata items necessary for an unlink operation:
+ *
+ * 1 for the possible orphan item
+ * 1 for the dir item
+ * 1 for the dir index
+ * 1 for the inode ref
+ * 1 for the inode
+ * 1 for the parent inode
+ */
+#define BTRFS_UNLINK_METADATA_UNITS 6
+
void btrfs_init_block_rsv(struct btrfs_block_rsv *rsv, enum btrfs_rsv_type type);
void btrfs_init_root_block_rsv(struct btrfs_root *root);
struct btrfs_block_rsv *btrfs_alloc_block_rsv(struct btrfs_fs_info *fs_info,
diff --git a/fs/btrfs/delayed-ref.h b/fs/btrfs/delayed-ref.h
index d6304b690ec4..5c6bb23d45a5 100644
--- a/fs/btrfs/delayed-ref.h
+++ b/fs/btrfs/delayed-ref.h
@@ -253,6 +253,27 @@ extern struct kmem_cache *btrfs_delayed_extent_op_cachep;
int __init btrfs_delayed_ref_init(void);
void __cold btrfs_delayed_ref_exit(void);
+static inline u64 btrfs_calc_delayed_ref_bytes(struct btrfs_fs_info *fs_info,
+ int num_delayed_refs)
+{
+ u64 num_bytes;
+
+ num_bytes = btrfs_calc_insert_metadata_size(fs_info, num_delayed_refs);
+
+ /*
+ * We have to check the mount option here because we could be enabling
+ * the free space tree for the first time and don't have the compat_ro
+ * option set yet.
+ *
+ * We need extra reservations if we have the free space tree because
+ * we'll have to modify that tree as well.
+ */
+ if (btrfs_test_opt(fs_info, FREE_SPACE_TREE))
+ num_bytes *= 2;
+
+ return num_bytes;
+}
+
static inline void btrfs_init_generic_ref(struct btrfs_ref *generic_ref,
int action, u64 bytenr, u64 len, u64 parent)
{
--
2.43.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b4cd80b0338945a94972ac3ed54f8338d2da2076
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024091330-reps-craftsman-ab67@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
b4cd80b03389 ("mptcp: pm: Fix uaf in __timer_delete_sync")
d58300c3185b ("mptcp: validate 'id' when stopping the ADD_ADDR retransmit timer")
f7dafee18538 ("mptcp: use mptcp_addr_info in mptcp_options_received")
dc87efdb1a5c ("mptcp: add mptcp reset option support")
557963c383e8 ("mptcp: move to next addr when subflow creation fail")
d88c476f4a7d ("mptcp: export lookup_anno_list_by_saddr")
efd13b71a3fa ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b4cd80b0338945a94972ac3ed54f8338d2da2076 Mon Sep 17 00:00:00 2001
From: Edward Adam Davis <eadavis(a)qq.com>
Date: Tue, 10 Sep 2024 17:58:56 +0800
Subject: [PATCH] mptcp: pm: Fix uaf in __timer_delete_sync
There are two paths to access mptcp_pm_del_add_timer, result in a race
condition:
CPU1 CPU2
==== ====
net_rx_action
napi_poll netlink_sendmsg
__napi_poll netlink_unicast
process_backlog netlink_unicast_kernel
__netif_receive_skb genl_rcv
__netif_receive_skb_one_core netlink_rcv_skb
NF_HOOK genl_rcv_msg
ip_local_deliver_finish genl_family_rcv_msg
ip_protocol_deliver_rcu genl_family_rcv_msg_doit
tcp_v4_rcv mptcp_pm_nl_flush_addrs_doit
tcp_v4_do_rcv mptcp_nl_remove_addrs_list
tcp_rcv_established mptcp_pm_remove_addrs_and_subflows
tcp_data_queue remove_anno_list_by_saddr
mptcp_incoming_options mptcp_pm_del_add_timer
mptcp_pm_del_add_timer kfree(entry)
In remove_anno_list_by_saddr(running on CPU2), after leaving the critical
zone protected by "pm.lock", the entry will be released, which leads to the
occurrence of uaf in the mptcp_pm_del_add_timer(running on CPU1).
Keeping a reference to add_timer inside the lock, and calling
sk_stop_timer_sync() with this reference, instead of "entry->add_timer".
Move list_del(&entry->list) to mptcp_pm_del_add_timer and inside the pm lock,
do not directly access any members of the entry outside the pm lock, which
can avoid similar "entry->x" uaf.
Fixes: 00cfd77b9063 ("mptcp: retransmit ADD_ADDR when timeout")
Cc: stable(a)vger.kernel.org
Reported-and-tested-by: syzbot+f3a31fb909db9b2a5c4d(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=f3a31fb909db9b2a5c4d
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Edward Adam Davis <eadavis(a)qq.com>
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Link: https://patch.msgid.link/tencent_7142963A37944B4A74EF76CD66EA3C253609@qq.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c
index f891bc714668..ad935d34c973 100644
--- a/net/mptcp/pm_netlink.c
+++ b/net/mptcp/pm_netlink.c
@@ -334,15 +334,21 @@ mptcp_pm_del_add_timer(struct mptcp_sock *msk,
{
struct mptcp_pm_add_entry *entry;
struct sock *sk = (struct sock *)msk;
+ struct timer_list *add_timer = NULL;
spin_lock_bh(&msk->pm.lock);
entry = mptcp_lookup_anno_list_by_saddr(msk, addr);
- if (entry && (!check_id || entry->addr.id == addr->id))
+ if (entry && (!check_id || entry->addr.id == addr->id)) {
entry->retrans_times = ADD_ADDR_RETRANS_MAX;
+ add_timer = &entry->add_timer;
+ }
+ if (!check_id && entry)
+ list_del(&entry->list);
spin_unlock_bh(&msk->pm.lock);
- if (entry && (!check_id || entry->addr.id == addr->id))
- sk_stop_timer_sync(sk, &entry->add_timer);
+ /* no lock, because sk_stop_timer_sync() is calling del_timer_sync() */
+ if (add_timer)
+ sk_stop_timer_sync(sk, add_timer);
return entry;
}
@@ -1462,7 +1468,6 @@ static bool remove_anno_list_by_saddr(struct mptcp_sock *msk,
entry = mptcp_pm_del_add_timer(msk, addr, false);
if (entry) {
- list_del(&entry->list);
kfree(entry);
return true;
}
Hello stable-team,
Francesco Dolcini reported a regression on v6.6.52 [1], it turns out
since v6.6.51~34 a.k.a. 5ea24ddc26a7 ("can: mcp251xfd: rx: add
workaround for erratum DS80000789E 6 of mcp2518fd") the mcp25xfd driver
is broken.
Cherry picking the following commits fixes the problem:
51b2a7216122 ("can: mcp251xfd: properly indent labels")
a7801540f325 ("can: mcp251xfd: move mcp251xfd_timestamp_start()/stop() into mcp251xfd_chip_start/stop()")
Please pick these patches for
- 6.10.x
- 6.6.x
- 6.1.x
[1] https://lore.kernel.org/all/20240923115310.GA138774@francesco-nb
regards,
Marc
--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung Nürnberg | Phone: +49-5121-206917-129 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-9 |
Hi Greg, Sasha,
This batch contains a backport for fixes for 5.10-stable:
The following list shows the backported patches, I am using original commit
IDs for reference:
1) 29b359cf6d95 ("netfilter: nft_set_pipapo: walk over current view on netlink dump")
2) efefd4f00c96 ("netfilter: nf_tables: missing iterator type in lookup walk")
Please, apply,
Thanks
Pablo Neira Ayuso (2):
netfilter: nft_set_pipapo: walk over current view on netlink dump
netfilter: nf_tables: missing iterator type in lookup walk
include/net/netfilter/nf_tables.h | 13 +++++++++++++
net/netfilter/nf_tables_api.c | 5 +++++
net/netfilter/nft_lookup.c | 1 +
net/netfilter/nft_set_pipapo.c | 6 ++++--
4 files changed, 23 insertions(+), 2 deletions(-)
--
2.30.2
Hi,
Can you please bring this commit into linux-6.11.y?
commit c35fad6f7e0d ("ASoC: amd: acp: add ZSC control register
programming sequence")
This helps with power consumption on the affected platforms.
Thanks,
Commit 0ef625bba6fb ("vfs: support statx(..., NULL, AT_EMPTY_PATH,
...)") added support for passing in NULL when AT_EMPTY_PATH is given,
improving performance when statx is used for fetching stat informantion
from a given fd, which is especially important for 32-bit platforms.
This commit also improved the performance when an empty string is given
by short-circuiting the handling of such paths.
This series is based on the commits in the Linus’ tree. Sligth
modifications are applied to the context of the patches for cleanly
applying.
Tested-by: Xi Ruoyao <xry111(a)xry111.site>
Signed-off-by: Miao Wang <shankerwangmiao(a)gmail.com>
---
Christian Brauner (2):
fs: new helper vfs_empty_path()
stat: use vfs_empty_path() helper
Mateusz Guzik (1):
vfs: support statx(..., NULL, AT_EMPTY_PATH, ...)
fs/internal.h | 14 ++++++
fs/namespace.c | 13 ------
fs/stat.c | 123 ++++++++++++++++++++++++++++++++++++-----------------
include/linux/fs.h | 17 ++++++++
4 files changed, 116 insertions(+), 51 deletions(-)
---
base-commit: 049be94099ea5ba8338526c5a4f4f404b9dcaf54
change-id: 20240918-statx-stable-linux-6-10-y-17c3ec7497c8
Best regards,
--
Miao Wang <shankerwangmiao(a)gmail.com>
Hello,
please consider picking up:
7f3287db6543 ("netfilter: nft_socket: make cgroupsv2 matching work with namespaces")
and its followup fix,
7052622fccb1 ("netfilter: nft_socket: Fix a NULL vs IS_ERR() bug in nft_socket_cgroup_subtree_level()")
It should cherry-pick fine for 6.1 and later.
I'm not sure a 5.15 backport is worth it, as its not a crash fix and
noone has reported this problem so far with a 5.15 kernel.
Hello again,
Here is the next set of XFS backports, this set is for 6.1.y and I will
be following up with a set for 5.15.y later. There were some good
suggestions made at LSF to survey test coverage to cut back on
testing but I've been a bit swamped and a backport set was overdue.
So for this set, I have run the auto group 3 x 8 configs with no
regressions seen. Let me know if you spot any issues.
This set has already been ack'd on the XFS list.
Thanks,
Leah
https://lkml.iu.edu/hypermail/linux/kernel/2212.0/04860.html
52f31ed22821
[1/1] xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
https://lore.kernel.org/linux-xfs/ef8a958d-741f-5bfd-7b2f-db65bf6dc3ac@huaw…
4da112513c01
[1/1] xfs: Fix deadlock on xfs_inodegc_worker
https://www.spinics.net/lists/linux-xfs/msg68547.html
601a27ea09a3
[1/1] xfs: fix extent busy updating
https://www.spinics.net/lists/linux-xfs/msg67254.html
c85007e2e394
[1/1] xfs: don't use BMBT btree split workers for IO completion
fixes from start of the series
https://lore.kernel.org/linux-xfs/20230209221825.3722244-1-david@fromorbit.…
[00/42] xfs: per-ag centric allocation alogrithms
1dd0510f6d4b
[01/42] xfs: fix low space alloc deadlock
f08f984c63e9
[02/42] xfs: prefer free inodes at ENOSPC over chunk allocation
d5753847b216
[03/42] xfs: block reservation too large for minleft allocation
https://lore.kernel.org/linux-xfs/Y+Z7TZ9o+KgXLcV8@magnolia/
60b730a40c43
[1/1] xfs: fix uninitialized variable access
https://lore.kernel.org/linux-xfs/20230228051250.1238353-1-david@fromorbit.…
0c7273e494dd
[1/1] xfs: quotacheck failure can race with background inode inactivation
https://lore.kernel.org/linux-xfs/20230412024907.GP360889@frogsfrogsfrogs/
8ee81ed581ff
[1/1] xfs: fix BUG_ON in xfs_getbmap()
https://www.spinics.net/lists/linux-xfs/msg71062.html
[0/4] xfs: bug fixes for 6.4-rc1
[1/4] xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof
skip, fix for a commit from 6.3
8e698ee72c4e
[2/4] xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
[3/4] xfs: flush dirty data and drain directios before scrubbing cow fork
skip, scrub
[4/4] xfs: don't allocate into the data fork for an unshare request
skip, more of an optimization than a fix
1bba82fe1afa
[5/4] xfs: fix negative array access in xfs_getbmap
(fix of 8ee81ed)
https://lore.kernel.org/linux-xfs/20230517000449.3997582-1-david@fromorbit.…
[0/4] xfs: bug fixes for 6.4-rcX
89a4bf0dc385
[1/4] xfs: buffer pins need to hold a buffer reference
[2/4] xfs: restore allocation trylock iteration
skip, for issue introduced in 6.3
cb042117488d
[3/4] xfs: defered work could create precommits
(dependency for patch 4)
82842fee6e59
[4/4] xfs: fix AGF vs inode cluster buffer deadlock
https://lore.kernel.org/linux-xfs/20240612225148.3989713-1-david@fromorbit.…
348a1983cf4c
(fix for 82842fee6e5)
[1/1] xfs: fix unlink vs cluster buffer instantiation race
https://lore.kernel.org/linux-xfs/20230530001928.2967218-1-david@fromorbit.…
d4d12c02b
[1/1] xfs: collect errors from inodegc for unlinked inode recovery
https://lore.kernel.org/linux-xfs/20230524121041.GA4128075@ceph-admin/
c3b880acadc9
[1/1] xfs: fix ag count overflow during growfs
4b827b3f305d
[1/1] xfs: remove WARN when dquot cache insertion fails
requested on list to reduce bot noise
https://www.spinics.net/lists/linux-xfs/msg73214.html
5cf32f63b0f4
[1/2] xfs: fix the calculation for "end" and "length"
[2/2] introduces new feature, skipping
https://lore.kernel.org/linux-xfs/20230913102942.601271-1-ruansy.fnst@fujit…
3c90c01e4934
[1/1] xfs: correct calculation for agend and blockcount
(fixes 5cf32)
https://lore.kernel.org/all/20230901160020.GT28186@frogsfrogsfrogs/
68b957f64fca
[1/1] xfs: load uncached unlinked inodes into memory on demand
https://www.spinics.net/lists/linux-xfs/msg74960.html
[0/3] xfs: reload entire iunlink lists
f12b96683d69
[1/3] xfs: use i_prev_unlinked to distinguish inodes that are not on the unlinked list
83771c50e42b
[2/3] xfs: reload entire unlinked bucket lists
(dependency of 49813a21ed)
49813a21ed57
[3/3] xfs: make inode unlinked bucket recovery work with quotacheck
(dependency for 537c013)
https://lore.kernel.org/all/169565629026.1982077.12646061547002741492.stgit…
537c013b140d
[1/1] xfs: fix reloading entire unlinked bucket lists
(fix of 68b957f64fca)
Darrick J. Wong (8):
xfs: fix uninitialized variable access
xfs: load uncached unlinked inodes into memory on demand
xfs: fix negative array access in xfs_getbmap
xfs: use i_prev_unlinked to distinguish inodes that are not on the
unlinked list
xfs: reload entire unlinked bucket lists
xfs: make inode unlinked bucket recovery work with quotacheck
xfs: fix reloading entire unlinked bucket lists
xfs: set bnobt/cntbt numrecs correctly when formatting new AGs
Dave Chinner (12):
xfs: dquot shrinker doesn't check for XFS_DQFLAG_FREEING
xfs: don't use BMBT btree split workers for IO completion
xfs: fix low space alloc deadlock
xfs: prefer free inodes at ENOSPC over chunk allocation
xfs: block reservation too large for minleft allocation
xfs: quotacheck failure can race with background inode inactivation
xfs: buffer pins need to hold a buffer reference
xfs: defered work could create precommits
xfs: fix AGF vs inode cluster buffer deadlock
xfs: collect errors from inodegc for unlinked inode recovery
xfs: remove WARN when dquot cache insertion fails
xfs: fix unlink vs cluster buffer instantiation race
Long Li (1):
xfs: fix ag count overflow during growfs
Shiyang Ruan (2):
xfs: fix the calculation for "end" and "length"
xfs: correct calculation for agend and blockcount
Wengang Wang (1):
xfs: fix extent busy updating
Wu Guanghao (1):
xfs: Fix deadlock on xfs_inodegc_worker
Ye Bin (1):
xfs: fix BUG_ON in xfs_getbmap()
fs/xfs/libxfs/xfs_ag.c | 19 ++-
fs/xfs/libxfs/xfs_alloc.c | 69 +++++++--
fs/xfs/libxfs/xfs_bmap.c | 16 +-
fs/xfs/libxfs/xfs_bmap.h | 2 +
fs/xfs/libxfs/xfs_bmap_btree.c | 19 ++-
fs/xfs/libxfs/xfs_btree.c | 18 ++-
fs/xfs/libxfs/xfs_fs.h | 2 +
fs/xfs/libxfs/xfs_ialloc.c | 17 +++
fs/xfs/libxfs/xfs_log_format.h | 9 +-
fs/xfs/libxfs/xfs_trans_inode.c | 113 +-------------
fs/xfs/xfs_attr_inactive.c | 1 -
fs/xfs/xfs_bmap_util.c | 18 +--
fs/xfs/xfs_buf_item.c | 88 ++++++++---
fs/xfs/xfs_dquot.c | 1 -
fs/xfs/xfs_export.c | 14 ++
fs/xfs/xfs_extent_busy.c | 1 +
fs/xfs/xfs_fsmap.c | 1 +
fs/xfs/xfs_fsops.c | 13 +-
fs/xfs/xfs_icache.c | 58 +++++--
fs/xfs/xfs_icache.h | 4 +-
fs/xfs/xfs_inode.c | 260 ++++++++++++++++++++++++++++----
fs/xfs/xfs_inode.h | 36 ++++-
fs/xfs/xfs_inode_item.c | 149 ++++++++++++++++++
fs/xfs/xfs_inode_item.h | 1 +
fs/xfs/xfs_itable.c | 11 ++
fs/xfs/xfs_log_recover.c | 19 ++-
fs/xfs/xfs_mount.h | 11 +-
fs/xfs/xfs_notify_failure.c | 15 +-
fs/xfs/xfs_qm.c | 72 ++++++---
fs/xfs/xfs_super.c | 1 +
fs/xfs/xfs_trace.h | 46 ++++++
fs/xfs/xfs_trans.c | 9 +-
32 files changed, 841 insertions(+), 272 deletions(-)
--
2.46.0.792.g87dc391469-goog
From: Tony Luck <tony.luck(a)intel.com>
[ Upstream commit 2eda374e883ad297bd9fe575a16c1dc850346075 ]
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
[ Ricardo: I used the old match macro X86_MATCH_INTEL_FAM6_MODEL()
instead of X86_MATCH_VFM() as in the upstream commit. ]
Signed-off-by: Ricardo Neri <ricardo.neri-calderon(a)linux.intel.com>
---
I tested this backport on an Alder Lake system. Now pr_info("Incomplete
global flushes, disabling PCID") is back in dmesg. I also tested on a
Meteor Lake system, which is not affected by the INVLPG issue. The message
in question is not there before and after the backport, as expected.
This backport fixes the last remaining caller of x86_match_cpu()
that does not use the family of X86_MATCH_*() macros.
Thomas Lindroth intially reported the regression in
https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Maybe Tony and/or the x86 maintainers can ack this backport?
---
arch/x86/mm/init.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index 679893ea5e68..6215dfa23578 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
}
}
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \
- .family = 6, \
- .model = _model, \
- }
/*
* INVLPG may not properly flush Global entries
* on these CPUs when PCIDs are enabled.
*/
static const struct x86_cpu_id invlpg_miss_ids[] = {
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE ),
- INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
- INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
- INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GRACEMONT, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_P, 0),
+ X86_MATCH_INTEL_FAM6_MODEL(RAPTORLAKE_S, 0),
{}
};
--
2.34.1
Hi,
Upstream commit 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just
X86_VENDOR_INTEL") introduced a flags member to struct x86_cpu_id. Bit 0
of x86_cpu.id.flags must be set to 1 for x86_match_cpu() to work correctly.
This upstream commit has been backported to 6.1.y.
Callers that use the X86_MATCH_*() family of macros to compose the argument
of x86_match_cpu() function correctly. Callers that use their own custom
mechanisms may not work if they fail to set x86_cpu_id.flags correctly.
There are two remaining callers in 6.1.y that use their own mechanisms:
setup_pcid() and rapl_msr_probe(). The former caused a regression that
Thomas Lindroth reported in [1]. The latter works by luck but it still
populates its x86_cpu_id[] array incorrectly.
I backported two patches that fix these bugs in mainline. Hopefully the
authors and/or maintainers can ack the backports?
I tested these patches in Alder Lake and Meteor Lake systems as the two
involved callers behave differently on these two systems. I wrote the
testing details in each patch separately.
Thanks and BR,
Ricardo
[1]. https://lore.kernel.org/all/eb709d67-2a8d-412f-905d-f3777d897bfa@gmail.com/
Sumeet Pawnikar (1):
powercap: RAPL: fix invalid initialization for pl4_supported field
Tony Luck (1):
x86/mm: Switch to new Intel CPU model defines
arch/x86/mm/init.c | 16 ++++++----------
drivers/powercap/intel_rapl_msr.c | 12 ++++++------
2 files changed, 12 insertions(+), 16 deletions(-)
--
2.34.1
The `LockedBy::access` method only requires a shared reference to the
owner, so if we have shared access to the `LockedBy` from several
threads at once, then two threads could call `access` in parallel and
both obtain a shared reference to the inner value. Thus, require that
`T: Sync` when calling the `access` method.
An alternative is to require `T: Sync` in the `impl Sync for LockedBy`.
This patch does not choose that approach as it gives up the ability to
use `LockedBy` with `!Sync` types, which is okay as long as you only use
`access_mut`.
Cc: stable(a)vger.kernel.org
Fixes: 7b1f55e3a984 ("rust: sync: introduce `LockedBy`")
Signed-off-by: Alice Ryhl <aliceryhl(a)google.com>
---
Changes in v2:
- Use a `where T: Sync` on `access` instead of changing `impl Sync for
LockedBy`.
- Link to v1: https://lore.kernel.org/r/20240912-locked-by-sync-fix-v1-1-26433cbccbd2@goo…
---
rust/kernel/sync/locked_by.rs | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/rust/kernel/sync/locked_by.rs b/rust/kernel/sync/locked_by.rs
index babc731bd5f6..ce2ee8d87865 100644
--- a/rust/kernel/sync/locked_by.rs
+++ b/rust/kernel/sync/locked_by.rs
@@ -83,8 +83,12 @@ pub struct LockedBy<T: ?Sized, U: ?Sized> {
// SAFETY: `LockedBy` can be transferred across thread boundaries iff the data it protects can.
unsafe impl<T: ?Sized + Send, U: ?Sized> Send for LockedBy<T, U> {}
-// SAFETY: `LockedBy` serialises the interior mutability it provides, so it is `Sync` as long as the
-// data it protects is `Send`.
+// SAFETY: If `T` is not `Sync`, then parallel shared access to this `LockedBy` allows you to use
+// `access_mut` to hand out `&mut T` on one thread at the time. The requirement that `T: Send` is
+// sufficient to allow that.
+//
+// If `T` is `Sync`, then the `access` method also becomes available, which allows you to obtain
+// several `&T` from several threads at once. However, this is okay as `T` is `Sync`.
unsafe impl<T: ?Sized + Send, U: ?Sized> Sync for LockedBy<T, U> {}
impl<T, U> LockedBy<T, U> {
@@ -118,7 +122,10 @@ impl<T: ?Sized, U> LockedBy<T, U> {
///
/// Panics if `owner` is different from the data protected by the lock used in
/// [`new`](LockedBy::new).
- pub fn access<'a>(&'a self, owner: &'a U) -> &'a T {
+ pub fn access<'a>(&'a self, owner: &'a U) -> &'a T
+ where
+ T: Sync,
+ {
build_assert!(
size_of::<U>() > 0,
"`U` cannot be a ZST because `owner` wouldn't be unique"
@@ -127,7 +134,10 @@ pub fn access<'a>(&'a self, owner: &'a U) -> &'a T {
panic!("mismatched owners");
}
- // SAFETY: `owner` is evidence that the owner is locked.
+ // SAFETY: `owner` is evidence that there are only shared references to the owner for the
+ // duration of 'a, so it's not possible to use `Self::access_mut` to obtain a mutable
+ // reference to the inner value that aliases with this shared reference. The type is `Sync`
+ // so there are no other requirements.
unsafe { &*self.data.get() }
}
---
base-commit: 93dc3be19450447a3a7090bd1dfb9f3daac3e8d2
change-id: 20240912-locked-by-sync-fix-07193df52f98
Best regards,
--
Alice Ryhl <aliceryhl(a)google.com>
The quilt patch titled
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
has been removed from the -mm tree. Its filename was
ocfs2-fix-uninit-value-in-ocfs2_get_block.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix uninit-value in ocfs2_get_block()
Date: Wed, 25 Sep 2024 17:06:00 +0800
syzbot reported an uninit-value BUG:
BUG: KMSAN: uninit-value in ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
ocfs2_get_block+0xed2/0x2710 fs/ocfs2/aops.c:159
do_mpage_readpage+0xc45/0x2780 fs/mpage.c:225
mpage_readahead+0x43f/0x840 fs/mpage.c:374
ocfs2_readahead+0x269/0x320 fs/ocfs2/aops.c:381
read_pages+0x193/0x1110 mm/readahead.c:160
page_cache_ra_unbounded+0x901/0x9f0 mm/readahead.c:273
do_page_cache_ra mm/readahead.c:303 [inline]
force_page_cache_ra+0x3b1/0x4b0 mm/readahead.c:332
force_page_cache_readahead mm/internal.h:347 [inline]
generic_fadvise+0x6b0/0xa90 mm/fadvise.c:106
vfs_fadvise mm/fadvise.c:185 [inline]
ksys_fadvise64_64 mm/fadvise.c:199 [inline]
__do_sys_fadvise64 mm/fadvise.c:214 [inline]
__se_sys_fadvise64 mm/fadvise.c:212 [inline]
__x64_sys_fadvise64+0x1fb/0x3a0 mm/fadvise.c:212
x64_sys_call+0xe11/0x3ba0
arch/x86/include/generated/asm/syscalls_64.h:222
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This is because when ocfs2_extent_map_get_blocks() fails, p_blkno is
uninitialized. So the error log will trigger the above uninit-value
access.
The error log is out-of-date since get_blocks() was removed long time ago.
And the error code will be logged in ocfs2_extent_map_get_blocks() once
ocfs2_get_cluster() fails, so fix this by only logging inode and block.
Link: https://syzkaller.appspot.com/bug?extid=9709e73bae885b05314b
Link: https://lkml.kernel.org/r/20240925090600.3643376-1-joseph.qi@linux.alibaba.…
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Tested-by: syzbot+9709e73bae885b05314b(a)syzkaller.appspotmail.com
Cc: Heming Zhao <heming.zhao(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/aops.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/fs/ocfs2/aops.c~ocfs2-fix-uninit-value-in-ocfs2_get_block
+++ a/fs/ocfs2/aops.c
@@ -156,9 +156,8 @@ int ocfs2_get_block(struct inode *inode,
err = ocfs2_extent_map_get_blocks(inode, iblock, &p_blkno, &count,
&ext_flags);
if (err) {
- mlog(ML_ERROR, "Error %d from get_blocks(0x%p, %llu, 1, "
- "%llu, NULL)\n", err, inode, (unsigned long long)iblock,
- (unsigned long long)p_blkno);
+ mlog(ML_ERROR, "get_blocks() failed, inode: 0x%p, "
+ "block: %llu\n", inode, (unsigned long long)iblock);
goto bail;
}
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
The quilt patch titled
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
has been removed from the -mm tree. Its filename was
kselftests-mm-fix-wrong-__nr_userfaultfd-value.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Subject: kselftests: mm: fix wrong __NR_userfaultfd value
Date: Mon, 23 Sep 2024 10:38:36 +0500
grep -rnIF "#define __NR_userfaultfd"
tools/include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
arch/x86/include/generated/uapi/asm/unistd_32.h:374:#define
__NR_userfaultfd 374
arch/x86/include/generated/uapi/asm/unistd_64.h:327:#define
__NR_userfaultfd 323
arch/x86/include/generated/uapi/asm/unistd_x32.h:282:#define
__NR_userfaultfd (__X32_SYSCALL_BIT + 323)
arch/arm/include/generated/uapi/asm/unistd-eabi.h:347:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
arch/arm/include/generated/uapi/asm/unistd-oabi.h:359:#define
__NR_userfaultfd (__NR_SYSCALL_BASE + 388)
include/uapi/asm-generic/unistd.h:681:#define __NR_userfaultfd 282
The number is dependent on the architecture. The above data shows that:
x86 374
x86_64 323
The value of __NR_userfaultfd was changed to 282 when asm-generic/unistd.h
was included. It makes the test to fail every time as the correct number
of this syscall on x86_64 is 323. Fix the header to asm/unistd.h.
Link: https://lkml.kernel.org/r/20240923053836.3270393-1-usama.anjum@collabora.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pagemap_ioctl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/pagemap_ioctl.c~kselftests-mm-fix-wrong-__nr_userfaultfd-value
+++ a/tools/testing/selftests/mm/pagemap_ioctl.c
@@ -15,7 +15,7 @@
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <math.h>
-#include <asm-generic/unistd.h>
+#include <asm/unistd.h>
#include <pthread.h>
#include <sys/resource.h>
#include <assert.h>
_
Patches currently in -mm which might be from usama.anjum(a)collabora.com are
The quilt patch titled
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
has been removed from the -mm tree. Its filename was
compilerh-specify-correct-attribute-for-rodatac_jump_table.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Subject: compiler.h: specify correct attribute for .rodata..c_jump_table
Date: Tue, 24 Sep 2024 14:27:10 +0800
Currently, there is an assembler message when generating kernel/bpf/core.o
under CONFIG_OBJTOOL with LoongArch compiler toolchain:
Warning: setting incorrect section attributes for .rodata..c_jump_table
This is because the section ".rodata..c_jump_table" should be readonly,
but there is a "W" (writable) part of the flags:
$ readelf -S kernel/bpf/core.o | grep -A 1 "rodata..c"
[34] .rodata..c_j[...] PROGBITS 0000000000000000 0000d2e0
0000000000000800 0000000000000000 WA 0 0 8
There is no above issue on x86 due to the generated section flag is only
"A" (allocatable). In order to silence the warning on LoongArch, specify
the attribute like ".rodata..c_jump_table,\"a\",@progbits #" explicitly,
then the section attribute of ".rodata..c_jump_table" must be readonly
in the kernel/bpf/core.o file.
Before:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, DATA
After:
$ objdump -h kernel/bpf/core.o | grep -A 1 "rodata..c"
21 .rodata..c_jump_table 00000800 0000000000000000 0000000000000000 0000d2e0 2**3
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
By the way, AFAICT, maybe the root cause is related with the different
compiler behavior of various archs, so to some extent this change is a
workaround for LoongArch, and also there is no effect for x86 which is the
only port supported by objtool before LoongArch with this patch.
Link: https://lkml.kernel.org/r/20240924062710.1243-1-yangtiezhu@loongson.cn
Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
Cc: Josh Poimboeuf <jpoimboe(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org> [6.9+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/compiler.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/compiler.h~compilerh-specify-correct-attribute-for-rodatac_jump_table
+++ a/include/linux/compiler.h
@@ -133,7 +133,7 @@ void ftrace_likely_update(struct ftrace_
#define annotate_unreachable() __annotate_unreachable(__COUNTER__)
/* Annotate a C jump table to allow objtool to follow the code flow */
-#define __annotate_jump_table __section(".rodata..c_jump_table")
+#define __annotate_jump_table __section(".rodata..c_jump_table,\"a\",@progbits #")
#else /* !CONFIG_OBJTOOL */
#define annotate_reachable()
_
Patches currently in -mm which might be from yangtiezhu(a)loongson.cn are
The quilt patch titled
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
has been removed from the -mm tree. Its filename was
ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Subject: ocfs2: fix deadlock in ocfs2_get_system_file_inode
Date: Tue, 24 Sep 2024 09:32:57 +0000
syzbot has found a possible deadlock in ocfs2_get_system_file_inode [1].
The scenario is depicted here,
CPU0 CPU1
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
lock(&ocfs2_file_ip_alloc_sem_key);
lock(&osb->system_file_mutex);
The function calls which could lead to this are:
CPU0
ocfs2_mknod - lock(&ocfs2_file_ip_alloc_sem_key);
.
.
.
ocfs2_get_system_file_inode - lock(&osb->system_file_mutex);
CPU1 -
ocfs2_fill_super - lock(&osb->system_file_mutex);
.
.
.
ocfs2_read_virt_blocks - lock(&ocfs2_file_ip_alloc_sem_key);
This issue can be resolved by making the down_read -> down_read_try
in the ocfs2_read_virt_blocks.
[1] https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Link: https://lkml.kernel.org/r/20240924093257.7181-1-pvmohammedanees2003@gmail.c…
Signed-off-by: Mohammed Anees <pvmohammedanees2003(a)gmail.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reported-by: <syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=e0055ea09f1f5e6fabdd
Tested-by: syzbot+e0055ea09f1f5e6fabdd(a)syzkaller.appspotmail.com
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/extent_map.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/extent_map.c~ocfs2-fix-deadlock-in-ocfs2_get_system_file_inode
+++ a/fs/ocfs2/extent_map.c
@@ -973,7 +973,13 @@ int ocfs2_read_virt_blocks(struct inode
}
while (done < nr) {
- down_read(&OCFS2_I(inode)->ip_alloc_sem);
+ if (!down_read_trylock(&OCFS2_I(inode)->ip_alloc_sem)) {
+ rc = -EAGAIN;
+ mlog(ML_ERROR,
+ "Inode #%llu ip_alloc_sem is temporarily unavailable\n",
+ (unsigned long long)OCFS2_I(inode)->ip_blkno);
+ break;
+ }
rc = ocfs2_extent_map_get_blocks(inode, v_block + done,
&p_block, &p_count, NULL);
up_read(&OCFS2_I(inode)->ip_alloc_sem);
_
Patches currently in -mm which might be from pvmohammedanees2003(a)gmail.com are
ocfs2-fix-typo-in-comment.patch
The quilt patch titled
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
has been removed from the -mm tree. Its filename was
ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Subject: ocfs2: reserve space for inline xattr before attaching reflink tree
Date: Wed, 18 Sep 2024 06:38:44 +0000
One of our customers reported a crash and a corrupted ocfs2 filesystem.
The crash was due to the detection of corruption. Upon troubleshooting,
the fsck -fn output showed the below corruption
[EXTENT_LIST_FREE] Extent list in owner 33080590 claims 230 as the next free chain record,
but fsck believes the largest valid value is 227. Clamp the next record value? n
The stat output from the debugfs.ocfs2 showed the following corruption
where the "Next Free Rec:" had overshot the "Count:" in the root metadata
block.
Inode: 33080590 Mode: 0640 Generation: 2619713622 (0x9c25a856)
FS Generation: 904309833 (0x35e6ac49)
CRC32: 00000000 ECC: 0000
Type: Regular Attr: 0x0 Flags: Valid
Dynamic Features: (0x16) HasXattr InlineXattr Refcounted
Extended Attributes Block: 0 Extended Attributes Inline Size: 256
User: 0 (root) Group: 0 (root) Size: 281320357888
Links: 1 Clusters: 141738
ctime: 0x66911b56 0x316edcb8 -- Fri Jul 12 06:02:30.829349048 2024
atime: 0x66911d6b 0x7f7a28d -- Fri Jul 12 06:11:23.133669517 2024
mtime: 0x66911b56 0x12ed75d7 -- Fri Jul 12 06:02:30.317552087 2024
dtime: 0x0 -- Wed Dec 31 17:00:00 1969
Refcount Block: 2777346
Last Extblk: 2886943 Orphan Slot: 0
Sub Alloc Slot: 0 Sub Alloc Bit: 14
Tree Depth: 1 Count: 227 Next Free Rec: 230
## Offset Clusters Block#
0 0 2310 2776351
1 2310 2139 2777375
2 4449 1221 2778399
3 5670 731 2779423
4 6401 566 2780447
....... .... .......
....... .... .......
The issue was in the reflink workfow while reserving space for inline
xattr. The problematic function is ocfs2_reflink_xattr_inline(). By the
time this function is called the reflink tree is already recreated at the
destination inode from the source inode. At this point, this function
reserves space for inline xattrs at the destination inode without even
checking if there is space at the root metadata block. It simply reduces
the l_count from 243 to 227 thereby making space of 256 bytes for inline
xattr whereas the inode already has extents beyond this index (in this
case up to 230), thereby causing corruption.
The fix for this is to reserve space for inline metadata at the destination
inode before the reflink tree gets recreated. The customer has verified the
fix.
Link: https://lkml.kernel.org/r/20240918063844.1830332-1-gautham.ananthakrishna@o…
Fixes: ef962df057aa ("ocfs2: xattr: fix inlined xattr reflink")
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/refcounttree.c | 26 ++++++++++++++++++++++++--
fs/ocfs2/xattr.c | 11 +----------
2 files changed, 25 insertions(+), 12 deletions(-)
--- a/fs/ocfs2/refcounttree.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/refcounttree.c
@@ -25,6 +25,7 @@
#include "namei.h"
#include "ocfs2_trace.h"
#include "file.h"
+#include "symlink.h"
#include <linux/bio.h>
#include <linux/blkdev.h>
@@ -4148,8 +4149,9 @@ static int __ocfs2_reflink(struct dentry
int ret;
struct inode *inode = d_inode(old_dentry);
struct buffer_head *new_bh = NULL;
+ struct ocfs2_inode_info *oi = OCFS2_I(inode);
- if (OCFS2_I(inode)->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
+ if (oi->ip_flags & OCFS2_INODE_SYSTEM_FILE) {
ret = -EINVAL;
mlog_errno(ret);
goto out;
@@ -4175,6 +4177,26 @@ static int __ocfs2_reflink(struct dentry
goto out_unlock;
}
+ if ((oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) &&
+ (oi->ip_dyn_features & OCFS2_INLINE_XATTR_FL)) {
+ /*
+ * Adjust extent record count to reserve space for extended attribute.
+ * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
+ */
+ struct ocfs2_inode_info *new_oi = OCFS2_I(new_inode);
+
+ if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
+ !(ocfs2_inode_is_fast_symlink(new_inode))) {
+ struct ocfs2_dinode *new_di = (struct ocfs2_dinode *)new_bh->b_data;
+ struct ocfs2_dinode *old_di = (struct ocfs2_dinode *)old_bh->b_data;
+ struct ocfs2_extent_list *el = &new_di->id2.i_list;
+ int inline_size = le16_to_cpu(old_di->i_xattr_inline_size);
+
+ le16_add_cpu(&el->l_count, -(inline_size /
+ sizeof(struct ocfs2_extent_rec)));
+ }
+ }
+
ret = ocfs2_create_reflink_node(inode, old_bh,
new_inode, new_bh, preserve);
if (ret) {
@@ -4182,7 +4204,7 @@ static int __ocfs2_reflink(struct dentry
goto inode_unlock;
}
- if (OCFS2_I(inode)->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
+ if (oi->ip_dyn_features & OCFS2_HAS_XATTR_FL) {
ret = ocfs2_reflink_xattrs(inode, old_bh,
new_inode, new_bh,
preserve);
--- a/fs/ocfs2/xattr.c~ocfs2-reserve-space-for-inline-xattr-before-attaching-reflink-tree
+++ a/fs/ocfs2/xattr.c
@@ -6511,16 +6511,7 @@ static int ocfs2_reflink_xattr_inline(st
}
new_oi = OCFS2_I(args->new_inode);
- /*
- * Adjust extent record count to reserve space for extended attribute.
- * Inline data count had been adjusted in ocfs2_duplicate_inline_data().
- */
- if (!(new_oi->ip_dyn_features & OCFS2_INLINE_DATA_FL) &&
- !(ocfs2_inode_is_fast_symlink(args->new_inode))) {
- struct ocfs2_extent_list *el = &new_di->id2.i_list;
- le16_add_cpu(&el->l_count, -(inline_size /
- sizeof(struct ocfs2_extent_rec)));
- }
+
spin_lock(&new_oi->ip_lock);
new_oi->ip_dyn_features |= OCFS2_HAS_XATTR_FL | OCFS2_INLINE_XATTR_FL;
new_di->i_dyn_features = cpu_to_le16(new_oi->ip_dyn_features);
_
Patches currently in -mm which might be from gautham.ananthakrishna(a)oracle.com are
The quilt patch titled
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
has been removed from the -mm tree. Its filename was
mm-migrate-annotate-data-race-in-migrate_folio_unmap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jeongjun Park <aha310510(a)gmail.com>
Subject: mm: migrate: annotate data-race in migrate_folio_unmap()
Date: Tue, 24 Sep 2024 22:00:53 +0900
I found a report from syzbot [1]
This report shows that the value can be changed, but in reality, the
value of __folio_set_movable() cannot be changed because it holds the
folio refcount.
Therefore, it is appropriate to add an annotate to make KCSAN
ignore that data-race.
[1]
==================================================================
BUG: KCSAN: data-race in __filemap_remove_folio / migrate_pages_batch
write to 0xffffea0004b81dd8 of 8 bytes by task 6348 on cpu 0:
page_cache_delete mm/filemap.c:153 [inline]
__filemap_remove_folio+0x1ac/0x2c0 mm/filemap.c:233
filemap_remove_folio+0x6b/0x1f0 mm/filemap.c:265
truncate_inode_folio+0x42/0x50 mm/truncate.c:178
shmem_undo_range+0x25b/0xa70 mm/shmem.c:1028
shmem_truncate_range mm/shmem.c:1144 [inline]
shmem_evict_inode+0x14d/0x530 mm/shmem.c:1272
evict+0x2f0/0x580 fs/inode.c:731
iput_final fs/inode.c:1883 [inline]
iput+0x42a/0x5b0 fs/inode.c:1909
dentry_unlink_inode+0x24f/0x260 fs/dcache.c:412
__dentry_kill+0x18b/0x4c0 fs/dcache.c:615
dput+0x5c/0xd0 fs/dcache.c:857
__fput+0x3fb/0x6d0 fs/file_table.c:439
____fput+0x1c/0x30 fs/file_table.c:459
task_work_run+0x13a/0x1a0 kernel/task_work.c:228
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0xbe/0x130 kernel/entry/common.c:218
do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
entry_SYSCALL_64_after_hwframe+0x77/0x7f
read to 0xffffea0004b81dd8 of 8 bytes by task 6342 on cpu 1:
__folio_test_movable include/linux/page-flags.h:699 [inline]
migrate_folio_unmap mm/migrate.c:1199 [inline]
migrate_pages_batch+0x24c/0x1940 mm/migrate.c:1797
migrate_pages_sync mm/migrate.c:1963 [inline]
migrate_pages+0xff1/0x1820 mm/migrate.c:2072
do_mbind mm/mempolicy.c:1390 [inline]
kernel_mbind mm/mempolicy.c:1533 [inline]
__do_sys_mbind mm/mempolicy.c:1607 [inline]
__se_sys_mbind+0xf76/0x1160 mm/mempolicy.c:1603
__x64_sys_mbind+0x78/0x90 mm/mempolicy.c:1603
x64_sys_call+0x2b4d/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:238
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
value changed: 0xffff888127601078 -> 0x0000000000000000
Link: https://lkml.kernel.org/r/20240924130053.107490-1-aha310510@gmail.com
Fixes: 7e2a5e5ab217 ("mm: migrate: use __folio_test_movable()")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Reported-by: syzbot <syzkaller(a)googlegroups.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/migrate.c~mm-migrate-annotate-data-race-in-migrate_folio_unmap
+++ a/mm/migrate.c
@@ -1196,7 +1196,7 @@ static int migrate_folio_unmap(new_folio
int rc = -EAGAIN;
int old_page_state = 0;
struct anon_vma *anon_vma = NULL;
- bool is_lru = !__folio_test_movable(src);
+ bool is_lru = data_race(!__folio_test_movable(src));
bool locked = false;
bool dst_locked = false;
_
Patches currently in -mm which might be from aha310510(a)gmail.com are
mm-percpu-fix-typo-to-pcpu_alloc_noprof-description.patch
mm-shmem-fix-data-race-in-shmem_getattr.patch
The quilt patch titled
Subject: mm/hugetlb: simplify refs in memfd_alloc_folio
has been removed from the -mm tree. Its filename was
mm-hugetlb-simplify-refs-in-memfd_alloc_folio.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/hugetlb: simplify refs in memfd_alloc_folio
Date: Wed, 4 Sep 2024 12:41:08 -0700
The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
delete the matching folio_put in memfd_pin_folios. This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails. That error path is also broken in a second way -- when its
folio_put causes the ref to become 0, it will implicitly call
free_huge_folio, but then the path *explicitly* calls free_huge_folio.
Delete the latter.
This is a continuation of the fix
"mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
[steven.sistare(a)oracle.com: remove explicit call to free_huge_folio(), per Matthew]
Link: https://lkml.kernel.org/r/Zti-7nPVMcGgpcbi@casper.infradead.org
Link: https://lkml.kernel.org/r/1725481920-82506-1-git-send-email-steven.sistare@…
Link: https://lkml.kernel.org/r/1725478868-61732-1-git-send-email-steven.sistare@…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Suggested-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 +---
mm/memfd.c | 3 +--
2 files changed, 2 insertions(+), 5 deletions(-)
--- a/mm/gup.c~mm-hugetlb-simplify-refs-in-memfd_alloc_folio
+++ a/mm/gup.c
@@ -3615,7 +3615,7 @@ long memfd_pin_folios(struct file *memfd
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h = NULL;
+ struct hstate *h;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3659,8 +3659,6 @@ long memfd_pin_folios(struct file *memfd
&fbatch);
if (folio) {
folio_put(folio);
- if (h)
- folio_put(folio);
folio = NULL;
}
--- a/mm/memfd.c~mm-hugetlb-simplify-refs-in-memfd_alloc_folio
+++ a/mm/memfd.c
@@ -89,13 +89,12 @@ struct folio *memfd_alloc_folio(struct f
numa_node_id(),
NULL,
gfp_mask);
- if (folio && folio_try_get(folio)) {
+ if (folio) {
err = hugetlb_add_to_page_cache(folio,
memfd->f_mapping,
idx);
if (err) {
folio_put(folio);
- free_huge_folio(folio);
return ERR_PTR(err);
}
folio_unlock(folio);
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/gup: fix memfd_pin_folios alloc race panic
has been removed from the -mm tree. Its filename was
mm-gup-fix-memfd_pin_folios-alloc-race-panic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/gup: fix memfd_pin_folios alloc race panic
Date: Tue, 3 Sep 2024 07:25:21 -0700
If memfd_pin_folios tries to create a hugetlb page, but someone else
already did, then folio gets the value -EEXIST here:
folio = memfd_alloc_folio(memfd, start_idx);
if (IS_ERR(folio)) {
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
then on the next trip through the "while start_idx" loop we panic here:
if (folio) {
folio_put(folio);
To fix, set the folio to NULL on error.
Link: https://lkml.kernel.org/r/1725373521-451395-6-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/gup.c~mm-gup-fix-memfd_pin_folios-alloc-race-panic
+++ a/mm/gup.c
@@ -3702,6 +3702,7 @@ long memfd_pin_folios(struct file *memfd
ret = PTR_ERR(folio);
if (ret != -EEXIST)
goto err;
+ folio = NULL;
}
}
}
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/gup: fix memfd_pin_folios hugetlb page allocation
has been removed from the -mm tree. Its filename was
mm-gup-fix-memfd_pin_folios-hugetlb-page-allocation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/gup: fix memfd_pin_folios hugetlb page allocation
Date: Tue, 3 Sep 2024 07:25:20 -0700
When memfd_pin_folios -> memfd_alloc_folio creates a hugetlb page, the
index is wrong. The subsequent call to filemap_get_folios_contig thus
cannot find it, and fails, and memfd_pin_folios loops forever. To fix,
adjust the index for the huge_page_order.
memfd_alloc_folio also forgets to unlock the folio, so the next touch of
the page calls hugetlb_fault which blocks forever trying to take the lock.
Unlock it.
Link: https://lkml.kernel.org/r/1725373521-451395-5-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/memfd.c~mm-gup-fix-memfd_pin_folios-hugetlb-page-allocation
+++ a/mm/memfd.c
@@ -79,10 +79,13 @@ struct folio *memfd_alloc_folio(struct f
* alloc from. Also, the folio will be pinned for an indefinite
* amount of time, so it is not expected to be migrated away.
*/
- gfp_mask = htlb_alloc_mask(hstate_file(memfd));
+ struct hstate *h = hstate_file(memfd);
+
+ gfp_mask = htlb_alloc_mask(h);
gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE);
+ idx >>= huge_page_order(h);
- folio = alloc_hugetlb_folio_reserve(hstate_file(memfd),
+ folio = alloc_hugetlb_folio_reserve(h,
numa_node_id(),
NULL,
gfp_mask);
@@ -95,6 +98,7 @@ struct folio *memfd_alloc_folio(struct f
free_huge_folio(folio);
return ERR_PTR(err);
}
+ folio_unlock(folio);
return folio;
}
return ERR_PTR(-ENOMEM);
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-memfd_pin_folios-free_huge_pages-leak.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/hugetlb: fix memfd_pin_folios free_huge_pages leak
Date: Tue, 3 Sep 2024 07:25:18 -0700
memfd_pin_folios followed by unpin_folios fails to restore free_huge_pages
if the pages were not already faulted in, because the folio refcount for
pages created by memfd_alloc_folio never goes to 0. memfd_pin_folios
needs another folio_put to undo the folio_try_get below:
memfd_alloc_folio()
alloc_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_node_exact()
folio_ref_unfreeze(folio, 1); ; adds 1 refcount
folio_try_get() ; adds 1 refcount
hugetlb_add_to_page_cache() ; adds 512 refcount (on x86)
With the fix, after memfd_pin_folios + unpin_folios, the refcount for the
(unfaulted) page is 512, which is correct, as the refcount for a faulted
unpinned page is 513.
Link: https://lkml.kernel.org/r/1725373521-451395-3-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
--- a/mm/gup.c~mm-hugetlb-fix-memfd_pin_folios-free_huge_pages-leak
+++ a/mm/gup.c
@@ -3615,7 +3615,7 @@ long memfd_pin_folios(struct file *memfd
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h;
+ struct hstate *h = NULL;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3659,6 +3659,8 @@ long memfd_pin_folios(struct file *memfd
&fbatch);
if (folio) {
folio_put(folio);
+ if (h)
+ folio_put(folio);
folio = NULL;
}
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
The quilt patch titled
Subject: mm/filemap: fix filemap_get_folios_contig THP panic
has been removed from the -mm tree. Its filename was
mm-filemap-fix-filemap_get_folios_contig-thp-panic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Steve Sistare <steven.sistare(a)oracle.com>
Subject: mm/filemap: fix filemap_get_folios_contig THP panic
Date: Tue, 3 Sep 2024 07:25:17 -0700
Patch series "memfd-pin huge page fixes".
Fix multiple bugs that occur when using memfd_pin_folios with hugetlb
pages and THP. The hugetlb bugs only bite when the page is not yet
faulted in when memfd_pin_folios is called. The THP bug bites when the
starting offset passed to memfd_pin_folios is not huge page aligned. See
the commit messages for details.
This patch (of 5):
memfd_pin_folios on memory backed by THP panics if the requested start
offset is not huge page aligned:
BUG: kernel NULL pointer dereference, address: 0000000000000036
RIP: 0010:filemap_get_folios_contig+0xdf/0x290
RSP: 0018:ffffc9002092fbe8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000002 RCX: 0000000000000002
The fault occurs here, because xas_load returns a folio with value 2:
filemap_get_folios_contig()
for (folio = xas_load(&xas); folio && xas.xa_index <= end;
folio = xas_next(&xas)) {
...
if (!folio_try_get(folio)) <-- BOOM
"2" is an xarray sibling entry. We get it because memfd_pin_folios does
not round the indices passed to filemap_get_folios_contig to huge page
boundaries for THP, so we load from the middle of a huge page range see a
sibling. (It does round for hugetlbfs, at the is_file_hugepages test).
To fix, if the folio is a sibling, then return the next index as the
starting point for the next call to filemap_get_folios_contig.
Link: https://lkml.kernel.org/r/1725373521-451395-1-git-send-email-steven.sistare…
Link: https://lkml.kernel.org/r/1725373521-451395-2-git-send-email-steven.sistare…
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Signed-off-by: Steve Sistare <steven.sistare(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/filemap.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/filemap.c~mm-filemap-fix-filemap_get_folios_contig-thp-panic
+++ a/mm/filemap.c
@@ -2196,6 +2196,10 @@ unsigned filemap_get_folios_contig(struc
if (xa_is_value(folio))
goto update_start;
+ /* If we landed in the middle of a THP, continue at its end. */
+ if (xa_is_sibling(folio))
+ goto update_start;
+
if (!folio_try_get(folio))
goto retry;
_
Patches currently in -mm which might be from steven.sistare(a)oracle.com are
nfs41_init_clientid does not signal a failure condition from
nfs4_proc_exchange_id and nfs4_proc_create_session to a client which may
lead to mount syscall indefinitely blocked in the following stack trace:
nfs_wait_client_init_complete
nfs41_discover_server_trunking
nfs4_discover_server_trunking
nfs4_init_client
nfs4_set_client
nfs4_create_server
nfs4_try_get_tree
vfs_get_tree
do_new_mount
__se_sys_mount
and the client stuck in uninitialized state.
In addition to this all subsequent mount calls would also get blocked in
nfs_match_client waiting for the uninitialized client to finish
initialization:
nfs_wait_client_init_complete
nfs_match_client
nfs_get_client
nfs4_set_client
nfs4_create_server
nfs4_try_get_tree
vfs_get_tree
do_new_mount
__se_sys_mount
To avoid this situation propagate error condition to the mount thread
and let mount syscall fail properly.
Signed-off-by: Oleksandr Tymoshenko <ovt(a)google.com>
---
fs/nfs/nfs4state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 877f682b45f2..54ad3440ad2b 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -335,8 +335,8 @@ int nfs41_init_clientid(struct nfs_client *clp, const struct cred *cred)
if (!(clp->cl_exchange_flags & EXCHGID4_FLAG_CONFIRMED_R))
nfs4_state_start_reclaim_reboot(clp);
nfs41_finish_session_reset(clp);
- nfs_mark_client_ready(clp, NFS_CS_READY);
out:
+ nfs_mark_client_ready(clp, status == 0 ? NFS_CS_READY : status);
return status;
}
---
base-commit: ad618736883b8970f66af799e34007475fe33a68
change-id: 20240906-nfs-mount-deadlock-fix-55c14b38e088
Best regards,
--
Oleksandr Tymoshenko <ovt(a)google.com>
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: d4fc4d01471528da8a9797a065982e05090e1d81
Gitweb: https://git.kernel.org/tip/d4fc4d01471528da8a9797a065982e05090e1d81
Author: Alexey Gladkov (Intel) <legion(a)kernel.org>
AuthorDate: Fri, 13 Sep 2024 19:05:56 +02:00
Committer: Dave Hansen <dave.hansen(a)linux.intel.com>
CommitterDate: Thu, 26 Sep 2024 09:45:04 -07:00
x86/tdx: Fix "in-kernel MMIO" check
TDX only supports kernel-initiated MMIO operations. The handle_mmio()
function checks if the #VE exception occurred in the kernel and rejects
the operation if it did not.
However, userspace can deceive the kernel into performing MMIO on its
behalf. For example, if userspace can point a syscall to an MMIO address,
syscall does get_user() or put_user() on it, triggering MMIO #VE. The
kernel will treat the #VE as in-kernel MMIO.
Ensure that the target MMIO address is within the kernel before decoding
instruction.
Fixes: 31d58c4e557d ("x86/tdx: Handle in-kernel MMIO")
Signed-off-by: Alexey Gladkov (Intel) <legion(a)kernel.org>
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Acked-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/565a804b80387970460a4ebc67c88d1380f61ad1.172623…
---
arch/x86/coco/tdx/tdx.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index da8b66d..327c45c 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -16,6 +16,7 @@
#include <asm/insn-eval.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
+#include <asm/traps.h>
/* MMIO direction */
#define EPT_READ 0
@@ -433,6 +434,11 @@ static int handle_mmio(struct pt_regs *regs, struct ve_info *ve)
return -EINVAL;
}
+ if (!fault_in_kernel_space(ve->gla)) {
+ WARN_ONCE(1, "Access to userspace address is not supported");
+ return -EINVAL;
+ }
+
/*
* Reject EPT violation #VEs that split pages.
*
This series brings fixes and updates to ov08x40 which allows for use of
this sensor on the Qualcomm x1e80100 CRD but also on any other dts based
system.
Firstly there's a fix for the pseudo burst mode code that was added in
8f667d202384 ("media: ov08x40: Reduce start streaming time"). Not every I2C
controller can handle an arbitrary sized write, this is the case on
Qualcomm CAMSS/CCI I2C sensor interfaces which limit the transaction size
and communicate this limit via I2C quirks. A simple fix to optionally break
up the large submitted burst into chunks not exceeding adapter->quirk size
fixes.
Secondly then is addition of a yaml description for the ov08x40 and
extension of the driver to support OF probe and powering on of the power
rails from the driver instead of from ACPI.
Once done the sensor works without further modification on the Qualcomm
x1e80100 CRD.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (4):
media: ov08x40: Fix burst write sequence
media: dt-bindings: Add OmniVision OV08X40
media: ov08x40: Rename ext_clk to xvclk
media: ov08x40: Add OF probe support
.../bindings/media/i2c/ovti,ov08x40.yaml | 130 ++++++++++++++++
drivers/media/i2c/ov08x40.c | 172 ++++++++++++++++++---
2 files changed, 283 insertions(+), 19 deletions(-)
---
base-commit: 2b7275670032a98cba266bd1b8905f755b3e650f
change-id: 20240926-b4-master-24-11-25-ov08x40-c6f477aaa6a4
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COW'ed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.
This works as long as we're only ever modifying blocks, not changing the
structure of the tree.
However relocation does in fact change the structure of the tree. For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent. We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.
But, a key feature of do_relocation is the pending list. This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.
The purpose of this is to make sure we don't break sharing unless we
absolutely have to. Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path. If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally. To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.
Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.
The problem with this is that we will never reflect this in the backref
cache. If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree. This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.
In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.
Worst case you will double free some random block and re-use it when
there's still references to the block.
This is extremely subtle, and the consequences are quite bad. There
isn't a way to make sure our backref cache is consistent between
transid's.
In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's. This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.
This has existed since relocation was added, and is a pretty critical
bug. There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.
Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.
We have a reproducer that reproduced the corruption within a few minutes
of running. With this patch it survives several iterations/hours of
running the reproducer.
Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
Cc: stable(a)vger.kernel.org
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/backref.c | 12 ++++---
fs/btrfs/relocation.c | 75 ++-----------------------------------------
2 files changed, 11 insertions(+), 76 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e2f478ecd7fd..f8e1d5b2c512 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3179,10 +3179,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
btrfs_backref_cleanup_node(cache, node);
}
- cache->last_trans = 0;
-
- for (i = 0; i < BTRFS_MAX_LEVEL; i++)
- ASSERT(list_empty(&cache->pending[i]));
+ for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+ while (!list_empty(&cache->pending[i])) {
+ node = list_first_entry(&cache->pending[i],
+ struct btrfs_backref_node,
+ list);
+ btrfs_backref_cleanup_node(cache, node);
+ }
+ }
ASSERT(list_empty(&cache->pending_edge));
ASSERT(list_empty(&cache->useless_node));
ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ea4ed85919ec..cf1dfeaaf2d8 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -232,70 +232,6 @@ static struct btrfs_backref_node *walk_down_backref(
return NULL;
}
-static void update_backref_node(struct btrfs_backref_cache *cache,
- struct btrfs_backref_node *node, u64 bytenr)
-{
- struct rb_node *rb_node;
- rb_erase(&node->rb_node, &cache->rb_root);
- node->bytenr = bytenr;
- rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
- if (rb_node)
- btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
- struct btrfs_backref_cache *cache)
-{
- struct btrfs_backref_node *node;
- int level = 0;
-
- if (cache->last_trans == 0) {
- cache->last_trans = trans->transid;
- return 0;
- }
-
- if (cache->last_trans == trans->transid)
- return 0;
-
- /*
- * detached nodes are used to avoid unnecessary backref
- * lookup. transaction commit changes the extent tree.
- * so the detached nodes are no longer useful.
- */
- while (!list_empty(&cache->detached)) {
- node = list_entry(cache->detached.next,
- struct btrfs_backref_node, list);
- btrfs_backref_cleanup_node(cache, node);
- }
-
- while (!list_empty(&cache->changed)) {
- node = list_entry(cache->changed.next,
- struct btrfs_backref_node, list);
- list_del_init(&node->list);
- BUG_ON(node->pending);
- update_backref_node(cache, node, node->new_bytenr);
- }
-
- /*
- * some nodes can be left in the pending list if there were
- * errors during processing the pending nodes.
- */
- for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
- list_for_each_entry(node, &cache->pending[level], list) {
- BUG_ON(!node->pending);
- if (node->bytenr == node->new_bytenr)
- continue;
- update_backref_node(cache, node, node->new_bytenr);
- }
- }
-
- cache->last_trans = 0;
- return 1;
-}
-
static bool reloc_root_is_dead(const struct btrfs_root *root)
{
/*
@@ -551,9 +487,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
struct btrfs_backref_edge *new_edge;
struct rb_node *rb_node;
- if (cache->last_trans > 0)
- update_backref_cache(trans, cache);
-
rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3698,11 +3631,9 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
break;
}
restart:
- if (update_backref_cache(trans, &rc->backref_cache)) {
- btrfs_end_transaction(trans);
- trans = NULL;
- continue;
- }
+ if (rc->backref_cache.last_trans != trans->transid)
+ btrfs_backref_release_cache(&rc->backref_cache);
+ rc->backref_cache.last_trans = trans->transid;
ret = find_next_extent(rc, path, &key);
if (ret < 0)
--
2.43.0
Since the inception of relocation we have maintained the backref cache
across transaction commits, updating the backref cache with the new
bytenr whenever we COW'ed blocks that were in the cache, and then
updating their bytenr once we detected a transaction id change.
This works as long as we're only ever modifying blocks, not changing the
structure of the tree.
However relocation does in fact change the structure of the tree. For
example, if we are relocating a data extent, we will look up all the
leaves that point to this data extent. We will then call
do_relocation() on each of these leaves, which will COW down to the leaf
and then update the file extent location.
But, a key feature of do_relocation is the pending list. This is all
the pending nodes that we modified when we updated the file extent item.
We will then process all of these blocks via finish_pending_nodes, which
calls do_relocation() on all of the nodes that led up to that leaf.
The purpose of this is to make sure we don't break sharing unless we
absolutely have to. Consider the case that we have 3 snapshots that all
point to this leaf through the same nodes, the initial COW would have
created a whole new path. If we did this for all 3 snapshots we would
end up with 3x the number of nodes we had originally. To avoid this we
will cycle through each of the snapshots that point to each of these
nodes and update their pointers to point at the new nodes.
Once we update the pointer to the new node we will drop the node we
removed the link for and all of its children via btrfs_drop_subtree().
This is essentially just btrfs_drop_snapshot(), but for an arbitrary
point in the snapshot.
The problem with this is that we will never reflect this in the backref
cache. If we do this btrfs_drop_snapshot() for a node that is in the
backref tree, we will leave the node in the backref tree. This becomes
a problem when we change the transid, as now the backref cache has
entire subtrees that no longer exist, but exist as if they still are
pointed to by the same roots.
In the best case scenario you end up with "adding refs to an existing
tree ref" errors from insert_inline_extent_backref(), where we attempt
to link in nodes on roots that are no longer valid.
Worst case you will double free some random block and re-use it when
there's still references to the block.
This is extremely subtle, and the consequences are quite bad. There
isn't a way to make sure our backref cache is consistent between
transid's.
In order to fix this we need to simply evict the entire backref cache
anytime we cross transid's. This reduces performance in that we have to
rebuild this backref cache every time we change transid's, but fixes the
bug.
This has existed since relocation was added, and is a pretty critical
bug. There's a lot more cleanup that can be done now that this
functionality is going away, but this patch is as small as possible in
order to fix the problem and make it easy for us to backport it to all
the kernels it needs to be backported to.
Followup series will dismantle more of this code and simplify relocation
drastically to remove this functionality.
We have a reproducer that reproduced the corruption within a few minutes
of running. With this patch it survives several iterations/hours of
running the reproducer.
Fixes: 3fd0a5585eb9 ("Btrfs: Metadata ENOSPC handling for balance")
Cc: stable(a)vger.kernel.org
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/backref.c | 12 ++++---
fs/btrfs/relocation.c | 76 +++----------------------------------------
2 files changed, 13 insertions(+), 75 deletions(-)
diff --git a/fs/btrfs/backref.c b/fs/btrfs/backref.c
index e2f478ecd7fd..f8e1d5b2c512 100644
--- a/fs/btrfs/backref.c
+++ b/fs/btrfs/backref.c
@@ -3179,10 +3179,14 @@ void btrfs_backref_release_cache(struct btrfs_backref_cache *cache)
btrfs_backref_cleanup_node(cache, node);
}
- cache->last_trans = 0;
-
- for (i = 0; i < BTRFS_MAX_LEVEL; i++)
- ASSERT(list_empty(&cache->pending[i]));
+ for (i = 0; i < BTRFS_MAX_LEVEL; i++) {
+ while (!list_empty(&cache->pending[i])) {
+ node = list_first_entry(&cache->pending[i],
+ struct btrfs_backref_node,
+ list);
+ btrfs_backref_cleanup_node(cache, node);
+ }
+ }
ASSERT(list_empty(&cache->pending_edge));
ASSERT(list_empty(&cache->useless_node));
ASSERT(list_empty(&cache->changed));
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index ea4ed85919ec..aaa9cac213f1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -232,70 +232,6 @@ static struct btrfs_backref_node *walk_down_backref(
return NULL;
}
-static void update_backref_node(struct btrfs_backref_cache *cache,
- struct btrfs_backref_node *node, u64 bytenr)
-{
- struct rb_node *rb_node;
- rb_erase(&node->rb_node, &cache->rb_root);
- node->bytenr = bytenr;
- rb_node = rb_simple_insert(&cache->rb_root, node->bytenr, &node->rb_node);
- if (rb_node)
- btrfs_backref_panic(cache->fs_info, bytenr, -EEXIST);
-}
-
-/*
- * update backref cache after a transaction commit
- */
-static int update_backref_cache(struct btrfs_trans_handle *trans,
- struct btrfs_backref_cache *cache)
-{
- struct btrfs_backref_node *node;
- int level = 0;
-
- if (cache->last_trans == 0) {
- cache->last_trans = trans->transid;
- return 0;
- }
-
- if (cache->last_trans == trans->transid)
- return 0;
-
- /*
- * detached nodes are used to avoid unnecessary backref
- * lookup. transaction commit changes the extent tree.
- * so the detached nodes are no longer useful.
- */
- while (!list_empty(&cache->detached)) {
- node = list_entry(cache->detached.next,
- struct btrfs_backref_node, list);
- btrfs_backref_cleanup_node(cache, node);
- }
-
- while (!list_empty(&cache->changed)) {
- node = list_entry(cache->changed.next,
- struct btrfs_backref_node, list);
- list_del_init(&node->list);
- BUG_ON(node->pending);
- update_backref_node(cache, node, node->new_bytenr);
- }
-
- /*
- * some nodes can be left in the pending list if there were
- * errors during processing the pending nodes.
- */
- for (level = 0; level < BTRFS_MAX_LEVEL; level++) {
- list_for_each_entry(node, &cache->pending[level], list) {
- BUG_ON(!node->pending);
- if (node->bytenr == node->new_bytenr)
- continue;
- update_backref_node(cache, node, node->new_bytenr);
- }
- }
-
- cache->last_trans = 0;
- return 1;
-}
-
static bool reloc_root_is_dead(const struct btrfs_root *root)
{
/*
@@ -551,9 +487,6 @@ static int clone_backref_node(struct btrfs_trans_handle *trans,
struct btrfs_backref_edge *new_edge;
struct rb_node *rb_node;
- if (cache->last_trans > 0)
- update_backref_cache(trans, cache);
-
rb_node = rb_simple_search(&cache->rb_root, src->commit_root->start);
if (rb_node) {
node = rb_entry(rb_node, struct btrfs_backref_node, rb_node);
@@ -3698,10 +3631,11 @@ static noinline_for_stack int relocate_block_group(struct reloc_control *rc)
break;
}
restart:
- if (update_backref_cache(trans, &rc->backref_cache)) {
- btrfs_end_transaction(trans);
- trans = NULL;
- continue;
+ if (rc->backref_cache.last_trans == 0) {
+ rc->backref_cache.last_trans = trans->transid;
+ } else if (rc->backref_cache.last_trans != trans->transid) {
+ btrfs_backref_release_cache(&rc->backref_cache);
+ rc->backref_cache.last_trans = trans->transid;
}
ret = find_next_extent(rc, path, &key);
--
2.43.0
These are all fixes for the frozen notification patch [1], which as of
today hasn't landed in mainline yet. As such, this patchset is rebased
on top of the char-misc-next branch.
[1] https://lore.kernel.org/all/20240709070047.4055369-2-yutingtseng@google.com/
Cc: stable(a)vger.kernel.org
Cc: Yu-Ting Tseng <yutingtseng(a)google.com>
Cc: Alice Ryhl <aliceryhl(a)google.com>
Cc: Todd Kjos <tkjos(a)google.com>
Cc: Martijn Coenen <maco(a)google.com>
Cc: Arve Hjønnevåg <arve(a)android.com>
Cc: Viktor Martensson <vmartensson(a)google.com>
Carlos Llamas (4):
binder: fix node UAF in binder_add_freeze_work()
binder: fix OOB in binder_add_freeze_work()
binder: fix freeze UAF in binder_release_work()
binder: fix BINDER_WORK_FROZEN_BINDER debug logs
drivers/android/binder.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
--
2.46.0.792.g87dc391469-goog
Please consider adding the following commit to v6.6:
83bdfcbdbe5d ("nvme-pci: qdepth 1 quirk")
This resolves filesystem corruption and random drive drop-outs on machines with O2micro NVME drives, including a number of HP Thinclients.
Alex