There are reports that since version 6.7 update-grub fails to find the
device of the root on systems without initrd and on a single device.
This looks like the device name changed in the output of
/proc/self/mountinfo:
6.5-rc5 working
18 1 0:16 / / rw,noatime - btrfs /dev/sda8 ...
6.7 not working:
17 1 0:15 / / rw,noatime - btrfs /dev/root ...
and "update-grub" shows this error:
/usr/sbin/grub-probe: error: cannot find a device for / (is /dev mounted?)
This looks like it's related to the device name, but grub-probe
recognizes the "/dev/root" path and tries to find the underlying device.
However there's a special case for some filesystems, for btrfs in
particular.
The generic root device detection heuristic is not done and it all
relies on reading the device infos by a btrfs specific ioctl. This ioctl
returns the device name as it was saved at the time of device scan (in
this case it's /dev/root).
The change in 6.7 for temp_fsid to allow several single device
filesystem to exist with the same fsid (and transparently generate a new
UUID at mount time) was to skip caching/registering such devices.
This also skipped mounted device. One step of scanning is to check if
the device name hasn't changed, and if yes then update the cached value.
This broke the grub-probe as it always read the device /dev/root and
couldn't find it in the system. A temporary workaround is to create a
symlink but this does not survive reboot.
The right fix is to allow updating the device path of a mounted
filesystem even if this is a single device one. This does not affect the
temp_fsid feature, the UUID of the mounted filesystem remains the same
and the matching is based on device major:minor which is unique per
mounted filesystem.
As the main part of device scanning and list update is done in
device_list_add() that handles all corner cases and locking, it is
extended to take a parameter that tells it to do everything as before,
except adding a new device entry.
This covers the path when the device (that exists for all mounted
devices) name changes, updating /dev/root to /dev/sdx. Any other single
device with filesystem is skipped.
Note that if a system is booted and initial mount is done on the
/dev/root device, this will be the cached name of the device. Only after
the command "btrfs device rescan" it will change as it triggers the
rename.
The fix was verified by users whose systems were affected.
CC: stable(a)vger.kernel.org # 6.7+
Fixes: bc27d6f0aa0e ("btrfs: scan but don't register device on single device filesystem")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=218353
Link: https://lore.kernel.org/lkml/CAKLYgeJ1tUuqLcsquwuFqjDXPSJpEiokrWK2gisPKDZLs…
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/volumes.c | 30 ++++++++++++++----------------
1 file changed, 14 insertions(+), 16 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 474ab7ed65ea..f2c2f7ca5c3d 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -738,6 +738,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
bool same_fsid_diff_dev = false;
bool has_metadata_uuid = (btrfs_super_incompat_flags(disk_super) &
BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
+ bool can_create_new = *new_device_added;
if (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_CHANGING_FSID_V2) {
btrfs_err(NULL,
@@ -753,6 +754,7 @@ static noinline struct btrfs_device *device_list_add(const char *path,
return ERR_PTR(error);
}
+ *new_device_added = false;
fs_devices = find_fsid_by_device(disk_super, path_devt, &same_fsid_diff_dev);
if (!fs_devices) {
@@ -804,6 +806,15 @@ static noinline struct btrfs_device *device_list_add(const char *path,
return ERR_PTR(-EBUSY);
}
+ if (!can_create_new) {
+ pr_info(
+ "BTRFS: device fsid %pU devid %llu transid %llu %s skip registration scanned by %s (%d)\n",
+ disk_super->fsid, devid, found_transid, path,
+ current->comm, task_pid_nr(current));
+ mutex_unlock(&fs_devices->device_list_mutex);
+ return NULL;
+ }
+
nofs_flag = memalloc_nofs_save();
device = btrfs_alloc_device(NULL, &devid,
disk_super->dev_item.uuid, path);
@@ -1355,27 +1366,14 @@ struct btrfs_device *btrfs_scan_one_device(const char *path, blk_mode_t flags,
goto error_bdev_put;
}
- if (!mount_arg_dev && btrfs_super_num_devices(disk_super) == 1 &&
- !(btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING)) {
- dev_t devt;
-
- ret = lookup_bdev(path, &devt);
- if (ret)
- btrfs_warn(NULL, "lookup bdev failed for path %s: %d",
- path, ret);
- else
- btrfs_free_stale_devices(devt, NULL);
-
- pr_debug("BTRFS: skip registering single non-seed device %s\n", path);
- device = NULL;
- goto free_disk_super;
- }
+ if (mount_arg_dev || btrfs_super_num_devices(disk_super) != 1 ||
+ (btrfs_super_flags(disk_super) & BTRFS_SUPER_FLAG_SEEDING))
+ new_device_added = true;
device = device_list_add(path, disk_super, &new_device_added);
if (!IS_ERR(device) && new_device_added)
btrfs_free_stale_devices(device->devt, device);
-free_disk_super:
btrfs_release_disk_super(disk_super);
error_bdev_put:
--
2.42.1
From: Chengming Zhou <zhouchengming(a)bytedance.com>
We may encounter duplicate entry in the zswap_store():
1. swap slot that freed to per-cpu swap cache, doesn't invalidate
the zswap entry, then got reused. This has been fixed.
2. !exclusive load mode, swapin folio will leave its zswap entry
on the tree, then swapout again. This has been removed.
3. one folio can be dirtied again after zswap_store(), so need to
zswap_store() again. This should be handled correctly.
So we must invalidate the old duplicate entry before insert the
new one, which actually doesn't have to be done at the beginning
of zswap_store(). And this is a normal situation, we shouldn't
WARN_ON(1) in this case, so delete it. (The WARN_ON(1) seems want
to detect swap entry UAF problem? But not very necessary here.)
The good point is that we don't need to lock tree twice in the
store success path.
Note we still need to invalidate the old duplicate entry in the
store failure path, otherwise the new data in swapfile could be
overwrite by the old data in zswap pool when lru writeback.
We have to do this even when !zswap_enabled since zswap can be
disabled anytime. If the folio store success before, then got
dirtied again but zswap disabled, we won't invalidate the old
duplicate entry in the zswap_store(). So later lru writeback
may overwrite the new data in swapfile.
Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Cc: <stable(a)vger.kernel.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com>
---
v2:
- Change the duplicate entry invalidation loop to if, since we hold
the lock, we won't find it once we invalidate it, per Yosry.
- Add Fixes tag.
---
mm/zswap.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
diff --git a/mm/zswap.c b/mm/zswap.c
index cd67f7f6b302..6c1466633274 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1518,18 +1518,8 @@ bool zswap_store(struct folio *folio)
return false;
if (!zswap_enabled)
- return false;
+ goto check_old;
- /*
- * If this is a duplicate, it must be removed before attempting to store
- * it, otherwise, if the store fails the old page won't be removed from
- * the tree, and it might be written back overriding the new data.
- */
- spin_lock(&tree->lock);
- entry = zswap_rb_search(&tree->rbroot, offset);
- if (entry)
- zswap_invalidate_entry(tree, entry);
- spin_unlock(&tree->lock);
objcg = get_obj_cgroup_from_folio(folio);
if (objcg && !obj_cgroup_may_zswap(objcg)) {
memcg = get_mem_cgroup_from_objcg(objcg);
@@ -1608,14 +1598,12 @@ bool zswap_store(struct folio *folio)
/* map */
spin_lock(&tree->lock);
/*
- * A duplicate entry should have been removed at the beginning of this
- * function. Since the swap entry should be pinned, if a duplicate is
- * found again here it means that something went wrong in the swap
- * cache.
+ * The folio could be dirtied again, invalidate the possible old entry
+ * before insert this new entry.
*/
- while (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
- WARN_ON(1);
+ if (zswap_rb_insert(&tree->rbroot, entry, &dupentry) == -EEXIST) {
zswap_invalidate_entry(tree, dupentry);
+ VM_WARN_ON(zswap_rb_insert(&tree->rbroot, entry, &dupentry));
}
if (entry->length) {
INIT_LIST_HEAD(&entry->lru);
@@ -1638,6 +1626,17 @@ bool zswap_store(struct folio *folio)
reject:
if (objcg)
obj_cgroup_put(objcg);
+check_old:
+ /*
+ * If zswap store fail or zswap disabled, we must invalidate possible
+ * old entry which previously stored by this folio. Otherwise, later
+ * writeback could overwrite the new data in swapfile.
+ */
+ spin_lock(&tree->lock);
+ entry = zswap_rb_search(&tree->rbroot, offset);
+ if (entry)
+ zswap_invalidate_entry(tree, entry);
+ spin_unlock(&tree->lock);
return false;
shrink:
--
2.40.1
Dear stable kernel maintainers,
I am writing to request that 3 related patches be merged to various LTS kernels. I'm not sure if it would have
been preferable for me to send 3 separate emails, so please forgive me if I chose wrongly. (This is my first foray
into interacting with the kernel community) :)
The patches are as follows:
1. 0cd3be51733f (HID: apple: Add support for the 2021 Magic Keyboard, 2021-10-08)
2. 346338ef00d3 (HID: apple: Swap the Fn and Left Control keys on Apple keyboards, 2020-05-15)
3. 531cb56972f2 (HID: apple: Add 2021 magic keyboard FN key mapping, 2021-11-08)
These patches have all been merged to mainline, but I believe when they were submitted, backporting may not have been considered. The Apple Magic Keyboard 2021 (Model # A2450) seems to be a popular keyboard, and without these
patches, for users on certain LTS kernels that use this keyboard, the function keys do not behave as expected. e.g. Pressing the brightness down or brightness up key didn't work, and bizarrely pressing the globe/Fn key alone caused the brightness to decrease. None of the top row keys worked as expected.
I checked to see where the patches were missing and figured that it would be good to have those patches in those
kernels.
I would ask that patches 1 & 3 be merged to v4.19, v5.4, v5.10, and v5.15.
I would ask that patch 2 be merged to: v5.4 and v4.19.
For patch 3 to apply cleanly, it needed patch 2 to be present in the tree.
Thanks,
--
Aseda Aboagye
On x86 each cpu_hw_events maintains a table for counter assignment but
it missed to update one for the deleted event in x86_pmu_del(). This
can make perf_clear_dirty_counters() reset used counter if it's called
before event scheduling or enabling. Then it would return out of range
data which doesn't make sense.
The following code can reproduce the problem.
$ cat repro.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.disabled = 1,
};
void *worker(void *arg)
{
int cpu = (long)arg;
int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
void *p;
do {
ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
munmap(p, 4096);
ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
} while (1);
return NULL;
}
int main(void)
{
int i;
int n = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t *th = calloc(n, sizeof(*th));
for (i = 0; i < n; i++)
pthread_create(&th[i], NULL, worker, (void *)(long)i);
for (i = 0; i < n; i++)
pthread_join(th[i], NULL);
free(th);
return 0;
}
And you can see the out of range data using perf stat like this.
Probably it'd be easier to see on a large machine.
$ gcc -o repro repro.c -pthread
$ ./repro &
$ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%)
1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%)
1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%)
2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%)
2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%)
2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%)
2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%)
3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%)
3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%)
3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%)
3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle
It should move the contents in the cpuc->assign as well.
Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
Cc: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
---
arch/x86/events/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 09050641ce5d..5b0dd07b1ef1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1644,6 +1644,7 @@ static void x86_pmu_del(struct perf_event *event, int flags)
while (++i < cpuc->n_events) {
cpuc->event_list[i-1] = cpuc->event_list[i];
cpuc->event_constraint[i-1] = cpuc->event_constraint[i];
+ cpuc->assign[i-1] = cpuc->assign[i];
}
cpuc->event_constraint[i-1] = NULL;
--cpuc->n_events;
--
2.43.0.472.g3155946c3a-goog
From: Finn Thain <fthain(a)linux-m68k.org>
commit b845b574f86dcb6a70dfa698aa87a237b0878d2a upstream.
On 68030/020, an instruction such as, moveml %a2-%a3/%a5,%sp@- may cause
a stack page fault during instruction execution (i.e. not at an
instruction boundary) and produce a format 0xB exception frame.
In this situation, the value of USP will be unreliable. If a signal is
to be delivered following the exception, this USP value is used to
calculate the location for a signal frame. This can result in a
corrupted user stack.
The corruption was detected in dash (actually in glibc) where it showed
up as an intermittent "stack smashing detected" message and crash
following signal delivery for SIGCHLD.
It was hard to reproduce that failure because delivery of the signal
raced with the page fault and because the kernel places an unpredictable
gap of up to 7 bytes between the USP and the signal frame.
A format 0xB exception frame can be produced by a bus error or an
address error. The 68030 Users Manual says that address errors occur
immediately upon detection during instruction prefetch. The instruction
pipeline allows prefetch to overlap with other instructions, which means
an address error can arise during the execution of a different
instruction. So it seems likely that this patch may help in the address
error case also.
Reported-and-tested-by: Stan Johnson <userm57(a)yahoo.com>
Link: https://lore.kernel.org/all/CAMuHMdW3yD22_ApemzW_6me3adq6A458u1_F0v-1EYwK_6…
Cc: Michael Schmitz <schmitzmic(a)gmail.com>
Cc: Andreas Schwab <schwab(a)linux-m68k.org>
Cc: stable(a)vger.kernel.org
Co-developed-by: Michael Schmitz <schmitzmic(a)gmail.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Signed-off-by: Finn Thain <fthain(a)linux-m68k.org>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Link: https://lore.kernel.org/r/9e66262a754fcba50208aa424188896cc52a1dd1.16833658…
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
arch/m68k/kernel/signal.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/arch/m68k/kernel/signal.c b/arch/m68k/kernel/signal.c
index 8fb8ee804b3a..de7c1bde62bc 100644
--- a/arch/m68k/kernel/signal.c
+++ b/arch/m68k/kernel/signal.c
@@ -808,11 +808,17 @@ static inline int rt_setup_ucontext(struct ucontext __user *uc, struct pt_regs *
}
static inline void __user *
-get_sigframe(struct ksignal *ksig, size_t frame_size)
+get_sigframe(struct ksignal *ksig, struct pt_regs *tregs, size_t frame_size)
{
unsigned long usp = sigsp(rdusp(), ksig);
+ unsigned long gap = 0;
- return (void __user *)((usp - frame_size) & -8UL);
+ if (CPU_IS_020_OR_030 && tregs->format == 0xb) {
+ /* USP is unreliable so use worst-case value */
+ gap = 256;
+ }
+
+ return (void __user *)((usp - gap - frame_size) & -8UL);
}
static int setup_frame(struct ksignal *ksig, sigset_t *set,
@@ -830,7 +836,7 @@ static int setup_frame(struct ksignal *ksig, sigset_t *set,
return -EFAULT;
}
- frame = get_sigframe(ksig, sizeof(*frame) + fsize);
+ frame = get_sigframe(ksig, tregs, sizeof(*frame) + fsize);
if (fsize)
err |= copy_to_user (frame + 1, regs + 1, fsize);
@@ -903,7 +909,7 @@ static int setup_rt_frame(struct ksignal *ksig, sigset_t *set,
return -EFAULT;
}
- frame = get_sigframe(ksig, sizeof(*frame));
+ frame = get_sigframe(ksig, tregs, sizeof(*frame));
if (fsize)
err |= copy_to_user (&frame->uc.uc_extra, regs + 1, fsize);
--
2.17.1
commit 3f90f9ef2dda316d64e420d5d51ba369587ccc55 upstream.
If 020/030 support is enabled, get_io_area() leaves an IO_SIZE gap
between mappings which is added to the vm_struct representing the
mapping. __ioremap() uses the actual requested size (after alignment),
while __iounmap() is passed the size from the vm_struct.
On 020/030, early termination descriptors are used to set up mappings of
extent 'size', which are validated on unmapping. The unmapped gap of
size IO_SIZE defeats the sanity check of the pmd tables, causing
__iounmap() to loop forever on 030.
On 040/060, unmapping of page table entries does not check for a valid
mapping, so the umapping loop always completes there.
Adjust size to be unmapped by the gap that had been added in the
vm_struct prior.
This fixes the hang in atari_platform_init() reported a long time ago,
and a similar one reported by Finn recently (addressed by removing
ioremap() use from the SWIM driver.
Tested on my Falcon in 030 mode - untested but should work the same on
040/060 (the extra page tables cleared there would never have been set
up anyway).
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
[geert: Minor commit description improvements]
[geert: This was fixed in 2.4.23, but not in 2.5.x]
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: stable(a)vger.kernel.org
Cc: <cip-dev(a)lists.cip-project.org> # 4.4
---
arch/m68k/mm/kmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/m68k/mm/kmap.c b/arch/m68k/mm/kmap.c
index 6e4955bc542b..fcd52cefee29 100644
--- a/arch/m68k/mm/kmap.c
+++ b/arch/m68k/mm/kmap.c
@@ -88,7 +88,8 @@ static inline void free_io_area(void *addr)
for (p = &iolist ; (tmp = *p) ; p = &tmp->next) {
if (tmp->addr == addr) {
*p = tmp->next;
- __iounmap(tmp->addr, tmp->size);
+ /* remove gap added in get_io_area() */
+ __iounmap(tmp->addr, tmp->size - IO_SIZE);
kfree(tmp);
return;
}
--
2.17.1
Syzkaller reports NULL pointer dereference issue at skb_segment()
in 5.10/5.15/6.1 stable releases. The problem has been fixed by
the following patch which can be cleanly applied to 5.10/5.15/6.1 branches.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.