Hello everyone,
#regzbot introduced v6.7.5..v6.7.6
I'm experiencing an issue where kexec does a full firmware reboot
instead of kexec reboot.
Issue first submitted at OpenSuse bugzilla [0].
OS details as follows:
Distributor ID: openSUSE
Description: openSUSE Tumbleweed-Slowroll
Release: 20240213
Issue has been reproduced by building kernel from source.
kexec works as expected in kernel v6.7.5.
kexec does full firmware reboot in kernel v6.7.6.
I followed the docs here [1] to perform git bisect and find the culprit,
hope it's alright as I'm quite out of my depth here.
Git bisect logs:
git bisect start
# status: waiting for both good and bad commits
# bad: [b631f5b445dc3379f67ff63a2e4c58f22d4975dc] Linux 6.7.6
git bisect bad b631f5b445dc3379f67ff63a2e4c58f22d4975dc
# status: waiting for good commit(s), bad commit known
# good: [004dcea13dc10acaf1486d9939be4c793834c13c] Linux 6.7.5
git bisect good 004dcea13dc10acaf1486d9939be4c793834c13c
Let me know if there's anything else I can do to help troubleshoot the
issue.
[0]: https://bugzilla.suse.com/show_bug.cgi?id=1220541
[1]: https://docs.kernel.org/admin-guide/bug-bisect.html
Kind regards,
Pavin Joseph.
I'm announcing the release of the 4.19.309 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/firmware/efi/capsule-loader.c | 2 +-
drivers/gpio/gpio-74x164.c | 4 ++--
drivers/mmc/core/mmc.c | 2 ++
drivers/net/gtp.c | 12 ++++++------
drivers/net/tun.c | 1 +
drivers/net/usb/dm9601.c | 2 +-
drivers/net/usb/lan78xx.c | 3 ++-
drivers/power/supply/bq27xxx_battery_i2c.c | 4 +++-
fs/btrfs/dev-replace.c | 24 ++++++++++++++++++++----
fs/cachefiles/bind.c | 3 +++
net/bluetooth/hci_core.c | 7 ++++---
net/bluetooth/hci_event.c | 9 ++++++++-
net/bluetooth/l2cap_core.c | 8 +++++++-
net/netlink/af_netlink.c | 2 +-
net/wireless/nl80211.c | 2 ++
sound/core/Makefile | 1 -
17 files changed, 64 insertions(+), 24 deletions(-)
Alexander Ofitserov (1):
gtp: fix use-after-free and null-ptr-deref in gtp_newlink()
Arnd Bergmann (1):
efi/capsule-loader: fix incorrect allocation size
Arturas Moskvinas (1):
gpio: 74x164: Enable output pins after registers are reset
Baokun Li (1):
cachefiles: fix memory leak in cachefiles_add_cache()
David Sterba (1):
btrfs: dev-replace: properly validate device names
Greg Kroah-Hartman (1):
Linux 4.19.309
Hans de Goede (1):
power: supply: bq27xxx-i2c: Do not free non existing IRQ
Ivan Semenov (1):
mmc: core: Fix eMMC initialization with 1-bit bus connection
Javier Carrasco (1):
net: usb: dm9601: fix wrong return value in dm9601_mdio_read
Johannes Berg (1):
wifi: nl80211: reject iftype change with mesh ID change
Kai-Heng Feng (1):
Bluetooth: Enforce validation on max value of connection interval
Luiz Augusto von Dentz (1):
Bluetooth: hci_event: Fix handling of HCI_EV_IO_CAPA_REQUEST
Oleksij Rempel (1):
lan78xx: enable auto speed configuration for LAN7850 if no EEPROM is detected
Ryosuke Yasuoka (1):
netlink: Fix kernel-infoleak-after-free in __skb_datagram_iter
Takashi Iwai (1):
ALSA: Drop leftover snd-rtctimer stuff from Makefile
Ying Hsu (1):
Bluetooth: Avoid potential use-after-free in hci_error_reset
Yunjian Wang (1):
tun: Fix xdp_rxq_info's queue_index when detaching
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from
a test case. But there has been agreement based on code review that this
is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn't present in get_swap_device()
because it doesn't make sense in general due to the race between getting
the reference and swapoff. So I've added an equivalent check directly in
free_swap_and_cache().
Details of how to provoke one possible issue (thanks to David
Hildenbrand for deriving this):
--8<-----
__swap_entry_free() might be the last user and result in
"count == SWAP_HAS_CACHE".
swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
So the question is: could someone reclaim the folio and turn
si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
still references by swap entries.
Process 1 still references subpage 0 via swap entry.
Process 2 still references subpage 1 via swap entry.
Process 1 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
[then, preempted in the hypervisor etc.]
Process 2 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
__try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()->
...
WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
What stops swapoff to succeed after process 2 reclaimed the swap cache
but before process1 finished its call to swap_page_trans_huge_swapped()?
--8<-----
Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch")
Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redha…
Cc: stable(a)vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
Hi Andrew,
Please replace v1 of this patch in mm-unstable with this version.
Changes since v1:
- Added comments for get_swap_device() as suggested by David
- Moved check that swap entry is not free from get_swap_device() to
free_swap_and_cache() since there are some paths that legitimately call with
a free offset.
I haven't addressed the recommendation by Huang Ying [1] to also revert commit
23b230ba8ac3 ("mm/swap: print bad swap offset entry in get_swap_device"). It
should be done separately to this, and and we need to conclude discussion
first.
[1] https://lore.kernel.org/all/875xy0842q.fsf@yhuang6-desk2.ccr.corp.intel.com/
Thanks,
Ryan
mm/swapfile.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2b3a2d85e350..1155a6304119 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1232,6 +1232,11 @@ static unsigned char __swap_entry_free_locked(struct swap_info_struct *p,
* with get_swap_device() and put_swap_device(), unless the swap
* functions call get/put_swap_device() by themselves.
*
+ * Note that when only holding the PTL, swapoff might succeed immediately
+ * after freeing a swap entry. Therefore, immediately after
+ * __swap_entry_free(), the swap info might become stale and should not
+ * be touched without a prior get_swap_device().
+ *
* Check whether swap entry is valid in the swap device. If so,
* return pointer to swap_info_struct, and keep the swap entry valid
* via preventing the swap device from being swapoff, until
@@ -1609,13 +1614,19 @@ int free_swap_and_cache(swp_entry_t entry)
if (non_swap_entry(entry))
return 1;
- p = _swap_info_get(entry);
+ p = get_swap_device(entry);
if (p) {
+ if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) {
+ put_swap_device(p);
+ return 0;
+ }
+
count = __swap_entry_free(p, entry);
if (count == SWAP_HAS_CACHE &&
!swap_page_trans_huge_swapped(p, entry))
__try_to_reclaim_swap(p, swp_offset(entry),
TTRS_UNMAPPED | TTRS_FULL);
+ put_swap_device(p);
}
return p != NULL;
}
--
2.25.1