In some cases bootloaders will leave boot_params->cc_blob_address
uninitialized rather than zero'ing it out. This field is only meant to
be set by the boot/compressed kernel to pass information to the
uncompressed kernel when SEV-SNP support is enabled, so there are no
cases where the bootloader-provided values should be treated as
anything other than garbage. Otherwise, the uncompressed kernel may
attempt to access this bogus address, leading to a crash during early
boot.
Normally sanitize_boot_params() would be used to clear out such fields,
but that happens too late: sev_enable() may have already initialized it
to a valid value that should not be zero'd out. Instead, have
sev_enable() zero it out unconditionally beforehand.
Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also
including this handling in the sev_enable() stub function.
Fixes: b190a043c49a ("x86/sev: Add SEV-SNP feature detection/setup")
Cc: stable(a)vger.kernel.org
Reported-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Reported-by: watnuss(a)gmx.de
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387
Signed-off-by: Michael Roth <michael.roth(a)amd.com>
---
arch/x86/boot/compressed/misc.h | 11 ++++++++++-
arch/x86/boot/compressed/sev.c | 8 ++++++++
2 files changed, 18 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 4910bf230d7b..aa7889751abc 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -132,7 +132,16 @@ void snp_set_page_private(unsigned long paddr);
void snp_set_page_shared(unsigned long paddr);
void sev_prep_identity_maps(unsigned long top_level_pgt);
#else
-static inline void sev_enable(struct boot_params *bp) { }
+static inline void sev_enable(struct boot_params *bp)
+{
+ /*
+ * bp->cc_blob_address should only be set by boot/compressed kernel.
+ * Initialize it to 0 to ensure that uninitialized values from
+ * buggy bootloaders aren't propagated.
+ */
+ if (bp)
+ bp->cc_blob_address = 0;
+}
static inline void sev_es_shutdown_ghcb(void) { }
static inline bool sev_es_check_ghcb_fault(unsigned long address)
{
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index 52f989f6acc2..c93930d5ccbd 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -276,6 +276,14 @@ void sev_enable(struct boot_params *bp)
struct msr m;
bool snp;
+ /*
+ * bp->cc_blob_address should only be set by boot/compressed kernel.
+ * Initialize it to 0 to ensure that uninitialized values from
+ * buggy bootloaders aren't propagated.
+ */
+ if (bp)
+ bp->cc_blob_address = 0;
+
/*
* Setup/preliminary detection of SNP. This will be sanity-checked
* against CPUID/MSR values later.
--
2.25.1
This is the start of the stable review cycle for the 4.9.326 release.
There are 101 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 25 Aug 2022 08:00:15 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.326-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.326-rc1
Nathan Chancellor <nathan(a)kernel.org>
MIPS: tlbex: Explicitly compare _PAGE_NO_EXEC against 0
Zheyu Ma <zheyuma97(a)gmail.com>
video: fbdev: i740fb: Check the argument of i740_calc_vclk()
Zhouyi Zhou <zhouzhouyi(a)gmail.com>
powerpc/64: Init jump labels before parse_early_param()
Takashi Iwai <tiwai(a)suse.de>
ALSA: timer: Use deferred fasync helper
Takashi Iwai <tiwai(a)suse.de>
ALSA: core: Add async signal helpers
Liang He <windhl(a)126.com>
mips: cavium-octeon: Fix missing of_node_put() in octeon2_usb_clocks_start
Schspa Shi <schspa(a)gmail.com>
vfio: Clear the caps->buf to NULL after free
Liang He <windhl(a)126.com>
tty: serial: Fix refcount leak bug in ucc_uart.c
Kiselev, Oleg <okiselev(a)amazon.com>
ext4: avoid resizing to a partial cluster size
Ye Bin <yebin10(a)huawei.com>
ext4: avoid remove directory when directory is corrupted
Wentao_Liang <Wentao_Liang_g(a)163.com>
drivers:md:fix a potential use-after-free bug
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
cxl: Fix a memory leak in an error handling path
Jozef Martiniak <jomajm(a)gmail.com>
gadgetfs: ep_io - wait until IRQ finishes
Liang He <windhl(a)126.com>
usb: host: ohci-ppc-of: Fix refcount leak bug
Sai Prakash Ranjan <quic_saipraka(a)quicinc.com>
irqchip/tegra: Fix overflow implicit truncation warnings
Masahiro Yamada <yamada.masahiro(a)socionext.com>
kbuild: clear LDFLAGS in the top Makefile
Csókás Bence <csokas.bence(a)prolan.hu>
fec: Fix timer capture timing in `fec_ptp_enable_pps()`
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: really skip inactive sets when allocating name
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: add force_successful_syscall_return()
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: restarts apply only to the first sigframe we build...
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: fix syscall restart checks
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: traced syscall does need to check the syscall number
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: don't leave NULLs in sys_call_table[]
Al Viro <viro(a)zeniv.linux.org.uk>
nios2: page fault et.al. are *not* restartable syscalls...
Duoming Zhou <duoming(a)zju.edu.cn>
atm: idt77252: fix use-after-free bugs caused by tst_timer
Dan Carpenter <dan.carpenter(a)oracle.com>
xen/xenbus: fix return type in xenbus_file_read()
Peilin Ye <peilin.ye(a)bytedance.com>
vsock: Fix memory leak in vsock_connect()
Nikita Travkin <nikita(a)trvn.ru>
pinctrl: qcom: msm8916: Allow CAMSS GP clocks to be muxed
Miaoqian Lin <linmq006(a)gmail.com>
pinctrl: nomadik: Fix refcount leak in nmk_pinctrl_dt_subnode_to_map
Trond Myklebust <trond.myklebust(a)hammerspace.com>
SUNRPC: Reinitialise the backchannel request buffers before reuse
Zhang Xianwei <zhang.xianwei8(a)zte.com.cn>
NFSv4.1: RECLAIM_COMPLETE must handle EACCES
Marc Kleine-Budde <mkl(a)pengutronix.de>
can: ems_usb: fix clang's -Wunaligned-access warning
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix lost error handling when looking up extended ref on log replay
Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
ata: libata-eh: Add missing command name
Mikulas Patocka <mpatocka(a)redhat.com>
rds: add missing barrier to release_refill
Amadeusz Sławiński <amadeuszx.slawinski(a)linux.intel.com>
ALSA: info: Fix llseek return value when using callback
Jamal Hadi Salim <jhs(a)mojatatu.com>
net_sched: cls_route: disallow handle of 0
Tyler Hicks <tyhicks(a)linux.microsoft.com>
net/9p: Initialize the iounit field during fid creation
Guenter Roeck <linux(a)roeck-us.net>
nios2: time: Read timer in get_cycles only if initialized
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression
Jose Alonso <joalonsof(a)gmail.com>
Revert "net: usb: ax88179_178a needs FLAG_SEND_ZLP"
Tony Battersby <tonyb(a)cybernetics.com>
scsi: sg: Allow waiting for commands to complete on removed device
Eric Dumazet <edumazet(a)google.com>
tcp: fix over estimation in sk_forced_mem_schedule()
Qu Wenruo <wqu(a)suse.com>
btrfs: reject log replay if there is unsupported RO compat flag
Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
net_sched: cls_route: remove from list when handle is 0
Mikulas Patocka <mpatocka(a)redhat.com>
dm raid: fix address sanitizer warning in raid_status
Baokun Li <libaokun1(a)huawei.com>
ext4: correct max_inline_xattr_value_size computing
Eric Whitney <enwlinux(a)gmail.com>
ext4: fix extent status tree race in writeback error recovery path
Theodore Ts'o <tytso(a)mit.edu>
ext4: update s_overhead_clusters in the superblock during an on-line resize
Baokun Li <libaokun1(a)huawei.com>
ext4: fix use-after-free in ext4_xattr_set_entry
Lukas Czerner <lczerner(a)redhat.com>
ext4: make sure ext4_append() always allocates new block
Baokun Li <libaokun1(a)huawei.com>
ext4: add EXT4_INODE_HAS_XATTR_SPACE macro in xattr.h
David Collins <quic_collinsd(a)quicinc.com>
spmi: trace: fix stack-out-of-bound access in SPMI tracing functions
Alexander Lobakin <alexandr.lobakin(a)intel.com>
x86/olpc: fix 'logical not is only applied to the left hand side'
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: Fix missing auto port scan and thus missing target ports
Florian Westphal <fw(a)strlen.de>
netfilter: nf_tables: fix null deref due to zeroed list head
Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
USB: HCD: Fix URB giveback issue in tasklet function
Huacai Chen <chenhuacai(a)loongson.cn>
MIPS: cpuinfo: Fix a warning for CONFIG_CPUMASK_OFFSTACK
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/powernv: Avoid crashing if rng is NULL
Pali Rohár <pali(a)kernel.org>
powerpc/fsl-pci: Fix Class Code of PCIe Root Port
Pali Rohár <pali(a)kernel.org>
PCI: Add defines for normal and subtractive PCI bridges
Alexander Lobakin <alexandr.lobakin(a)intel.com>
ia64, processor: fix -Wincompatible-pointer-types in ia64_get_irr()
Mikulas Patocka <mpatocka(a)redhat.com>
md-raid10: fix KASAN warning
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: limit nsec
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: fix overflow in prog accounting
Timur Tabi <ttabi(a)nvidia.com>
drm/nouveau: fix another off-by-one in nvbios_addr
Helge Deller <deller(a)gmx.de>
parisc: Fix device names in /proc/iomem
Lukas Wunner <lukas(a)wunner.de>
usbnet: Fix linkwatch use-after-free on disconnect
David Howells <dhowells(a)redhat.com>
vfs: Check the truncate maximum size in inode_newsize_ok()
Allen Ballway <ballway(a)chromium.org>
ALSA: hda/cirrus - support for iMac 12,1 model
Meng Tang <tangmeng(a)uniontech.com>
ALSA: hda/conexant: Add quirk for LENOVO 20149 Notebook model
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checks
Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
KVM: SVM: Don't BUG if userspace injects an interrupt with GIF=0
Mikulas Patocka <mpatocka(a)redhat.com>
add barriers to buffer_uptodate and set_buffer_uptodate
Zheyu Ma <zheyuma97(a)gmail.com>
ALSA: bcd2000: Fix a UAF bug on the error path of probing
Nick Desaulniers <ndesaulniers(a)google.com>
x86: link vdso and boot with -z noexecstack --no-warn-rwx-segments
Nick Desaulniers <ndesaulniers(a)google.com>
Makefile: link with -z noexecstack --no-warn-rwx-segments
Ning Qiang <sohu0106(a)126.com>
macintosh/adb: fix oob read in do_adb_query() function
Hans-Christian Noren Egtvedt <hegtvedt(a)cisco.com>
random: only call boot_init_stack_canary() once
Werner Sembach <wse(a)tuxedocomputers.com>
ACPI: video: Shortening quirk list by identifying Clevo by board_name only
Werner Sembach <wse(a)tuxedocomputers.com>
ACPI: video: Force backlight native for some TongFang devices
Daniel Micay <danielmicay(a)gmail.com>
init/main.c: extract early boot entropy from the passed cmdline
Laura Abbott <lauraa(a)codeaurora.org>
init: move stack canary initialization after setup_arch
Viresh Kumar <viresh.kumar(a)linaro.org>
init/main: properly align the multi-line comment
Viresh Kumar <viresh.kumar(a)linaro.org>
init/main: Fix double "the" in comment
Christian Borntraeger <borntraeger(a)de.ibm.com>
include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap
Paul Moore <paul(a)paul-moore.com>
selinux: fix inode_doinit_with_dentry() LABEL_INVALID error handling
Tianyue Ren <rentianyue(a)kylinos.cn>
selinux: fix error initialization in inode_doinit_with_dentry()
Andreas Gruenbacher <agruenba(a)redhat.com>
selinux: Convert isec->lock into a spinlock
Andreas Gruenbacher <agruenba(a)redhat.com>
selinux: Clean up initialization of isec->sclass
Andreas Gruenbacher <agruenba(a)redhat.com>
proc: Pass file mode to proc_pid_make_inode
Andreas Gruenbacher <agruenba(a)redhat.com>
selinux: Minor cleanups
Nathan Chancellor <nathan(a)kernel.org>
ion: Make user_ion_handle_put_nolock() a void function
Wei Mingzhi <whistler(a)member.fsf.org>
mt7601u: add USB device ID for some versions of XiaoDu WiFi Dongle.
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
ARM: crypto: comment out gcc warning that breaks clang builds
Florian Westphal <fw(a)strlen.de>
netfilter: nf_queue: do not allow packet truncation below transport header offset
Liang He <windhl(a)126.com>
net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()
Kuniyuki Iwashima <kuniyu(a)amazon.com>
net: ping6: Fix memleak in ipv6_renew_options().
Liang He <windhl(a)126.com>
scsi: ufs: host: Hold reference returned by of_parse_phandle()
ChenXiaoSong <chenxiaosong2(a)huawei.com>
ntfs: fix use-after-free in ntfs_ucsncmp()
Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
-------------
Diffstat:
Makefile | 8 +-
arch/arm/lib/xor-neon.c | 3 +-
arch/ia64/include/asm/processor.h | 2 +-
arch/mips/cavium-octeon/octeon-platform.c | 3 +-
arch/mips/kernel/proc.c | 2 +-
arch/mips/mm/tlbex.c | 4 +-
arch/nios2/include/asm/entry.h | 3 +-
arch/nios2/include/asm/ptrace.h | 2 +
arch/nios2/kernel/entry.S | 22 +++--
arch/nios2/kernel/signal.c | 3 +-
arch/nios2/kernel/syscall_table.c | 1 +
arch/nios2/kernel/time.c | 5 +-
arch/parisc/kernel/drivers.c | 9 +-
arch/powerpc/kernel/prom.c | 7 ++
arch/powerpc/platforms/powernv/rng.c | 2 +
arch/powerpc/sysdev/fsl_pci.c | 8 ++
arch/powerpc/sysdev/fsl_pci.h | 1 +
arch/x86/boot/Makefile | 2 +-
arch/x86/boot/compressed/Makefile | 4 +
arch/x86/entry/vdso/Makefile | 2 +-
arch/x86/kvm/emulate.c | 19 ++--
arch/x86/kvm/svm.c | 2 -
arch/x86/platform/olpc/olpc-xo1-sci.c | 2 +-
drivers/acpi/video_detect.c | 55 +++++++----
drivers/ata/libata-eh.c | 1 +
drivers/atm/idt77252.c | 1 +
drivers/gpu/drm/nouveau/nvkm/subdev/bios/base.c | 2 +-
drivers/irqchip/irq-tegra.c | 10 +-
drivers/macintosh/adb.c | 2 +-
drivers/md/dm-raid.c | 2 +-
drivers/md/raid10.c | 5 +-
drivers/md/raid5.c | 2 +-
drivers/misc/cxl/irq.c | 1 +
drivers/net/can/usb/ems_usb.c | 2 +-
drivers/net/ethernet/freescale/fec_ptp.c | 6 +-
drivers/net/sungem_phy.c | 1 +
drivers/net/usb/ax88179_178a.c | 14 +--
drivers/net/usb/usbnet.c | 8 +-
drivers/net/wireless/mediatek/mt7601u/usb.c | 1 +
drivers/pinctrl/nomadik/pinctrl-nomadik.c | 4 +-
drivers/pinctrl/qcom/pinctrl-msm8916.c | 4 +-
drivers/s390/scsi/zfcp_fc.c | 29 ++++--
drivers/s390/scsi/zfcp_fc.h | 6 +-
drivers/s390/scsi/zfcp_fsf.c | 4 +-
drivers/scsi/sg.c | 57 ++++++-----
drivers/scsi/ufs/ufshcd-pltfrm.c | 15 ++-
drivers/staging/android/ion/ion-ioctl.c | 8 +-
drivers/tty/serial/ucc_uart.c | 2 +
drivers/usb/core/hcd.c | 26 ++---
drivers/usb/gadget/legacy/inode.c | 1 +
drivers/usb/host/ohci-ppc-of.c | 1 +
drivers/vfio/vfio.c | 1 +
drivers/video/fbdev/i740fb.c | 9 +-
drivers/xen/xenbus/xenbus_dev_frontend.c | 4 +-
fs/attr.c | 2 +
fs/btrfs/disk-io.c | 14 +++
fs/btrfs/tree-log.c | 4 +-
fs/ext4/inline.c | 3 +
fs/ext4/inode.c | 7 ++
fs/ext4/namei.c | 23 ++++-
fs/ext4/resize.c | 11 +++
fs/ext4/xattr.c | 6 +-
fs/ext4/xattr.h | 13 +++
fs/fuse/inode.c | 6 ++
fs/nfs/nfs4proc.c | 3 +
fs/ntfs/attrib.c | 8 +-
fs/proc/base.c | 23 ++---
fs/proc/fd.c | 6 +-
fs/proc/internal.h | 2 +-
fs/proc/namespaces.c | 3 +-
include/linux/bpf.h | 11 +++
include/linux/buffer_head.h | 25 ++++-
include/linux/pci_ids.h | 2 +
include/linux/usb/hcd.h | 1 +
include/net/bluetooth/l2cap.h | 1 +
include/sound/core.h | 8 ++
include/trace/events/spmi.h | 12 +--
include/uapi/linux/swab.h | 4 +-
init/main.c | 14 +--
kernel/bpf/core.c | 16 ++-
kernel/bpf/syscall.c | 36 +++++--
net/9p/client.c | 4 +-
net/bluetooth/l2cap_core.c | 68 +++++++++----
net/ipv4/tcp_output.c | 7 +-
net/ipv6/ping.c | 6 ++
net/netfilter/nf_tables_api.c | 3 +-
net/netfilter/nfnetlink_queue.c | 7 +-
net/rds/ib_recv.c | 1 +
net/sched/cls_route.c | 8 +-
net/sunrpc/backchannel_rqst.c | 14 +++
net/vmw_vsock/af_vsock.c | 9 +-
security/selinux/hooks.c | 123 +++++++++++++++---------
security/selinux/include/objsec.h | 5 +-
security/selinux/selinuxfs.c | 4 +-
sound/core/info.c | 6 +-
sound/core/misc.c | 94 ++++++++++++++++++
sound/core/timer.c | 11 ++-
sound/pci/hda/patch_cirrus.c | 1 +
sound/pci/hda/patch_conexant.c | 11 ++-
sound/usb/bcd2000/bcd2000.c | 3 +-
100 files changed, 753 insertions(+), 296 deletions(-)
The patch titled
Subject: mm/mprotect: Only reference swap pfn page if type match
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mprotect-only-reference-swap-pfn-page-if-type-match.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/mprotect: Only reference swap pfn page if type match
Date: Tue, 23 Aug 2022 18:11:38 -0400
Yu Zhao reported a bug after the commit "mm/swap: Add swp_offset_pfn() to
fetch PFN from swap entry" added a check in swp_offset_pfn() for swap type [1]:
kernel BUG at include/linux/swapops.h:117!
CPU: 46 PID: 5245 Comm: EventManager_De Tainted: G S O L 6.0.0-dbg-DEV #2
RIP: 0010:pfn_swap_entry_to_page+0x72/0xf0
Code: c6 48 8b 36 48 83 fe ff 74 53 48 01 d1 48 83 c1 08 48 8b 09 f6
c1 01 75 7b 66 90 48 89 c1 48 8b 09 f6 c1 01 74 74 5d c3 eb 9e <0f> 0b
48 ba ff ff ff ff 03 00 00 00 eb ae a9 ff 0f 00 00 75 13 48
RSP: 0018:ffffa59e73fabb80 EFLAGS: 00010282
RAX: 00000000ffffffe8 RBX: 0c00000000000000 RCX: ffffcd5440000000
RDX: 1ffffffffff7a80a RSI: 0000000000000000 RDI: 0c0000000000042b
RBP: ffffa59e73fabb80 R08: ffff9965ca6e8bb8 R09: 0000000000000000
R10: ffffffffa5a2f62d R11: 0000030b372e9fff R12: ffff997b79db5738
R13: 000000000000042b R14: 0c0000000000042b R15: 1ffffffffff7a80a
FS: 00007f549d1bb700(0000) GS:ffff99d3cf680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000440d035b3180 CR3: 0000002243176004 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
change_pte_range+0x36e/0x880
change_p4d_range+0x2e8/0x670
change_protection_range+0x14e/0x2c0
mprotect_fixup+0x1ee/0x330
do_mprotect_pkey+0x34c/0x440
__x64_sys_mprotect+0x1d/0x30
It triggers because pfn_swap_entry_to_page() could be called upon e.g. a
genuine swap entry.
Fix it by only calling it when it's a write migration entry where the page*
is used.
[1] https://lore.kernel.org/lkml/CAOUHufaVC2Za-p8m0aiHw6YkheDcrO-C3wRGixwDS32VT…
Link: https://lkml.kernel.org/r/20220823221138.45602-1-peterx@redhat.com
Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Yu Zhao <yuzhao(a)google.com>
Tested-by: Yu Zhao <yuzhao(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mprotect.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/mprotect.c~mm-mprotect-only-reference-swap-pfn-page-if-type-match
+++ a/mm/mprotect.c
@@ -196,10 +196,11 @@ static unsigned long change_pte_range(st
pages++;
} else if (is_swap_pte(oldpte)) {
swp_entry_t entry = pte_to_swp_entry(oldpte);
- struct page *page = pfn_swap_entry_to_page(entry);
pte_t newpte;
if (is_writable_migration_entry(entry)) {
+ struct page *page = pfn_swap_entry_to_page(entry);
+
/*
* A protection check is difficult so
* just be safe and disable write
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-mprotect-only-reference-swap-pfn-page-if-type-match.patch
mm-x86-use-swp_type_bits-in-3-level-swap-macros.patch
mm-swap-comment-all-the-ifdef-in-swapopsh.patch
mm-swap-add-swp_offset_pfn-to-fetch-pfn-from-swap-entry.patch
mm-thp-carry-over-dirty-bit-when-thp-splits-on-pmd.patch
mm-remember-young-dirty-bit-for-page-migrations.patch
mm-swap-cache-maximum-swapfile-size-when-init-swap.patch
mm-swap-cache-swap-migration-a-d-bits-support.patch
In the function report_idle_softirq(), ratelimit is a static variable
and checked for < 10 to return false. Since this variable is not
assigned to any other value before the check, this condition will
always be true.
report_idle_softirq() introduced with the
commit 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
checks for pending soft irq during stopping of tick and returns true if
tick can't be stopped.
The purpose of ratelimit is to limit printing of warning messages, so
move the check for printing warning message only. Also don't return
false as tick can't be stopped here for pending soft irq.
Fixes: 0345691b24c0 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Cc: stable(a)vger.kernel.org # 5.18+
---
One more change from the prior version of kernel:
In prior version !local_bh_blocked() check was only used for printing
warning message. The output was not used to decide whether to allow
tick-stop or not. But with change introduced with 5.18, the output
will also decide whether to stop tick or not even if
local_softirq_pending() returns true.
Not sure if this is intentional change or not.
kernel/time/tick-sched.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index b0e3c9205946..4db634525bf4 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1025,16 +1025,15 @@ static bool report_idle_softirq(void)
return false;
}
- if (ratelimit < 10)
- return false;
-
/* On RT, softirqs handling may be waiting on some lock */
if (!local_bh_blocked())
return false;
- pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n",
- pending);
- ratelimit++;
+ if (ratelimit < 10) {
+ pr_warn("NOHZ tick-stop error: local softirq work is pending, handler #%02x!!!\n",
+ pending);
+ ratelimit++;
+ }
return true;
}
--
2.35.3
One of the side-effects of mb_optimize_scan was that the optimized
functions to select next group to try were called even before we tried
the goal group. As a result we no longer allocate files close to
corresponding inodes as well as we don't try to expand currently
allocated extent in the same group. This results in reaim regression
with workfile.disk workload of upto 8% with many clients on my test
machine:
baseline mb_optimize_scan
Hmean disk-1 2114.16 ( 0.00%) 2099.37 ( -0.70%)
Hmean disk-41 87794.43 ( 0.00%) 83787.47 * -4.56%*
Hmean disk-81 148170.73 ( 0.00%) 135527.05 * -8.53%*
Hmean disk-121 177506.11 ( 0.00%) 166284.93 * -6.32%*
Hmean disk-161 220951.51 ( 0.00%) 207563.39 * -6.06%*
Hmean disk-201 208722.74 ( 0.00%) 203235.59 ( -2.63%)
Hmean disk-241 222051.60 ( 0.00%) 217705.51 ( -1.96%)
Hmean disk-281 252244.17 ( 0.00%) 241132.72 * -4.41%*
Hmean disk-321 255844.84 ( 0.00%) 245412.84 * -4.08%*
Also this is causing huge regression (time increased by a factor of 5 or
so) when untarring archive with lots of small files on some eMMC storage
cards.
Fix the problem by making sure we try goal group first.
Fixes: 196e402adf2e ("ext4: improve cr 0 / cr 1 group scanning")
CC: stable(a)vger.kernel.org
Reported-by: Stefan Wahren <stefan.wahren(a)i2se.com>
Link: https://lore.kernel.org/all/20220727105123.ckwrhbilzrxqpt24@quack3/
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/mballoc.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index bd8f8b5c3d30..41e1cfecac3b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1049,8 +1049,10 @@ static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
{
*new_cr = ac->ac_criteria;
- if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining)
+ if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining) {
+ *group = next_linear_group(ac, *group, ngroups);
return;
+ }
if (*new_cr == 0) {
ext4_mb_choose_next_group_cr0(ac, new_cr, group, ngroups);
@@ -2636,7 +2638,7 @@ static noinline_for_stack int
ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
{
ext4_group_t prefetch_grp = 0, ngroups, group, i;
- int cr = -1;
+ int cr = -1, new_cr;
int err = 0, first_err = 0;
unsigned int nr = 0, prefetch_ios = 0;
struct ext4_sb_info *sbi;
@@ -2711,13 +2713,11 @@ ext4_mb_regular_allocator(struct ext4_allocation_context *ac)
ac->ac_groups_linear_remaining = sbi->s_mb_max_linear_groups;
prefetch_grp = group;
- for (i = 0; i < ngroups; group = next_linear_group(ac, group, ngroups),
- i++) {
- int ret = 0, new_cr;
+ for (i = 0, new_cr = cr; i < ngroups; i++,
+ ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups)) {
+ int ret = 0;
cond_resched();
-
- ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups);
if (new_cr != cr) {
cr = new_cr;
goto repeat;
--
2.35.3
UNITED NATIONS COVID-19 OVERDUE COMPENSATION UNIT.
REFERENCE PAYMENT CODE: 8525595
BAILOUT AMOUNT: $500,000.00 USD
ADDRESS: NEW YORK, NY 10017, UNITED STATES
Dear award recipient, Covid-19 Compensation funds.
You are receiving this correspondence because we have finally reached
a consensus with the UN, IRS, and IMF that your total fund worth
$500,000.00 USD of Covid-19 Compensation payment shall be delivered to
your nominated mode of receipt, and you are expected to pay the sum of
$10,000 for levies owed to authorities after receiving your funds.
You have a grace period of 2 weeks to pay the $10,000 levy after you
have received your Covid-19 Compensation total sum of $500,000.00 USD.
We shall proceed with the payment of your bailout grant only if you
agree to the terms and conditions stated.
Contact Dr. Mustafa Ali for more information by email at: (
mustafaliali180(a)gmail.com ) Your consent in this regard would be
highly appreciated.
Regards,
Mr. Jimmy Moore.
Undersecretary-General United Nations
Office of Internal Oversight-UNIOS.