--Andy
> On Apr 18, 2020, at 12:42 PM, Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
>>> On Fri, Apr 17, 2020 at 5:12 PM Dan Williams <dan.j.williams(a)intel.com> wrote:
>>>
>>> @@ -106,12 +108,10 @@ static __always_inline __must_check unsigned long
>>> memcpy_mcsafe(void *dst, const void *src, size_t cnt)
>>> {
>>> #ifdef CONFIG_X86_MCE
>>> - i(static_branch_unlikely(&mcsafe_key))
>>> - return __memcpy_mcsafe(dst, src, cnt);
>>> - else
>>> + if (static_branch_unlikely(&mcsafe_slow_key))
>>> + return memcpy_mcsafe_slow(dst, src, cnt);
>>> #endif
>>> - memcpy(dst, src, cnt);
>>> - return 0;
>>> + return memcpy_mcsafe_fast(dst, src, cnt);
>>> }
>
> It strikes me that I see no advantages to making this an inline function at all.
>
> Even for the good case - where it turns into just a memcpy because MCE
> is entirely disabled - it doesn't seem to matter.
>
> The only case that really helps is when the memcpy can be turned into
> a single access. Which - and I checked - does exist, with people doing
>
> r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t));
>
> to read a single 64-bit field which looks aligned to me.
>
> But that code is incredible garbage anyway, since even on a broken
> machine, there's no actual reason to use the slow variant for that
> whole access that I can tell. The macs-safe copy routines do not do
> anything worthwhile for a single access.
Maybe I’m missing something obvious, but what’s the alternative? The _mcsafe variants don’t just avoid the REP mess — they also tell the kernel that this particular access is recoverable via extable. With a regular memory access, the CPU may not explode, but do_machine_check() will, at very best, OOPS, and even that requires a certain degree of optimism. A panic is more likely.
I recently tracked down a problem I observed when booting a v5.4 kernel
on a sparsemem UMA arm platform which includes a no-map reserved-memory
region in the middle of its HighMem zone.
When memmap_init_zone() is invoked the pfn's that correspond to the
no-map region fail the early_pfn_valid() check and the struct page
structures are not initialized creating a "hole" in the memmap. Later in
my boot sequence the sock_init() initcall leads to a bpf_prog_alloc()
which ends up stealing a page from the block containing the no-map
region which then leads to a call of move_freepages_block() to
reclassify the migratetype of the entire block.
The function move_freepages() includes a check of pfn_valid_within for
each page in the range, but since the arm architecture doesn't include
HOLES_IN_ZONE this check is optimized out and the uninitialized struct
page is accessed. Specifically, PageLRU() calls compound_head() on the
page and if the page->compound_head value is odd the value is used as a
pointer to the head struct page. For uninitialized memory there is a
high chance that a random value of compound head will be odd and contain
an invalid pointer value that causes the kernel to abort and panic.
As you might imagine specifying HOLES_IN_ZONE for the arm build allows
pfn_valid_within to protect against accessing the uninitialized struct
page. However, the performance penalty this incurs seems unnecessary.
Commit 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations") as
part of the "sparse_init rewrite" series introduced in v4.19 changed the
way sparsemem memmaps are initialized. Prior to this patch the sparsemem
memmaps are initialized to all 0's. I observed that on older kernels the
"uninitialized" struct page access also occurs, but the 0
page->compound_head indicates no compound head and the page pointer is
therefore not corrupted. The other logic ends up causing the page to be
skipped and everything "happens to work".
While considering solutions to this issue I observed that the problem
does not occur in the current upstream as a result of a combination of
other commits. The following commits provided functionality to
initialize struct page structures for pages that are unavailable like
the no-map region in my system:
commit a4a3ede2132a ("mm: zero reserved and unavailable struct pages")
commit 907ec5fca3dc ("mm: zero remaining unavailable struct pages")
commit ec393a0f014e ("mm: return zero_resv_unavail optimization")
commit e822969cab48 ("mm/page_alloc.c: fix uninitialized memmaps on a
partially populated last section")
commit 4b094b7851bf ("mm/page_alloc.c: initialize memmap of unavailable
memory directly")
However, those commits added the functionality to the free_area_init()
and free_area_init_nodes() functions and the non-NUMA arm architecture
did not begin calling free_area_init() until the following commit in v5.8:
commit a32c1c61212d ("arm: simplify detection of memory zone boundaries")
Prior to that commit the non-NUMA arm architecture called
free_area_init_node() directly at the end of zone_sizes_init().
So while the problem appears to be fixed upstream by commit a32c1c61212d
("arm: simplify detection of memory zone boundaries") it is still
present in stable branches between v4.19.y and v5.7.y inclusive and
probably for architectures other than arm as well that didn't call
free_area_init(). This upstream commit is not easily/safely backportable
to stable branches, but if we focus on the sliver of functionality that
adds the initialization code from free_area_init() to the
zones_sizes_init() function used by non-NUMA arm kernels I believe a
simple patch could be developed for each relevant stable branch to
resolve the issue I am observing. Similar patches could also be applied
for other architectures that now call free_area_init() upstream but not
in one of these stable branches, but I am not in a position to test
those architectures.
For the linux-5.4.y branch such a patch might look like this:
>From 671c341b5cdb8360349c33ade43115e28ca56a8a Mon Sep 17 00:00:00 2001
From: Doug Berger <opendmb(a)gmail.com>
Date: Tue, 25 Aug 2020 14:39:43 -0700
Subject: [PATCH] ARM: mm: sync zone_sizes_init with free_area_init
The arm architecture does not invoke the common function
free_area_init(). Instead for non-NUMA builds it invokes
free_area_init_node() directly from zone_sizes_init().
As a result recent changes in free_area_init() are not
picked up by arm architecture builds.
This commit adds the updates to the zone_sizes_init()
function to achieve parity with the free_area_init()
functionality.
Fixes: 35fd1eb1e821 ("mm/sparse: abstract sparse buffer allocations")
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
arch/arm/mm/init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 6f19ba53fd1f..4f171d834c60 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -169,6 +169,7 @@ static void __init zone_sizes_init(unsigned long
min, unsigned long max_low,
arm_dma_zone_size >> PAGE_SHIFT);
#endif
+ zero_resv_unavail();
free_area_init_node(0, zone_size, min, zhole_size);
}
--
2.7.4
I am unclear of the mechanics for submitting such a stable patch when it
represents a perhaps less than obvious sliver of the upstream commit
that fixes the issue, so I am soliciting guidance with this email.
Thank you for taking the time to read this far, and please let me know
how I can improve the situation,
Doug
This is the start of the stable review cycle for the 4.19.141 release.
There are 92 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 22 Aug 2020 09:15:09 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.141-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.141-rc1
Sandeep Raghuraman <sandy.8925(a)gmail.com>
drm/amdgpu: Fix bug where DPM is not enabled after hibernate and resume
Marius Iacob <themariusus(a)gmail.com>
drm: Added orientation quirk for ASUS tablet model T103HAF
Denis Efremov <efremov(a)linux.com>
drm/radeon: fix fb_div check in ni_init_smc_spll_table()
Tomasz Maciej Nowak <tmn505(a)gmail.com>
arm64: dts: marvell: espressobin: add ethernet alias
Hugh Dickins <hughd(a)google.com>
khugepaged: retract_page_tables() remember to test exit
Geert Uytterhoeven <geert+renesas(a)glider.be>
sh: landisk: Add missing initialization of sh_io_port_base
Daniel Díaz <daniel.diaz(a)linaro.org>
tools build feature: Quote CC and CXX for their arguments
Vincent Whitchurch <vincent.whitchurch(a)axis.com>
perf bench mem: Always memset source before memcpy
Dinghao Liu <dinghao.liu(a)zju.edu.cn>
ALSA: echoaudio: Fix potential Oops in snd_echo_resume()
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
mfd: dln2: Run event handler loop under spinlock
Tiezhu Yang <yangtiezhu(a)loongson.cn>
test_kmod: avoid potential double free in trigger_config_run_type()
Colin Ian King <colin.king(a)canonical.com>
fs/ufs: avoid potential u32 multiplication overflow
Eric Biggers <ebiggers(a)google.com>
fs/minix: remove expected error message in block_to_path()
Eric Biggers <ebiggers(a)google.com>
fs/minix: fix block limit check for V1 filesystems
Eric Biggers <ebiggers(a)google.com>
fs/minix: set s_maxbytes correctly
Jeffrey Mitchell <jeffrey.mitchell(a)starlab.io>
nfs: Fix getxattr kernel panic and memory overflow
Wang Hai <wanghai38(a)huawei.com>
net: qcom/emac: add missed clk_disable_unprepare in error path of emac_clks_phase1_init
Dan Carpenter <dan.carpenter(a)oracle.com>
drm/vmwgfx: Fix two list_for_each loop exit tests
Dan Carpenter <dan.carpenter(a)oracle.com>
drm/vmwgfx: Use correct vmw_legacy_display_unit pointer
Colin Ian King <colin.king(a)canonical.com>
Input: sentelic - fix error return when fsp_reg_write fails
Krzysztof Sobota <krzysztof.sobota(a)nokia.com>
watchdog: initialize device before misc_register
Ewan D. Milne <emilne(a)redhat.com>
scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport
Stafford Horne <shorne(a)gmail.com>
openrisc: Fix oops caused when dumping stack
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: avoid race when unregistering slave
Thomas Hebb <tommyhebb(a)gmail.com>
tools build feature: Use CC and CXX from parent
Rayagonda Kokatanur <rayagonda.kokatanur(a)broadcom.com>
pwm: bcm-iproc: handle clk_get_rate() return
Xu Wang <vulab(a)iscas.ac.cn>
clk: clk-atlas6: fix return value check in atlas6_clk_init()
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: slave: only send STOP event when we have been addressed
Liu Yi L <yi.l.liu(a)intel.com>
iommu/vt-d: Enforce PASID devTLB field mask
Colin Ian King <colin.king(a)canonical.com>
iommu/omap: Check for failure of a call to omap_iommu_dump_ctx
Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
selftests/powerpc: ptrace-pkey: Don't update expected UAMOR value
Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
selftests/powerpc: ptrace-pkey: Update the test to mark an invalid pkey correctly
Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
selftests/powerpc: ptrace-pkey: Rename variables to make it easier to follow code
Ming Lei <ming.lei(a)redhat.com>
dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
Steve Longerbeam <slongerbeam(a)gmail.com>
gpu: ipu-v3: image-convert: Combine rotate/no-rotate irq handlers
Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
mmc: renesas_sdhi_internal_dmac: clean up the code for dma complete
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: fix break and sysrq handling
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: clean up receive processing
Johan Hovold <johan(a)kernel.org>
USB: serial: ftdi_sio: make process-packet buffer unsigned
Paul Kocialkowski <paul.kocialkowski(a)bootlin.com>
media: rockchip: rga: Only set output CSC mode for RGB input
Paul Kocialkowski <paul.kocialkowski(a)bootlin.com>
media: rockchip: rga: Introduce color fmt macros and refactor CSC mode logic
Jason Gunthorpe <jgg(a)nvidia.com>
RDMA/ipoib: Fix ABBA deadlock with ipoib_reap_ah()
Kamal Heib <kamalheib1(a)gmail.com>
RDMA/ipoib: Return void from ipoib_ib_dev_stop()
Charles Keepax <ckeepax(a)opensource.cirrus.com>
mfd: arizona: Ensure 32k clock is put on driver unbind and error
Liu Ying <victor.liu(a)nxp.com>
drm/imx: imx-ldb: Disable both channels for split mode in enc->disable()
Sibi Sankar <sibis(a)codeaurora.org>
remoteproc: qcom: q6v5: Update running state before requesting stop
Adrian Hunter <adrian.hunter(a)intel.com>
perf intel-pt: Fix FUP packet state
Kees Cook <keescook(a)chromium.org>
module: Correctly truncate sysfs sections output
Anton Blanchard <anton(a)ozlabs.org>
pseries: Fix 64 bit logical memory block panic
Ahmad Fatoum <a.fatoum(a)pengutronix.de>
watchdog: f71808e_wdt: clear watchdog timeout occurred flag
Ahmad Fatoum <a.fatoum(a)pengutronix.de>
watchdog: f71808e_wdt: remove use of wrong watchdog_info option
Ahmad Fatoum <a.fatoum(a)pengutronix.de>
watchdog: f71808e_wdt: indicate WDIOF_CARDRESET support in watchdog_info.options
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Use trace_sched_process_free() instead of exit() for pid tracing
Kevin Hao <haokexin(a)gmail.com>
tracing/hwlat: Honor the tracing_cpumask
Muchun Song <songmuchun(a)bytedance.com>
kprobes: Fix NULL pointer dereference at kprobe_ftrace_handler
Chengming Zhou <zhouchengming(a)bytedance.com>
ftrace: Setup correct FTRACE_FL_REGS flags for module
Michal Koutný <mkoutny(a)suse.com>
mm/page_counter.c: fix protection usage propagation
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: change slot number type s16 to u16
Mikulas Patocka <mpatocka(a)redhat.com>
ext2: fix missing percpu_counter_inc
Huacai Chen <chenhc(a)lemote.com>
MIPS: CPU#0 is not hotpluggable
Lukas Wunner <lukas(a)wunner.de>
driver core: Avoid binding drivers to dead devices
Johannes Berg <johannes.berg(a)intel.com>
mac80211: fix misplaced while instead of if
Coly Li <colyli(a)suse.de>
bcache: fix overflow in offset_to_stripe()
Coly Li <colyli(a)suse.de>
bcache: allocate meta data pages as compound pages
ChangSyun Peng <allenpeng(a)synology.com>
md/raid5: Fix Force reconstruct-write io stuck in degraded raid5
Kees Cook <keescook(a)chromium.org>
net/compat: Add missing sock updates for SCM_RIGHTS
Jonathan McDowell <noodles(a)earth.li>
net: stmmac: dwmac1000: provide multicast filter fallback
Jonathan McDowell <noodles(a)earth.li>
net: ethernet: stmmac: Disable hardware multicast filter
Eugeniu Rosca <erosca(a)de.adit-jv.com>
media: vsp1: dl: Fix NULL pointer dereference on unbind
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc: Fix circular dependency between percpu.h and mmu.h
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc: Allow 4224 bytes of stack expansion for the signal frame
Paul Aurich <paul(a)darkrain42.org>
cifs: Fix leak when handling lease break for cached root fid
Max Filippov <jcmvbkbc(a)gmail.com>
xtensa: fix xtensa_pmu_setup prototype
Alexandru Ardelean <alexandru.ardelean(a)analog.com>
iio: dac: ad5592r: fix unbalanced mutex unlocks in ad5592r_read_raw()
Christian Eggers <ceggers(a)arri.de>
dt-bindings: iio: io-channel-mux: Fix compatible string in example code
Pavel Machek <pavel(a)denx.de>
btrfs: fix return value mixup in btrfs_get_extent
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix memory leaks after failure to lookup checksums during inode logging
Josef Bacik <josef(a)toxicpanda.com>
btrfs: only search for left_info if there is no right_info in try_merge_free_space
David Sterba <dsterba(a)suse.com>
btrfs: fix messages after changing compression level by remount
Josef Bacik <josef(a)toxicpanda.com>
btrfs: open device without device_list_mutex
Anand Jain <anand.jain(a)oracle.com>
btrfs: don't traverse into the seed devices in show_devname
Tom Rix <trix(a)redhat.com>
btrfs: ref-verify: fix memory leak in add_block_entry
Qu Wenruo <wqu(a)suse.com>
btrfs: don't allocate anonymous block device for user invisible roots
Qu Wenruo <wqu(a)suse.com>
btrfs: free anon block device right after subvolume deletion
Bjorn Helgaas <bhelgaas(a)google.com>
PCI: Probe bridge window attributes once at enumeration-time
Ansuel Smith <ansuelsmth(a)gmail.com>
PCI: qcom: Add support for tx term offset for rev 2.1.0
Ansuel Smith <ansuelsmth(a)gmail.com>
PCI: qcom: Define some PARF params needed for ipq8064 SoC
Rajat Jain <rajatja(a)google.com>
PCI: Add device even if driver attach failed
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI: hotplug: ACPI: Fix context refcounting in acpiphp_grab_context()
Thomas Gleixner <tglx(a)linutronix.de>
genirq/affinity: Make affinity setting if activated opt-in
Steve French <stfrench(a)microsoft.com>
smb3: warn on confusing error scenario with sec=krb5
-------------
Diffstat:
.../bindings/iio/multiplexer/io-channel-mux.txt | 2 +-
Makefile | 4 +-
.../boot/dts/marvell/armada-3720-espressobin.dts | 6 ++
arch/mips/kernel/topology.c | 2 +-
arch/openrisc/kernel/stacktrace.c | 18 +++++-
arch/powerpc/include/asm/percpu.h | 4 +-
arch/powerpc/mm/fault.c | 7 ++-
arch/powerpc/platforms/pseries/hotplug-memory.c | 2 +-
arch/sh/boards/mach-landisk/setup.c | 3 +
arch/x86/kernel/apic/vector.c | 4 ++
arch/xtensa/kernel/perf_event.c | 2 +-
drivers/base/dd.c | 4 +-
drivers/clk/sirf/clk-atlas6.c | 2 +-
drivers/gpu/drm/amd/powerplay/smumgr/ci_smumgr.c | 5 +-
drivers/gpu/drm/drm_panel_orientation_quirks.c | 6 ++
drivers/gpu/drm/imx/imx-ldb.c | 7 ++-
drivers/gpu/drm/radeon/ni_dpm.c | 2 +-
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 8 +--
drivers/gpu/drm/vmwgfx/vmwgfx_ldu.c | 5 +-
drivers/gpu/ipu-v3/ipu-image-convert.c | 58 ++++++-----------
drivers/i2c/busses/i2c-rcar.c | 15 +++--
drivers/iio/dac/ad5592r-base.c | 4 +-
drivers/infiniband/ulp/ipoib/ipoib.h | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_ib.c | 67 +++++++++-----------
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +
drivers/input/mouse/sentelic.c | 2 +-
drivers/iommu/omap-iommu-debug.c | 3 +
drivers/irqchip/irq-gic-v3-its.c | 5 +-
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/bset.c | 2 +-
drivers/md/bcache/btree.c | 2 +-
drivers/md/bcache/journal.c | 4 +-
drivers/md/bcache/super.c | 2 +-
drivers/md/bcache/writeback.c | 14 +++--
drivers/md/bcache/writeback.h | 19 +++++-
drivers/md/dm-rq.c | 3 -
drivers/md/raid5.c | 3 +-
drivers/media/platform/rockchip/rga/rga-hw.c | 29 +++++----
drivers/media/platform/rockchip/rga/rga-hw.h | 5 ++
drivers/media/platform/vsp1/vsp1_dl.c | 2 +
drivers/mfd/arizona-core.c | 18 ++++++
drivers/mfd/dln2.c | 4 ++
drivers/mmc/host/renesas_sdhi_internal_dmac.c | 18 ++++--
drivers/net/ethernet/qualcomm/emac/emac.c | 17 ++++-
.../net/ethernet/stmicro/stmmac/dwmac-ipq806x.c | 1 +
.../net/ethernet/stmicro/stmmac/dwmac1000_core.c | 3 +
drivers/pci/bus.c | 6 +-
drivers/pci/controller/dwc/pcie-qcom.c | 41 +++++++++++-
drivers/pci/hotplug/acpiphp_glue.c | 14 ++++-
drivers/pci/probe.c | 52 +++++++++++++++
drivers/pci/quirks.c | 5 +-
drivers/pci/setup-bus.c | 45 ++-----------
drivers/pwm/pwm-bcm-iproc.c | 9 ++-
drivers/remoteproc/qcom_q6v5.c | 2 +
drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
drivers/usb/serial/ftdi_sio.c | 57 ++++++++++-------
drivers/watchdog/f71808e_wdt.c | 13 +++-
drivers/watchdog/watchdog_dev.c | 18 +++---
fs/btrfs/disk-io.c | 13 +++-
fs/btrfs/free-space-cache.c | 4 +-
fs/btrfs/inode.c | 4 +-
fs/btrfs/ref-verify.c | 2 +
fs/btrfs/super.c | 35 +++++------
fs/btrfs/tree-log.c | 8 +--
fs/btrfs/volumes.c | 21 ++++++-
fs/cifs/smb2misc.c | 73 +++++++++++++++-------
fs/cifs/smb2pdu.c | 2 +
fs/ext2/ialloc.c | 3 +-
fs/minix/inode.c | 12 ++--
fs/minix/itree_v1.c | 12 ++--
fs/minix/itree_v2.c | 13 ++--
fs/minix/minix.h | 1 -
fs/nfs/nfs4proc.c | 2 -
fs/nfs/nfs4xdr.c | 6 +-
fs/ocfs2/ocfs2.h | 4 +-
fs/ocfs2/suballoc.c | 4 +-
fs/ocfs2/super.c | 4 +-
fs/ufs/super.c | 2 +-
include/linux/intel-iommu.h | 4 +-
include/linux/irq.h | 13 ++++
include/linux/pci.h | 3 +
include/net/sock.h | 4 ++
kernel/irq/manage.c | 6 +-
kernel/kprobes.c | 7 +++
kernel/module.c | 22 ++++++-
kernel/trace/ftrace.c | 15 +++--
kernel/trace/trace_events.c | 4 +-
kernel/trace/trace_hwlat.c | 5 +-
lib/test_kmod.c | 2 +-
mm/khugepaged.c | 22 ++++---
mm/page_counter.c | 6 +-
net/compat.c | 1 +
net/core/sock.c | 21 +++++++
net/mac80211/sta_info.c | 2 +-
sound/pci/echoaudio/echoaudio.c | 2 -
tools/build/Makefile.feature | 2 +-
tools/build/feature/Makefile | 2 -
tools/perf/bench/mem-functions.c | 21 ++++---
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 21 +++----
.../testing/selftests/powerpc/ptrace/ptrace-pkey.c | 55 ++++++++--------
100 files changed, 715 insertions(+), 413 deletions(-)
musb_queue_resume_work() would call the provided callback if the runtime
PM status was 'active'. Otherwise, it would enqueue the request if the
hardware was still suspended (musb->is_runtime_suspended is true).
This causes a race with the runtime PM handlers, as it is possible to be
in the case where the runtime PM status is not yet 'active', but the
hardware has been awaken (PM resume function has been called).
When hitting the race, the resume work was not enqueued, which probably
triggered other bugs further down the stack. For instance, a telnet
connection on Ingenic SoCs would result in a 50/50 chance of a
segmentation fault somewhere in the musb code.
Rework the code so that either we call the callback directly if
(musb->is_runtime_suspended == 0), or enqueue the query otherwise.
Fixes: ea2f35c01d5e ("usb: musb: Fix sleeping function called from invalid context for hdrc glue")
Cc: stable(a)vger.kernel.org # v4.9
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
drivers/usb/musb/musb_core.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index 384a8039a7fd..462c10d7455a 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -2241,32 +2241,35 @@ int musb_queue_resume_work(struct musb *musb,
{
struct musb_pending_work *w;
unsigned long flags;
+ bool is_suspended;
int error;
if (WARN_ON(!callback))
return -EINVAL;
- if (pm_runtime_active(musb->controller))
- return callback(musb, data);
+ spin_lock_irqsave(&musb->list_lock, flags);
+ is_suspended = musb->is_runtime_suspended;
+
+ if (is_suspended) {
+ w = devm_kzalloc(musb->controller, sizeof(*w), GFP_ATOMIC);
+ if (!w) {
+ error = -ENOMEM;
+ goto out_unlock;
+ }
- w = devm_kzalloc(musb->controller, sizeof(*w), GFP_ATOMIC);
- if (!w)
- return -ENOMEM;
+ w->callback = callback;
+ w->data = data;
- w->callback = callback;
- w->data = data;
- spin_lock_irqsave(&musb->list_lock, flags);
- if (musb->is_runtime_suspended) {
list_add_tail(&w->node, &musb->pending_list);
error = 0;
- } else {
- dev_err(musb->controller, "could not add resume work %p\n",
- callback);
- devm_kfree(musb->controller, w);
- error = -EINPROGRESS;
}
+
+out_unlock:
spin_unlock_irqrestore(&musb->list_lock, flags);
+ if (!is_suspended)
+ error = callback(musb, data);
+
return error;
}
EXPORT_SYMBOL_GPL(musb_queue_resume_work);
--
2.28.0
The basic permission bits (protection bits in AmigaOS) have been broken
in Linux' affs - it would only set bits, but never delete them.
Also, contrary to the documentation, the Archived bit was not handled.
Let's fix this for good, and set the bits such that Linux and classic
AmigaOS can coexist in the most peaceful manner.
Also, update the documentation to represent the current state of things.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Max Staudt <max(a)enpas.org>
---
Documentation/filesystems/affs.rst | 16 ++++++++++------
fs/affs/amigaffs.c | 27 +++++++++++++++++++++++++++
fs/affs/file.c | 27 ++++++++++++++++++++++++++-
3 files changed, 63 insertions(+), 7 deletions(-)
diff --git a/Documentation/filesystems/affs.rst b/Documentation/filesystems/affs.rst
index 7f1a40dce6d3..5776cbd5fa53 100644
--- a/Documentation/filesystems/affs.rst
+++ b/Documentation/filesystems/affs.rst
@@ -110,13 +110,15 @@ The Amiga protection flags RWEDRWEDHSPARWED are handled as follows:
- R maps to r for user, group and others. On directories, R implies x.
- - If both W and D are allowed, w will be set.
+ - W maps to w.
- E maps to x.
- - H and P are always retained and ignored under Linux.
+ - D is ignored.
- - A is always reset when a file is written to.
+ - H, S and P are always retained and ignored under Linux.
+
+ - A is cleared when a file is written to.
User id and group id will be used unless set[gu]id are given as mount
options. Since most of the Amiga file systems are single user systems
@@ -128,11 +130,13 @@ Linux -> Amiga:
The Linux rwxrwxrwx file mode is handled as follows:
- - r permission will set R for user, group and others.
+ - r permission will allow R for user, group and others.
+
+ - w permission will allow W for user, group and others.
- - w permission will set W and D for user, group and others.
+ - x permission of the user will allow E for plain files.
- - x permission of the user will set E for plain files.
+ - D will be allowed for user, group and others.
- All other flags (suid, sgid, ...) are ignored and will
not be retained.
diff --git a/fs/affs/amigaffs.c b/fs/affs/amigaffs.c
index f708c45d5f66..7952f885e6c6 100644
--- a/fs/affs/amigaffs.c
+++ b/fs/affs/amigaffs.c
@@ -420,24 +420,51 @@ affs_mode_to_prot(struct inode *inode)
u32 prot = AFFS_I(inode)->i_protect;
umode_t mode = inode->i_mode;
+ /*
+ * First, clear all RWED bits for owner, group, other.
+ * Then, recalculate them afresh.
+ *
+ * We'll always clear the delete-inhibit bit for the owner,
+ * as that is the classic single-user mode AmigaOS protection
+ * bit and we need to stay compatible with all scenarios.
+ *
+ * Since multi-user AmigaOS is an extension, we'll only set
+ * the delete-allow bit if any of the other bits in the same
+ * user class (group/other) are used.
+ */
+ prot &= ~(FIBF_NOEXECUTE | FIBF_NOREAD
+ | FIBF_NOWRITE | FIBF_NODELETE
+ | FIBF_GRP_EXECUTE | FIBF_GRP_READ
+ | FIBF_GRP_WRITE | FIBF_GRP_DELETE
+ | FIBF_OTR_EXECUTE | FIBF_OTR_READ
+ | FIBF_OTR_WRITE | FIBF_OTR_DELETE);
+
+ /* Classic single-user AmigaOS flags. These are inverted. */
if (!(mode & 0100))
prot |= FIBF_NOEXECUTE;
if (!(mode & 0400))
prot |= FIBF_NOREAD;
if (!(mode & 0200))
prot |= FIBF_NOWRITE;
+
+ /* Multi-user extended flags. Not inverted. */
if (mode & 0010)
prot |= FIBF_GRP_EXECUTE;
if (mode & 0040)
prot |= FIBF_GRP_READ;
if (mode & 0020)
prot |= FIBF_GRP_WRITE;
+ if (mode & 0070)
+ prot |= FIBF_GRP_DELETE;
+
if (mode & 0001)
prot |= FIBF_OTR_EXECUTE;
if (mode & 0004)
prot |= FIBF_OTR_READ;
if (mode & 0002)
prot |= FIBF_OTR_WRITE;
+ if (mode & 0007)
+ prot |= FIBF_OTR_DELETE;
AFFS_I(inode)->i_protect = prot;
}
diff --git a/fs/affs/file.c b/fs/affs/file.c
index a26a0f96c119..9a137e2f1782 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -429,6 +429,25 @@ static int affs_write_begin(struct file *file, struct address_space *mapping,
return ret;
}
+static int affs_write_end(struct file *file, struct address_space *mapping,
+ loff_t pos, unsigned int len, unsigned int copied,
+ struct page *page, void *fsdata)
+{
+ struct inode *inode = mapping->host;
+ int ret;
+
+ ret = generic_write_end(file, mapping, pos, len, copied,
+ page, fsdata);
+
+ /* Clear Archived bit on file writes, as AmigaOS would do */
+ if (AFFS_I(inode)->i_protect & FIBF_ARCHIVED) {
+ AFFS_I(inode)->i_protect &= ~FIBF_ARCHIVED;
+ mark_inode_dirty(inode);
+ }
+
+ return ret;
+}
+
static sector_t _affs_bmap(struct address_space *mapping, sector_t block)
{
return generic_block_bmap(mapping,block,affs_get_block);
@@ -438,7 +457,7 @@ const struct address_space_operations affs_aops = {
.readpage = affs_readpage,
.writepage = affs_writepage,
.write_begin = affs_write_begin,
- .write_end = generic_write_end,
+ .write_end = affs_write_end,
.direct_IO = affs_direct_IO,
.bmap = _affs_bmap
};
@@ -795,6 +814,12 @@ static int affs_write_end_ofs(struct file *file, struct address_space *mapping,
if (tmp > inode->i_size)
inode->i_size = AFFS_I(inode)->mmu_private = tmp;
+ /* Clear Archived bit on file writes, as AmigaOS would do */
+ if (AFFS_I(inode)->i_protect & FIBF_ARCHIVED) {
+ AFFS_I(inode)->i_protect &= ~FIBF_ARCHIVED;
+ mark_inode_dirty(inode);
+ }
+
err_first_bh:
unlock_page(page);
put_page(page);
--
2.20.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8a224ffb3f52b0027f6b7279854c71a31c48fc97 Mon Sep 17 00:00:00 2001
From: Chengming Zhou <zhouchengming(a)bytedance.com>
Date: Wed, 29 Jul 2020 02:05:53 +0800
Subject: [PATCH] ftrace: Setup correct FTRACE_FL_REGS flags for module
When module loaded and enabled, we will use __ftrace_replace_code
for module if any ftrace_ops referenced it found. But we will get
wrong ftrace_addr for module rec in ftrace_get_addr_new, because
rec->flags has not been setup correctly. It can cause the callback
function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called
with pt_regs set to NULL.
So setup correct FTRACE_FL_REGS flags for rec when we call
referenced_filters to find ftrace_ops references it.
Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com
Cc: stable(a)vger.kernel.org
Fixes: 8c4f3c3fa9681 ("ftrace: Check module functions being traced on reload")
Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c141d347f71a..d052f856f1cf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6198,8 +6198,11 @@ static int referenced_filters(struct dyn_ftrace *rec)
int cnt = 0;
for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
- if (ops_references_rec(ops, rec))
- cnt++;
+ if (ops_references_rec(ops, rec)) {
+ cnt++;
+ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS)
+ rec->flags |= FTRACE_FL_REGS;
+ }
}
return cnt;
@@ -6378,8 +6381,8 @@ void ftrace_module_enable(struct module *mod)
if (ftrace_start_up)
cnt += referenced_filters(rec);
- /* This clears FTRACE_FL_DISABLED */
- rec->flags = cnt;
+ rec->flags &= ~FTRACE_FL_DISABLED;
+ rec->flags += cnt;
if (ftrace_start_up && cnt) {
int failed = __ftrace_replace_code(rec, 1);
From: Raymond Tan <raymond.tan(a)intel.com>
Similar to some other IA platforms, Elkhart Lake too depends on the
PMU register write to request transition of Dx power state.
Thus, we add the PCI_DEVICE_ID_INTEL_EHLLP to the list of devices that
shall execute the ACPI _DSM method during D0/D3 sequence.
[heikki.krogerus(a)linux.intel.com: included Fixes tag]
Fixes: dbb0569de852 ("usb: dwc3: pci: Add Support for Intel Elkhart Lake Devices")
Cc: stable(a)vger.kernel.org
Signed-off-by: Raymond Tan <raymond.tan(a)intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
---
drivers/usb/dwc3/dwc3-pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/dwc3-pci.c b/drivers/usb/dwc3/dwc3-pci.c
index f5a61f57c74f0..242b6210380a4 100644
--- a/drivers/usb/dwc3/dwc3-pci.c
+++ b/drivers/usb/dwc3/dwc3-pci.c
@@ -147,7 +147,8 @@ static int dwc3_pci_quirks(struct dwc3_pci *dwc)
if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
if (pdev->device == PCI_DEVICE_ID_INTEL_BXT ||
- pdev->device == PCI_DEVICE_ID_INTEL_BXT_M) {
+ pdev->device == PCI_DEVICE_ID_INTEL_BXT_M ||
+ pdev->device == PCI_DEVICE_ID_INTEL_EHLLP) {
guid_parse(PCI_INTEL_BXT_DSM_GUID, &dwc->guid);
dwc->has_dsm_for_pm = true;
}
--
2.28.0