I ended up tracking down some rather nasty issues with f2fs (and other
filesystem modules) constantly crashing on my kernel down to a
combination of out of bounds memory accesses, one of which was coming
from brcmfmac during module load:
[ 30.891382] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4356-sdio for chip BCM4356/2
[ 30.894437] ==================================================================
[ 30.901581] BUG: KASAN: global-out-of-bounds in brcmf_fw_alloc_request+0x42c/0x480 [brcmfmac]
[ 30.909935] Read of size 1 at addr ffff2000024865df by task kworker/6:2/387
[ 30.916805]
[ 30.918261] CPU: 6 PID: 387 Comm: kworker/6:2 Tainted: G O 4.20.0-rc3Lyude-Test+ #19
[ 30.927251] Hardware name: amlogic khadas-vim2/khadas-vim2, BIOS 2018.07-rc2-armbian 09/11/2018
[ 30.935964] Workqueue: events brcmf_driver_register [brcmfmac]
[ 30.941641] Call trace:
[ 30.944058] dump_backtrace+0x0/0x3e8
[ 30.947676] show_stack+0x14/0x20
[ 30.950968] dump_stack+0x130/0x1c4
[ 30.954406] print_address_description+0x60/0x25c
[ 30.959066] kasan_report+0x1b4/0x368
[ 30.962683] __asan_report_load1_noabort+0x18/0x20
[ 30.967547] brcmf_fw_alloc_request+0x42c/0x480 [brcmfmac]
[ 30.967639] brcmf_sdio_probe+0x163c/0x2050 [brcmfmac]
[ 30.978035] brcmf_ops_sdio_probe+0x598/0xa08 [brcmfmac]
[ 30.983254] sdio_bus_probe+0x190/0x398
[ 30.983270] really_probe+0x2a0/0xa70
[ 30.983296] driver_probe_device+0x1b4/0x2d8
[ 30.994901] __driver_attach+0x200/0x280
[ 30.994914] bus_for_each_dev+0x10c/0x1a8
[ 30.994925] driver_attach+0x38/0x50
[ 30.994935] bus_add_driver+0x330/0x608
[ 30.994953] driver_register+0x140/0x388
[ 31.013965] sdio_register_driver+0x74/0xa0
[ 31.014076] brcmf_sdio_register+0x14/0x60 [brcmfmac]
[ 31.023177] brcmf_driver_register+0xc/0x18 [brcmfmac]
[ 31.023209] process_one_work+0x654/0x1080
[ 31.032266] worker_thread+0x4f0/0x1308
[ 31.032286] kthread+0x2a8/0x320
[ 31.039254] ret_from_fork+0x10/0x1c
[ 31.039269]
[ 31.044226] The buggy address belongs to the variable:
[ 31.044351] brcmf_firmware_path+0x11f/0xfffffffffffd3b40 [brcmfmac]
[ 31.055601]
[ 31.057031] Memory state around the buggy address:
[ 31.061800] ffff200002486480: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
[ 31.068983] ffff200002486500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.068993] >ffff200002486580: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
[ 31.068999] ^
[ 31.069017] ffff200002486600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.096521] ffff200002486680: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
[ 31.096528] ==================================================================
[ 31.096533] Disabling lock debugging due to kernel taint
It appears that when trying to determine the length of the string in the
alternate firmware path, we make the mistake of not handling the case
where the firmware path is empty correctly. Since strlen(mp_path) can
return 0, we'll end up accessing mp_path[-1] when the firmware_path
isn't provided through the module arguments.
So, fix this by just setting the end char to '\0' by default, and only
changing it if we have a non-zero length. Additionally, use strnlen()
with BRCMF_FW_ALTPATH_LEN instead of strlen() just to be extra safe.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Fixes: 2baa3aaee27f ("brcmfmac: introduce brcmf_fw_alloc_request() function")
Cc: Hante Meuleman <hante.meuleman(a)broadcom.com>
Cc: Pieter-Paul Giesberts <pieter-paul.giesberts(a)broadcom.com>
Cc: Franky Lin <franky.lin(a)broadcom.com>
Cc: Arend van Spriel <arend.vanspriel(a)broadcom.com>
Cc: Kalle Valo <kvalo(a)codeaurora.org>
Cc: Arend Van Spriel <arend.vanspriel(a)broadcom.com>
Cc: Himanshu Jha <himanshujha199640(a)gmail.com>
Cc: Dan Haab <dhaab(a)luxul.com>
Cc: Jia-Shyr Chuang <saint.chuang(a)cypress.com>
Cc: Ian Molton <ian(a)mnementh.co.uk>
Cc: <stable(a)vger.kernel.org> # v4.17+
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
---
.../net/wireless/broadcom/brcm80211/brcmfmac/firmware.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
index 9095b830ae4d..9927079a9ace 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/firmware.c
@@ -641,8 +641,9 @@ brcmf_fw_alloc_request(u32 chip, u32 chiprev,
struct brcmf_fw_request *fwreq;
char chipname[12];
const char *mp_path;
+ size_t mp_path_len;
u32 i, j;
- char end;
+ char end = '\0';
size_t reqsz;
for (i = 0; i < table_size; i++) {
@@ -667,7 +668,10 @@ brcmf_fw_alloc_request(u32 chip, u32 chiprev,
mapping_table[i].fw_base, chipname);
mp_path = brcmf_mp_global.firmware_path;
- end = mp_path[strlen(mp_path) - 1];
+ mp_path_len = strnlen(mp_path, BRCMF_FW_ALTPATH_LEN);
+ if (mp_path_len)
+ end = mp_path[mp_path_len - 1];
+
fwreq->n_items = n_fwnames;
for (j = 0; j < n_fwnames; j++) {
--
2.19.1
This is the start of the stable review cycle for the 4.14.71 release.
There are 126 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Sep 19 21:16:12 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.71-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.71-rc1
Linus Torvalds <torvalds(a)linux-foundation.org>
mm: get rid of vmacache_flush_all() entirely
Ian Kent <raven(a)themaw.net>
autofs: fix autofs_sbi() does not check super block type
Jason Wang <jasowang(a)redhat.com>
tuntap: fix use after free during release
Jason Wang <jasowang(a)redhat.com>
tun: fix use after free for ptr_ring
Wei Yongjun <weiyongjun1(a)huawei.com>
mtd: ubi: wl: Fix error return code in ubi_wl_init()
Taehee Yoo <ap420073(a)gmail.com>
ip: frags: fix crash in ip_do_fragment()
Peter Oskolkov <posk(a)google.com>
ip: process in-order fragments efficiently
Peter Oskolkov <posk(a)google.com>
ip: add helpers to process in-order fragments faster.
Dan Carpenter <dan.carpenter(a)oracle.com>
ipv4: frags: precedence bug in ip_expire()
Eric Dumazet <edumazet(a)google.com>
net: sk_buff rbnode reorg
Eric Dumazet <edumazet(a)google.com>
net: add rb_to_skb() and other rb tree helpers
Eric Dumazet <edumazet(a)google.com>
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
Florian Westphal <fw(a)strlen.de>
ipv6: defrag: drop non-last frags smaller than min mtu
Peter Oskolkov <posk(a)google.com>
net: modify skb_rbtree_purge to return the truesize of all purged skbs.
Eric Dumazet <edumazet(a)google.com>
net: speed up skb_rbtree_purge()
Peter Oskolkov <posk(a)google.com>
ip: discard IPv4 datagrams with overlapping segments.
Eric Dumazet <edumazet(a)google.com>
inet: frags: fix ip6frag_low_thresh boundary
Eric Dumazet <edumazet(a)google.com>
inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
Eric Dumazet <edumazet(a)google.com>
inet: frags: reorganize struct netns_frags
Eric Dumazet <edumazet(a)google.com>
rhashtable: reorganize struct rhashtable layout
Eric Dumazet <edumazet(a)google.com>
ipv6: frags: rewrite ip6_expire_frag_queue()
Eric Dumazet <edumazet(a)google.com>
inet: frags: do not clone skb in ip_expire()
Eric Dumazet <edumazet(a)google.com>
inet: frags: break the 2GB limit for frags storage
Eric Dumazet <edumazet(a)google.com>
inet: frags: remove inet_frag_maybe_warn_overflow()
Eric Dumazet <edumazet(a)google.com>
inet: frags: get rif of inet_frag_evicting()
Eric Dumazet <edumazet(a)google.com>
inet: frags: remove some helpers
Eric Dumazet <edumazet(a)google.com>
inet: frags: use rhashtables for reassembly units
Eric Dumazet <edumazet(a)google.com>
rhashtable: add schedule points
Eric Dumazet <edumazet(a)google.com>
ipv6: export ip6 fragments sysctl to unprivileged users
Eric Dumazet <edumazet(a)google.com>
inet: frags: refactor lowpan_net_frag_init()
Eric Dumazet <edumazet(a)google.com>
inet: frags: refactor ipv6_frag_init()
Kees Cook <keescook(a)chromium.org>
inet: frags: Convert timers to use timer_setup()
Eric Dumazet <edumazet(a)google.com>
inet: frags: refactor ipfrag_init()
Eric Dumazet <edumazet(a)google.com>
inet: frags: add a pointer to struct netns_frags
Eric Dumazet <edumazet(a)google.com>
inet: frags: change inet_frags_init_net() return value
Jani Nikula <jani.nikula(a)intel.com>
drm/i915: set DP Main Stream Attribute for color range on DDI platforms
Parav Pandit <parav(a)mellanox.com>
RDMA/cma: Do not ignore net namespace for unbound cm_id
Paul Burton <paul.burton(a)mips.com>
MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFSv4.1: Fix a potential layoutget/layoutrecall deadlock
Chao Yu <yuchao0(a)huawei.com>
f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize
Zumeng Chen <zumeng.chen(a)gmail.com>
mfd: ti_am335x_tscadc: Fix struct clk memory leak
Geert Uytterhoeven <geert+renesas(a)glider.be>
iommu/ipmmu-vmsa: Fix allocation in atomic context
Dan Carpenter <dan.carpenter(a)oracle.com>
f2fs: Fix uninitialized return in f2fs_ioc_shutdown()
Chao Yu <yuchao0(a)huawei.com>
f2fs: fix to wait on page writeback before updating page
Katsuhiro Suzuki <suzuki.katsuhiro(a)socionext.com>
media: helene: fix xtal frequency setting at power on
Mauricio Faria de Oliveira <mfo(a)canonical.com>
partitions/aix: fix usage of uninitialized lv_info and lvname structures
Mauricio Faria de Oliveira <mfo(a)canonical.com>
partitions/aix: append null character to print data from disk
Sylwester Nawrocki <s.nawrocki(a)samsung.com>
media: s5p-mfc: Fix buffer look up in s5p_mfc_handle_frame_{new, copy_time} functions
Nick Dyer <nick.dyer(a)itdev.co.uk>
Input: atmel_mxt_ts - only use first T9 instance
John Pittman <jpittman(a)redhat.com>
dm cache: only allow a single io_mode cache feature to be requested
Petr Machata <petrm(a)mellanox.com>
net: dcb: For wild-card lookups, use priority -1, not 0
Nicholas Mc Guire <hofrat(a)osadl.org>
MIPS: generic: fix missing of_node_put()
Nicholas Mc Guire <hofrat(a)osadl.org>
MIPS: Octeon: add missing of_node_put()
Chao Yu <yuchao0(a)huawei.com>
f2fs: fix to do sanity check with reserved blkaddr of inline inode
Peter Rosin <peda(a)axentia.se>
tpm/tpm_i2c_infineon: switch to i2c_lock_bus(..., I2C_LOCK_SEGMENT)
Linus Walleij <linus.walleij(a)linaro.org>
tpm_tis_spi: Pass the SPI IRQ down to the driver
Chao Yu <yuchao0(a)huawei.com>
f2fs: fix to skip GC if type in SSA and SIT is inconsistent
Jinbum Park <jinb.park7(a)gmail.com>
pktcdvd: Fix possible Spectre-v1 for pkt_devs
Chao Yu <yuchao0(a)huawei.com>
f2fs: try grabbing node page lock aggressively in sync scenario
Yelena Krivosheev <yelena(a)marvell.com>
net: mvneta: fix mtu change on port without link
Daniel Kurtz <djkurtz(a)chromium.org>
pinctrl/amd: only handle irq if it is pending and unmasked
Anton Vasilyev <vasilyev(a)ispras.ru>
gpio: ml-ioh: Fix buffer underwrite on probe error path
Dan Carpenter <dan.carpenter(a)oracle.com>
pinctrl: imx: off by one in imx_pinconf_group_dbg_show()
Joerg Roedel <jroedel(a)suse.de>
x86/mm: Remove in_nmi() warning from vmalloc_fault()
Marcel Holtmann <marcel(a)holtmann.org>
Bluetooth: hidp: Fix handling of strncpy for hid->name information
Surabhi Vishnoi <svishnoi(a)codeaurora.org>
ath10k: disable bundle mgmt tx completion event support
Huaisheng Ye <yehs1(a)lenovo.com>
tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access()
Anton Vasilyev <vasilyev(a)ispras.ru>
scsi: 3ware: fix return 0 on the error path of probe
Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
ata: libahci: Correct setting of DEVSLP register
Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
ata: libahci: Allow reconfigure of DEVSLP register
Paul Burton <paul.burton(a)mips.com>
MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET
Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
rpmsg: core: add support to power domains for devices
Loic Poulain <loic.poulain(a)linaro.org>
wlcore: Set rx_status boottime_ns field on rx
Sven Eckelmann <sven.eckelmann(a)openmesh.com>
ath10k: prevent active scans on potential unusable channels
Felix Fietkau <nbd(a)nbd.name>
ath9k_hw: fix channel maximum power level test
Felix Fietkau <nbd(a)nbd.name>
ath9k: report tx status on EOSP
Finn Thain <fthain(a)telegraphics.com.au>
macintosh/via-pmu: Add missing mmio accessors
Kan Liang <kan.liang(a)linux.intel.com>
perf evlist: Fix error out while applying initial delay and LBR
Jiri Olsa <jolsa(a)kernel.org>
perf c2c report: Fix crash for empty browser
Olga Kornievskaia <kolga(a)netapp.com>
NFSv4.0 fix client reference leak in callback
Christophe Leroy <christophe.leroy(a)c-s.fr>
perf tools: Allow overriding MAX_NR_CPUS at compile time
Randy Dunlap <rdunlap(a)infradead.org>
f2fs: fix defined but not used build warnings
Yunlong Song <yunlong.song(a)huawei.com>
f2fs: do not set free of current section
Chao Yu <yuchao0(a)huawei.com>
f2fs: fix to active page in lru list for read path
Anton Vasilyev <vasilyev(a)ispras.ru>
tty: rocket: Fix possible buffer overwrite on register_PCI
Michael Kelley <mikelley(a)microsoft.com>
Drivers: hv: vmbus: Cleanup synic memory free path
Anton Vasilyev <vasilyev(a)ispras.ru>
firmware: vpd: Fix section enabled flag on vpd_section_destroy
Dan Carpenter <dan.carpenter(a)oracle.com>
uio: potential double frees if __uio_register_device() fails
Anton Vasilyev <vasilyev(a)ispras.ru>
misc: ti-st: Fix memory leak in the error path of probe()
Philipp Zabel <p.zabel(a)pengutronix.de>
gpu: ipu-v3: default to id 0 on missing OF alias
Todor Tomov <todor.tomov(a)linaro.org>
media: camss: csid: Configure data type and decode format properly
Gaurav Kohli <gkohli(a)codeaurora.org>
timers: Clear timer_base::must_forward_clk with timer_base::lock held
BingJing Chang <bingjingc(a)synology.com>
md/raid5: fix data corruption of replacements after originals dropped
Mike Christie <mchristi(a)redhat.com>
scsi: target: fix __transport_register_session locking
Ming Lei <ming.lei(a)redhat.com>
blk-mq: fix updating tags depth
Arun Parameswaran <arun.parameswaran(a)broadcom.com>
net: phy: Fix the register offsets in Broadcom iProc mdio mux driver
Anton Vasilyev <vasilyev(a)ispras.ru>
media: dw2102: Fix memleak on sequence of probes
Anton Vasilyev <vasilyev(a)ispras.ru>
media: davinci: vpif_display: Mix memory leak on probe error path
Roman Gushchin <guro(a)fb.com>
selftests/bpf: fix a typo in map in map test
Reza Arbab <arbab(a)linux.ibm.com>
powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage
Dmitry Osipenko <digetx(a)gmail.com>
gpio: tegra: Move driver registration to subsys_init level
Johan Hedberg <johan.hedberg(a)intel.com>
Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV
Jae Hyun Yoo <jae.hyun.yoo(a)linux.intel.com>
i2c: aspeed: Add an explicit type casting for *get_clk_reg_val
Florian Fainelli <f.fainelli(a)gmail.com>
ethtool: Remove trailing semicolon for static inline
Dan Carpenter <dan.carpenter(a)oracle.com>
misc: mic: SCIF Fix scif_get_new_port() error handling
Alexey Brodkin <abrodkin(a)synopsys.com>
ARC: [plat-axs*]: Enable SWAP
Tomas Winkler <tomas.winkler(a)intel.com>
tpm: separate cmd_ready/go_idle from runtime_pm
Arnd Bergmann <arnd(a)arndb.de>
crypto: aes-generic - fix aes-generic regression on powerpc
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
switchtec: Fix Spectre v1 vulnerability
Filippo Sironi <sironi(a)amazon.de>
x86/microcode: Update the new microcode revision unconditionally
Prarit Bhargava <prarit(a)redhat.com>
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
Thomas Gleixner <tglx(a)linutronix.de>
cpu/hotplug: Prevent state corruption on error rollback
Neeraj Upadhyay <neeraju(a)codeaurora.org>
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
Takashi Iwai <tiwai(a)suse.de>
ALSA: hda - Fix cancel_work_sync() stall from jackpoll work
Sean Christopherson <sean.j.christopherson(a)intel.com>
KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr
Pierre Morel <pmorel(a)linux.ibm.com>
KVM: s390: vsie: copy wrapping keys to right place
Filipe Manana <fdmanana(a)suse.com>
Btrfs: fix data corruption when deduplicating between different files
Steve French <stfrench(a)microsoft.com>
smb3: check for and properly advertise directory lease support
Steve French <stfrench(a)microsoft.com>
SMB3: Backup intent flag missing for directory opens with backupuid mounts
Paul Burton <paul.burton(a)mips.com>
MIPS: VDSO: Match data page cache colouring when D$ aliases
Minchan Kim <minchan(a)kernel.org>
android: binder: fix the race mmap and alloc_new_buf_locked
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
block: bfq: swap puts in bfqg_and_blkg_put
Jens Axboe <axboe(a)kernel.dk>
nbd: don't allow invalid blocksize settings
James Smart <jsmart2021(a)gmail.com>
scsi: lpfc: Correct MDS diag and nvmet configuration
Felipe Balbi <felipe.balbi(a)linux.intel.com>
i2c: i801: fix DNV's SMBCTRL register offset
Shubhrajyoti Datta <shubhrajyoti.datta(a)xilinx.com>
i2c: xiic: Make the start and the byte count write atomic
-------------
Diffstat:
Documentation/networking/ip-sysctl.txt | 13 +-
Makefile | 4 +-
arch/arc/configs/axs101_defconfig | 1 -
arch/arc/configs/axs103_defconfig | 1 -
arch/arc/configs/axs103_smp_defconfig | 1 -
arch/mips/cavium-octeon/octeon-platform.c | 2 +
arch/mips/generic/init.c | 1 +
arch/mips/include/asm/io.h | 8 +-
arch/mips/kernel/vdso.c | 20 +
arch/mips/mm/c-r4k.c | 6 +-
arch/powerpc/platforms/powernv/npu-dma.c | 5 +-
arch/s390/kvm/vsie.c | 3 +-
arch/x86/kernel/cpu/microcode/amd.c | 24 +-
arch/x86/kernel/cpu/microcode/intel.c | 17 +-
arch/x86/kvm/vmx.c | 4 +-
arch/x86/mm/fault.c | 2 -
block/bfq-cgroup.c | 4 +-
block/blk-mq-tag.c | 8 +-
block/partitions/aix.c | 13 +-
crypto/Makefile | 2 +-
drivers/android/binder_alloc.c | 42 +-
drivers/ata/libahci.c | 20 +-
drivers/block/nbd.c | 3 +
drivers/block/pktcdvd.c | 4 +-
drivers/bluetooth/Kconfig | 1 +
drivers/char/tpm/tpm-interface.c | 50 +-
drivers/char/tpm/tpm.h | 12 +-
drivers/char/tpm/tpm2-space.c | 16 +-
drivers/char/tpm/tpm_crb.c | 101 +---
drivers/char/tpm/tpm_i2c_infineon.c | 8 +-
drivers/char/tpm/tpm_tis_spi.c | 9 +-
drivers/firmware/google/vpd.c | 5 +-
drivers/gpio/gpio-ml-ioh.c | 3 +-
drivers/gpio/gpio-tegra.c | 2 +-
drivers/gpu/drm/i915/i915_reg.h | 1 +
drivers/gpu/drm/i915/intel_ddi.c | 4 +
drivers/gpu/ipu-v3/ipu-common.c | 2 +
drivers/hv/hv.c | 14 +-
drivers/i2c/busses/i2c-aspeed.c | 2 +-
drivers/i2c/busses/i2c-i801.c | 7 +-
drivers/i2c/busses/i2c-xiic.c | 4 +
drivers/infiniband/core/cma.c | 13 +-
drivers/input/touchscreen/atmel_mxt_ts.c | 7 +-
drivers/iommu/ipmmu-vmsa.c | 9 +-
drivers/macintosh/via-pmu.c | 9 +-
drivers/md/dm-cache-target.c | 19 +-
drivers/md/raid5.c | 6 +
drivers/media/dvb-frontends/helene.c | 5 +-
drivers/media/platform/davinci/vpif_display.c | 24 +-
.../media/platform/qcom/camss-8x16/camss-csid.c | 16 +-
drivers/media/platform/s5p-mfc/s5p_mfc.c | 23 +-
drivers/media/usb/dvb-usb/dw2102.c | 19 +-
drivers/mfd/ti_am335x_tscadc.c | 3 +-
drivers/misc/mic/scif/scif_api.c | 20 +-
drivers/misc/ti-st/st_kim.c | 4 +-
drivers/mtd/ubi/wl.c | 8 +-
drivers/net/ethernet/marvell/mvneta.c | 1 -
drivers/net/phy/mdio-mux-bcm-iproc.c | 20 +-
drivers/net/tun.c | 21 +-
drivers/net/wireless/ath/ath10k/mac.c | 7 +
drivers/net/wireless/ath/ath10k/wmi-tlv.c | 5 +
drivers/net/wireless/ath/ath10k/wmi-tlv.h | 5 +
drivers/net/wireless/ath/ath9k/hw.c | 7 +-
drivers/net/wireless/ath/ath9k/xmit.c | 3 +-
drivers/net/wireless/ti/wlcore/rx.c | 8 +-
drivers/pci/switch/switchtec.c | 4 +
drivers/pinctrl/freescale/pinctrl-imx.c | 2 +-
drivers/pinctrl/pinctrl-amd.c | 3 +-
drivers/rpmsg/rpmsg_core.c | 7 +
drivers/scsi/3w-9xxx.c | 6 +-
drivers/scsi/3w-sas.c | 3 +
drivers/scsi/3w-xxxx.c | 2 +
drivers/scsi/lpfc/lpfc.h | 2 +-
drivers/target/target_core_transport.c | 5 +-
drivers/tty/rocket.c | 2 +-
drivers/uio/uio.c | 3 +-
fs/autofs4/autofs_i.h | 4 +-
fs/autofs4/inode.c | 1 -
fs/btrfs/ioctl.c | 19 +
fs/cifs/inode.c | 2 +
fs/cifs/smb2ops.c | 35 +-
fs/cifs/smb2pdu.c | 3 +
fs/f2fs/f2fs.h | 7 +-
fs/f2fs/file.c | 2 +-
fs/f2fs/gc.c | 8 +-
fs/f2fs/inline.c | 22 +
fs/f2fs/node.c | 4 +-
fs/f2fs/segment.h | 3 +
fs/f2fs/super.c | 21 +-
fs/f2fs/sysfs.c | 10 +-
fs/nfs/callback_proc.c | 4 +-
fs/nfs/callback_xdr.c | 11 +-
include/linux/mm_types.h | 2 +-
include/linux/mm_types_task.h | 2 +-
include/linux/rhashtable.h | 8 +-
include/linux/skbuff.h | 50 +-
include/linux/tpm.h | 2 +
include/linux/vm_event_item.h | 1 -
include/linux/vmacache.h | 5 -
include/net/inet_frag.h | 135 +++--
include/net/ip.h | 1 -
include/net/ipv6.h | 26 +-
include/uapi/linux/ethtool.h | 4 +-
include/uapi/linux/snmp.h | 1 +
kernel/cpu.c | 11 +-
kernel/time/timer.c | 29 +-
lib/rhashtable.c | 2 +
mm/debug.c | 4 +-
mm/vmacache.c | 38 --
net/bluetooth/hidp/core.c | 2 +-
net/core/skbuff.c | 31 +-
net/dcb/dcbnl.c | 11 +-
net/ieee802154/6lowpan/6lowpan_i.h | 26 +-
net/ieee802154/6lowpan/reassembly.c | 153 +++---
net/ipv4/inet_fragment.c | 378 +++-----------
net/ipv4/ip_fragment.c | 578 ++++++++++++---------
net/ipv4/proc.c | 7 +-
net/ipv4/tcp_fastopen.c | 8 +-
net/ipv4/tcp_input.c | 33 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 105 ++--
net/ipv6/proc.c | 5 +-
net/ipv6/reassembly.c | 217 ++++----
net/sched/sch_netem.c | 14 +-
sound/pci/hda/hda_codec.c | 3 +-
tools/perf/builtin-c2c.c | 3 +
tools/perf/perf.h | 2 +
tools/perf/util/evsel.c | 14 +
tools/testing/nvdimm/pmem-dax.c | 12 +-
tools/testing/selftests/bpf/test_verifier.c | 6 +-
129 files changed, 1473 insertions(+), 1362 deletions(-)
From: Miklos Szeredi <mszeredi(a)redhat.com>
When mounting overlayfs it needs a clean "work" directory under the
supplied workdir.
Previously the mount code removed this directory if it already existed and
created a new one. If the removal failed (e.g. directory was not empty)
then it fell back to a read-only mount not using the workdir.
While this has never been reported, it is possible to get a non-empty
"work" dir from a previous mount of overlayfs in case of crash in the
middle of an operation using the work directory.
In this case the left over state should be discarded and the overlay
filesystem will be consistent, guaranteed by the atomicity of operations on
moving to/from the workdir to the upper layer.
This patch implements cleaning out any files left in workdir. It is
implemented using real recursion for simplicity, but the depth is limited
to 2, because the worst case is that of a directory containing whiteouts
under "work".
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
---
fs/overlayfs/copy_up.c | 26 +-
fs/overlayfs/dir.c | 67 +--
fs/overlayfs/overlayfs.h | 3 +
fs/overlayfs/readdir.c | 77 ++-
fs/overlayfs/super.c | 20 +-
include/linux/scif.h | 1339 ----------------------------------------------
6 files changed, 92 insertions(+), 1440 deletions(-)
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 63a0d0ba36de..64c5386d0c1b 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -317,7 +317,6 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
struct dentry *upperdir;
struct dentry *upperdentry;
const struct cred *old_cred;
- struct cred *override_cred;
char *link = NULL;
if (WARN_ON(!workdir))
@@ -336,28 +335,7 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
return PTR_ERR(link);
}
- err = -ENOMEM;
- override_cred = prepare_creds();
- if (!override_cred)
- goto out_free_link;
-
- override_cred->fsuid = stat->uid;
- override_cred->fsgid = stat->gid;
- /*
- * CAP_SYS_ADMIN for copying up extended attributes
- * CAP_DAC_OVERRIDE for create
- * CAP_FOWNER for chmod, timestamp update
- * CAP_FSETID for chmod
- * CAP_CHOWN for chown
- * CAP_MKNOD for mknod
- */
- cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
- cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
- cap_raise(override_cred->cap_effective, CAP_FOWNER);
- cap_raise(override_cred->cap_effective, CAP_FSETID);
- cap_raise(override_cred->cap_effective, CAP_CHOWN);
- cap_raise(override_cred->cap_effective, CAP_MKNOD);
- old_cred = override_creds(override_cred);
+ old_cred = ovl_override_creds(dentry->d_sb);
err = -EIO;
if (lock_rename(workdir, upperdir) != NULL) {
@@ -380,9 +358,7 @@ int ovl_copy_up_one(struct dentry *parent, struct dentry *dentry,
out_unlock:
unlock_rename(workdir, upperdir);
revert_creds(old_cred);
- put_cred(override_cred);
-out_free_link:
if (link)
free_page((unsigned long) link);
diff --git a/fs/overlayfs/dir.c b/fs/overlayfs/dir.c
index 327177df03a5..f8aa54272121 100644
--- a/fs/overlayfs/dir.c
+++ b/fs/overlayfs/dir.c
@@ -408,28 +408,13 @@ static int ovl_create_or_link(struct dentry *dentry, int mode, dev_t rdev,
err = ovl_create_upper(dentry, inode, &stat, link, hardlink);
} else {
const struct cred *old_cred;
- struct cred *override_cred;
- err = -ENOMEM;
- override_cred = prepare_creds();
- if (!override_cred)
- goto out_iput;
-
- /*
- * CAP_SYS_ADMIN for setting opaque xattr
- * CAP_DAC_OVERRIDE for create in workdir, rename
- * CAP_FOWNER for removing whiteout from sticky dir
- */
- cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
- cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
- cap_raise(override_cred->cap_effective, CAP_FOWNER);
- old_cred = override_creds(override_cred);
+ old_cred = ovl_override_creds(dentry->d_sb);
err = ovl_create_over_whiteout(dentry, inode, &stat, link,
hardlink);
revert_creds(old_cred);
- put_cred(override_cred);
}
if (!err)
@@ -659,32 +644,11 @@ static int ovl_do_remove(struct dentry *dentry, bool is_dir)
if (OVL_TYPE_PURE_UPPER(type)) {
err = ovl_remove_upper(dentry, is_dir);
} else {
- const struct cred *old_cred;
- struct cred *override_cred;
-
- err = -ENOMEM;
- override_cred = prepare_creds();
- if (!override_cred)
- goto out_drop_write;
-
- /*
- * CAP_SYS_ADMIN for setting xattr on whiteout, opaque dir
- * CAP_DAC_OVERRIDE for create in workdir, rename
- * CAP_FOWNER for removing whiteout from sticky dir
- * CAP_FSETID for chmod of opaque dir
- * CAP_CHOWN for chown of opaque dir
- */
- cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
- cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
- cap_raise(override_cred->cap_effective, CAP_FOWNER);
- cap_raise(override_cred->cap_effective, CAP_FSETID);
- cap_raise(override_cred->cap_effective, CAP_CHOWN);
- old_cred = override_creds(override_cred);
+ const struct cred *old_cred = ovl_override_creds(dentry->d_sb);
err = ovl_remove_and_whiteout(dentry, is_dir);
revert_creds(old_cred);
- put_cred(override_cred);
}
out_drop_write:
ovl_drop_write(dentry);
@@ -723,7 +687,6 @@ static int ovl_rename2(struct inode *olddir, struct dentry *old,
bool new_is_dir = false;
struct dentry *opaquedir = NULL;
const struct cred *old_cred = NULL;
- struct cred *override_cred = NULL;
err = -EINVAL;
if (flags & ~(RENAME_EXCHANGE | RENAME_NOREPLACE))
@@ -792,26 +755,8 @@ static int ovl_rename2(struct inode *olddir, struct dentry *old,
old_opaque = !OVL_TYPE_PURE_UPPER(old_type);
new_opaque = !OVL_TYPE_PURE_UPPER(new_type);
- if (old_opaque || new_opaque) {
- err = -ENOMEM;
- override_cred = prepare_creds();
- if (!override_cred)
- goto out_drop_write;
-
- /*
- * CAP_SYS_ADMIN for setting xattr on whiteout, opaque dir
- * CAP_DAC_OVERRIDE for create in workdir
- * CAP_FOWNER for removing whiteout from sticky dir
- * CAP_FSETID for chmod of opaque dir
- * CAP_CHOWN for chown of opaque dir
- */
- cap_raise(override_cred->cap_effective, CAP_SYS_ADMIN);
- cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
- cap_raise(override_cred->cap_effective, CAP_FOWNER);
- cap_raise(override_cred->cap_effective, CAP_FSETID);
- cap_raise(override_cred->cap_effective, CAP_CHOWN);
- old_cred = override_creds(override_cred);
- }
+ if (old_opaque || new_opaque)
+ old_cred = ovl_override_creds(old->d_sb);
if (overwrite && OVL_TYPE_MERGE_OR_LOWER(new_type) && new_is_dir) {
opaquedir = ovl_check_empty_and_clear(new);
@@ -942,10 +887,8 @@ out_dput_old:
out_unlock:
unlock_rename(new_upperdir, old_upperdir);
out_revert_creds:
- if (old_opaque || new_opaque) {
+ if (old_opaque || new_opaque)
revert_creds(old_cred);
- put_cred(override_cred);
- }
out_drop_write:
ovl_drop_write(old);
out:
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index c319d5eaabcf..c77d64d2e8f1 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -150,6 +150,7 @@ void ovl_drop_write(struct dentry *dentry);
bool ovl_dentry_is_opaque(struct dentry *dentry);
void ovl_dentry_set_opaque(struct dentry *dentry, bool opaque);
bool ovl_is_whiteout(struct dentry *dentry);
+const struct cred *ovl_override_creds(struct super_block *sb);
void ovl_dentry_update(struct dentry *dentry, struct dentry *upperdentry);
struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
unsigned int flags);
@@ -163,6 +164,8 @@ extern const struct file_operations ovl_dir_operations;
int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list);
void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list);
void ovl_cache_free(struct list_head *list);
+void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
+ struct dentry *dentry, int level);
/* inode.c */
int ovl_setattr(struct dentry *dentry, struct iattr *attr);
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index adcb1398c481..49b5c7b17a05 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -36,6 +36,7 @@ struct ovl_dir_cache {
struct ovl_readdir_data {
struct dir_context ctx;
+ struct dentry *dentry;
bool is_merge;
struct rb_root root;
struct list_head *list;
@@ -205,17 +206,8 @@ static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
struct ovl_cache_entry *p;
struct dentry *dentry;
const struct cred *old_cred;
- struct cred *override_cred;
-
- override_cred = prepare_creds();
- if (!override_cred)
- return -ENOMEM;
- /*
- * CAP_DAC_OVERRIDE for lookup
- */
- cap_raise(override_cred->cap_effective, CAP_DAC_OVERRIDE);
- old_cred = override_creds(override_cred);
+ old_cred = ovl_override_creds(rdd->dentry->d_sb);
err = mutex_lock_killable(&dir->d_inode->i_mutex);
if (!err) {
@@ -231,7 +223,6 @@ static int ovl_check_whiteouts(struct dentry *dir, struct ovl_readdir_data *rdd)
mutex_unlock(&dir->d_inode->i_mutex);
}
revert_creds(old_cred);
- put_cred(override_cred);
return err;
}
@@ -256,7 +247,7 @@ static inline int ovl_dir_read(struct path *realpath,
err = rdd->err;
} while (!err && rdd->count);
- if (!err && rdd->first_maybe_whiteout)
+ if (!err && rdd->first_maybe_whiteout && rdd->dentry)
err = ovl_check_whiteouts(realpath->dentry, rdd);
fput(realfile);
@@ -287,6 +278,7 @@ static int ovl_dir_read_merged(struct dentry *dentry, struct list_head *list)
struct path realpath;
struct ovl_readdir_data rdd = {
.ctx.actor = ovl_fill_merge,
+ .dentry = dentry,
.list = list,
.root = RB_ROOT,
.is_merge = false,
@@ -577,3 +569,64 @@ void ovl_cleanup_whiteouts(struct dentry *upper, struct list_head *list)
}
mutex_unlock(&upper->d_inode->i_mutex);
}
+
+static void ovl_workdir_cleanup_recurse(struct path *path, int level)
+{
+ int err;
+ struct inode *dir = path->dentry->d_inode;
+ LIST_HEAD(list);
+ struct ovl_cache_entry *p;
+ struct ovl_readdir_data rdd = {
+ .ctx.actor = ovl_fill_merge,
+ .dentry = NULL,
+ .list = &list,
+ .root = RB_ROOT,
+ .is_merge = false,
+ };
+
+ err = ovl_dir_read(path, &rdd);
+ if (err)
+ goto out;
+
+ inode_lock_nested(dir, I_MUTEX_PARENT);
+ list_for_each_entry(p, &list, l_node) {
+ struct dentry *dentry;
+
+ if (p->name[0] == '.') {
+ if (p->len == 1)
+ continue;
+ if (p->len == 2 && p->name[1] == '.')
+ continue;
+ }
+ dentry = lookup_one_len(p->name, path->dentry, p->len);
+ if (IS_ERR(dentry))
+ continue;
+ if (dentry->d_inode)
+ ovl_workdir_cleanup(dir, path->mnt, dentry, level);
+ dput(dentry);
+ }
+ inode_unlock(dir);
+out:
+ ovl_cache_free(&list);
+}
+
+void ovl_workdir_cleanup(struct inode *dir, struct vfsmount *mnt,
+ struct dentry *dentry, int level)
+{
+ int err;
+
+ if (!d_is_dir(dentry) || level > 1) {
+ ovl_cleanup(dir, dentry);
+ return;
+ }
+
+ err = ovl_do_rmdir(dir, dentry);
+ if (err) {
+ struct path path = { .mnt = mnt, .dentry = dentry };
+
+ inode_unlock(dir);
+ ovl_workdir_cleanup_recurse(&path, level + 1);
+ inode_lock_nested(dir, I_MUTEX_PARENT);
+ ovl_cleanup(dir, dentry);
+ }
+}
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index d70208c0de84..0d008af70873 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -42,6 +42,8 @@ struct ovl_fs {
long lower_namelen;
/* pathnames of lower and upper dirs, for show_options */
struct ovl_config config;
+ /* creds of process who forced instantiation of super block */
+ const struct cred *creator_cred;
};
struct ovl_dir_cache;
@@ -246,6 +248,13 @@ bool ovl_is_whiteout(struct dentry *dentry)
return inode && IS_WHITEOUT(inode);
}
+const struct cred *ovl_override_creds(struct super_block *sb)
+{
+ struct ovl_fs *ofs = sb->s_fs_info;
+
+ return override_creds(ofs->creator_cred);
+}
+
static bool ovl_is_opaquedir(struct dentry *dentry)
{
int res;
@@ -587,6 +596,7 @@ static void ovl_put_super(struct super_block *sb)
kfree(ufs->config.lowerdir);
kfree(ufs->config.upperdir);
kfree(ufs->config.workdir);
+ put_cred(ufs->creator_cred);
kfree(ufs);
}
@@ -774,7 +784,7 @@ retry:
goto out_dput;
retried = true;
- ovl_cleanup(dir, work);
+ ovl_workdir_cleanup(dir, mnt, work, 0);
dput(work);
goto retry;
}
@@ -1087,10 +1097,14 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
else
sb->s_d_op = &ovl_dentry_operations;
+ ufs->creator_cred = prepare_creds();
+ if (!ufs->creator_cred)
+ goto out_put_lower_mnt;
+
err = -ENOMEM;
oe = ovl_alloc_entry(numlower);
if (!oe)
- goto out_put_lower_mnt;
+ goto out_put_cred;
root_dentry = d_make_root(ovl_new_inode(sb, S_IFDIR, oe));
if (!root_dentry)
@@ -1123,6 +1137,8 @@ static int ovl_fill_super(struct super_block *sb, void *data, int silent)
out_free_oe:
kfree(oe);
+out_put_cred:
+ put_cred(ufs->creator_cred);
out_put_lower_mnt:
for (i = 0; i < ufs->numlower; i++)
mntput(ufs->lower_mnt[i]);
diff --git a/include/linux/scif.h b/include/linux/scif.h
index 49a35d6edc94..e69de29bb2d1 100644
--- a/include/linux/scif.h
+++ b/include/linux/scif.h
@@ -1,1339 +0,0 @@
-/*
- * Intel MIC Platform Software Stack (MPSS)
- *
- * This file is provided under a dual BSD/GPLv2 license. When using or
- * redistributing this file, you may do so under either license.
- *
- * GPL LICENSE SUMMARY
- *
- * Copyright(c) 2014 Intel Corporation.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of version 2 of the GNU General Public License as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
- * General Public License for more details.
- *
- * BSD LICENSE
- *
- * Copyright(c) 2014 Intel Corporation.
- *
- * Redistribution and use in source and binary forms, with or without
- * modification, are permitted provided that the following conditions
- * are met:
- *
- * * Redistributions of source code must retain the above copyright
- * notice, this list of conditions and the following disclaimer.
- * * Redistributions in binary form must reproduce the above copyright
- * notice, this list of conditions and the following disclaimer in
- * the documentation and/or other materials provided with the
- * distribution.
- * * Neither the name of Intel Corporation nor the names of its
- * contributors may be used to endorse or promote products derived
- * from this software without specific prior written permission.
- *
- * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
- * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
- * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
- * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
- * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
- * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
- * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
- * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
- * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
- * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- *
- * Intel SCIF driver.
- *
- */
-#ifndef __SCIF_H__
-#define __SCIF_H__
-
-#include <linux/types.h>
-#include <linux/poll.h>
-#include <linux/device.h>
-#include <linux/scif_ioctl.h>
-
-#define SCIF_ACCEPT_SYNC 1
-#define SCIF_SEND_BLOCK 1
-#define SCIF_RECV_BLOCK 1
-
-enum {
- SCIF_PROT_READ = (1 << 0),
- SCIF_PROT_WRITE = (1 << 1)
-};
-
-enum {
- SCIF_MAP_FIXED = 0x10,
- SCIF_MAP_KERNEL = 0x20,
-};
-
-enum {
- SCIF_FENCE_INIT_SELF = (1 << 0),
- SCIF_FENCE_INIT_PEER = (1 << 1),
- SCIF_SIGNAL_LOCAL = (1 << 4),
- SCIF_SIGNAL_REMOTE = (1 << 5)
-};
-
-enum {
- SCIF_RMA_USECPU = (1 << 0),
- SCIF_RMA_USECACHE = (1 << 1),
- SCIF_RMA_SYNC = (1 << 2),
- SCIF_RMA_ORDERED = (1 << 3)
-};
-
-/* End of SCIF Admin Reserved Ports */
-#define SCIF_ADMIN_PORT_END 1024
-
-/* End of SCIF Reserved Ports */
-#define SCIF_PORT_RSVD 1088
-
-typedef struct scif_endpt *scif_epd_t;
-typedef struct scif_pinned_pages *scif_pinned_pages_t;
-
-/**
- * struct scif_range - SCIF registered range used in kernel mode
- * @cookie: cookie used internally by SCIF
- * @nr_pages: number of pages of PAGE_SIZE
- * @prot_flags: R/W protection
- * @phys_addr: Array of bus addresses
- * @va: Array of kernel virtual addresses backed by the pages in the phys_addr
- * array. The va is populated only when called on the host for a remote
- * SCIF connection on MIC. This is required to support the use case of DMA
- * between MIC and another device which is not a SCIF node e.g., an IB or
- * ethernet NIC.
- */
-struct scif_range {
- void *cookie;
- int nr_pages;
- int prot_flags;
- dma_addr_t *phys_addr;
- void __iomem **va;
-};
-
-/**
- * struct scif_pollepd - SCIF endpoint to be monitored via scif_poll
- * @epd: SCIF endpoint
- * @events: requested events
- * @revents: returned events
- */
-struct scif_pollepd {
- scif_epd_t epd;
- short events;
- short revents;
-};
-
-/**
- * scif_peer_dev - representation of a peer SCIF device
- *
- * Peer devices show up as PCIe devices for the mgmt node but not the cards.
- * The mgmt node discovers all the cards on the PCIe bus and informs the other
- * cards about their peers. Upon notification of a peer a node adds a peer
- * device to the peer bus to maintain symmetry in the way devices are
- * discovered across all nodes in the SCIF network.
- *
- * @dev: underlying device
- * @dnode - The destination node which this device will communicate with.
- */
-struct scif_peer_dev {
- struct device dev;
- u8 dnode;
-};
-
-/**
- * scif_client - representation of a SCIF client
- * @name: client name
- * @probe - client method called when a peer device is registered
- * @remove - client method called when a peer device is unregistered
- * @si - subsys_interface used internally for implementing SCIF clients
- */
-struct scif_client {
- const char *name;
- void (*probe)(struct scif_peer_dev *spdev);
- void (*remove)(struct scif_peer_dev *spdev);
- struct subsys_interface si;
-};
-
-#define SCIF_OPEN_FAILED ((scif_epd_t)-1)
-#define SCIF_REGISTER_FAILED ((off_t)-1)
-#define SCIF_MMAP_FAILED ((void *)-1)
-
-/**
- * scif_open() - Create an endpoint
- *
- * Return:
- * Upon successful completion, scif_open() returns an endpoint descriptor to
- * be used in subsequent SCIF functions calls to refer to that endpoint;
- * otherwise in user mode SCIF_OPEN_FAILED (that is ((scif_epd_t)-1)) is
- * returned and errno is set to indicate the error; in kernel mode a NULL
- * scif_epd_t is returned.
- *
- * Errors:
- * ENOMEM - Insufficient kernel memory was available
- */
-scif_epd_t scif_open(void);
-
-/**
- * scif_bind() - Bind an endpoint to a port
- * @epd: endpoint descriptor
- * @pn: port number
- *
- * scif_bind() binds endpoint epd to port pn, where pn is a port number on the
- * local node. If pn is zero, a port number greater than or equal to
- * SCIF_PORT_RSVD is assigned and returned. Each endpoint may be bound to
- * exactly one local port. Ports less than 1024 when requested can only be bound
- * by system (or root) processes or by processes executed by privileged users.
- *
- * Return:
- * Upon successful completion, scif_bind() returns the port number to which epd
- * is bound; otherwise in user mode -1 is returned and errno is set to
- * indicate the error; in kernel mode the negative of one of the following
- * errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * EINVAL - the endpoint or the port is already bound
- * EISCONN - The endpoint is already connected
- * ENOSPC - No port number available for assignment
- * EACCES - The port requested is protected and the user is not the superuser
- */
-int scif_bind(scif_epd_t epd, u16 pn);
-
-/**
- * scif_listen() - Listen for connections on an endpoint
- * @epd: endpoint descriptor
- * @backlog: maximum pending connection requests
- *
- * scif_listen() marks the endpoint epd as a listening endpoint - that is, as
- * an endpoint that will be used to accept incoming connection requests. Once
- * so marked, the endpoint is said to be in the listening state and may not be
- * used as the endpoint of a connection.
- *
- * The endpoint, epd, must have been bound to a port.
- *
- * The backlog argument defines the maximum length to which the queue of
- * pending connections for epd may grow. If a connection request arrives when
- * the queue is full, the client may receive an error with an indication that
- * the connection was refused.
- *
- * Return:
- * Upon successful completion, scif_listen() returns 0; otherwise in user mode
- * -1 is returned and errno is set to indicate the error; in kernel mode the
- * negative of one of the following errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * EINVAL - the endpoint is not bound to a port
- * EISCONN - The endpoint is already connected or listening
- */
-int scif_listen(scif_epd_t epd, int backlog);
-
-/**
- * scif_connect() - Initiate a connection on a port
- * @epd: endpoint descriptor
- * @dst: global id of port to which to connect
- *
- * The scif_connect() function requests the connection of endpoint epd to remote
- * port dst. If the connection is successful, a peer endpoint, bound to dst, is
- * created on node dst.node. On successful return, the connection is complete.
- *
- * If the endpoint epd has not already been bound to a port, scif_connect()
- * will bind it to an unused local port.
- *
- * A connection is terminated when an endpoint of the connection is closed,
- * either explicitly by scif_close(), or when a process that owns one of the
- * endpoints of the connection is terminated.
- *
- * In user space, scif_connect() supports an asynchronous connection mode
- * if the application has set the O_NONBLOCK flag on the endpoint via the
- * fcntl() system call. Setting this flag will result in the calling process
- * not to wait during scif_connect().
- *
- * Return:
- * Upon successful completion, scif_connect() returns the port ID to which the
- * endpoint, epd, is bound; otherwise in user mode -1 is returned and errno is
- * set to indicate the error; in kernel mode the negative of one of the
- * following errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNREFUSED - The destination was not listening for connections or refused
- * the connection request
- * EINVAL - dst.port is not a valid port ID
- * EISCONN - The endpoint is already connected
- * ENOMEM - No buffer space is available
- * ENODEV - The destination node does not exist, or the node is lost or existed,
- * but is not currently in the network since it may have crashed
- * ENOSPC - No port number available for assignment
- * EOPNOTSUPP - The endpoint is listening and cannot be connected
- */
-int scif_connect(scif_epd_t epd, struct scif_port_id *dst);
-
-/**
- * scif_accept() - Accept a connection on an endpoint
- * @epd: endpoint descriptor
- * @peer: global id of port to which connected
- * @newepd: new connected endpoint descriptor
- * @flags: flags
- *
- * The scif_accept() call extracts the first connection request from the queue
- * of pending connections for the port on which epd is listening. scif_accept()
- * creates a new endpoint, bound to the same port as epd, and allocates a new
- * SCIF endpoint descriptor, returned in newepd, for the endpoint. The new
- * endpoint is connected to the endpoint through which the connection was
- * requested. epd is unaffected by this call, and remains in the listening
- * state.
- *
- * On successful return, peer holds the global port identifier (node id and
- * local port number) of the port which requested the connection.
- *
- * A connection is terminated when an endpoint of the connection is closed,
- * either explicitly by scif_close(), or when a process that owns one of the
- * endpoints of the connection is terminated.
- *
- * The number of connections that can (subsequently) be accepted on epd is only
- * limited by system resources (memory).
- *
- * The flags argument is formed by OR'ing together zero or more of the
- * following values.
- * SCIF_ACCEPT_SYNC - block until a connection request is presented. If
- * SCIF_ACCEPT_SYNC is not in flags, and no pending
- * connections are present on the queue, scif_accept()
- * fails with an EAGAIN error
- *
- * In user mode, the select() and poll() functions can be used to determine
- * when there is a connection request. In kernel mode, the scif_poll()
- * function may be used for this purpose. A readable event will be delivered
- * when a connection is requested.
- *
- * Return:
- * Upon successful completion, scif_accept() returns 0; otherwise in user mode
- * -1 is returned and errno is set to indicate the error; in kernel mode the
- * negative of one of the following errors is returned.
- *
- * Errors:
- * EAGAIN - SCIF_ACCEPT_SYNC is not set and no connections are present to be
- * accepted or SCIF_ACCEPT_SYNC is not set and remote node failed to complete
- * its connection request
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * EINTR - Interrupted function
- * EINVAL - epd is not a listening endpoint, or flags is invalid, or peer is
- * NULL, or newepd is NULL
- * ENODEV - The requesting node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOMEM - Not enough space
- * ENOENT - Secondary part of epd registration failed
- */
-int scif_accept(scif_epd_t epd, struct scif_port_id *peer, scif_epd_t
- *newepd, int flags);
-
-/**
- * scif_close() - Close an endpoint
- * @epd: endpoint descriptor
- *
- * scif_close() closes an endpoint and performs necessary teardown of
- * facilities associated with that endpoint.
- *
- * If epd is a listening endpoint then it will no longer accept connection
- * requests on the port to which it is bound. Any pending connection requests
- * are rejected.
- *
- * If epd is a connected endpoint, then its peer endpoint is also closed. RMAs
- * which are in-process through epd or its peer endpoint will complete before
- * scif_close() returns. Registered windows of the local and peer endpoints are
- * released as if scif_unregister() was called against each window.
- *
- * Closing a SCIF endpoint does not affect local registered memory mapped by
- * a SCIF endpoint on a remote node. The local memory remains mapped by the peer
- * SCIF endpoint explicitly removed by calling munmap(..) by the peer.
- *
- * If the peer endpoint's receive queue is not empty at the time that epd is
- * closed, then the peer endpoint can be passed as the endpoint parameter to
- * scif_recv() until the receive queue is empty.
- *
- * epd is freed and may no longer be accessed.
- *
- * Return:
- * Upon successful completion, scif_close() returns 0; otherwise in user mode
- * -1 is returned and errno is set to indicate the error; in kernel mode the
- * negative of one of the following errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- */
-int scif_close(scif_epd_t epd);
-
-/**
- * scif_send() - Send a message
- * @epd: endpoint descriptor
- * @msg: message buffer address
- * @len: message length
- * @flags: blocking mode flags
- *
- * scif_send() sends data to the peer of endpoint epd. Up to len bytes of data
- * are copied from memory starting at address msg. On successful execution the
- * return value of scif_send() is the number of bytes that were sent, and is
- * zero if no bytes were sent because len was zero. scif_send() may be called
- * only when the endpoint is in a connected state.
- *
- * If a scif_send() call is non-blocking, then it sends only those bytes which
- * can be sent without waiting, up to a maximum of len bytes.
- *
- * If a scif_send() call is blocking, then it normally returns after sending
- * all len bytes. If a blocking call is interrupted or the connection is
- * reset, the call is considered successful if some bytes were sent or len is
- * zero, otherwise the call is considered unsuccessful.
- *
- * In user mode, the select() and poll() functions can be used to determine
- * when the send queue is not full. In kernel mode, the scif_poll() function
- * may be used for this purpose.
- *
- * It is recommended that scif_send()/scif_recv() only be used for short
- * control-type message communication between SCIF endpoints. The SCIF RMA
- * APIs are expected to provide better performance for transfer sizes of
- * 1024 bytes or longer for the current MIC hardware and software
- * implementation.
- *
- * scif_send() will block until the entire message is sent if SCIF_SEND_BLOCK
- * is passed as the flags argument.
- *
- * Return:
- * Upon successful completion, scif_send() returns the number of bytes sent;
- * otherwise in user mode -1 is returned and errno is set to indicate the
- * error; in kernel mode the negative of one of the following errors is
- * returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - flags is invalid, or len is negative
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOMEM - Not enough space
- * ENOTCONN - The endpoint is not connected
- */
-int scif_send(scif_epd_t epd, void *msg, int len, int flags);
-
-/**
- * scif_recv() - Receive a message
- * @epd: endpoint descriptor
- * @msg: message buffer address
- * @len: message buffer length
- * @flags: blocking mode flags
- *
- * scif_recv() receives data from the peer of endpoint epd. Up to len bytes of
- * data are copied to memory starting at address msg. On successful execution
- * the return value of scif_recv() is the number of bytes that were received,
- * and is zero if no bytes were received because len was zero. scif_recv() may
- * be called only when the endpoint is in a connected state.
- *
- * If a scif_recv() call is non-blocking, then it receives only those bytes
- * which can be received without waiting, up to a maximum of len bytes.
- *
- * If a scif_recv() call is blocking, then it normally returns after receiving
- * all len bytes. If the blocking call was interrupted due to a disconnection,
- * subsequent calls to scif_recv() will copy all bytes received upto the point
- * of disconnection.
- *
- * In user mode, the select() and poll() functions can be used to determine
- * when data is available to be received. In kernel mode, the scif_poll()
- * function may be used for this purpose.
- *
- * It is recommended that scif_send()/scif_recv() only be used for short
- * control-type message communication between SCIF endpoints. The SCIF RMA
- * APIs are expected to provide better performance for transfer sizes of
- * 1024 bytes or longer for the current MIC hardware and software
- * implementation.
- *
- * scif_recv() will block until the entire message is received if
- * SCIF_RECV_BLOCK is passed as the flags argument.
- *
- * Return:
- * Upon successful completion, scif_recv() returns the number of bytes
- * received; otherwise in user mode -1 is returned and errno is set to
- * indicate the error; in kernel mode the negative of one of the following
- * errors is returned.
- *
- * Errors:
- * EAGAIN - The destination node is returning from a low power state
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - flags is invalid, or len is negative
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOMEM - Not enough space
- * ENOTCONN - The endpoint is not connected
- */
-int scif_recv(scif_epd_t epd, void *msg, int len, int flags);
-
-/**
- * scif_register() - Mark a memory region for remote access.
- * @epd: endpoint descriptor
- * @addr: starting virtual address
- * @len: length of range
- * @offset: offset of window
- * @prot_flags: read/write protection flags
- * @map_flags: mapping flags
- *
- * The scif_register() function opens a window, a range of whole pages of the
- * registered address space of the endpoint epd, starting at offset po and
- * continuing for len bytes. The value of po, further described below, is a
- * function of the parameters offset and len, and the value of map_flags. Each
- * page of the window represents the physical memory page which backs the
- * corresponding page of the range of virtual address pages starting at addr
- * and continuing for len bytes. addr and len are constrained to be multiples
- * of the page size. A successful scif_register() call returns po.
- *
- * When SCIF_MAP_FIXED is set in the map_flags argument, po will be offset
- * exactly, and offset is constrained to be a multiple of the page size. The
- * mapping established by scif_register() will not replace any existing
- * registration; an error is returned if any page within the range [offset,
- * offset + len - 1] intersects an existing window.
- *
- * When SCIF_MAP_FIXED is not set, the implementation uses offset in an
- * implementation-defined manner to arrive at po. The po value so chosen will
- * be an area of the registered address space that the implementation deems
- * suitable for a mapping of len bytes. An offset value of 0 is interpreted as
- * granting the implementation complete freedom in selecting po, subject to
- * constraints described below. A non-zero value of offset is taken to be a
- * suggestion of an offset near which the mapping should be placed. When the
- * implementation selects a value for po, it does not replace any extant
- * window. In all cases, po will be a multiple of the page size.
- *
- * The physical pages which are so represented by a window are available for
- * access in calls to mmap(), scif_readfrom(), scif_writeto(),
- * scif_vreadfrom(), and scif_vwriteto(). While a window is registered, the
- * physical pages represented by the window will not be reused by the memory
- * subsystem for any other purpose. Note that the same physical page may be
- * represented by multiple windows.
- *
- * Subsequent operations which change the memory pages to which virtual
- * addresses are mapped (such as mmap(), munmap()) have no effect on
- * existing window.
- *
- * If the process will fork(), it is recommended that the registered
- * virtual address range be marked with MADV_DONTFORK. Doing so will prevent
- * problems due to copy-on-write semantics.
- *
- * The prot_flags argument is formed by OR'ing together one or more of the
- * following values.
- * SCIF_PROT_READ - allow read operations from the window
- * SCIF_PROT_WRITE - allow write operations to the window
- *
- * Return:
- * Upon successful completion, scif_register() returns the offset at which the
- * mapping was placed (po); otherwise in user mode SCIF_REGISTER_FAILED (that
- * is (off_t *)-1) is returned and errno is set to indicate the error; in
- * kernel mode the negative of one of the following errors is returned.
- *
- * Errors:
- * EADDRINUSE - SCIF_MAP_FIXED is set in map_flags, and pages in the range
- * [offset, offset + len -1] are already registered
- * EAGAIN - The mapping could not be performed due to lack of resources
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - map_flags is invalid, or prot_flags is invalid, or SCIF_MAP_FIXED is
- * set in flags, and offset is not a multiple of the page size, or addr is not a
- * multiple of the page size, or len is not a multiple of the page size, or is
- * 0, or offset is negative
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOMEM - Not enough space
- * ENOTCONN -The endpoint is not connected
- */
-off_t scif_register(scif_epd_t epd, void *addr, size_t len, off_t offset,
- int prot_flags, int map_flags);
-
-/**
- * scif_unregister() - Mark a memory region for remote access.
- * @epd: endpoint descriptor
- * @offset: start of range to unregister
- * @len: length of range to unregister
- *
- * The scif_unregister() function closes those previously registered windows
- * which are entirely within the range [offset, offset + len - 1]. It is an
- * error to specify a range which intersects only a subrange of a window.
- *
- * On a successful return, pages within the window may no longer be specified
- * in calls to mmap(), scif_readfrom(), scif_writeto(), scif_vreadfrom(),
- * scif_vwriteto(), scif_get_pages, and scif_fence_signal(). The window,
- * however, continues to exist until all previous references against it are
- * removed. A window is referenced if there is a mapping to it created by
- * mmap(), or if scif_get_pages() was called against the window
- * (and the pages have not been returned via scif_put_pages()). A window is
- * also referenced while an RMA, in which some range of the window is a source
- * or destination, is in progress. Finally a window is referenced while some
- * offset in that window was specified to scif_fence_signal(), and the RMAs
- * marked by that call to scif_fence_signal() have not completed. While a
- * window is in this state, its registered address space pages are not
- * available for use in a new registered window.
- *
- * When all such references to the window have been removed, its references to
- * all the physical pages which it represents are removed. Similarly, the
- * registered address space pages of the window become available for
- * registration in a new window.
- *
- * Return:
- * Upon successful completion, scif_unregister() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned. In the event of an
- * error, no windows are unregistered.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - the range [offset, offset + len - 1] intersects a subrange of a
- * window, or offset is negative
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - Offsets in the range [offset, offset + len - 1] are invalid for the
- * registered address space of epd
- */
-int scif_unregister(scif_epd_t epd, off_t offset, size_t len);
-
-/**
- * scif_readfrom() - Copy from a remote address space
- * @epd: endpoint descriptor
- * @loffset: offset in local registered address space to
- * which to copy
- * @len: length of range to copy
- * @roffset: offset in remote registered address space
- * from which to copy
- * @rma_flags: transfer mode flags
- *
- * scif_readfrom() copies len bytes from the remote registered address space of
- * the peer of endpoint epd, starting at the offset roffset to the local
- * registered address space of epd, starting at the offset loffset.
- *
- * Each of the specified ranges [loffset, loffset + len - 1] and [roffset,
- * roffset + len - 1] must be within some registered window or windows of the
- * local and remote nodes. A range may intersect multiple registered windows,
- * but only if those windows are contiguous in the registered address space.
- *
- * If rma_flags includes SCIF_RMA_USECPU, then the data is copied using
- * programmed read/writes. Otherwise the data is copied using DMA. If rma_-
- * flags includes SCIF_RMA_SYNC, then scif_readfrom() will return after the
- * transfer is complete. Otherwise, the transfer may be performed asynchron-
- * ously. The order in which any two asynchronous RMA operations complete
- * is non-deterministic. The synchronization functions, scif_fence_mark()/
- * scif_fence_wait() and scif_fence_signal(), can be used to synchronize to
- * the completion of asynchronous RMA operations on the same endpoint.
- *
- * The DMA transfer of individual bytes is not guaranteed to complete in
- * address order. If rma_flags includes SCIF_RMA_ORDERED, then the last
- * cacheline or partial cacheline of the source range will become visible on
- * the destination node after all other transferred data in the source
- * range has become visible on the destination node.
- *
- * The optimal DMA performance will likely be realized if both
- * loffset and roffset are cacheline aligned (are a multiple of 64). Lower
- * performance will likely be realized if loffset and roffset are not
- * cacheline aligned but are separated by some multiple of 64. The lowest level
- * of performance is likely if loffset and roffset are not separated by a
- * multiple of 64.
- *
- * The rma_flags argument is formed by ORing together zero or more of the
- * following values.
- * SCIF_RMA_USECPU - perform the transfer using the CPU, otherwise use the DMA
- * engine.
- * SCIF_RMA_SYNC - perform the transfer synchronously, returning after the
- * transfer has completed. Passing this flag results in the
- * current implementation busy waiting and consuming CPU cycles
- * while the DMA transfer is in progress for best performance by
- * avoiding the interrupt latency.
- * SCIF_RMA_ORDERED - ensure that the last cacheline or partial cacheline of
- * the source range becomes visible on the destination node
- * after all other transferred data in the source range has
- * become visible on the destination
- *
- * Return:
- * Upon successful completion, scif_readfrom() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned.
- *
- * Errors:
- * EACCESS - Attempt to write to a read-only range
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - rma_flags is invalid
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - The range [loffset, loffset + len - 1] is invalid for the registered
- * address space of epd, or, The range [roffset, roffset + len - 1] is invalid
- * for the registered address space of the peer of epd, or loffset or roffset
- * is negative
- */
-int scif_readfrom(scif_epd_t epd, off_t loffset, size_t len, off_t
- roffset, int rma_flags);
-
-/**
- * scif_writeto() - Copy to a remote address space
- * @epd: endpoint descriptor
- * @loffset: offset in local registered address space
- * from which to copy
- * @len: length of range to copy
- * @roffset: offset in remote registered address space to
- * which to copy
- * @rma_flags: transfer mode flags
- *
- * scif_writeto() copies len bytes from the local registered address space of
- * epd, starting at the offset loffset to the remote registered address space
- * of the peer of endpoint epd, starting at the offset roffset.
- *
- * Each of the specified ranges [loffset, loffset + len - 1] and [roffset,
- * roffset + len - 1] must be within some registered window or windows of the
- * local and remote nodes. A range may intersect multiple registered windows,
- * but only if those windows are contiguous in the registered address space.
- *
- * If rma_flags includes SCIF_RMA_USECPU, then the data is copied using
- * programmed read/writes. Otherwise the data is copied using DMA. If rma_-
- * flags includes SCIF_RMA_SYNC, then scif_writeto() will return after the
- * transfer is complete. Otherwise, the transfer may be performed asynchron-
- * ously. The order in which any two asynchronous RMA operations complete
- * is non-deterministic. The synchronization functions, scif_fence_mark()/
- * scif_fence_wait() and scif_fence_signal(), can be used to synchronize to
- * the completion of asynchronous RMA operations on the same endpoint.
- *
- * The DMA transfer of individual bytes is not guaranteed to complete in
- * address order. If rma_flags includes SCIF_RMA_ORDERED, then the last
- * cacheline or partial cacheline of the source range will become visible on
- * the destination node after all other transferred data in the source
- * range has become visible on the destination node.
- *
- * The optimal DMA performance will likely be realized if both
- * loffset and roffset are cacheline aligned (are a multiple of 64). Lower
- * performance will likely be realized if loffset and roffset are not cacheline
- * aligned but are separated by some multiple of 64. The lowest level of
- * performance is likely if loffset and roffset are not separated by a multiple
- * of 64.
- *
- * The rma_flags argument is formed by ORing together zero or more of the
- * following values.
- * SCIF_RMA_USECPU - perform the transfer using the CPU, otherwise use the DMA
- * engine.
- * SCIF_RMA_SYNC - perform the transfer synchronously, returning after the
- * transfer has completed. Passing this flag results in the
- * current implementation busy waiting and consuming CPU cycles
- * while the DMA transfer is in progress for best performance by
- * avoiding the interrupt latency.
- * SCIF_RMA_ORDERED - ensure that the last cacheline or partial cacheline of
- * the source range becomes visible on the destination node
- * after all other transferred data in the source range has
- * become visible on the destination
- *
- * Return:
- * Upon successful completion, scif_readfrom() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned.
- *
- * Errors:
- * EACCESS - Attempt to write to a read-only range
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - rma_flags is invalid
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - The range [loffset, loffset + len - 1] is invalid for the registered
- * address space of epd, or, The range [roffset , roffset + len -1] is invalid
- * for the registered address space of the peer of epd, or loffset or roffset
- * is negative
- */
-int scif_writeto(scif_epd_t epd, off_t loffset, size_t len, off_t
- roffset, int rma_flags);
-
-/**
- * scif_vreadfrom() - Copy from a remote address space
- * @epd: endpoint descriptor
- * @addr: address to which to copy
- * @len: length of range to copy
- * @roffset: offset in remote registered address space
- * from which to copy
- * @rma_flags: transfer mode flags
- *
- * scif_vreadfrom() copies len bytes from the remote registered address
- * space of the peer of endpoint epd, starting at the offset roffset, to local
- * memory, starting at addr.
- *
- * The specified range [roffset, roffset + len - 1] must be within some
- * registered window or windows of the remote nodes. The range may
- * intersect multiple registered windows, but only if those windows are
- * contiguous in the registered address space.
- *
- * If rma_flags includes SCIF_RMA_USECPU, then the data is copied using
- * programmed read/writes. Otherwise the data is copied using DMA. If rma_-
- * flags includes SCIF_RMA_SYNC, then scif_vreadfrom() will return after the
- * transfer is complete. Otherwise, the transfer may be performed asynchron-
- * ously. The order in which any two asynchronous RMA operations complete
- * is non-deterministic. The synchronization functions, scif_fence_mark()/
- * scif_fence_wait() and scif_fence_signal(), can be used to synchronize to
- * the completion of asynchronous RMA operations on the same endpoint.
- *
- * The DMA transfer of individual bytes is not guaranteed to complete in
- * address order. If rma_flags includes SCIF_RMA_ORDERED, then the last
- * cacheline or partial cacheline of the source range will become visible on
- * the destination node after all other transferred data in the source
- * range has become visible on the destination node.
- *
- * If rma_flags includes SCIF_RMA_USECACHE, then the physical pages which back
- * the specified local memory range may be remain in a pinned state even after
- * the specified transfer completes. This may reduce overhead if some or all of
- * the same virtual address range is referenced in a subsequent call of
- * scif_vreadfrom() or scif_vwriteto().
- *
- * The optimal DMA performance will likely be realized if both
- * addr and roffset are cacheline aligned (are a multiple of 64). Lower
- * performance will likely be realized if addr and roffset are not
- * cacheline aligned but are separated by some multiple of 64. The lowest level
- * of performance is likely if addr and roffset are not separated by a
- * multiple of 64.
- *
- * The rma_flags argument is formed by ORing together zero or more of the
- * following values.
- * SCIF_RMA_USECPU - perform the transfer using the CPU, otherwise use the DMA
- * engine.
- * SCIF_RMA_USECACHE - enable registration caching
- * SCIF_RMA_SYNC - perform the transfer synchronously, returning after the
- * transfer has completed. Passing this flag results in the
- * current implementation busy waiting and consuming CPU cycles
- * while the DMA transfer is in progress for best performance by
- * avoiding the interrupt latency.
- * SCIF_RMA_ORDERED - ensure that the last cacheline or partial cacheline of
- * the source range becomes visible on the destination node
- * after all other transferred data in the source range has
- * become visible on the destination
- *
- * Return:
- * Upon successful completion, scif_vreadfrom() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned.
- *
- * Errors:
- * EACCESS - Attempt to write to a read-only range
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - rma_flags is invalid
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - Offsets in the range [roffset, roffset + len - 1] are invalid for the
- * registered address space of epd
- */
-int scif_vreadfrom(scif_epd_t epd, void *addr, size_t len, off_t roffset,
- int rma_flags);
-
-/**
- * scif_vwriteto() - Copy to a remote address space
- * @epd: endpoint descriptor
- * @addr: address from which to copy
- * @len: length of range to copy
- * @roffset: offset in remote registered address space to
- * which to copy
- * @rma_flags: transfer mode flags
- *
- * scif_vwriteto() copies len bytes from the local memory, starting at addr, to
- * the remote registered address space of the peer of endpoint epd, starting at
- * the offset roffset.
- *
- * The specified range [roffset, roffset + len - 1] must be within some
- * registered window or windows of the remote nodes. The range may intersect
- * multiple registered windows, but only if those windows are contiguous in the
- * registered address space.
- *
- * If rma_flags includes SCIF_RMA_USECPU, then the data is copied using
- * programmed read/writes. Otherwise the data is copied using DMA. If rma_-
- * flags includes SCIF_RMA_SYNC, then scif_vwriteto() will return after the
- * transfer is complete. Otherwise, the transfer may be performed asynchron-
- * ously. The order in which any two asynchronous RMA operations complete
- * is non-deterministic. The synchronization functions, scif_fence_mark()/
- * scif_fence_wait() and scif_fence_signal(), can be used to synchronize to
- * the completion of asynchronous RMA operations on the same endpoint.
- *
- * The DMA transfer of individual bytes is not guaranteed to complete in
- * address order. If rma_flags includes SCIF_RMA_ORDERED, then the last
- * cacheline or partial cacheline of the source range will become visible on
- * the destination node after all other transferred data in the source
- * range has become visible on the destination node.
- *
- * If rma_flags includes SCIF_RMA_USECACHE, then the physical pages which back
- * the specified local memory range may be remain in a pinned state even after
- * the specified transfer completes. This may reduce overhead if some or all of
- * the same virtual address range is referenced in a subsequent call of
- * scif_vreadfrom() or scif_vwriteto().
- *
- * The optimal DMA performance will likely be realized if both
- * addr and offset are cacheline aligned (are a multiple of 64). Lower
- * performance will likely be realized if addr and offset are not cacheline
- * aligned but are separated by some multiple of 64. The lowest level of
- * performance is likely if addr and offset are not separated by a multiple of
- * 64.
- *
- * The rma_flags argument is formed by ORing together zero or more of the
- * following values.
- * SCIF_RMA_USECPU - perform the transfer using the CPU, otherwise use the DMA
- * engine.
- * SCIF_RMA_USECACHE - allow registration caching
- * SCIF_RMA_SYNC - perform the transfer synchronously, returning after the
- * transfer has completed. Passing this flag results in the
- * current implementation busy waiting and consuming CPU cycles
- * while the DMA transfer is in progress for best performance by
- * avoiding the interrupt latency.
- * SCIF_RMA_ORDERED - ensure that the last cacheline or partial cacheline of
- * the source range becomes visible on the destination node
- * after all other transferred data in the source range has
- * become visible on the destination
- *
- * Return:
- * Upon successful completion, scif_vwriteto() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned.
- *
- * Errors:
- * EACCESS - Attempt to write to a read-only range
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - rma_flags is invalid
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - Offsets in the range [roffset, roffset + len - 1] are invalid for the
- * registered address space of epd
- */
-int scif_vwriteto(scif_epd_t epd, void *addr, size_t len, off_t roffset,
- int rma_flags);
-
-/**
- * scif_fence_mark() - Mark previously issued RMAs
- * @epd: endpoint descriptor
- * @flags: control flags
- * @mark: marked value returned as output.
- *
- * scif_fence_mark() returns after marking the current set of all uncompleted
- * RMAs initiated through the endpoint epd or the current set of all
- * uncompleted RMAs initiated through the peer of endpoint epd. The RMAs are
- * marked with a value returned at mark. The application may subsequently call
- * scif_fence_wait(), passing the value returned at mark, to await completion
- * of all RMAs so marked.
- *
- * The flags argument has exactly one of the following values.
- * SCIF_FENCE_INIT_SELF - RMA operations initiated through endpoint
- * epd are marked
- * SCIF_FENCE_INIT_PEER - RMA operations initiated through the peer
- * of endpoint epd are marked
- *
- * Return:
- * Upon successful completion, scif_fence_mark() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - flags is invalid
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENOMEM - Insufficient kernel memory was available
- */
-int scif_fence_mark(scif_epd_t epd, int flags, int *mark);
-
-/**
- * scif_fence_wait() - Wait for completion of marked RMAs
- * @epd: endpoint descriptor
- * @mark: mark request
- *
- * scif_fence_wait() returns after all RMAs marked with mark have completed.
- * The value passed in mark must have been obtained in a previous call to
- * scif_fence_mark().
- *
- * Return:
- * Upon successful completion, scif_fence_wait() returns 0; otherwise in user
- * mode -1 is returned and errno is set to indicate the error; in kernel mode
- * the negative of one of the following errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENOMEM - Insufficient kernel memory was available
- */
-int scif_fence_wait(scif_epd_t epd, int mark);
-
-/**
- * scif_fence_signal() - Request a memory update on completion of RMAs
- * @epd: endpoint descriptor
- * @loff: local offset
- * @lval: local value to write to loffset
- * @roff: remote offset
- * @rval: remote value to write to roffset
- * @flags: flags
- *
- * scif_fence_signal() returns after marking the current set of all uncompleted
- * RMAs initiated through the endpoint epd or marking the current set of all
- * uncompleted RMAs initiated through the peer of endpoint epd.
- *
- * If flags includes SCIF_SIGNAL_LOCAL, then on completion of the RMAs in the
- * marked set, lval is written to memory at the address corresponding to offset
- * loff in the local registered address space of epd. loff must be within a
- * registered window. If flags includes SCIF_SIGNAL_REMOTE, then on completion
- * of the RMAs in the marked set, rval is written to memory at the address
- * corresponding to offset roff in the remote registered address space of epd.
- * roff must be within a remote registered window of the peer of epd. Note
- * that any specified offset must be DWORD (4 byte / 32 bit) aligned.
- *
- * The flags argument is formed by OR'ing together the following.
- * Exactly one of the following values.
- * SCIF_FENCE_INIT_SELF - RMA operations initiated through endpoint
- * epd are marked
- * SCIF_FENCE_INIT_PEER - RMA operations initiated through the peer
- * of endpoint epd are marked
- * One or more of the following values.
- * SCIF_SIGNAL_LOCAL - On completion of the marked set of RMAs, write lval to
- * memory at the address corresponding to offset loff in the local
- * registered address space of epd.
- * SCIF_SIGNAL_REMOTE - On completion of the marked set of RMAs, write rval to
- * memory at the address corresponding to offset roff in the remote
- * registered address space of epd.
- *
- * Return:
- * Upon successful completion, scif_fence_signal() returns 0; otherwise in
- * user mode -1 is returned and errno is set to indicate the error; in kernel
- * mode the negative of one of the following errors is returned.
- *
- * Errors:
- * EBADF, ENOTTY - epd is not a valid endpoint descriptor
- * ECONNRESET - Connection reset by peer
- * EINVAL - flags is invalid, or loff or roff are not DWORD aligned
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - loff is invalid for the registered address of epd, or roff is invalid
- * for the registered address space, of the peer of epd
- */
-int scif_fence_signal(scif_epd_t epd, off_t loff, u64 lval, off_t roff,
- u64 rval, int flags);
-
-/**
- * scif_get_node_ids() - Return information about online nodes
- * @nodes: array in which to return online node IDs
- * @len: number of entries in the nodes array
- * @self: address to place the node ID of the local node
- *
- * scif_get_node_ids() fills in the nodes array with up to len node IDs of the
- * nodes in the SCIF network. If there is not enough space in nodes, as
- * indicated by the len parameter, only len node IDs are returned in nodes. The
- * return value of scif_get_node_ids() is the total number of nodes currently in
- * the SCIF network. By checking the return value against the len parameter,
- * the user may determine if enough space for nodes was allocated.
- *
- * The node ID of the local node is returned at self.
- *
- * Return:
- * Upon successful completion, scif_get_node_ids() returns the actual number of
- * online nodes in the SCIF network including 'self'; otherwise in user mode
- * -1 is returned and errno is set to indicate the error; in kernel mode no
- * errors are returned.
- */
-int scif_get_node_ids(u16 *nodes, int len, u16 *self);
-
-/**
- * scif_pin_pages() - Pin a set of pages
- * @addr: Virtual address of range to pin
- * @len: Length of range to pin
- * @prot_flags: Page protection flags
- * @map_flags: Page classification flags
- * @pinned_pages: Handle to pinned pages
- *
- * scif_pin_pages() pins (locks in physical memory) the physical pages which
- * back the range of virtual address pages starting at addr and continuing for
- * len bytes. addr and len are constrained to be multiples of the page size. A
- * successful scif_pin_pages() call returns a handle to pinned_pages which may
- * be used in subsequent calls to scif_register_pinned_pages().
- *
- * The pages will remain pinned as long as there is a reference against the
- * scif_pinned_pages_t value returned by scif_pin_pages() and until
- * scif_unpin_pages() is called, passing the scif_pinned_pages_t value. A
- * reference is added to a scif_pinned_pages_t value each time a window is
- * created by calling scif_register_pinned_pages() and passing the
- * scif_pinned_pages_t value. A reference is removed from a
- * scif_pinned_pages_t value each time such a window is deleted.
- *
- * Subsequent operations which change the memory pages to which virtual
- * addresses are mapped (such as mmap(), munmap()) have no effect on the
- * scif_pinned_pages_t value or windows created against it.
- *
- * If the process will fork(), it is recommended that the registered
- * virtual address range be marked with MADV_DONTFORK. Doing so will prevent
- * problems due to copy-on-write semantics.
- *
- * The prot_flags argument is formed by OR'ing together one or more of the
- * following values.
- * SCIF_PROT_READ - allow read operations against the pages
- * SCIF_PROT_WRITE - allow write operations against the pages
- * The map_flags argument can be set as SCIF_MAP_KERNEL to interpret addr as a
- * kernel space address. By default, addr is interpreted as a user space
- * address.
- *
- * Return:
- * Upon successful completion, scif_pin_pages() returns 0; otherwise the
- * negative of one of the following errors is returned.
- *
- * Errors:
- * EINVAL - prot_flags is invalid, map_flags is invalid, or offset is negative
- * ENOMEM - Not enough space
- */
-int scif_pin_pages(void *addr, size_t len, int prot_flags, int map_flags,
- scif_pinned_pages_t *pinned_pages);
-
-/**
- * scif_unpin_pages() - Unpin a set of pages
- * @pinned_pages: Handle to pinned pages to be unpinned
- *
- * scif_unpin_pages() prevents scif_register_pinned_pages() from registering new
- * windows against pinned_pages. The physical pages represented by pinned_pages
- * will remain pinned until all windows previously registered against
- * pinned_pages are deleted (the window is scif_unregister()'d and all
- * references to the window are removed (see scif_unregister()).
- *
- * pinned_pages must have been obtain from a previous call to scif_pin_pages().
- * After calling scif_unpin_pages(), it is an error to pass pinned_pages to
- * scif_register_pinned_pages().
- *
- * Return:
- * Upon successful completion, scif_unpin_pages() returns 0; otherwise the
- * negative of one of the following errors is returned.
- *
- * Errors:
- * EINVAL - pinned_pages is not valid
- */
-int scif_unpin_pages(scif_pinned_pages_t pinned_pages);
-
-/**
- * scif_register_pinned_pages() - Mark a memory region for remote access.
- * @epd: endpoint descriptor
- * @pinned_pages: Handle to pinned pages
- * @offset: Registered address space offset
- * @map_flags: Flags which control where pages are mapped
- *
- * The scif_register_pinned_pages() function opens a window, a range of whole
- * pages of the registered address space of the endpoint epd, starting at
- * offset po. The value of po, further described below, is a function of the
- * parameters offset and pinned_pages, and the value of map_flags. Each page of
- * the window represents a corresponding physical memory page of the range
- * represented by pinned_pages; the length of the window is the same as the
- * length of range represented by pinned_pages. A successful
- * scif_register_pinned_pages() call returns po as the return value.
- *
- * When SCIF_MAP_FIXED is set in the map_flags argument, po will be offset
- * exactly, and offset is constrained to be a multiple of the page size. The
- * mapping established by scif_register_pinned_pages() will not replace any
- * existing registration; an error is returned if any page of the new window
- * would intersect an existing window.
- *
- * When SCIF_MAP_FIXED is not set, the implementation uses offset in an
- * implementation-defined manner to arrive at po. The po so chosen will be an
- * area of the registered address space that the implementation deems suitable
- * for a mapping of the required size. An offset value of 0 is interpreted as
- * granting the implementation complete freedom in selecting po, subject to
- * constraints described below. A non-zero value of offset is taken to be a
- * suggestion of an offset near which the mapping should be placed. When the
- * implementation selects a value for po, it does not replace any extant
- * window. In all cases, po will be a multiple of the page size.
- *
- * The physical pages which are so represented by a window are available for
- * access in calls to scif_get_pages(), scif_readfrom(), scif_writeto(),
- * scif_vreadfrom(), and scif_vwriteto(). While a window is registered, the
- * physical pages represented by the window will not be reused by the memory
- * subsystem for any other purpose. Note that the same physical page may be
- * represented by multiple windows.
- *
- * Windows created by scif_register_pinned_pages() are unregistered by
- * scif_unregister().
- *
- * The map_flags argument can be set to SCIF_MAP_FIXED which interprets a
- * fixed offset.
- *
- * Return:
- * Upon successful completion, scif_register_pinned_pages() returns the offset
- * at which the mapping was placed (po); otherwise the negative of one of the
- * following errors is returned.
- *
- * Errors:
- * EADDRINUSE - SCIF_MAP_FIXED is set in map_flags and pages in the new window
- * would intersect an existing window
- * EAGAIN - The mapping could not be performed due to lack of resources
- * ECONNRESET - Connection reset by peer
- * EINVAL - map_flags is invalid, or SCIF_MAP_FIXED is set in map_flags, and
- * offset is not a multiple of the page size, or offset is negative
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOMEM - Not enough space
- * ENOTCONN - The endpoint is not connected
- */
-off_t scif_register_pinned_pages(scif_epd_t epd,
- scif_pinned_pages_t pinned_pages,
- off_t offset, int map_flags);
-
-/**
- * scif_get_pages() - Add references to remote registered pages
- * @epd: endpoint descriptor
- * @offset: remote registered offset
- * @len: length of range of pages
- * @pages: returned scif_range structure
- *
- * scif_get_pages() returns the addresses of the physical pages represented by
- * those pages of the registered address space of the peer of epd, starting at
- * offset and continuing for len bytes. offset and len are constrained to be
- * multiples of the page size.
- *
- * All of the pages in the specified range [offset, offset + len - 1] must be
- * within a single window of the registered address space of the peer of epd.
- *
- * The addresses are returned as a virtually contiguous array pointed to by the
- * phys_addr component of the scif_range structure whose address is returned in
- * pages. The nr_pages component of scif_range is the length of the array. The
- * prot_flags component of scif_range holds the protection flag value passed
- * when the pages were registered.
- *
- * Each physical page whose address is returned by scif_get_pages() remains
- * available and will not be released for reuse until the scif_range structure
- * is returned in a call to scif_put_pages(). The scif_range structure returned
- * by scif_get_pages() must be unmodified.
- *
- * It is an error to call scif_close() on an endpoint on which a scif_range
- * structure of that endpoint has not been returned to scif_put_pages().
- *
- * Return:
- * Upon successful completion, scif_get_pages() returns 0; otherwise the
- * negative of one of the following errors is returned.
- * Errors:
- * ECONNRESET - Connection reset by peer.
- * EINVAL - offset is not a multiple of the page size, or offset is negative, or
- * len is not a multiple of the page size
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- * ENXIO - Offsets in the range [offset, offset + len - 1] are invalid
- * for the registered address space of the peer epd
- */
-int scif_get_pages(scif_epd_t epd, off_t offset, size_t len,
- struct scif_range **pages);
-
-/**
- * scif_put_pages() - Remove references from remote registered pages
- * @pages: pages to be returned
- *
- * scif_put_pages() releases a scif_range structure previously obtained by
- * calling scif_get_pages(). The physical pages represented by pages may
- * be reused when the window which represented those pages is unregistered.
- * Therefore, those pages must not be accessed after calling scif_put_pages().
- *
- * Return:
- * Upon successful completion, scif_put_pages() returns 0; otherwise the
- * negative of one of the following errors is returned.
- * Errors:
- * EINVAL - pages does not point to a valid scif_range structure, or
- * the scif_range structure pointed to by pages was already returned
- * ENODEV - The remote node is lost or existed, but is not currently in the
- * network since it may have crashed
- * ENOTCONN - The endpoint is not connected
- */
-int scif_put_pages(struct scif_range *pages);
-
-/**
- * scif_poll() - Wait for some event on an endpoint
- * @epds: Array of endpoint descriptors
- * @nepds: Length of epds
- * @timeout: Upper limit on time for which scif_poll() will block
- *
- * scif_poll() waits for one of a set of endpoints to become ready to perform
- * an I/O operation.
- *
- * The epds argument specifies the endpoint descriptors to be examined and the
- * events of interest for each endpoint descriptor. epds is a pointer to an
- * array with one member for each open endpoint descriptor of interest.
- *
- * The number of items in the epds array is specified in nepds. The epd field
- * of scif_pollepd is an endpoint descriptor of an open endpoint. The field
- * events is a bitmask specifying the events which the application is
- * interested in. The field revents is an output parameter, filled by the
- * kernel with the events that actually occurred. The bits returned in revents
- * can include any of those specified in events, or one of the values POLLERR,
- * POLLHUP, or POLLNVAL. (These three bits are meaningless in the events
- * field, and will be set in the revents field whenever the corresponding
- * condition is true.)
- *
- * If none of the events requested (and no error) has occurred for any of the
- * endpoint descriptors, then scif_poll() blocks until one of the events occurs.
- *
- * The timeout argument specifies an upper limit on the time for which
- * scif_poll() will block, in milliseconds. Specifying a negative value in
- * timeout means an infinite timeout.
- *
- * The following bits may be set in events and returned in revents.
- * POLLIN - Data may be received without blocking. For a connected
- * endpoint, this means that scif_recv() may be called without blocking. For a
- * listening endpoint, this means that scif_accept() may be called without
- * blocking.
- * POLLOUT - Data may be sent without blocking. For a connected endpoint, this
- * means that scif_send() may be called without blocking. POLLOUT may also be
- * used to block waiting for a non-blocking connect to complete. This bit value
- * has no meaning for a listening endpoint and is ignored if specified.
- *
- * The following bits are only returned in revents, and are ignored if set in
- * events.
- * POLLERR - An error occurred on the endpoint
- * POLLHUP - The connection to the peer endpoint was disconnected
- * POLLNVAL - The specified endpoint descriptor is invalid.
- *
- * Return:
- * Upon successful completion, scif_poll() returns a non-negative value. A
- * positive value indicates the total number of endpoint descriptors that have
- * been selected (that is, endpoint descriptors for which the revents member is
- * non-zero). A value of 0 indicates that the call timed out and no endpoint
- * descriptors have been selected. Otherwise in user mode -1 is returned and
- * errno is set to indicate the error; in kernel mode the negative of one of
- * the following errors is returned.
- *
- * Errors:
- * EINTR - A signal occurred before any requested event
- * EINVAL - The nepds argument is greater than {OPEN_MAX}
- * ENOMEM - There was no space to allocate file descriptor tables
- */
-int scif_poll(struct scif_pollepd *epds, unsigned int nepds, long timeout);
-
-/**
- * scif_client_register() - Register a SCIF client
- * @client: client to be registered
- *
- * scif_client_register() registers a SCIF client. The probe() method
- * of the client is called when SCIF peer devices come online and the
- * remove() method is called when the peer devices disappear.
- *
- * Return:
- * Upon successful completion, scif_client_register() returns a non-negative
- * value. Otherwise the return value is the same as subsys_interface_register()
- * in the kernel.
- */
-int scif_client_register(struct scif_client *client);
-
-/**
- * scif_client_unregister() - Unregister a SCIF client
- * @client: client to be unregistered
- *
- * scif_client_unregister() unregisters a SCIF client.
- *
- * Return:
- * None
- */
-void scif_client_unregister(struct scif_client *client);
-
-#endif /* __SCIF_H__ */
--
2.11.0
From: Janosch Frank <frankja(a)de.ibm.com>
Userspace could have munmapped the area before doing unmapping from the
gmap. This would leave us with a valid vmaddr, but an invalid vma from
which we would try to zap memory.
Let's check before using the vma.
Fixes: 1e133ab296f3 ("s390/mm: split arch/s390/mm/pgtable.c")
Signed-off-by: Janosch Frank <frankja(a)linux.ibm.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
CC: stable(a)vger.kernel.org # 4.6+
---
arch/s390/mm/gmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index cb2cd04..b6c85b7 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -686,6 +686,8 @@ void gmap_discard(struct gmap *gmap, unsigned long from, unsigned long to)
vmaddr |= gaddr & ~PMD_MASK;
/* Find vma in the parent mm */
vma = find_vma(gmap->mm, vmaddr);
+ if (!vma)
+ continue;
size = min(to - gaddr, PMD_SIZE - (gaddr & ~PMD_MASK));
zap_page_range(vma, vmaddr, size, NULL);
}
--
2.7.4
[partial backport upstream 760db29bdc97b73ff60b091315ad787b1deb5cf5]
Upon invocation, lan78xx_init_mac_address() checks that the mac address present
in the RX_ADDRL & RX_ADDRH registers is a valid address, if not, it first tries
to read a new address from an external eeprom or the otp area, and in case both
read fail (or the address read back is invalid), it randomly generates a new
one.
Unfortunately, due to the way the above logic is laid out,
if both read_eeprom() and read_otp() fail, a new mac address is correctly
generated but is never written back to RX_ADDRL & RX_ADDRH, leaving the chip in an
incosistent state and with an invalid mac address (e.g. the nic appears to be
completely dead, and doesn't receive any packet, etc):
lan78xx_init_mac_address()
...
if (lan78xx_read_eeprom(addr ...) || lan78xx_read_otp(addr ...)) {
if (is_valid_ether_addr(addr) {
// nop...
} else {
random_ether_addr(addr);
}
// correctly writes back the new address
lan78xx_write_reg(RX_ADDRL, addr ...);
lan78xx_write_reg(RX_ADDRH, addr ...);
} else {
// XXX if both eeprom and otp read fail, we land here and skip
// XXX the RX_ADDRL & RX_ADDRH update completely
random_ether_addr(addr);
}
This bug went unnoticed because lan78xx_read_otp() was buggy itself and would
never fail, up until 4bfc338 "lan78xx: Correctly indicate invalid OTP"
fixed it and as a side effect uncovered this bug.
4.18+ is fine, since the bug was implicitly fixed in 760db29 "lan78xx: Read MAC
address from DT if present" when the address change logic was reorganized, but
it's still present in all stable trees below that: linux-4.4.y, linux-4.9.y,
linux-4.14.y, etc up to linux-4.18.y (not included).
Signed-off-by: Paolo Pisati <p.pisati(a)gmail.com>
---
drivers/net/usb/lan78xx.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 50e2e10a..114dc55 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1660,13 +1660,6 @@ static void lan78xx_init_mac_address(struct lan78xx_net *dev)
netif_dbg(dev, ifup, dev->net,
"MAC address set to random addr");
}
-
- addr_lo = addr[0] | (addr[1] << 8) |
- (addr[2] << 16) | (addr[3] << 24);
- addr_hi = addr[4] | (addr[5] << 8);
-
- ret = lan78xx_write_reg(dev, RX_ADDRL, addr_lo);
- ret = lan78xx_write_reg(dev, RX_ADDRH, addr_hi);
} else {
/* generate random MAC */
random_ether_addr(addr);
@@ -1674,6 +1667,11 @@ static void lan78xx_init_mac_address(struct lan78xx_net *dev)
"MAC address set to random addr");
}
}
+ addr_lo = addr[0] | (addr[1] << 8) | (addr[2] << 16) | (addr[3] << 24);
+ addr_hi = addr[4] | (addr[5] << 8);
+
+ ret = lan78xx_write_reg(dev, RX_ADDRL, addr_lo);
+ ret = lan78xx_write_reg(dev, RX_ADDRH, addr_hi);
ret = lan78xx_write_reg(dev, MAF_LO(0), addr_lo);
ret = lan78xx_write_reg(dev, MAF_HI(0), addr_hi | MAF_HI_VALID_);
--
2.7.4
This is the start of the stable review cycle for the 4.9.134 release.
There are 71 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu Oct 18 17:05:18 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.134-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.134-rc1
Dan Carpenter <dan.carpenter(a)oracle.com>
ipv4: frags: precedence bug in ip_expire()
Taehee Yoo <ap420073(a)gmail.com>
ip: frags: fix crash in ip_do_fragment()
Peter Oskolkov <posk(a)google.com>
ip: process in-order fragments efficiently
Peter Oskolkov <posk(a)google.com>
ip: add helpers to process in-order fragments faster.
Peter Oskolkov <posk(a)google.com>
ip: use rb trees for IP frag queue.
Eric Dumazet <edumazet(a)google.com>
net: add rb_to_skb() and other rb tree helpers
Eric Dumazet <edumazet(a)google.com>
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
Florian Westphal <fw(a)strlen.de>
ipv6: defrag: drop non-last frags smaller than min mtu
Peter Oskolkov <posk(a)google.com>
net: modify skb_rbtree_purge to return the truesize of all purged skbs.
Eric Dumazet <edumazet(a)google.com>
net: speed up skb_rbtree_purge()
Peter Oskolkov <posk(a)google.com>
ip: discard IPv4 datagrams with overlapping segments.
Eric Dumazet <edumazet(a)google.com>
inet: frags: fix ip6frag_low_thresh boundary
Eric Dumazet <edumazet(a)google.com>
inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
Eric Dumazet <edumazet(a)google.com>
inet: frags: reorganize struct netns_frags
Eric Dumazet <edumazet(a)google.com>
rhashtable: reorganize struct rhashtable layout
Eric Dumazet <edumazet(a)google.com>
ipv6: frags: rewrite ip6_expire_frag_queue()
Eric Dumazet <edumazet(a)google.com>
inet: frags: do not clone skb in ip_expire()
Eric Dumazet <edumazet(a)google.com>
inet: frags: break the 2GB limit for frags storage
Eric Dumazet <edumazet(a)google.com>
inet: frags: remove inet_frag_maybe_warn_overflow()
Eric Dumazet <edumazet(a)google.com>
inet: frags: get rif of inet_frag_evicting()
Eric Dumazet <edumazet(a)google.com>
inet: frags: remove some helpers
Eric Dumazet <edumazet(a)google.com>
inet: frags: use rhashtables for reassembly units
Eric Dumazet <edumazet(a)google.com>
rhashtable: add schedule points
Eric Dumazet <edumazet(a)google.com>
ipv6: export ip6 fragments sysctl to unprivileged users
Eric Dumazet <edumazet(a)google.com>
inet: frags: refactor lowpan_net_frag_init()
Eric Dumazet <edumazet(a)google.com>
inet: frags: refactor ipv6_frag_init()
Eric Dumazet <edumazet(a)google.com>
inet: frags: refactor ipfrag_init()
Eric Dumazet <edumazet(a)google.com>
inet: frags: add a pointer to struct netns_frags
Eric Dumazet <edumazet(a)google.com>
inet: frags: change inet_frags_init_net() return value
Eric Dumazet <edumazet(a)google.com>
inet: make sure to grab rcu_read_lock before using ireq->ireq_opt
Eric Dumazet <edumazet(a)google.com>
tcp/dccp: fix lockdep issue when SYN is backlogged
Eric Dumazet <edumazet(a)google.com>
rtnl: limit IFLA_NUM_TX_QUEUES and IFLA_NUM_RX_QUEUES to 4096
Florian Fainelli <f.fainelli(a)gmail.com>
net: systemport: Fix wake-up interrupt race during resume
Maxime Chevallier <maxime.chevallier(a)bootlin.com>
net: mvpp2: Extract the correct ethtype from the skb for tx csum offload
Florian Fainelli <f.fainelli(a)gmail.com>
net: dsa: bcm_sf2: Fix unbind ordering
Ido Schimmel <idosch(a)mellanox.com>
team: Forbid enslaving team device to itself
Giacinto Cifelli <gciofono(a)gmail.com>
qmi_wwan: Added support for Gemalto's Cinterion ALASxx WWAN interface
Shahed Shaikh <shahed.shaikh(a)cavium.com>
qlcnic: fix Tx descriptor corruption on 82xx devices
Yu Zhao <yuzhao(a)google.com>
net/usb: cancel pending work when unbinding smsc75xx
Sean Tranchetti <stranche(a)codeaurora.org>
netlabel: check for IPV4MASK in addrinfo_get
Jeff Barnhill <0xeffeff(a)gmail.com>
net/ipv6: Display all addresses in output of /proc/net/if_inet6
Sabrina Dubroca <sd(a)queasysnail.net>
net: ipv4: update fnhe_pmtu when first hop's MTU changes
Yunsheng Lin <linyunsheng(a)huawei.com>
net: hns: fix for unmapping problem when SMMU is on
Florian Fainelli <f.fainelli(a)gmail.com>
net: dsa: bcm_sf2: Call setup during switch resume
Wei Wang <weiwan(a)google.com>
ipv6: take rcu lock in rawv6_send_hdrinc()
Eric Dumazet <edumazet(a)google.com>
ipv4: fix use-after-free in ip_cmsg_recv_dstaddr()
Paolo Abeni <pabeni(a)redhat.com>
ip_tunnel: be careful when accessing the inner header
Paolo Abeni <pabeni(a)redhat.com>
ip6_tunnel: be careful when accessing the inner header
Mahesh Bandewar <maheshb(a)google.com>
bonding: avoid possible dead-lock
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix TX timeout during netpoll.
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: Don't print a warning when setting link state for disabled ports
Edgar Cherkasov <echerkasov(a)dev.rtsoft.ru>
i2c: i2c-scmi: fix for i2c_smbus_write_block_data
Jan Kara <jack(a)suse.cz>
mm: Preserve _PAGE_DEVMAP across mprotect() calls
Adrian Hunter <adrian.hunter(a)intel.com>
perf script python: Fix export-to-postgresql.py occasional failure
Mikulas Patocka <mpatocka(a)redhat.com>
mach64: detect the dot clock divider correctly on sparc
Paul Burton <paul.burton(a)mips.com>
MIPS: VDSO: Always map near top of user memory
Jann Horn <jannh(a)google.com>
mm/vmstat.c: fix outdated vmstat_text
Daniel Rosenberg <drosen(a)google.com>
ext4: Fix error code in ext4_xattr_set_entry()
Amber Lin <Amber.Lin(a)amd.com>
drm/amdgpu: Fix SDMA HQD destroy error on gfx_v7
Vitaly Kuznetsov <vkuznets(a)redhat.com>
x86/kvm/lapic: always disable MMIO interface in x2APIC mode
Nicolas Ferre <nicolas.ferre(a)microchip.com>
ARM: dts: at91: add new compatibility string for macb on sama5d3
Nicolas Ferre <nicolas.ferre(a)microchip.com>
net: macb: disable scatter-gather for macb on sama5d3
Jongsung Kim <neidhard.kim(a)lge.com>
stmmac: fix valid numbers of unicast filter entries
Yu Zhao <yuzhao(a)google.com>
sound: enable interrupt after dma buffer initialization
Dan Carpenter <dan.carpenter(a)oracle.com>
scsi: qla2xxx: Fix an endian bug in fcpcmd_is_corrupted()
Laura Abbott <labbott(a)redhat.com>
scsi: iscsi: target: Don't use stack buffer for scatterlist
Tony Lindgren <tony(a)atomide.com>
mfd: omap-usb-host: Fix dts probe of children
Lei Yang <Lei.Yang(a)windriver.com>
selftests: memory-hotplug: add required configs
Lei Yang <Lei.Yang(a)windriver.com>
selftests/efivarfs: add required kernel configs
Danny Smith <danny.smith(a)axis.com>
ASoC: sigmadsp: safeload should not have lower byte limit
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: wm8804: Add ACPI support
-------------
Diffstat:
Documentation/devicetree/bindings/net/macb.txt | 1 +
Documentation/networking/ip-sysctl.txt | 13 +-
Makefile | 4 +-
arch/arm/boot/dts/sama5d3_emac.dtsi | 2 +-
arch/mips/include/asm/processor.h | 10 +-
arch/mips/kernel/process.c | 25 +
arch/mips/kernel/vdso.c | 18 +-
arch/powerpc/include/asm/book3s/64/pgtable.h | 4 +-
arch/x86/include/asm/pgtable_types.h | 2 +-
arch/x86/include/uapi/asm/kvm.h | 1 +
arch/x86/kvm/lapic.c | 22 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 +-
drivers/i2c/busses/i2c-scmi.c | 1 +
drivers/mfd/omap-usb-host.c | 11 +-
drivers/net/bonding/bond_main.c | 43 +-
drivers/net/dsa/bcm_sf2.c | 12 +-
drivers/net/ethernet/broadcom/bcmsysport.c | 22 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 13 +-
drivers/net/ethernet/cadence/macb.c | 8 +
drivers/net/ethernet/hisilicon/hns/hnae.c | 2 +-
drivers/net/ethernet/hisilicon/hns/hns_enet.c | 30 +-
drivers/net/ethernet/marvell/mvpp2.c | 10 +-
drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 8 +-
.../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c | 3 +-
.../net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.h | 3 +-
drivers/net/ethernet/qlogic/qlcnic/qlcnic_hw.h | 3 +-
drivers/net/ethernet/qlogic/qlcnic/qlcnic_io.c | 12 +-
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 5 +-
drivers/net/team/team.c | 5 +
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/usb/smsc75xx.c | 1 +
drivers/scsi/qla2xxx/qla_target.h | 4 +-
drivers/target/iscsi/iscsi_target.c | 22 +-
drivers/usb/host/xhci-hub.c | 18 +-
drivers/video/fbdev/aty/atyfb.h | 3 +-
drivers/video/fbdev/aty/atyfb_base.c | 7 +-
drivers/video/fbdev/aty/mach64_ct.c | 10 +-
fs/ext4/xattr.c | 2 +-
include/linux/netdevice.h | 7 +
include/linux/rhashtable.h | 4 +-
include/linux/skbuff.h | 34 +-
include/net/bonding.h | 7 +-
include/net/inet_frag.h | 133 +++--
include/net/inet_sock.h | 6 -
include/net/ip.h | 1 -
include/net/ip_fib.h | 1 +
include/net/ipv6.h | 26 +-
include/uapi/linux/snmp.h | 1 +
lib/rhashtable.c | 5 +-
mm/vmstat.c | 1 -
net/core/dev.c | 28 +-
net/core/rtnetlink.c | 6 +
net/core/skbuff.c | 31 +-
net/dccp/input.c | 4 +-
net/dccp/ipv4.c | 4 +-
net/ieee802154/6lowpan/6lowpan_i.h | 26 +-
net/ieee802154/6lowpan/reassembly.c | 148 +++---
net/ipv4/fib_frontend.c | 12 +-
net/ipv4/fib_semantics.c | 50 ++
net/ipv4/inet_connection_sock.c | 5 +-
net/ipv4/inet_fragment.c | 379 +++-----------
net/ipv4/ip_fragment.c | 573 ++++++++++++---------
net/ipv4/ip_sockglue.c | 3 +-
net/ipv4/ip_tunnel.c | 9 +
net/ipv4/proc.c | 7 +-
net/ipv4/tcp_input.c | 37 +-
net/ipv4/tcp_ipv4.c | 4 +-
net/ipv6/addrconf.c | 4 +-
net/ipv6/ip6_tunnel.c | 13 +-
net/ipv6/netfilter/nf_conntrack_reasm.c | 100 ++--
net/ipv6/proc.c | 5 +-
net/ipv6/raw.c | 29 +-
net/ipv6/reassembly.c | 212 ++++----
net/netlabel/netlabel_unlabeled.c | 3 +-
sound/hda/hdac_controller.c | 8 +-
sound/soc/codecs/sigmadsp.c | 3 +-
sound/soc/codecs/wm8804-i2c.c | 15 +-
tools/perf/scripts/python/export-to-postgresql.py | 9 +
tools/testing/selftests/efivarfs/config | 1 +
tools/testing/selftests/memory-hotplug/config | 1 +
80 files changed, 1185 insertions(+), 1133 deletions(-)
Hello,
Please picked up this patch for linux 4.4 (backported version).
Indeed, this code will be beneficial to the GNU/Linux distributions that use a longterm kernel.
Compiled/tested without problem.
Thank.
[ Upstream commit 30aba6656f61ed44cba445a3c0d38b296fa9e8f5 ]
From: Salvatore Mesoraca <s.mesoraca16(a)gmail.com>
Date: Thu, 23 Aug 2018 17:00:35 -0700
Subject: namei: allow restricted O_CREAT of FIFOs and regular files
Disallows open of FIFOs or regular files not owned by the user in world
writable sticky directories, unless the owner is the same as that of the
directory or the file is opened without the O_CREAT flag. The purpose
is to make data spoofing attacks harder. This protection can be turned
on and off separately for FIFOs and regular files via sysctl, just like
the symlinks/hardlinks protection. This patch is based on Openwall's
"HARDEN_FIFO" feature by Solar Designer.
This is a brief list of old vulnerabilities that could have been prevented
by this feature, some of them even allow for privilege escalation:
CVE-2000-1134
CVE-2007-3852
CVE-2008-0525
CVE-2009-0416
CVE-2011-4834
CVE-2015-1838
CVE-2015-7442
CVE-2016-7489
This list is not meant to be complete. It's difficult to track down all
vulnerabilities of this kind because they were often reported without any
mention of this particular attack vector. In fact, before
hardlinks/symlinks restrictions, fifos/regular files weren't the favorite
vehicle to exploit them.
[s.mesoraca16(a)gmail.com: fix bug reported by Dan Carpenter]
Link: https://lkml.kernel.org/r/20180426081456.GA7060@mwanda
Link: http://lkml.kernel.org/r/1524829819-11275-1-git-send-email-s.mesoraca16@gma…
[keescook(a)chromium.org: drop pr_warn_ratelimited() in favor of audit changes in the future]
[keescook(a)chromium.org: adjust commit subjet]
Link: http://lkml.kernel.org/r/20180416175918.GA13494@beast
Signed-off-by: Salvatore Mesoraca <s.mesoraca16(a)gmail.com>
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Suggested-by: Solar Designer <solar(a)openwall.com>
Suggested-by: Kees Cook <keescook(a)chromium.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
[backported to 4.4 by Loic]
Cc: Loic <hackurx(a)opensec.fr>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
---
Documentation/sysctl/fs.txt | 36 ++++++++++++++++++++++++++++++
fs/namei.c | 53 ++++++++++++++++++++++++++++++++++++++++++---
include/linux/fs.h | 2 ++
kernel/sysctl.c | 18 +++++++++++++++
4 files changed, 106 insertions(+), 3 deletions(-)
diff -Nurp a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.txt
--- a/Documentation/sysctl/fs.txt 2018-10-20 09:52:38.000000000 +0200
+++ b/Documentation/sysctl/fs.txt 2018-10-23 18:08:20.398649373 +0200
@@ -34,7 +34,9 @@ Currently, these files are in /proc/sys/
- overflowgid
- pipe-user-pages-hard
- pipe-user-pages-soft
+- protected_fifos
- protected_hardlinks
+- protected_regular
- protected_symlinks
- suid_dumpable
- super-max
@@ -182,6 +184,24 @@ applied.
==============================================================
+protected_fifos:
+
+The intent of this protection is to avoid unintentional writes to
+an attacker-controlled FIFO, where a program expected to create a regular
+file.
+
+When set to "0", writing to FIFOs is unrestricted.
+
+When set to "1" don't allow O_CREAT open on FIFOs that we don't own
+in world writable sticky directories, unless they are owned by the
+owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+This protection is based on the restrictions in Openwall.
+
+==============================================================
+
protected_hardlinks:
A long-standing class of security issues is the hardlink-based
@@ -202,6 +222,22 @@ This protection is based on the restrict
==============================================================
+protected_regular:
+
+This protection is similar to protected_fifos, but it
+avoids writes to an attacker-controlled regular file, where a program
+expected to create one.
+
+When set to "0", writing to regular files is unrestricted.
+
+When set to "1" don't allow O_CREAT open on regular files that we
+don't own in world writable sticky directories, unless they are
+owned by the owner of the directory.
+
+When set to "2" it also applies to group writable sticky directories.
+
+==============================================================
+
protected_symlinks:
A long-standing class of security issues is the symlink-based
diff -Nurp a/fs/namei.c b/fs/namei.c
--- a/fs/namei.c 2018-10-20 09:52:38.000000000 +0200
+++ b/fs/namei.c 2018-10-23 18:09:35.450879869 +0200
@@ -869,6 +869,8 @@ static inline void put_link(struct namei
int sysctl_protected_symlinks __read_mostly = 0;
int sysctl_protected_hardlinks __read_mostly = 0;
+int sysctl_protected_fifos __read_mostly;
+int sysctl_protected_regular __read_mostly;
/**
* may_follow_link - Check symlink following for unsafe situations
@@ -982,6 +984,45 @@ static int may_linkat(struct path *link)
return -EPERM;
}
+/**
+ * may_create_in_sticky - Check whether an O_CREAT open in a sticky directory
+ * should be allowed, or not, on files that already
+ * exist.
+ * @dir: the sticky parent directory
+ * @inode: the inode of the file to open
+ *
+ * Block an O_CREAT open of a FIFO (or a regular file) when:
+ * - sysctl_protected_fifos (or sysctl_protected_regular) is enabled
+ * - the file already exists
+ * - we are in a sticky directory
+ * - we don't own the file
+ * - the owner of the directory doesn't own the file
+ * - the directory is world writable
+ * If the sysctl_protected_fifos (or sysctl_protected_regular) is set to 2
+ * the directory doesn't have to be world writable: being group writable will
+ * be enough.
+ *
+ * Returns 0 if the open is allowed, -ve on error.
+ */
+static int may_create_in_sticky(struct dentry * const dir,
+ struct inode * const inode)
+{
+ if ((!sysctl_protected_fifos && S_ISFIFO(inode->i_mode)) ||
+ (!sysctl_protected_regular && S_ISREG(inode->i_mode)) ||
+ likely(!(dir->d_inode->i_mode & S_ISVTX)) ||
+ uid_eq(inode->i_uid, dir->d_inode->i_uid) ||
+ uid_eq(current_fsuid(), inode->i_uid))
+ return 0;
+
+ if (likely(dir->d_inode->i_mode & 0002) ||
+ (dir->d_inode->i_mode & 0020 &&
+ ((sysctl_protected_fifos >= 2 && S_ISFIFO(inode->i_mode)) ||
+ (sysctl_protected_regular >= 2 && S_ISREG(inode->i_mode))))) {
+ return -EACCES;
+ }
+ return 0;
+}
+
static __always_inline
const char *get_link(struct nameidata *nd)
{
@@ -3166,9 +3207,15 @@ finish_open:
error = -ELOOP;
goto out;
}
- error = -EISDIR;
- if ((open_flag & O_CREAT) && d_is_dir(nd->path.dentry))
- goto out;
+ if (open_flag & O_CREAT) {
+ error = -EISDIR;
+ if (d_is_dir(nd->path.dentry))
+ goto out;
+ error = may_create_in_sticky(dir,
+ d_backing_inode(nd->path.dentry));
+ if (unlikely(error))
+ goto out;
+ }
error = -ENOTDIR;
if ((nd->flags & LOOKUP_DIRECTORY) && !d_can_lookup(nd->path.dentry))
goto out;
diff -Nurp a/include/linux/fs.h b/include/linux/fs.h
--- a/include/linux/fs.h 2018-10-20 09:52:38.000000000 +0200
+++ b/include/linux/fs.h 2018-10-23 18:08:20.402649386 +0200
@@ -65,6 +65,8 @@ extern struct inodes_stat_t inodes_stat;
extern int leases_enable, lease_break_time;
extern int sysctl_protected_symlinks;
extern int sysctl_protected_hardlinks;
+extern int sysctl_protected_fifos;
+extern int sysctl_protected_regular;
struct buffer_head;
typedef int (get_block_t)(struct inode *inode, sector_t iblock,
diff -Nurp a/kernel/sysctl.c b/kernel/sysctl.c
--- a/kernel/sysctl.c 2018-10-20 09:52:38.000000000 +0200
+++ b/kernel/sysctl.c 2018-10-23 18:08:20.402649386 +0200
@@ -1716,6 +1716,24 @@ static struct ctl_table fs_table[] = {
.extra2 = &one,
},
{
+ .procname = "protected_fifos",
+ .data = &sysctl_protected_fifos,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &two,
+ },
+ {
+ .procname = "protected_regular",
+ .data = &sysctl_protected_regular,
+ .maxlen = sizeof(int),
+ .mode = 0600,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &two,
+ },
+ {
.procname = "suid_dumpable",
.data = &suid_dumpable,
.maxlen = sizeof(int),
Please apply 0929983e49c8 (media: ov5640: fix framerate update") to
Linux 4.19.y stable along with the list of the following fixes in
order from top to bottom. This fixes multiple issues in the 4.19
kernel and allows my imx6q to sample the ov5640 module and stream to
the LCD.
Thank you,
Cc: <stable(a)vger.kernel.org> # 4.19.x: fb98e29ff1ea ("media: ov5640:
fix mode change regression")
Cc: <stable(a)vger.kernel.org> # 4.19.x: aa4bb8b8838 ("media: ov5640:
Re-work MIPI startup sequence")
Cc: <stable(a)vger.kernel.org> # 4.19.x: bad1774ed41 ("media: ov5640:
Fix timings setup code")
Cc: <stable(a)vger.kernel.org> # 4.19.x: dc29a1c187e ("media: ov5640:
fix exposure regression")
Cc: <stable(a)vger.kernel.org> # 4.19.x: 3cca8ef5f774 ("media: ov5640:
fix auto gain & exposure when changing mode")
Cc: <stable(a)vger.kernel.org> # 4.19.x: c2c3f42df4dd ("media: ov5640:
fix wrong binning value in exposure")
Cc: <stable(a)vger.kernel.org> # 4.19.x: a8f438c684ea ("media: ov5640:
fix auto controls values when switching to")
Cc: <stable(a)vger.kernel.org> # 4.19.x: 985cdcb08a04 ("media: ov5640:
fix restore of last mode set")
Signed-off-by: Adam Ford <aford173(a)gmail.com>