From: Daniel Borkmann <daniel(a)iogearbox.net>
commit f7bd9e36ee4a4ce38e1cddd7effe6c0d9943285b upstream
Add a bpf_check_basics_ok() and reject filters that are of invalid
size much earlier, so we don't do any useless work such as invoking
bpf_prog_alloc(). Currently, rejection happens in bpf_check_classic()
only, but it's really unnecessarily late and they should be rejected
at earliest point. While at it, also clean up one bpf_prog_size() to
make it consistent with the remaining invocations.
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Zubin Mithra <zsm(a)chromium.org>
---
Notes:
* Syzkaller reported a kernel BUG related to a kernel paging request in
bpf_prog_create with the following stacktrace when fuzzing a 4.4 kernel.
Call Trace:
[<ffffffff822ac1c8>] bpf_prog_create+0xc8/0x210 net/core/filter.c:1067
[<ffffffff82454699>] bpf_mt_check+0xb9/0x120 net/netfilter/xt_bpf.c:31
[<ffffffff82437db8>] xt_check_match+0x238/0x730 net/netfilter/x_tables.c:409
[<ffffffff82940254>] ebt_check_match net/bridge/netfilter/ebtables.c:380 [inline]
[<ffffffff82940254>] ebt_check_entry+0x844/0x1740 net/bridge/netfilter/ebtables.c:709
[<ffffffff82946842>] translate_table+0xcb2/0x1e80 net/bridge/netfilter/ebtables.c:946
[<ffffffff8294a918>] do_replace_finish+0x6e8/0x1fd0 net/bridge/netfilter/ebtables.c:1002
[<ffffffff8294c419>] do_replace+0x219/0x370 net/bridge/netfilter/ebtables.c:1145
[<ffffffff8294c649>] do_ebt_set_ctl+0xd9/0x110 net/bridge/netfilter/ebtables.c:1492
[<ffffffff8239a87c>] nf_sockopt net/netfilter/nf_sockopt.c:105 [inline]
[<ffffffff8239a87c>] nf_setsockopt+0x6c/0xc0 net/netfilter/nf_sockopt.c:114
[<ffffffff825ddeb6>] ip_setsockopt+0xa6/0xc0 net/ipv4/ip_sockglue.c:1226
[<ffffffff825fd3c7>] tcp_setsockopt+0x87/0xd0 net/ipv4/tcp.c:2701
[<ffffffff8220343a>] sock_common_setsockopt+0x9a/0xe0 net/core/sock.c:2690
[<ffffffff822006ed>] SYSC_setsockopt net/socket.c:1767 [inline]
[<ffffffff822006ed>] SyS_setsockopt+0x15d/0x240 net/socket.c:1746
[<ffffffff82a16f9b>] entry_SYSCALL_64_fastpath+0x18/0x94
* This patch resolves the following conflicts when applying to v4.4.y:
- __get_filter does not exist in v4.4. Instead the checks are moved into
__sk_attach_filter.
* This patch is present in v4.9.y.
* Tests run: Chrome OS tryjobs, Syzkaller reproducer
net/core/filter.c | 23 +++++++++++++++--------
1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c
index 1a9ded6af138..3c5f51198c41 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -742,6 +742,17 @@ static bool chk_code_allowed(u16 code_to_probe)
return codes[code_to_probe];
}
+static bool bpf_check_basics_ok(const struct sock_filter *filter,
+ unsigned int flen)
+{
+ if (filter == NULL)
+ return false;
+ if (flen == 0 || flen > BPF_MAXINSNS)
+ return false;
+
+ return true;
+}
+
/**
* bpf_check_classic - verify socket filter code
* @filter: filter to verify
@@ -762,9 +773,6 @@ static int bpf_check_classic(const struct sock_filter *filter,
bool anc_found;
int pc;
- if (flen == 0 || flen > BPF_MAXINSNS)
- return -EINVAL;
-
/* Check the filter code now */
for (pc = 0; pc < flen; pc++) {
const struct sock_filter *ftest = &filter[pc];
@@ -1057,7 +1065,7 @@ int bpf_prog_create(struct bpf_prog **pfp, struct sock_fprog_kern *fprog)
struct bpf_prog *fp;
/* Make sure new filter is there and in the right amounts. */
- if (fprog->filter == NULL)
+ if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
@@ -1104,7 +1112,7 @@ int bpf_prog_create_from_user(struct bpf_prog **pfp, struct sock_fprog *fprog,
int err;
/* Make sure new filter is there and in the right amounts. */
- if (fprog->filter == NULL)
+ if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
fp = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
@@ -1184,7 +1192,6 @@ int __sk_attach_filter(struct sock_fprog *fprog, struct sock *sk,
bool locked)
{
unsigned int fsize = bpf_classic_proglen(fprog);
- unsigned int bpf_fsize = bpf_prog_size(fprog->len);
struct bpf_prog *prog;
int err;
@@ -1192,10 +1199,10 @@ int __sk_attach_filter(struct sock_fprog *fprog, struct sock *sk,
return -EPERM;
/* Make sure new filter is there and in the right amounts. */
- if (fprog->filter == NULL)
+ if (!bpf_check_basics_ok(fprog->filter, fprog->len))
return -EINVAL;
- prog = bpf_prog_alloc(bpf_fsize, 0);
+ prog = bpf_prog_alloc(bpf_prog_size(fprog->len), 0);
if (!prog)
return -ENOMEM;
--
2.21.0.593.g511ec345e18-goog
Hello,
We ran automated tests on a patchset that was proposed for merging into this
kernel tree. The patches were applied to:
Kernel repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: d3da1f09fff2 - Linux 5.0.10
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: FAILED
We attempted to compile the kernel for multiple architectures, but the compile
failed on one or more architectures:
s390x: FAILED (see build-s390x.log.xz attachment)
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Merge testing
-------------
We cloned this repository and checked out the following commit:
Repo: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
Commit: d3da1f09fff2 - Linux 5.0.10
We then merged the patchset with `git am`:
netfilter-nf_tables-bogus-ebusy-when-deleting-set-af.patch
netfilter-nf_tables-bogus-ebusy-in-helper-removal-fr.patch
intel_th-gth-fix-an-off-by-one-in-output-unassigning.patch
powerpc-vdso32-fix-clock_monotonic-on-ppc64.patch
alsa-hda-realtek-move-to-act_init-state.patch
fs-proc-proc_sysctl.c-fix-a-null-pointer-dereference.patch
block-bfq-fix-use-after-free-in-bfq_bfqq_expire.patch
cifs-fix-memory-leak-in-smb2_read.patch
cifs-fix-page-reference-leak-with-readv-writev.patch
cifs-do-not-attempt-cifs-operation-on-smb2-rename-error.patch
tracing-fix-a-memory-leak-by-early-error-exit-in-trace_pid_write.patch
tracing-fix-buffer_ref-pipe-ops.patch
crypto-xts-fix-atomic-sleep-when-walking-skcipher.patch
crypto-lrw-fix-atomic-sleep-when-walking-skcipher.patch
gpio-eic-sprd-fix-incorrect-irq-type-setting-for-the-sync-eic.patch
zram-pass-down-the-bvec-we-need-to-read-into-in-the-work-struct.patch
lib-kconfig.debug-fix-build-error-without-config_block.patch
mips-scall64-o32-fix-indirect-syscall-number-load.patch
trace-fix-preempt_enable_no_resched-abuse.patch
mm-do-not-boost-watermarks-to-avoid-fragmentation-for-the-discontig-memory-model.patch
arm64-mm-ensure-tail-of-unaligned-initrd-is-reserved.patch
ib-rdmavt-fix-frwr-memory-registration.patch
rdma-mlx5-do-not-allow-the-user-to-write-to-the-clock-page.patch
rdma-mlx5-use-rdma_user_map_io-for-mapping-bar-pages.patch
rdma-ucontext-fix-regression-with-disassociate.patch
sched-numa-fix-a-possible-divide-by-zero.patch
ceph-only-use-d_name-directly-when-parent-is-locked.patch
ceph-ensure-d_name-stability-in-ceph_dentry_hash.patch
ceph-fix-ci-i_head_snapc-leak.patch
nfsd-don-t-release-the-callback-slot-unless-it-was-actually-held.patch
nfsd-wake-waiters-blocked-on-file_lock-before-deleting-it.patch
nfsd-wake-blocked-file-lock-waiters-before-sending-callback.patch
sunrpc-don-t-mark-uninitialised-items-as-valid.patch
perf-x86-intel-update-kbl-package-c-state-events-to-also-include-pc8-pc9-pc10-counters.patch
input-synaptics-rmi4-write-config-register-values-to-the-right-offset.patch
vfio-type1-limit-dma-mappings-per-container.patch
dmaengine-sh-rcar-dmac-with-cyclic-dma-residue-0-is-valid.patch
dmaengine-sh-rcar-dmac-fix-glitch-in-dmaengine_tx_status.patch
dmaengine-mediatek-cqdma-fix-wrong-register-usage-in-mtk_cqdma_start.patch
arm-8857-1-efi-enable-cp15-dmb-instructions-before-cleaning-the-cache.patch
powerpc-mm-radix-make-radix-require-hugetlb_page.patch
drm-vc4-fix-memory-leak-during-gpu-reset.patch
drm-ttm-fix-re-init-of-global-structures.patch
revert-drm-i915-fbdev-actually-configure-untiled-displays.patch
drm-vc4-fix-compilation-error-reported-by-kbuild-test-bot.patch
usb-add-new-usb-lpm-helpers.patch
usb-consolidate-lpm-checks-to-avoid-enabling-lpm-twice.patch
ext4-fix-some-error-pointer-dereferences.patch
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
kernel build: https://artifacts.cki-project.org/builds/aarch64/kernel-stable_queue-aarch6…
ppc64le:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
kernel build: https://artifacts.cki-project.org/builds/ppc64le/kernel-stable_queue-ppc64l…
s390x:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
build options: -j20 INSTALL_MOD_STRIP=1 targz-pkg
configuration: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
kernel build: https://artifacts.cki-project.org/builds/x86_64/kernel-stable_queue-x86_64-…
From: Todd Kjos <tkjos(a)android.com>
commit 5cec2d2e5839f9c0fec319c523a911e0a7fd299f upstream.
An munmap() on a binder device causes binder_vma_close() to be called
which clears the alloc->vma pointer.
If direct reclaim causes binder_alloc_free_page() to be called, there
is a race where alloc->vma is read into a local vma pointer and then
used later after the mm->mmap_sem is acquired. This can result in
calling zap_page_range() with an invalid vma which manifests as a
use-after-free in zap_page_range().
The fix is to check alloc->vma after acquiring the mmap_sem (which we
were acquiring anyway) and skip zap_page_range() if it has changed
to NULL.
Signed-off-by: Todd Kjos <tkjos(a)google.com>
Reviewed-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: stable <stable(a)vger.kernel.org> # 5.0, 4.19, 4.14
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Greg: This applies to 5.0, 4.19, 4.14. Not needed before 4.12.
drivers/android/binder_alloc.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 022cd80e80cc..a6e556bf62df 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -959,14 +959,13 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
index = page - alloc->pages;
page_addr = (uintptr_t)alloc->buffer + index * PAGE_SIZE;
+
+ mm = alloc->vma_vm_mm;
+ if (!mmget_not_zero(mm))
+ goto err_mmget;
+ if (!down_write_trylock(&mm->mmap_sem))
+ goto err_down_write_mmap_sem_failed;
vma = binder_alloc_get_vma(alloc);
- if (vma) {
- if (!mmget_not_zero(alloc->vma_vm_mm))
- goto err_mmget;
- mm = alloc->vma_vm_mm;
- if (!down_write_trylock(&mm->mmap_sem))
- goto err_down_write_mmap_sem_failed;
- }
list_lru_isolate(lru, item);
spin_unlock(lock);
@@ -979,10 +978,9 @@ enum lru_status binder_alloc_free_page(struct list_head *item,
PAGE_SIZE);
trace_binder_unmap_user_end(alloc, index);
-
- up_write(&mm->mmap_sem);
- mmput(mm);
}
+ up_write(&mm->mmap_sem);
+ mmput(mm);
trace_binder_unmap_kernel_start(alloc, index);
--
2.21.0.392.gf8f6787159e-goog
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8adddf349fda0d3de2f6bb41ddf838cbf36a8ad2 Mon Sep 17 00:00:00 2001
From: Michael Ellerman <mpe(a)ellerman.id.au>
Date: Tue, 16 Apr 2019 23:59:02 +1000
Subject: [PATCH] powerpc/mm/radix: Make Radix require HUGETLB_PAGE
Joel reported weird crashes using skiroot_defconfig, in his case we
jumped into an NX page:
kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002bff4f0
Looking at the disassembly, we had simply branched to that address:
c000000000c001bc 49fff335 bl c000000002bff4f0
But that didn't match the original kernel image:
c000000000c001bc 4bfff335 bl c000000000bff4f0 <kobject_get+0x8>
When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we
call radix__change_memory_range() late in boot to change page
protections. We do that both to mark rodata read only and also to mark
init text no-execute. That involves walking the kernel page tables,
and clearing _PAGE_WRITE or _PAGE_EXEC respectively.
With radix we may use hugepages for the linear mapping, so the code in
radix__change_memory_range() uses eg. pmd_huge() to test if it has
found a huge mapping, and if so it stops the page table walk and
changes the PMD permissions.
However if the kernel is built without HUGETLBFS support, pmd_huge()
is just a #define that always returns 0. That causes the code in
radix__change_memory_range() to incorrectly interpret the PMD value as
a pointer to a PTE page rather than as a PTE at the PMD level.
We can see this using `dv` in xmon which also uses pmd_huge():
0:mon> dv c000000000000000
pgd @ 0xc000000001740000
pgdp @ 0xc000000001740000 = 0x80000000ffffb009
pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009
pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f <- this is a PTE
ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d <- kernel text
The end result is we treat the value at 0xc000000000000100 as a PTE
and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code
at that address.
In Joel's specific case we cleared the sign bit in the offset of the
branch, causing a backward branch to turn into a forward branch which
caused us to branch into a non-executable page. However the exact
nature of the crash depends on kernel version, compiler version, and
other factors.
We need to fix radix__change_memory_range() to not use accessors that
depend on HUGETLBFS, but we also have radix memory hotplug code that
uses pmd_huge() etc that will also need fixing. So for now just
disallow the broken combination of Radix with HUGETLBFS disabled.
The only defconfig we have that is affected is skiroot_defconfig, so
turn on HUGETLBFS there so that it still gets Radix.
Fixes: 566ca99af026 ("powerpc/mm/radix: Add dummy radix_enabled()")
Cc: stable(a)vger.kernel.org # v4.7+
Reported-by: Joel Stanley <joel(a)jms.id.au>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
index 5ba131c30f6b..1bcd468ab422 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -266,6 +266,7 @@ CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_PROC_KCORE=y
+CONFIG_HUGETLBFS=y
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS=y
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 842b2c7e156a..50cd09b4e05d 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -324,7 +324,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
config PPC_RADIX_MMU
bool "Radix MMU Support"
- depends on PPC_BOOK3S_64
+ depends on PPC_BOOK3S_64 && HUGETLB_PAGE
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
default y
help
The frequency calculation was based on the current(max) frequency of the
CPU. However for low frequency, the value used was already the parent
frequency divided by a factor of 2.
Instead of using this frequency, this fix directly get the frequency from
the parent clock.
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Cc: <stable(a)vger.kernel.org>
Reported-by: Christian Neubert <christian.neubert.86(a)gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
---
drivers/cpufreq/armada-37xx-cpufreq.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
index ad4463e4266e..a0962463805e 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -373,11 +373,11 @@ static int __init armada37xx_cpufreq_driver_init(void)
struct armada_37xx_dvfs *dvfs;
struct platform_device *pdev;
unsigned long freq;
- unsigned int cur_frequency;
+ unsigned int cur_frequency, base_frequency;
struct regmap *nb_pm_base, *avs_base;
struct device *cpu_dev;
int load_lvl, ret;
- struct clk *clk;
+ struct clk *clk, *parent;
nb_pm_base =
syscon_regmap_lookup_by_compatible("marvell,armada-3700-nb-pm");
@@ -413,6 +413,22 @@ static int __init armada37xx_cpufreq_driver_init(void)
return PTR_ERR(clk);
}
+ parent = clk_get_parent(clk);
+ if (IS_ERR(parent)) {
+ dev_err(cpu_dev, "Cannot get parent clock for CPU0\n");
+ clk_put(clk);
+ return PTR_ERR(parent);
+ }
+
+ /* Get parent CPU frequency */
+ base_frequency = clk_get_rate(parent);
+
+ if (!base_frequency) {
+ dev_err(cpu_dev, "Failed to get parent clock rate for CPU\n");
+ clk_put(clk);
+ return -EINVAL;
+ }
+
/* Get nominal (current) CPU frequency */
cur_frequency = clk_get_rate(clk);
if (!cur_frequency) {
@@ -445,7 +461,7 @@ static int __init armada37xx_cpufreq_driver_init(void)
for (load_lvl = ARMADA_37XX_DVFS_LOAD_0; load_lvl < LOAD_LEVEL_NR;
load_lvl++) {
unsigned long u_volt = avs_map[dvfs->avs[load_lvl]] * 1000;
- freq = cur_frequency / dvfs->divider[load_lvl];
+ freq = base_frequency / dvfs->divider[load_lvl];
ret = dev_pm_opp_add(cpu_dev, freq, u_volt);
if (ret)
goto remove_opp;
--
2.20.1
On Mon, Apr 29, 2019 at 11:11:49AM +0200, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> RDMA/ucontext: Fix regression with disassociate
>
> to the 5.0-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> rdma-ucontext-fix-regression-with-disassociate.patch
> and it can be found in the queue-5.0 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Greg,
Please be aware that this patch has compilation issues on s390 platform.
https://patchwork.kernel.org/patch/10920895/#22610993
Thanks
>
>
> From 67f269b37f9b4d52c5e7f97acea26c0852e9b8a1 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgg(a)mellanox.com>
> Date: Tue, 16 Apr 2019 14:07:28 +0300
> Subject: RDMA/ucontext: Fix regression with disassociate
>
> From: Jason Gunthorpe <jgg(a)mellanox.com>
>
> commit 67f269b37f9b4d52c5e7f97acea26c0852e9b8a1 upstream.
>
> When this code was consolidated the intention was that the VMA would
> become backed by anonymous zero pages after the zap_vma_pte - however this
> very subtly relied on setting the vm_ops = NULL and clearing the VM_SHARED
> bits to transform the VMA into an anonymous VMA. Since the vm_ops was
> removed this broke.
>
> Now userspace gets a SIGBUS if it touches the vma after disassociation.
>
> Instead of converting the VMA to anonymous provide a fault handler that
> puts a zero'd page into the VMA when user-space touches it after
> disassociation.
>
> Cc: stable(a)vger.kernel.org
> Suggested-by: Andrea Arcangeli <aarcange(a)redhat.com>
> Fixes: 5f9794dc94f5 ("RDMA/ucontext: Add a core API for mmaping driver IO memory")
> Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
> Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
> Signed-off-by: Jason Gunthorpe <jgg(a)mellanox.com>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>
> ---
> drivers/infiniband/core/uverbs.h | 1
> drivers/infiniband/core/uverbs_main.c | 52 ++++++++++++++++++++++++++++++++--
> 2 files changed, 50 insertions(+), 3 deletions(-)
>
> --- a/drivers/infiniband/core/uverbs.h
> +++ b/drivers/infiniband/core/uverbs.h
> @@ -160,6 +160,7 @@ struct ib_uverbs_file {
>
> struct mutex umap_lock;
> struct list_head umaps;
> + struct page *disassociate_page;
>
> struct idr idr;
> /* spinlock protects write access to idr */
> --- a/drivers/infiniband/core/uverbs_main.c
> +++ b/drivers/infiniband/core/uverbs_main.c
> @@ -208,6 +208,9 @@ void ib_uverbs_release_file(struct kref
> kref_put(&file->async_file->ref,
> ib_uverbs_release_async_event_file);
> put_device(&file->device->dev);
> +
> + if (file->disassociate_page)
> + __free_pages(file->disassociate_page, 0);
> kfree(file);
> }
>
> @@ -876,9 +879,50 @@ static void rdma_umap_close(struct vm_ar
> kfree(priv);
> }
>
> +/*
> + * Once the zap_vma_ptes has been called touches to the VMA will come here and
> + * we return a dummy writable zero page for all the pfns.
> + */
> +static vm_fault_t rdma_umap_fault(struct vm_fault *vmf)
> +{
> + struct ib_uverbs_file *ufile = vmf->vma->vm_file->private_data;
> + struct rdma_umap_priv *priv = vmf->vma->vm_private_data;
> + vm_fault_t ret = 0;
> +
> + if (!priv)
> + return VM_FAULT_SIGBUS;
> +
> + /* Read only pages can just use the system zero page. */
> + if (!(vmf->vma->vm_flags & (VM_WRITE | VM_MAYWRITE))) {
> + vmf->page = ZERO_PAGE(vmf->vm_start);
> + get_page(vmf->page);
> + return 0;
> + }
> +
> + mutex_lock(&ufile->umap_lock);
> + if (!ufile->disassociate_page)
> + ufile->disassociate_page =
> + alloc_pages(vmf->gfp_mask | __GFP_ZERO, 0);
> +
> + if (ufile->disassociate_page) {
> + /*
> + * This VMA is forced to always be shared so this doesn't have
> + * to worry about COW.
> + */
> + vmf->page = ufile->disassociate_page;
> + get_page(vmf->page);
> + } else {
> + ret = VM_FAULT_SIGBUS;
> + }
> + mutex_unlock(&ufile->umap_lock);
> +
> + return ret;
> +}
> +
> static const struct vm_operations_struct rdma_umap_ops = {
> .open = rdma_umap_open,
> .close = rdma_umap_close,
> + .fault = rdma_umap_fault,
> };
>
> static struct rdma_umap_priv *rdma_user_mmap_pre(struct ib_ucontext *ucontext,
> @@ -888,6 +932,9 @@ static struct rdma_umap_priv *rdma_user_
> struct ib_uverbs_file *ufile = ucontext->ufile;
> struct rdma_umap_priv *priv;
>
> + if (!(vma->vm_flags & VM_SHARED))
> + return ERR_PTR(-EINVAL);
> +
> if (vma->vm_end - vma->vm_start != size)
> return ERR_PTR(-EINVAL);
>
> @@ -991,7 +1038,7 @@ void uverbs_user_mmap_disassociate(struc
> * at a time to get the lock ordering right. Typically there
> * will only be one mm, so no big deal.
> */
> - down_write(&mm->mmap_sem);
> + down_read(&mm->mmap_sem);
> if (!mmget_still_valid(mm))
> goto skip_mm;
> mutex_lock(&ufile->umap_lock);
> @@ -1005,11 +1052,10 @@ void uverbs_user_mmap_disassociate(struc
>
> zap_vma_ptes(vma, vma->vm_start,
> vma->vm_end - vma->vm_start);
> - vma->vm_flags &= ~(VM_SHARED | VM_MAYSHARE);
> }
> mutex_unlock(&ufile->umap_lock);
> skip_mm:
> - up_write(&mm->mmap_sem);
> + up_read(&mm->mmap_sem);
> mmput(mm);
> }
> }
>
>
> Patches currently in stable-queue which might be from jgg(a)mellanox.com are
>
> queue-5.0/rdma-ucontext-fix-regression-with-disassociate.patch
> queue-5.0/ib-rdmavt-fix-frwr-memory-registration.patch
> queue-5.0/rdma-mlx5-use-rdma_user_map_io-for-mapping-bar-pages.patch
> queue-5.0/rdma-mlx5-do-not-allow-the-user-to-write-to-the-clock-page.patch
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9fa246256e09dc30820524401cdbeeaadee94025 Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied(a)redhat.com>
Date: Wed, 24 Apr 2019 10:47:56 +1000
Subject: [PATCH] Revert "drm/i915/fbdev: Actually configure untiled displays"
This reverts commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5.
This commit is documented to break userspace X.org modesetting driver in certain configurations.
The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.
This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806
Signed-off-by: Dave Airlie <airlied(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # v3.19+
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index e8f694b57b8a..376ffe842e26 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -338,8 +338,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
bool *enabled, int width, int height)
{
struct drm_i915_private *dev_priv = to_i915(fb_helper->dev);
+ unsigned long conn_configured, conn_seq, mask;
unsigned int count = min(fb_helper->connector_count, BITS_PER_LONG);
- unsigned long conn_configured, conn_seq;
int i, j;
bool *save_enabled;
bool fallback = true, ret = true;
@@ -357,9 +357,10 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
drm_modeset_backoff(&ctx);
memcpy(save_enabled, enabled, count);
- conn_seq = GENMASK(count - 1, 0);
+ mask = GENMASK(count - 1, 0);
conn_configured = 0;
retry:
+ conn_seq = conn_configured;
for (i = 0; i < count; i++) {
struct drm_fb_helper_connector *fb_conn;
struct drm_connector *connector;
@@ -372,8 +373,7 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
if (conn_configured & BIT(i))
continue;
- /* First pass, only consider tiled connectors */
- if (conn_seq == GENMASK(count - 1, 0) && !connector->has_tile)
+ if (conn_seq == 0 && !connector->has_tile)
continue;
if (connector->status == connector_status_connected)
@@ -477,10 +477,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
conn_configured |= BIT(i);
}
- if (conn_configured != conn_seq) { /* repeat until no more are found */
- conn_seq = conn_configured;
+ if ((conn_configured & mask) != mask && conn_configured != conn_seq)
goto retry;
- }
/*
* If the BIOS didn't enable everything it could, fall back to have the
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9fa246256e09dc30820524401cdbeeaadee94025 Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied(a)redhat.com>
Date: Wed, 24 Apr 2019 10:47:56 +1000
Subject: [PATCH] Revert "drm/i915/fbdev: Actually configure untiled displays"
This reverts commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5.
This commit is documented to break userspace X.org modesetting driver in certain configurations.
The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.
This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806
Signed-off-by: Dave Airlie <airlied(a)redhat.com>
Cc: <stable(a)vger.kernel.org> # v3.19+
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index e8f694b57b8a..376ffe842e26 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -338,8 +338,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
bool *enabled, int width, int height)
{
struct drm_i915_private *dev_priv = to_i915(fb_helper->dev);
+ unsigned long conn_configured, conn_seq, mask;
unsigned int count = min(fb_helper->connector_count, BITS_PER_LONG);
- unsigned long conn_configured, conn_seq;
int i, j;
bool *save_enabled;
bool fallback = true, ret = true;
@@ -357,9 +357,10 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
drm_modeset_backoff(&ctx);
memcpy(save_enabled, enabled, count);
- conn_seq = GENMASK(count - 1, 0);
+ mask = GENMASK(count - 1, 0);
conn_configured = 0;
retry:
+ conn_seq = conn_configured;
for (i = 0; i < count; i++) {
struct drm_fb_helper_connector *fb_conn;
struct drm_connector *connector;
@@ -372,8 +373,7 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
if (conn_configured & BIT(i))
continue;
- /* First pass, only consider tiled connectors */
- if (conn_seq == GENMASK(count - 1, 0) && !connector->has_tile)
+ if (conn_seq == 0 && !connector->has_tile)
continue;
if (connector->status == connector_status_connected)
@@ -477,10 +477,8 @@ static bool intel_fb_initial_config(struct drm_fb_helper *fb_helper,
conn_configured |= BIT(i);
}
- if (conn_configured != conn_seq) { /* repeat until no more are found */
- conn_seq = conn_configured;
+ if ((conn_configured & mask) != mask && conn_configured != conn_seq)
goto retry;
- }
/*
* If the BIOS didn't enable everything it could, fall back to have the
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8adddf349fda0d3de2f6bb41ddf838cbf36a8ad2 Mon Sep 17 00:00:00 2001
From: Michael Ellerman <mpe(a)ellerman.id.au>
Date: Tue, 16 Apr 2019 23:59:02 +1000
Subject: [PATCH] powerpc/mm/radix: Make Radix require HUGETLB_PAGE
Joel reported weird crashes using skiroot_defconfig, in his case we
jumped into an NX page:
kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0xc000000002bff4f0
Looking at the disassembly, we had simply branched to that address:
c000000000c001bc 49fff335 bl c000000002bff4f0
But that didn't match the original kernel image:
c000000000c001bc 4bfff335 bl c000000000bff4f0 <kobject_get+0x8>
When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we
call radix__change_memory_range() late in boot to change page
protections. We do that both to mark rodata read only and also to mark
init text no-execute. That involves walking the kernel page tables,
and clearing _PAGE_WRITE or _PAGE_EXEC respectively.
With radix we may use hugepages for the linear mapping, so the code in
radix__change_memory_range() uses eg. pmd_huge() to test if it has
found a huge mapping, and if so it stops the page table walk and
changes the PMD permissions.
However if the kernel is built without HUGETLBFS support, pmd_huge()
is just a #define that always returns 0. That causes the code in
radix__change_memory_range() to incorrectly interpret the PMD value as
a pointer to a PTE page rather than as a PTE at the PMD level.
We can see this using `dv` in xmon which also uses pmd_huge():
0:mon> dv c000000000000000
pgd @ 0xc000000001740000
pgdp @ 0xc000000001740000 = 0x80000000ffffb009
pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009
pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f <- this is a PTE
ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d <- kernel text
The end result is we treat the value at 0xc000000000000100 as a PTE
and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code
at that address.
In Joel's specific case we cleared the sign bit in the offset of the
branch, causing a backward branch to turn into a forward branch which
caused us to branch into a non-executable page. However the exact
nature of the crash depends on kernel version, compiler version, and
other factors.
We need to fix radix__change_memory_range() to not use accessors that
depend on HUGETLBFS, but we also have radix memory hotplug code that
uses pmd_huge() etc that will also need fixing. So for now just
disallow the broken combination of Radix with HUGETLBFS disabled.
The only defconfig we have that is affected is skiroot_defconfig, so
turn on HUGETLBFS there so that it still gets Radix.
Fixes: 566ca99af026 ("powerpc/mm/radix: Add dummy radix_enabled()")
Cc: stable(a)vger.kernel.org # v4.7+
Reported-by: Joel Stanley <joel(a)jms.id.au>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
index 5ba131c30f6b..1bcd468ab422 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -266,6 +266,7 @@ CONFIG_UDF_FS=m
CONFIG_MSDOS_FS=m
CONFIG_VFAT_FS=m
CONFIG_PROC_KCORE=y
+CONFIG_HUGETLBFS=y
# CONFIG_MISC_FILESYSTEMS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set
CONFIG_NLS=y
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index 842b2c7e156a..50cd09b4e05d 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -324,7 +324,7 @@ config ARCH_ENABLE_SPLIT_PMD_PTLOCK
config PPC_RADIX_MMU
bool "Radix MMU Support"
- depends on PPC_BOOK3S_64
+ depends on PPC_BOOK3S_64 && HUGETLB_PAGE
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
default y
help
On 4/28/19 4:39 PM, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a "Fixes:" tag,
> fixing commit: 70800c3c0cc5 locking/rwsem: Scan the wait_list for readers only once.
>
> The bot has tested the following trees: v5.0.10, v4.19.37, v4.14.114, v4.9.171.
>
> v5.0.10: Failed to apply! Possible dependencies:
> 412f34a82ccf ("locking/qspinlock_stat: Track the no MCS node available case")
> 46ad0840b158 ("locking/rwsem: Remove arch specific rwsem files")
> 579afe866f52 ("xtensa: use generic spinlock/rwlock implementation")
> a8654596f037 ("locking/rwsem: Enable lock event counting")
> ad53fa10fa9e ("locking/qspinlock_stat: Introduce generic lockevent_*() counting APIs")
> c7580c1e8443 ("locking/rwsem: Move owner setting code from rwsem.c to rwsem.h")
> ce9084ba0d1d ("x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol")
> d682b596d993 ("locking/qspinlock: Handle > 4 slowpath nesting levels")
> eecec78f7777 ("locking/rwsem: Relocate rwsem_down_read_failed()")
> fb346fd9fc08 ("locking/lock_events: Make lock_events available for all archs & other locks")
>
> v4.19.37: Failed to apply! Possible dependencies:
> 0fa809ca7f81 ("locking/pvqspinlock: Extend node size when pvqspinlock is configured")
> 1222109a5363 ("locking/qspinlock_stat: Count instances of nested lock slowpaths")
> 412f34a82ccf ("locking/qspinlock_stat: Track the no MCS node available case")
> 46ad0840b158 ("locking/rwsem: Remove arch specific rwsem files")
> 4b486b535c33 ("locking/rwsem: Exit read lock slowpath if queue empty & no writer")
> 925b9cd1b89a ("locking/rwsem: Make owner store task pointer of last owning reader")
> a8654596f037 ("locking/rwsem: Enable lock event counting")
> ad53fa10fa9e ("locking/qspinlock_stat: Introduce generic lockevent_*() counting APIs")
> c7580c1e8443 ("locking/rwsem: Move owner setting code from rwsem.c to rwsem.h")
> ce9084ba0d1d ("x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol")
> d682b596d993 ("locking/qspinlock: Handle > 4 slowpath nesting levels")
> eecec78f7777 ("locking/rwsem: Relocate rwsem_down_read_failed()")
> fb346fd9fc08 ("locking/lock_events: Make lock_events available for all archs & other locks")
>
> v4.14.114: Failed to apply! Possible dependencies:
> 0fa809ca7f81 ("locking/pvqspinlock: Extend node size when pvqspinlock is configured")
> 11752adb68a3 ("locking/pvqspinlock: Implement hybrid PV queued/unfair locks")
> 1222109a5363 ("locking/qspinlock_stat: Count instances of nested lock slowpaths")
> 1958b5fc4010 ("x86/boot: Add early boot support when running with SEV active")
> 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> 271ca788774a ("arch: enable relative relocations for arm64, power and x86")
> 412f34a82ccf ("locking/qspinlock_stat: Track the no MCS node available case")
> 81d3dc9a349b ("locking/qspinlock: Add stat tracking for pending vs. slowpath")
> 94d49eb30e85 ("x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME")
> a8654596f037 ("locking/rwsem: Enable lock event counting")
> ad53fa10fa9e ("locking/qspinlock_stat: Introduce generic lockevent_*() counting APIs")
> ce9084ba0d1d ("x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol")
> d682b596d993 ("locking/qspinlock: Handle > 4 slowpath nesting levels")
> d7b417fa08d1 ("x86/mm: Add DMA support for SEV memory encryption")
> d8aa7eea78a1 ("x86/mm: Add Secure Encrypted Virtualization (SEV) support")
> dfaaec9033b8 ("x86: Add support for changing memory encryption attribute in early boot")
> fb346fd9fc08 ("locking/lock_events: Make lock_events available for all archs & other locks")
>
> v4.9.171: Failed to apply! Possible dependencies:
> 1a8b6d76dc5b ("net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering")
> 271ca788774a ("arch: enable relative relocations for arm64, power and x86")
> 29dee3c03abc ("locking/refcounts: Out-of-line everything")
> 40565b5aedd6 ("sched/cputime, powerpc, s390: Make scaled cputime arch specific")
> 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint")
> 51c9c0843993 ("powerpc/kprobes: Implement Optprobes")
> 5b9ff0278598 ("powerpc: Build-time sort the exception table")
> 65c059bcaa73 ("powerpc: Enable support for GCC plugins")
> 9fea59bd7ca5 ("powerpc/mm: Add support for runtime configuration of ASLR limits")
> a7d2475af7ae ("powerpc: Sort the selects under CONFIG_PPC")
> a8654596f037 ("locking/rwsem: Enable lock event counting")
> bd174169c7a1 ("locking/refcount: Add refcount_t API kernel-doc comments")
> ce9084ba0d1d ("x86: Make ARCH_USE_MEMREMAP_PROT a generic Kconfig symbol")
> d557d1b58b35 ("refcount: change EXPORT_SYMBOL markings")
> ebfa50df435e ("powerpc: Add helper to check if offset is within relative branch range")
> f405df5de317 ("refcount_t: Introduce a special purpose refcount type")
> fa769d3f58e6 ("powerpc/32: Enable HW_BREAKPOINT on BOOK3S")
> fb346fd9fc08 ("locking/lock_events: Make lock_events available for all archs & other locks")
> fd25d19f6b8d ("locking/refcount: Create unchecked atomic_t implementation")
>
>
> How should we proceed with this patch?
I will send out a version that will be easier to backport once this
patch lands in the mainline.
Cheers,
Longman
During my rwsem testing, it was found that after a down_read(), the
reader count may occasionally become 0 or even negative. Consequently,
a writer may steal the lock at that time and execute with the reader
in parallel thus breaking the mutual exclusion guarantee of the write
lock. In other words, both readers and writer can become rwsem owners
simultaneously.
The current reader wakeup code does it in one pass to clear waiter->task
and put them into wake_q before fully incrementing the reader count.
Once waiter->task is cleared, the corresponding reader may see it,
finish the critical section and do unlock to decrement the count before
the count is incremented. This is not a problem if there is only one
reader to wake up as the count has been pre-incremented by 1. It is
a problem if there are more than one readers to be woken up and writer
can steal the lock.
The wakeup is actually done in 2 passes before the v4.9 commit
70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only
once"). To fix this problem, the wakeup is now done in two passes
again. In the first pass, we collect the readers and count them. The
reader count is then fully incremented. In the second pass, the
waiter->task is then cleared and they are put into wake_q to be woken
up later.
Fixes: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/locking/rwsem-xadd.c | 45 +++++++++++++++++++++++++------------
1 file changed, 31 insertions(+), 14 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 6b3ee9948bf1..cab5b1f6b2de 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -130,6 +130,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
{
struct rwsem_waiter *waiter, *tmp;
long oldcount, woken = 0, adjustment = 0;
+ struct list_head wlist;
/*
* Take a peek at the queue head waiter such that we can determine
@@ -188,18 +189,44 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
* of the queue. We know that woken will be at least 1 as we accounted
* for above. Note we increment the 'active part' of the count by the
* number of readers before waking any processes up.
+ *
+ * We have to do wakeup in 2 passes to prevent the possibility that
+ * the reader count may be decremented before it is incremented. It
+ * is because the to-be-woken waiter may not have slept yet. So it
+ * may see waiter->task got cleared, finish its critical section and
+ * do an unlock before the reader count increment.
+ *
+ * 1) Collect the read-waiters in a separate list, count them and
+ * fully increment the reader count in rwsem.
+ * 2) For each waiters in the new list, clear waiter->task and
+ * put them into wake_q to be woken up later.
*/
+ INIT_LIST_HEAD(&wlist);
list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) {
- struct task_struct *tsk;
-
if (waiter->type == RWSEM_WAITING_FOR_WRITE)
break;
woken++;
- tsk = waiter->task;
+ list_move_tail(&waiter->list, &wlist);
+ }
+
+ adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
+ lockevent_cond_inc(rwsem_wake_reader, woken);
+ if (list_empty(&sem->wait_list)) {
+ /* hit end of list above */
+ adjustment -= RWSEM_WAITING_BIAS;
+ }
+
+ if (adjustment)
+ atomic_long_add(adjustment, &sem->count);
+
+ /* 2nd pass */
+ list_for_each_entry(waiter, &wlist, list) {
+ struct task_struct *tsk;
+ tsk = waiter->task;
get_task_struct(tsk);
- list_del(&waiter->list);
+
/*
* Ensure calling get_task_struct() before setting the reader
* waiter to nil such that rwsem_down_read_failed() cannot
@@ -213,16 +240,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
*/
wake_q_add_safe(wake_q, tsk);
}
-
- adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
- lockevent_cond_inc(rwsem_wake_reader, woken);
- if (list_empty(&sem->wait_list)) {
- /* hit end of list above */
- adjustment -= RWSEM_WAITING_BIAS;
- }
-
- if (adjustment)
- atomic_long_add(adjustment, &sem->count);
}
/*
--
2.18.1
Do you want web development and design?
We work full time working online.
We do not ask for advance payment.
Try for 7 days, If you like our work then only pay and continue long term.
Web Design , Development , PHP , Graphic Design , Rate @ 125 USD / Week
General Office Work (BPO) - Rate @ 90 US / Week ( Mon - Sat 40 Hrs per week ).
Our Expertise ( PHP | MySQL | HTML | CSS, Dreamweaver, Photoshop etc..)
We are always online on Skype(id-anjanindi) and email.
Please visit codelogics . com to see our previous work samples
If you are not interested we will never send you email again ever.
Thanks
Anjan Patra
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.
However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().
We have invented ways for addressing this kind of issue before, such as:
8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")
But still can't cover all cases, recently James reports another such
kind of issue:
https://marc.info/?l=linux-scsi&m=155389088124782&w=2
This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.
Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.
This approach follows typical design pattern wrt. kobject's release handler.
Cc: Dongli Zhang <dongli.zhang(a)oracle.com>
Cc: James Smart <james.smart(a)broadcom.com>
Cc: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: linux-scsi(a)vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen(a)oracle.com>,
Cc: Christoph Hellwig <hch(a)lst.de>,
Cc: James E . J . Bottomley <jejb(a)linux.vnet.ibm.com>,
Reported-by: James Smart <james.smart(a)broadcom.com>
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable(a)vger.kernel.org
Reviewed-by: Hannes Reinecke <hare(a)suse.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Tested-by: James Smart <james.smart(a)broadcom.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-core.c | 2 +-
block/blk-mq-sysfs.c | 6 ++++++
block/blk-mq.c | 8 ++------
block/blk-mq.h | 2 +-
4 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 93dc588fabe2..2dd94b3e9ece 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -374,7 +374,7 @@ void blk_cleanup_queue(struct request_queue *q)
blk_exit_queue(q);
if (queue_is_mq(q))
- blk_mq_free_queue(q);
+ blk_mq_exit_queue(q);
percpu_ref_exit(&q->q_usage_counter);
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 3f9c3f4ac44c..4040e62c3737 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -10,6 +10,7 @@
#include <linux/smp.h>
#include <linux/blk-mq.h>
+#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
@@ -33,6 +34,11 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj)
{
struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx,
kobj);
+
+ if (hctx->flags & BLK_MQ_F_BLOCKING)
+ cleanup_srcu_struct(hctx->srcu);
+ blk_free_flush_queue(hctx->fq);
+ sbitmap_free(&hctx->ctx_map);
free_cpumask_var(hctx->cpumask);
kfree(hctx->ctxs);
kfree(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 89781309a108..d98cb9614dfa 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2267,12 +2267,7 @@ static void blk_mq_exit_hctx(struct request_queue *q,
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
- if (hctx->flags & BLK_MQ_F_BLOCKING)
- cleanup_srcu_struct(hctx->srcu);
-
blk_mq_remove_cpuhp(hctx);
- blk_free_flush_queue(hctx->fq);
- sbitmap_free(&hctx->ctx_map);
}
static void blk_mq_exit_hw_queues(struct request_queue *q,
@@ -2907,7 +2902,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
}
EXPORT_SYMBOL(blk_mq_init_allocated_queue);
-void blk_mq_free_queue(struct request_queue *q)
+/* tags can _not_ be used after returning from blk_mq_exit_queue */
+void blk_mq_exit_queue(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 423ea88ab6fb..633a5a77ee8b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,7 +37,7 @@ struct blk_mq_ctx {
struct kobject kobj;
} ____cacheline_aligned_in_smp;
-void blk_mq_free_queue(struct request_queue *q);
+void blk_mq_exit_queue(struct request_queue *q);
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
--
2.9.5
From: Philipp Puschmann <philipp.puschmann(a)emlix.com>
[ Upstream commit 82ad759143ed77673db0d93d53c1cde7b99917ee ]
This patch fixes a bug that prevents freeing the reset gpio on unloading
the module.
aic3x_i2c_probe is called when loading the module and it calls list_add
with a probably uninitialized list entry aic3x->list (next = prev = NULL)).
So even if list_del is called it does nothing and in the end the gpio_reset
is not freed. Then a repeated module probing fails silently because
gpio_request fails.
When moving INIT_LIST_HEAD to aic3x_i2c_probe we also have to move
list_del to aic3x_i2c_remove because aic3x_remove may be called
multiple times without aic3x_i2c_remove being called which leads to
a NULL pointer dereference.
Signed-off-by: Philipp Puschmann <philipp.puschmann(a)emlix.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/codecs/tlv320aic3x.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/sound/soc/codecs/tlv320aic3x.c b/sound/soc/codecs/tlv320aic3x.c
index 6aa0edf8c5ef..cea3ebecdb12 100644
--- a/sound/soc/codecs/tlv320aic3x.c
+++ b/sound/soc/codecs/tlv320aic3x.c
@@ -1609,7 +1609,6 @@ static int aic3x_probe(struct snd_soc_component *component)
struct aic3x_priv *aic3x = snd_soc_component_get_drvdata(component);
int ret, i;
- INIT_LIST_HEAD(&aic3x->list);
aic3x->component = component;
for (i = 0; i < ARRAY_SIZE(aic3x->supplies); i++) {
@@ -1692,7 +1691,6 @@ static void aic3x_remove(struct snd_soc_component *component)
struct aic3x_priv *aic3x = snd_soc_component_get_drvdata(component);
int i;
- list_del(&aic3x->list);
for (i = 0; i < ARRAY_SIZE(aic3x->supplies); i++)
regulator_unregister_notifier(aic3x->supplies[i].consumer,
&aic3x->disable_nb[i].nb);
@@ -1890,6 +1888,7 @@ static int aic3x_i2c_probe(struct i2c_client *i2c,
if (ret != 0)
goto err_gpio;
+ INIT_LIST_HEAD(&aic3x->list);
list_add(&aic3x->list, &reset_list);
return 0;
@@ -1906,6 +1905,8 @@ static int aic3x_i2c_remove(struct i2c_client *client)
{
struct aic3x_priv *aic3x = i2c_get_clientdata(client);
+ list_del(&aic3x->list);
+
if (gpio_is_valid(aic3x->gpio_reset) &&
!aic3x_is_shared_reset(aic3x)) {
gpio_set_value(aic3x->gpio_reset, 0);
--
2.19.1
This is a backport of a 5.1rc patchset:
https://patchwork.ozlabs.org/cover/1029418/
Which was backported into 4.19:
https://patchwork.ozlabs.org/cover/1081619/
I had to backport two additional patches into 4.14 to make it work.
John Masinter (captwiggum), could you, please, confirm that this
patchset fixes TAHI tests? (I'm reasonably certain that it does, as
I ran ip_defrag selftest, but given the amount of changes here,
another set of completed tests would be nice to have).
Eric Dumazet (1):
ipv6: frags: fix a lockdep false positive
Florian Westphal (1):
ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module
Peter Oskolkov (3):
net: IP defrag: encapsulate rbtree defrag code into callable functions
net: IP6 defrag: use rbtrees for IPv6 defrag
net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
include/net/inet_frag.h | 16 +-
include/net/ipv6.h | 29 --
include/net/ipv6_frag.h | 111 +++++++
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv4/inet_fragment.c | 293 ++++++++++++++++++
net/ipv4/ip_fragment.c | 290 ++----------------
net/ipv6/netfilter/nf_conntrack_reasm.c | 279 +++++------------
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 3 +-
net/ipv6/reassembly.c | 357 +++++-----------------
net/openvswitch/conntrack.c | 1 +
10 files changed, 616 insertions(+), 765 deletions(-)
create mode 100644 include/net/ipv6_frag.h
--
2.21.0.593.g511ec345e18-goog
Hi,
On 23-04-19 00:29, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.0.9, v4.19.36, v4.14.113, v4.9.170, v4.4.178, v3.18.138.
>
> v5.0.9: Build OK!
> v4.19.36: Build OK!
> v4.14.113: Failed to apply! Possible dependencies:
> b60c75b6a502 ("power: supply: axp288_fuel_gauge: Do not register our psy on (some) HDMI sticks")
>
> v4.9.170: Failed to apply! Possible dependencies:
> b60c75b6a502 ("power: supply: axp288_fuel_gauge: Do not register our psy on (some) HDMI sticks")
>
> v4.4.178: Failed to apply! Possible dependencies:
> b60c75b6a502 ("power: supply: axp288_fuel_gauge: Do not register our psy on (some) HDMI sticks")
>
> v3.18.138: Failed to apply! Possible dependencies:
> 5a5bf49088f4 ("X-Power AXP288 PMIC Fuel Gauge Driver")
> b60c75b6a502 ("power: supply: axp288_fuel_gauge: Do not register our psy on (some) HDMI sticks")
> c1a281e34dae ("power: Add support for DA9150 Charger")
>
>
> How should we proceed with this patch?
Just applying it to 4.19.x and 5.0.x is fine.
Regards,
Hans
Hi,
On 23-04-19 00:29, Sasha Levin wrote:
> Hi,
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.0.9, v4.19.36, v4.14.113, v4.9.170, v4.4.178, v3.18.138.
>
> v5.0.9: Build OK!
> v4.19.36: Failed to apply! Possible dependencies:
> bd1e82bb420a ("brcmfmac: Set board_type from DMI on x86 based machines")
>
> v4.14.113: Failed to apply! Possible dependencies:
> bd1e82bb420a ("brcmfmac: Set board_type from DMI on x86 based machines")
>
> v4.9.170: Failed to apply! Possible dependencies:
> bd1e82bb420a ("brcmfmac: Set board_type from DMI on x86 based machines")
> e457a8a01a19 ("brcmfmac: make brcmf_of_probe more generic")
>
> v4.4.178: Failed to apply! Possible dependencies:
> 1119e23edf25 ("brcmfmac: Cleanup roaming configuration.")
> 3f5893d1b30a ("brcmfmac: Add get_station support for IBSS")
> 44129ed04b2b ("brcmfmac: add arp offload ip address table configuration support")
> 46d703a77539 ("brcmfmac: Unify methods to define and map firmware files.")
> 48ed16e86b28 ("brcmfmac: Add support for scheduled scan mac randomization")
> 4d7928959832 ("brcmfmac: switch to new platform data")
> 6c404f34f2bd ("brcmfmac: Cleanup pmksa cache handling code")
> 7bf65aa9ad3f ("brcmfmac: Add beamforming support.")
> 7d34b0560567 ("brcmfmac: Move all module parameters to one place")
> 8abffd8173a1 ("brcmfmac: Add RSDB support.")
> 8ea56be0869f ("brcmfmac: move platform data retrieval code to common")
> a41286aee42f ("brcmfmac: Move scheduled scan related interface layer structs")
> a7b82d474171 ("brcmfmac: Make TDLS a detectable feature")
> aeb64225aa8e ("brcmfmac: Add wowl wake indication report.")
> bd1e82bb420a ("brcmfmac: Set board_type from DMI on x86 based machines")
> c9c0043894cf ("brcmfmac: Simplify and fix usage of brcmf_ifname.")
> e457a8a01a19 ("brcmfmac: make brcmf_of_probe more generic")
>
> v3.18.138: Failed to apply! Possible dependencies:
> 063d51776bd6 ("brcmfmac: avoid runtime-pm for sdio host controller")
> 4d7928959832 ("brcmfmac: switch to new platform data")
> 888bf76e4111 ("brcmfmac: (clean) Rename sdio related files.")
> 8ea56be0869f ("brcmfmac: move platform data retrieval code to common")
> a8e8ed3446a3 ("brcmfmac: (clean) Rename files dhd_dbg to debug")
> bd1e82bb420a ("brcmfmac: Set board_type from DMI on x86 based machines")
> d14f78b990ec ("brcmfmac: (clean) Rename dhd_bus.h in bus.h")
> e457a8a01a19 ("brcmfmac: make brcmf_of_probe more generic")
>
>
> How should we proceed with this patch?
Just applying it to 5.0.x is fine.
Regards,
Hans