The patch below was submitted to be applied to the 5.8-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 1d8d42ba365101fa68d210c0e2ed2bc9582fda6c Mon Sep 17 00:00:00 2001
From: Thomas Zimmermann <tzimmermann(a)suse.de>
Date: Fri, 5 Jun 2020 15:57:50 +0200
Subject: [PATCH] drm/mgag200: Remove declaration of mgag200_mmap() from header
file
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit 94668ac796a5 ("drm/mgag200: Convert mgag200 driver to VRAM MM")
removed the implementation of mgag200_mmap(). Also remove the declaration.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Acked-by: Sam Ravnborg <sam(a)ravnborg.org>
Fixes: 94668ac796a5 ("drm/mgag200: Convert mgag200 driver to VRAM MM")
Cc: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Krzysztof Kozlowski <krzk(a)kernel.org>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: "Noralf Trønnes" <noralf(a)tronnes.org>
Cc: Armijn Hemel <armijn(a)tjaldur.nl>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: Emil Velikov <emil.velikov(a)collabora.com>
Cc: <stable(a)vger.kernel.org> # v5.3+
Link: https://patchwork.freedesktop.org/patch/msgid/20200605135803.19811-2-tzimme…
diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h b/drivers/gpu/drm/mgag200/mgag200_drv.h
index 47df62b1ad29..92b6679029fe 100644
--- a/drivers/gpu/drm/mgag200/mgag200_drv.h
+++ b/drivers/gpu/drm/mgag200/mgag200_drv.h
@@ -198,6 +198,5 @@ void mgag200_i2c_destroy(struct mga_i2c_chan *i2c);
int mgag200_mm_init(struct mga_device *mdev);
void mgag200_mm_fini(struct mga_device *mdev);
-int mgag200_mmap(struct file *filp, struct vm_area_struct *vma);
#endif /* __MGAG200_DRV_H__ */
Before the micmute_led_set() is introduced, the function of
alc_gpio_micmute_update() will set the gpio value with the
!micmute_led.led_value, and the machines have the correct micmute led
status. After the micmute_led_set() is introduced, it sets the gpio
value with !!micmute_led.led_value, so the led status is not correct
anymore, we need to set micmute_led_polarity = 1 to workaround it.
Now we fix the micmute_led_set() and remove micmute_led_polarity = 1.
Fixes: 87dc36482cab ("ALSA: hda/realtek - Add LED class support for micmute LED")
Reported-and-suggested-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/patch_realtek.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 09d93dd88713..073029aeaf3c 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -4125,7 +4125,7 @@ static int micmute_led_set(struct led_classdev *led_cdev,
struct alc_spec *spec = codec->spec;
alc_update_gpio_led(codec, spec->gpio_mic_led_mask,
- spec->micmute_led_polarity, !!brightness);
+ spec->micmute_led_polarity, !brightness);
return 0;
}
@@ -4162,8 +4162,6 @@ static void alc285_fixup_hp_gpio_led(struct hda_codec *codec,
{
struct alc_spec *spec = codec->spec;
- spec->micmute_led_polarity = 1;
-
alc_fixup_hp_gpio_led(codec, action, 0x04, 0x01);
}
@@ -4414,7 +4412,6 @@ static void alc233_fixup_lenovo_line2_mic_hotkey(struct hda_codec *codec,
{
struct alc_spec *spec = codec->spec;
- spec->micmute_led_polarity = 1;
alc_fixup_hp_gpio_led(codec, action, 0, 0x04);
if (action == HDA_FIXUP_ACT_PRE_PROBE) {
spec->init_amp = ALC_INIT_DEFAULT;
--
2.17.1
After installing the Ubuntu Linux, the micmute led status is not
correct. Users expect that the led is on if the capture is disabled,
but with the current kernel, the led is off with the capture disabled.
We tried the old linux kernel like linux-4.15, there is no this issue.
It looks like we introduced this issue when switching to the led_cdev.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/patch_realtek.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index daedcc0adc21..09d93dd88713 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -4414,6 +4414,7 @@ static void alc233_fixup_lenovo_line2_mic_hotkey(struct hda_codec *codec,
{
struct alc_spec *spec = codec->spec;
+ spec->micmute_led_polarity = 1;
alc_fixup_hp_gpio_led(codec, action, 0, 0x04);
if (action == HDA_FIXUP_ACT_PRE_PROBE) {
spec->init_amp = ALC_INIT_DEFAULT;
--
2.17.1
The patch titled
Subject: mm/vunmap: add cond_resched() in vunmap_pmd_range
has been added to the -mm tree. Its filename is
mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-vunmap-add-cond_resched-in-vunm…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-vunmap-add-cond_resched-in-vunm…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/vunmap: add cond_resched() in vunmap_pmd_range
Like zap_pte_range add cond_resched so that we can avoid softlockups as
reported below. On non-preemptible kernel with large I/O map region (like
the one we get when using persistent memory with sector mode), an unmap of
the namespace can report below softlockups.
22724.027334] watchdog: BUG: soft lockup - CPU#49 stuck for 23s! [ndctl:50777]
NIP [c0000000000dc224] plpar_hcall+0x38/0x58
LR [c0000000000d8898] pSeries_lpar_hpte_invalidate+0x68/0xb0
Call Trace:
[c0000004e87a7780] [c0000004fb197c00] 0xc0000004fb197c00 (unreliable)
[c0000004e87a7810] [c00000000007f4e4] flush_hash_page+0x114/0x200
[c0000004e87a7890] [c0000000000833cc] hpte_need_flush+0x2dc/0x540
[c0000004e87a7950] [c0000000003f5798] vunmap_page_range+0x538/0x6f0
[c0000004e87a7a70] [c0000000003f76d0] free_unmap_vmap_area+0x30/0x70
[c0000004e87a7aa0] [c0000000003f7a6c] remove_vm_area+0xfc/0x140
[c0000004e87a7ad0] [c0000000003f7dd8] __vunmap+0x68/0x270
[c0000004e87a7b50] [c000000000079de4] __iounmap.part.0+0x34/0x60
[c0000004e87a7bb0] [c000000000376394] memunmap+0x54/0x70
[c0000004e87a7bd0] [c000000000881d7c] release_nodes+0x28c/0x300
[c0000004e87a7c40] [c00000000087a65c] device_release_driver_internal+0x16c/0x280
[c0000004e87a7c80] [c000000000876fc4] unbind_store+0x124/0x170
[c0000004e87a7cd0] [c000000000875be4] drv_attr_store+0x44/0x60
[c0000004e87a7cf0] [c00000000057c734] sysfs_kf_write+0x64/0x90
[c0000004e87a7d10] [c00000000057bc10] kernfs_fop_write+0x1b0/0x290
[c0000004e87a7d60] [c000000000488e6c] __vfs_write+0x3c/0x70
[c0000004e87a7d80] [c00000000048c868] vfs_write+0xd8/0x260
[c0000004e87a7dd0] [c00000000048ccac] ksys_write+0xdc/0x130
[c0000004e87a7e20] [c00000000000b588] system_call+0x5c/0x70
Link: http://lkml.kernel.org/r/20200807075933.310240-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Harish Sriram <harish(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/vmalloc.c~mm-vunmap-add-cond_resched-in-vunmap_pmd_range
+++ a/mm/vmalloc.c
@@ -104,6 +104,8 @@ static void vunmap_pmd_range(pud_t *pud,
if (pmd_none_or_clear_bad(pmd))
continue;
vunmap_pte_range(pmd, addr, next, mask);
+
+ cond_resched();
} while (pmd++, addr = next, addr != end);
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-vunmap-add-cond_resched-in-vunmap_pmd_range.patch
From: Jakub Kicinski <kuba(a)kernel.org>
When ur_load_imm_any() is inlined into jeq_imm(), it's possible for the
compiler to deduce a case where _val can only have the value of -1 at
compile time. Specifically,
/* struct bpf_insn: _s32 imm */
u64 imm = insn->imm; /* sign extend */
if (imm >> 32) { /* non-zero only if insn->imm is negative */
/* inlined from ur_load_imm_any */
u32 __imm = imm >> 32; /* therefore, always 0xffffffff */
if (__builtin_constant_p(__imm) && __imm > 255)
compiletime_assert_XXX()
This can result in tripping a BUILD_BUG_ON() in __BF_FIELD_CHECK() that
checks that a given value is representable in one byte (interpreted as
unsigned).
FIELD_FIT() should return true or false at runtime for whether a value
can fit for not. Don't break the build over a value that's too large for
the mask. We'd prefer to keep the inlining and compiler optimizations
though we know this case will always return false.
Cc: stable(a)vger.kernel.org
Fixes: 1697599ee301a ("bitfield.h: add FIELD_FIT() helper")
Link: https://lore.kernel.org/kernel-hardening/CAK7LNASvb0UDJ0U5wkYYRzTAdnEs64HjX…
Reported-by: Masahiro Yamada <masahiroy(a)kernel.org>
Debugged-by: Sami Tolvanen <samitolvanen(a)google.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
Changes V1->V2:
* add Fixes tag.
include/linux/bitfield.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
index 48ea093ff04c..4e035aca6f7e 100644
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -77,7 +77,7 @@
*/
#define FIELD_FIT(_mask, _val) \
({ \
- __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_FIT: "); \
+ __BF_FIELD_CHECK(_mask, 0ULL, 0ULL, "FIELD_FIT: "); \
!((((typeof(_mask))_val) << __bf_shf(_mask)) & ~(_mask)); \
})
--
2.28.0.236.gb10cc79966-goog
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault
Recently we found regression when running will_it_scale/page_fault3 test
on ARM64. Over 70% down for the multi processes cases and over 20% down
for the multi threads cases. It turns out the regression is caused by
commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").
The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault. The retried page
fault would see correct PTEs installed by the first try then update dirty
bit and clear read-only bit and flush TLBs for ARM. The regression is
caused by the excessive TLB flush. It is fine on x86 since x86 doesn't
clear read-only bit so there is no need to flush TLB for this case.
The page fault would be retried due to:
1. Waiting for page readahead
2. Waiting for page swapped in
3. Waiting for dirty pages throttling
The first two cases don't have PTEs set up at all, so the retried page
fault would install the PTEs, so they don't reach there. But the #3 case
usually has PTEs installed, the retried page fault would reach the dirty
bit and read-only bit update. But it seems not necessary to modify those
bits again for #3 since they should be already set by the first page fault
try.
Of course the parallel page fault may set up PTEs, but we just need care
about write fault. If the parallel page fault setup a writable and dirty
PTE then the retried fault doesn't need do anything extra. If the
parallel page fault setup a clean read-only PTE, the retried fault should
just call do_wp_page() then return as the below code snippet shows:
if (vmf->flags & FAULT_FLAG_WRITE) {
if (!pte_write(entry))
return do_wp_page(vmf);
}
With this fix the test result get back to normal.
[yang.shi(a)linux.alibaba.com: incorporate comment from Will Deacon, update commit log per discussion]
Link: http://lkml.kernel.org/r/1594848990-55657-1-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1594148072-91273-1-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Xu Yu <xuyu(a)linux.alibaba.com>
Debugged-by: Xu Yu <xuyu(a)linux.alibaba.com>
Tested-by: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Josef Bacik <josef(a)toxicpanda.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault
+++ a/mm/memory.c
@@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struc
if (vmf->flags & FAULT_FLAG_WRITE) {
if (!pte_write(entry))
return do_wp_page(vmf);
- entry = pte_mkdirty(entry);
}
+
+ if (vmf->flags & FAULT_FLAG_TRIED)
+ goto unlock;
+
+ if (vmf->flags & FAULT_FLAG_WRITE)
+ entry = pte_mkdirty(entry);
+
entry = pte_mkyoung(entry);
if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
vmf->flags & FAULT_FLAG_WRITE)) {
_
On Mon, Aug 10, 2020 at 12:01 PM Greg Kroah-Hartman
<gregkh(a)linuxfoundation.org> wrote:
>
> On Mon, Aug 10, 2020 at 12:01:25PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Aug 10, 2020 at 11:52:30AM +0200, Sedat Dilek wrote:
> > > [ Hope I have the correct CC for linux-stable ML ]
> > >
> > > Hi Greg and Sasha,
> > >
> > > The base for <linux-stable-rc.git#queue/5.8> is Linux v5.7.14 where it
> > > should be Linux v5.8.
> >
> > What exactly do you mean by "#queue/5.8"?
> >
> > Is that a branch name? Ah, never seen those before, maybe they are
> > something that Sasha creates?
>
> But yes, you are right, it seems to mirror queue/5.7 at the moment,
> which isn't correct.
>
> thanks,
[ CC correct stable ML ]
Exactly.
With <linux-stable-rc.git#queue/5.8> I mean [1].
- Sedat -
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
From: Abhishek Kumar <abhishek4.kumar(a)intel.com>
For NV12 display sub plane is also configured and drivers internally
create plane atomic state. Driver copies all of the param of main
plane atomic state to sub planer atomic state but in sub plane
atomic state crtc is not added ,so when drm atomic state is configured
for commit ,fake commit handler is created for sub plane and also
state is not cleared when NV12 buffer is not displayed.
Fixes: 1f594b209fe1 ("drm/i915: Remove special case slave handling during hw programming")
Change-Id: I447b16bf433dfb5b43b2e4cade258fc775aee065
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Abhishek Kumar <abhishek4.kumar(a)intel.com>
Signed-off-by: Uma Shankar <uma.shankar(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_display.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 522c772a2111..76da2189b01d 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -12502,6 +12502,7 @@ static int icl_check_nv12_planes(struct intel_crtc_state *crtc_state)
struct intel_atomic_state *state = to_intel_atomic_state(crtc_state->uapi.state);
struct intel_plane *plane, *linked;
struct intel_plane_state *plane_state;
+ int ret;
int i;
if (INTEL_GEN(dev_priv) < 11)
@@ -12576,6 +12577,11 @@ static int icl_check_nv12_planes(struct intel_crtc_state *crtc_state)
linked_state->uapi.src = plane_state->uapi.src;
linked_state->uapi.dst = plane_state->uapi.dst;
+ /* Update Linked plane crtc same as of main plane */
+ ret = drm_atomic_set_crtc_for_plane(&linked_state->uapi, plane_state->uapi.crtc);
+ if(ret)
+ return ret;
+
if (icl_is_hdr_plane(dev_priv, plane->id)) {
if (linked->id == PLANE_SPRITE5)
plane_state->cus_ctl |= PLANE_CUS_PLANE_7;
--
2.26.2
Sorry if this is not the right approach to this, but I'd like to request
a backport of 412055398b9e67e07347a936fc4a6adddabe9cf4, "nfsd: Fix NFSv4
READ on RDMA when using readv" to Linux 5.4.
The patch applies cleanly and fixes a rare but severe issue with NFS
over RDMA, which I just spent several days tracking down to that patch,
with major help from linux-nfs/rdma.
I have right now manually applied it to my 5.4 Kernel and can confirm it
fixes the issue.
Thanks,
Timo
These stable fixes were not correctly noted as fixes when
originally submitted for 5.2-rc1. We are addressing the internal
gap that led to this miss.
Please consider these patches for all stable kernels older than
5.2.0, I tried on 4.19 and 3 out of 4 apply cleanly with a cherry
pick from Linus' tree, but one of them I had to rebase, so I'm
just sending the whole series.
If you'd rather I send one at a time in the format specified at
option 2) of the stable documentation, please just let me know.
Patch 4 depends on patch 1.
I tried to follow the stable commit format for each of the
individual patches, referencing the upstream commit ID. I also
added a "Fixes" to each, trying to assist the automation in
knowing how far back to backport.
Shortlog:
Grzegorz Siwik (1):
i40e: Wrong truncation from u16 to u8
Martyna Szapar (2):
i40e: Fix of memory leak and integer truncation in i40e_virtchnl.c
i40e: Memory leak in i40e_config_iwarp_qvlist
Sergey Nemov (1):
i40e: add num_vectors checker in iwarp handler
.../ethernet/intel/i40e/i40e_virtchnl_pf.c | 51 +++++++++++++------
1 file changed, 36 insertions(+), 15 deletions(-)
--
2.25.4
The patch below does not apply to the 5.7-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41ea93cf7ba4e0f0cc46ebfdda8b6ff27c67bc91 Mon Sep 17 00:00:00 2001
From: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Date: Thu, 2 Jul 2020 11:52:03 +0000
Subject: [PATCH] powerpc/kasan: Fix shadow pages allocation failure
Doing kasan pages allocation in MMU_init is too early, kernel doesn't
have access yet to the entire memory space and memblock_alloc() fails
when the kernel is a bit big.
Do it from kasan_init() instead.
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Fixes: d2a91cef9bbd ("powerpc/kasan: Fix shadow pages allocation failure")
Cc: stable(a)vger.kernel.org
Reported-by: Erhard F. <erhard_f(a)mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181
Link: https://lore.kernel.org/r/63048fcea8a1c02f75429ba3152f80f7853f87fc.15936907…
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 4813c6d50889..019b0c0bbbf3 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -120,11 +120,24 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
void __init kasan_mmu_init(void)
{
int ret;
+
+ if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
+ IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
+ ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, KASAN_SHADOW_END);
+
+ if (ret)
+ panic("kasan: kasan_init_shadow_page_tables() failed");
+ }
+}
+
+void __init kasan_init(void)
+{
struct memblock_region *reg;
for_each_memblock(memory, reg) {
phys_addr_t base = reg->base;
phys_addr_t top = min(base + reg->size, total_lowmem);
+ int ret;
if (base >= top)
continue;
@@ -134,18 +147,6 @@ void __init kasan_mmu_init(void)
panic("kasan: kasan_init_region() failed");
}
- if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
- IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
- ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, KASAN_SHADOW_END);
-
- if (ret)
- panic("kasan: kasan_init_shadow_page_tables() failed");
- }
-
-}
-
-void __init kasan_init(void)
-{
kasan_remap_early_shadow_ro();
clear_page(kasan_early_shadow_page);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 41ea93cf7ba4e0f0cc46ebfdda8b6ff27c67bc91 Mon Sep 17 00:00:00 2001
From: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Date: Thu, 2 Jul 2020 11:52:03 +0000
Subject: [PATCH] powerpc/kasan: Fix shadow pages allocation failure
Doing kasan pages allocation in MMU_init is too early, kernel doesn't
have access yet to the entire memory space and memblock_alloc() fails
when the kernel is a bit big.
Do it from kasan_init() instead.
Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Fixes: d2a91cef9bbd ("powerpc/kasan: Fix shadow pages allocation failure")
Cc: stable(a)vger.kernel.org
Reported-by: Erhard F. <erhard_f(a)mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208181
Link: https://lore.kernel.org/r/63048fcea8a1c02f75429ba3152f80f7853f87fc.15936907…
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c b/arch/powerpc/mm/kasan/kasan_init_32.c
index 4813c6d50889..019b0c0bbbf3 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -120,11 +120,24 @@ static void __init kasan_unmap_early_shadow_vmalloc(void)
void __init kasan_mmu_init(void)
{
int ret;
+
+ if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
+ IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
+ ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, KASAN_SHADOW_END);
+
+ if (ret)
+ panic("kasan: kasan_init_shadow_page_tables() failed");
+ }
+}
+
+void __init kasan_init(void)
+{
struct memblock_region *reg;
for_each_memblock(memory, reg) {
phys_addr_t base = reg->base;
phys_addr_t top = min(base + reg->size, total_lowmem);
+ int ret;
if (base >= top)
continue;
@@ -134,18 +147,6 @@ void __init kasan_mmu_init(void)
panic("kasan: kasan_init_region() failed");
}
- if (early_mmu_has_feature(MMU_FTR_HPTE_TABLE) ||
- IS_ENABLED(CONFIG_KASAN_VMALLOC)) {
- ret = kasan_init_shadow_page_tables(KASAN_SHADOW_START, KASAN_SHADOW_END);
-
- if (ret)
- panic("kasan: kasan_init_shadow_page_tables() failed");
- }
-
-}
-
-void __init kasan_init(void)
-{
kasan_remap_early_shadow_ro();
clear_page(kasan_early_shadow_page);
commit 4b836a1426cb0f1ef2a6e211d7e553221594f8fc upstream.
Binder is designed such that a binder_proc never has references to
itself. If this rule is violated, memory corruption can occur when a
process sends a transaction to itself; see e.g.
<https://syzkaller.appspot.com/bug?extid=09e05aba06723a94d43d>.
There is a remaining edgecase through which such a transaction-to-self
can still occur from the context of a task with BINDER_SET_CONTEXT_MGR
access:
- task A opens /dev/binder twice, creating binder_proc instances P1
and P2
- P1 becomes context manager
- P2 calls ACQUIRE on the magic handle 0, allocating index 0 in its
handle table
- P1 dies (by closing the /dev/binder fd and waiting a bit)
- P2 becomes context manager
- P2 calls ACQUIRE on the magic handle 0, allocating index 1 in its
handle table
[this triggers a warning: "binder: 1974:1974 tried to acquire
reference to desc 0, got 1 instead"]
- task B opens /dev/binder once, creating binder_proc instance P3
- P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way
transaction)
- P2 receives the handle and uses it to call P3 (two-way transaction)
- P3 calls P2 (via magic handle 0) (two-way transaction)
- P2 calls P2 (via handle 1) (two-way transaction)
And then, if P2 does *NOT* accept the incoming transaction work, but
instead closes the binder fd, we get a crash.
Solve it by preventing the context manager from using ACQUIRE on ref 0.
There shouldn't be any legitimate reason for the context manager to do
that.
Additionally, print a warning if someone manages to find another way to
trigger a transaction-to-self bug in the future.
Cc: stable(a)vger.kernel.org
Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
Acked-by: Todd Kjos <tkjos(a)google.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Martijn Coenen <maco(a)android.com>
Link: https://lore.kernel.org/r/20200727120424.1627555-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[manual backport: remove fine-grained locking and error reporting that
don't exist in <=4.9]
Signed-off-by: Jann Horn <jannh(a)google.com>
---
drivers/android/binder.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index e12288c245b5..f4c0b6295945 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -1427,6 +1427,10 @@ static void binder_transaction(struct binder_proc *proc,
return_error = BR_DEAD_REPLY;
goto err_dead_binder;
}
+ if (WARN_ON(proc == target_proc)) {
+ return_error = BR_FAILED_REPLY;
+ goto err_invalid_target_handle;
+ }
if (security_binder_transaction(proc->tsk,
target_proc->tsk) < 0) {
return_error = BR_FAILED_REPLY;
@@ -1830,6 +1834,11 @@ static int binder_thread_write(struct binder_proc *proc,
ptr += sizeof(uint32_t);
if (target == 0 && binder_context_mgr_node &&
(cmd == BC_INCREFS || cmd == BC_ACQUIRE)) {
+ if (binder_context_mgr_node->proc == proc) {
+ binder_user_error("%d:%d context manager tried to acquire desc 0\n",
+ proc->pid, thread->pid);
+ return -EINVAL;
+ }
ref = binder_get_ref_for_node(proc,
binder_context_mgr_node);
if (ref->desc != target) {
base-commit: 8d6b541290cb9293bd2a7bb00c1d58d01abe183b
--
2.28.0.236.gb10cc79966-goog
Further investigation of the L-R swap problem on the MS2109 reveals that
the problem isn't that the channels are swapped, but rather that they
are swapped and also out of phase by one sample. In other words, the
issue is actually that the very first frame that comes from the hardware
is a half-frame containing only the right channel, and after that
everything becomes offset.
So introduce a new quirk field to drop the very first 2 bytes that come
in after the format is configured and a capture stream starts. This puts
the channels in phase and in the correct order.
Cc: stable(a)vger.kernel.org
Signed-off-by: Hector Martin <marcan(a)marcan.st>
---
sound/usb/card.h | 1 +
sound/usb/pcm.c | 6 ++++++
sound/usb/quirks.c | 3 +++
sound/usb/stream.c | 1 +
4 files changed, 11 insertions(+)
diff --git a/sound/usb/card.h b/sound/usb/card.h
index de43267b9c8a..5351d7183b1b 100644
--- a/sound/usb/card.h
+++ b/sound/usb/card.h
@@ -137,6 +137,7 @@ struct snd_usb_substream {
unsigned int tx_length_quirk:1; /* add length specifier to transfers */
unsigned int fmt_type; /* USB audio format type (1-3) */
unsigned int pkt_offset_adj; /* Bytes to drop from beginning of packets (for non-compliant devices) */
+ unsigned int stream_offset_adj; /* Bytes to drop from beginning of stream (for non-compliant devices) */
unsigned int running: 1; /* running status */
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 415bfec49a01..5600751803cf 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -1420,6 +1420,12 @@ static void retire_capture_urb(struct snd_usb_substream *subs,
// continue;
}
bytes = urb->iso_frame_desc[i].actual_length;
+ if (subs->stream_offset_adj > 0) {
+ unsigned int adj = min(subs->stream_offset_adj, bytes);
+ cp += adj;
+ bytes -= adj;
+ subs->stream_offset_adj -= adj;
+ }
frames = bytes / stride;
if (!subs->txfr_quirk)
bytes = frames * stride;
diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c
index c551141f337e..abf99b814a0f 100644
--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -1495,6 +1495,9 @@ void snd_usb_set_format_quirk(struct snd_usb_substream *subs,
case USB_ID(0x2b73, 0x000a): /* Pioneer DJ DJM-900NXS2 */
pioneer_djm_set_format_quirk(subs);
break;
+ case USB_ID(0x534d, 0x2109): /* MacroSilicon MS2109 */
+ subs->stream_offset_adj = 2;
+ break;
}
}
diff --git a/sound/usb/stream.c b/sound/usb/stream.c
index 4d1e6579e54d..ca76ba5b5c0b 100644
--- a/sound/usb/stream.c
+++ b/sound/usb/stream.c
@@ -94,6 +94,7 @@ static void snd_usb_init_substream(struct snd_usb_stream *as,
subs->tx_length_quirk = as->chip->tx_length_quirk;
subs->speed = snd_usb_get_speed(subs->dev);
subs->pkt_offset_adj = 0;
+ subs->stream_offset_adj = 0;
snd_usb_set_pcm_ops(as->pcm, stream);
--
2.27.0
Note2myself: Check MAINTAINERS > STABLE BRANCH.
- Sedat -
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/MAI…
On Mon, Aug 10, 2020 at 11:52 AM Sedat Dilek <sedat.dilek(a)gmail.com> wrote:
>
> [ Hope I have the correct CC for linux-stable ML ]
>
> Hi Greg and Sasha,
>
> The base for <linux-stable-rc.git#queue/5.8> is Linux v5.7.14 where it
> should be Linux v5.8.
>
> Can you please look at this?
>
> Thanks.
>
> Regards,
> - Sedat -
The patch titled
Subject: khugepaged: khugepaged_test_exit() check mmget_still_valid()
has been removed from the -mm tree. Its filename was
khugepaged-khugepaged_test_exit-check-mmget_still_valid.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: khugepaged_test_exit() check mmget_still_valid()
Move collapse_huge_page()'s mmget_still_valid() check into
khugepaged_test_exit() itself. collapse_huge_page() is used for anon THP
only, and earned its mmget_still_valid() check because it inserts a huge
pmd entry in place of the page table's pmd entry; whereas
collapse_file()'s retract_page_tables() or collapse_pte_mapped_thp()
merely clears the page table's pmd entry. But core dumping without mmap
lock must have been as open to mistaking a racily cleared pmd entry for a
page table at physical page 0, as exit_mmap() was. And we certainly have
no interest in mapping as a THP once dumping core.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021217020.27773@eggly.anvils
Fixes: 59ea6d06cfa9 ("coredump: fix race condition between collapse_huge_page() and core dumping")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
--- a/mm/khugepaged.c~khugepaged-khugepaged_test_exit-check-mmget_still_valid
+++ a/mm/khugepaged.c
@@ -431,7 +431,7 @@ static void insert_to_mm_slots_hash(stru
static inline int khugepaged_test_exit(struct mm_struct *mm)
{
- return atomic_read(&mm->mm_users) == 0;
+ return atomic_read(&mm->mm_users) == 0 || !mmget_still_valid(mm);
}
static bool hugepage_vma_check(struct vm_area_struct *vma,
@@ -1100,9 +1100,6 @@ static void collapse_huge_page(struct mm
* handled by the anon_vma lock + PG_lock.
*/
mmap_write_lock(mm);
- result = SCAN_ANY_PROCESS;
- if (!mmget_still_valid(mm))
- goto out;
result = hugepage_vma_revalidate(mm, address, &vma);
if (result)
goto out;
_
Patches currently in -mm which might be from hughd(a)google.com are
The patch titled
Subject: khugepaged: retract_page_tables() remember to test exit
has been removed from the -mm tree. Its filename was
khugepaged-retract_page_tables-remember-to-test-exit.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: retract_page_tables() remember to test exit
Only once have I seen this scenario (and forgot even to notice what forced
the eventual crash): a sequence of "BUG: Bad page map" alerts from
vm_normal_page(), from zap_pte_range() servicing exit_mmap();
pmd:00000000, pte values corresponding to data in physical page 0.
The pte mappings being zapped in this case were supposed to be from a huge
page of ext4 text (but could as well have been shmem): my belief is that
it was racing with collapse_file()'s retract_page_tables(), found *pmd
pointing to a page table, locked it, but *pmd had become 0 by the time
start_pte was decided.
In most cases, that possibility is excluded by holding mmap lock; but
exit_mmap() proceeds without mmap lock. Most of what's run by khugepaged
checks khugepaged_test_exit() after acquiring mmap lock:
khugepaged_collapse_pte_mapped_thps() and hugepage_vma_revalidate() do so,
for example. But retract_page_tables() did not: fix that.
The fix is for retract_page_tables() to check khugepaged_test_exit(),
after acquiring mmap lock, before doing anything to the page table.
Getting the mmap lock serializes with __mmput(), which briefly takes and
drops it in __khugepaged_exit(); then the khugepaged_test_exit() check on
mm_users makes sure we don't touch the page table once exit_mmap() might
reach it, since exit_mmap() will be proceeding without mmap lock, not
expecting anyone to be racing with it.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021215400.27773@eggly.anvils
Fixes: f3f0e1d2150b ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [4.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/mm/khugepaged.c~khugepaged-retract_page_tables-remember-to-test-exit
+++ a/mm/khugepaged.c
@@ -1532,6 +1532,7 @@ out:
static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
{
struct vm_area_struct *vma;
+ struct mm_struct *mm;
unsigned long addr;
pmd_t *pmd, _pmd;
@@ -1560,7 +1561,8 @@ static void retract_page_tables(struct a
continue;
if (vma->vm_end < addr + HPAGE_PMD_SIZE)
continue;
- pmd = mm_find_pmd(vma->vm_mm, addr);
+ mm = vma->vm_mm;
+ pmd = mm_find_pmd(mm, addr);
if (!pmd)
continue;
/*
@@ -1570,17 +1572,19 @@ static void retract_page_tables(struct a
* mmap_lock while holding page lock. Fault path does it in
* reverse order. Trylock is a way to avoid deadlock.
*/
- if (mmap_write_trylock(vma->vm_mm)) {
- spinlock_t *ptl = pmd_lock(vma->vm_mm, pmd);
- /* assume page table is clear */
- _pmd = pmdp_collapse_flush(vma, addr, pmd);
- spin_unlock(ptl);
- mmap_write_unlock(vma->vm_mm);
- mm_dec_nr_ptes(vma->vm_mm);
- pte_free(vma->vm_mm, pmd_pgtable(_pmd));
+ if (mmap_write_trylock(mm)) {
+ if (!khugepaged_test_exit(mm)) {
+ spinlock_t *ptl = pmd_lock(mm, pmd);
+ /* assume page table is clear */
+ _pmd = pmdp_collapse_flush(vma, addr, pmd);
+ spin_unlock(ptl);
+ mm_dec_nr_ptes(mm);
+ pte_free(mm, pmd_pgtable(_pmd));
+ }
+ mmap_write_unlock(mm);
} else {
/* Try again later */
- khugepaged_add_pte_mapped_thp(vma->vm_mm, addr);
+ khugepaged_add_pte_mapped_thp(mm, addr);
}
}
i_mmap_unlock_write(mapping);
_
Patches currently in -mm which might be from hughd(a)google.com are
The patch titled
Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock
has been removed from the -mm tree. Its filename was
khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: collapse_pte_mapped_thp() protect the pmd lock
When retract_page_tables() removes a page table to make way for a huge
pmd, it holds huge page lock, i_mmap_lock_write, mmap_write_trylock and
pmd lock; but when collapse_pte_mapped_thp() does the same (to handle the
case when the original mmap_write_trylock had failed), only
mmap_write_trylock and pmd lock are held.
That's not enough. One machine has twice crashed under load, with "BUG:
spinlock bad magic" and GPF on 6b6b6b6b6b6b6b6b. Examining the second
crash, page_vma_mapped_walk_done()'s spin_unlock of pvmw->ptl (serving
page_referenced() on a file THP, that had found a page table at *pmd)
discovers that the page table page and its lock have already been freed by
the time it comes to unlock.
Follow the example of retract_page_tables(), but we only need one of huge
page lock or i_mmap_lock_write to secure against this: because it's the
narrower lock, and because it simplifies collapse_pte_mapped_thp() to know
the hpage earlier, choose to rely on huge page lock here.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021213070.27773@eggly.anvils
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 44 +++++++++++++++++++-------------------------
1 file changed, 19 insertions(+), 25 deletions(-)
--- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-protect-the-pmd-lock
+++ a/mm/khugepaged.c
@@ -1412,7 +1412,7 @@ void collapse_pte_mapped_thp(struct mm_s
{
unsigned long haddr = addr & HPAGE_PMD_MASK;
struct vm_area_struct *vma = find_vma(mm, haddr);
- struct page *hpage = NULL;
+ struct page *hpage;
pte_t *start_pte, *pte;
pmd_t *pmd, _pmd;
spinlock_t *ptl;
@@ -1432,9 +1432,17 @@ void collapse_pte_mapped_thp(struct mm_s
if (!hugepage_vma_check(vma, vma->vm_flags | VM_HUGEPAGE))
return;
+ hpage = find_lock_page(vma->vm_file->f_mapping,
+ linear_page_index(vma, haddr));
+ if (!hpage)
+ return;
+
+ if (!PageHead(hpage))
+ goto drop_hpage;
+
pmd = mm_find_pmd(mm, haddr);
if (!pmd)
- return;
+ goto drop_hpage;
start_pte = pte_offset_map_lock(mm, pmd, haddr, &ptl);
@@ -1453,30 +1461,11 @@ void collapse_pte_mapped_thp(struct mm_s
page = vm_normal_page(vma, addr, *pte);
- if (!page || !PageCompound(page))
- goto abort;
-
- if (!hpage) {
- hpage = compound_head(page);
- /*
- * The mapping of the THP should not change.
- *
- * Note that uprobe, debugger, or MAP_PRIVATE may
- * change the page table, but the new page will
- * not pass PageCompound() check.
- */
- if (WARN_ON(hpage->mapping != vma->vm_file->f_mapping))
- goto abort;
- }
-
/*
- * Confirm the page maps to the correct subpage.
- *
- * Note that uprobe, debugger, or MAP_PRIVATE may change
- * the page table, but the new page will not pass
- * PageCompound() check.
+ * Note that uprobe, debugger, or MAP_PRIVATE may change the
+ * page table, but the new page will not be a subpage of hpage.
*/
- if (WARN_ON(hpage + i != page))
+ if (hpage + i != page)
goto abort;
count++;
}
@@ -1495,7 +1484,7 @@ void collapse_pte_mapped_thp(struct mm_s
pte_unmap_unlock(start_pte, ptl);
/* step 3: set proper refcount and mm_counters. */
- if (hpage) {
+ if (count) {
page_ref_sub(hpage, count);
add_mm_counter(vma->vm_mm, mm_counter_file(hpage), -count);
}
@@ -1506,10 +1495,15 @@ void collapse_pte_mapped_thp(struct mm_s
spin_unlock(ptl);
mm_dec_nr_ptes(mm);
pte_free(mm, pmd_pgtable(_pmd));
+
+drop_hpage:
+ unlock_page(hpage);
+ put_page(hpage);
return;
abort:
pte_unmap_unlock(start_pte, ptl);
+ goto drop_hpage;
}
static int khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot)
_
Patches currently in -mm which might be from hughd(a)google.com are
The patch titled
Subject: khugepaged: collapse_pte_mapped_thp() flush the right range
has been removed from the -mm tree. Its filename was
khugepaged-collapse_pte_mapped_thp-flush-the-right-range.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: khugepaged: collapse_pte_mapped_thp() flush the right range
pmdp_collapse_flush() should be given the start address at which the huge
page is mapped, haddr: it was given addr, which at that point has been
used as a local variable, incremented to the end address of the extent.
Found by source inspection while chasing a hugepage locking bug, which I
then could not explain by this. At first I thought this was very bad;
then saw that all of the page translations that were not flushed would
actually still point to the right pages afterwards, so harmless; then
realized that I know nothing of how different architectures and models
cache intermediate paging structures, so maybe it matters after all -
particularly since the page table concerned is immediately freed.
Much easier to fix than to think about.
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2008021204390.27773@eggly.anvils
Fixes: 27e1f8273113 ("khugepaged: enable collapse pmd for pte-mapped THP")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Song Liu <songliubraving(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/khugepaged.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/khugepaged.c~khugepaged-collapse_pte_mapped_thp-flush-the-right-range
+++ a/mm/khugepaged.c
@@ -1502,7 +1502,7 @@ void collapse_pte_mapped_thp(struct mm_s
/* step 4: collapse pmd */
ptl = pmd_lock(vma->vm_mm, pmd);
- _pmd = pmdp_collapse_flush(vma, addr, pmd);
+ _pmd = pmdp_collapse_flush(vma, haddr, pmd);
spin_unlock(ptl);
mm_dec_nr_ptes(mm);
pte_free(mm, pmd_pgtable(_pmd));
_
Patches currently in -mm which might be from hughd(a)google.com are
The patch titled
Subject: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible
This is found by code observation only.
Firstly, the worst case scenario should assume the whole range was covered
by pmd sharing. The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).
Since at it, remove the loop since it should not be required. With that,
the new code should be faster too when the invalidating range is huge.
Mike said:
: With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
: adjust to (0, 1g+2m) which is incorrect.
:
: We should cc stable. The original reason for adjusting the range was to
: prevent data corruption (getting wrong page). Since the range is not
: always adjusted correctly, the potential for corruption still exists.
:
: However, I am fairly confident that adjust_range_if_pmd_sharing_possible
: is only gong to be called in two cases:
:
: 1) for a single page
: 2) for range == entire vma
:
: In those cases, the current code should produce the correct results.
:
: To be safe, let's just cc stable.
Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.com
Fixes: 017b1660df89 ("mm: migration: fix migration of huge PMD shared pages")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-calculation-of-adjust_range_if_pmd_sharing_possible
+++ a/mm/hugetlb.c
@@ -5314,25 +5314,21 @@ static bool vma_shareable(struct vm_area
void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
unsigned long *start, unsigned long *end)
{
- unsigned long check_addr;
+ unsigned long a_start, a_end;
if (!(vma->vm_flags & VM_MAYSHARE))
return;
- for (check_addr = *start; check_addr < *end; check_addr += PUD_SIZE) {
- unsigned long a_start = check_addr & PUD_MASK;
- unsigned long a_end = a_start + PUD_SIZE;
+ /* Extend the range to be PUD aligned for a worst case scenario */
+ a_start = ALIGN_DOWN(*start, PUD_SIZE);
+ a_end = ALIGN(*end, PUD_SIZE);
- /*
- * If sharing is possible, adjust start/end if necessary.
- */
- if (range_in_vma(vma, a_start, a_end)) {
- if (a_start < *start)
- *start = a_start;
- if (a_end > *end)
- *end = a_end;
- }
- }
+ /*
+ * Intersect the range with the vma range, since pmd sharing won't be
+ * across vma after all
+ */
+ *start = max(vma->vm_start, a_start);
+ *end = min(vma->vm_end, a_end);
}
/*
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-do-page-fault-accounting-in-handle_mm_fault.patch
mm-alpha-use-general-page-fault-accounting.patch
mm-arc-use-general-page-fault-accounting.patch
mm-arm-use-general-page-fault-accounting.patch
mm-arm64-use-general-page-fault-accounting.patch
mm-csky-use-general-page-fault-accounting.patch
mm-hexagon-use-general-page-fault-accounting.patch
mm-ia64-use-general-page-fault-accounting.patch
mm-m68k-use-general-page-fault-accounting.patch
mm-microblaze-use-general-page-fault-accounting.patch
mm-mips-use-general-page-fault-accounting.patch
mm-nds32-use-general-page-fault-accounting.patch
mm-nios2-use-general-page-fault-accounting.patch
mm-openrisc-use-general-page-fault-accounting.patch
mm-parisc-use-general-page-fault-accounting.patch
mm-powerpc-use-general-page-fault-accounting.patch
mm-riscv-use-general-page-fault-accounting.patch
mm-s390-use-general-page-fault-accounting.patch
mm-sh-use-general-page-fault-accounting.patch
mm-sparc32-use-general-page-fault-accounting.patch
mm-sparc64-use-general-page-fault-accounting.patch
mm-x86-use-general-page-fault-accounting.patch
mm-xtensa-use-general-page-fault-accounting.patch
mm-clean-up-the-last-pieces-of-page-fault-accountings.patch
mm-gup-remove-task_struct-pointer-for-all-gup-code.patch
The patch titled
Subject: mm/page_counter.c: fix protection usage propagation
has been removed from the -mm tree. Its filename was
mm-fix-protection-usage-propagation.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michal Koutný <mkoutny(a)suse.com>
Subject: mm/page_counter.c: fix protection usage propagation
When workload runs in cgroups that aren't directly below root cgroup and
their parent specifies reclaim protection, it may end up ineffective.
The reason is that propagate_protected_usage() is not called in all
hierarchy up. All the protected usage is incorrectly accumulated in the
workload's parent. This means that siblings_low_usage is overestimated
and effective protection underestimated. Even though it is transitional
phenomenon (uncharge path does correct propagation and fixes the wrong
children_low_usage), it can undermine the intended protection
unexpectedly.
We have noticed this problem while seeing a swap out in a descendant of a
protected memcg (intermediate node) while the parent was conveniently
under its protection limit and the memory pressure was external to that
hierarchy. Michal has pinpointed this down to the wrong
siblings_low_usage which led to the unwanted reclaim.
The fix is simply updating children_low_usage in respective ancestors also
in the charging path.
Link: http://lkml.kernel.org/r/20200803153231.15477-1-mhocko@kernel.org
Fixes: 230671533d64 ("mm: memory.low hierarchical behavior")
Signed-off-by: Michal Koutný <mkoutny(a)suse.com>
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.18+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_counter.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/page_counter.c~mm-fix-protection-usage-propagation
+++ a/mm/page_counter.c
@@ -72,7 +72,7 @@ void page_counter_charge(struct page_cou
long new;
new = atomic_long_add_return(nr_pages, &c->usage);
- propagate_protected_usage(counter, new);
+ propagate_protected_usage(c, new);
/*
* This is indeed racy, but we can live with some
* inaccuracy in the watermark.
@@ -116,7 +116,7 @@ bool page_counter_try_charge(struct page
new = atomic_long_add_return(nr_pages, &c->usage);
if (new > c->max) {
atomic_long_sub(nr_pages, &c->usage);
- propagate_protected_usage(counter, new);
+ propagate_protected_usage(c, new);
/*
* This is racy, but we can live with some
* inaccuracy in the failcnt.
@@ -125,7 +125,7 @@ bool page_counter_try_charge(struct page
*fail = c;
goto failed;
}
- propagate_protected_usage(counter, new);
+ propagate_protected_usage(c, new);
/*
* Just like with failcnt, we can live with some
* inaccuracy in the watermark.
_
Patches currently in -mm which might be from mkoutny(a)suse.com are
proc-pid-smaps-consistent-whitespace-output-format.patch
The patch titled
Subject: ocfs2: change slot number type s16 to u16
has been removed from the -mm tree. Its filename was
ocfs2-change-slot-number-type-s16-to-u16.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Junxiao Bi <junxiao.bi(a)oracle.com>
Subject: ocfs2: change slot number type s16 to u16
Dan Carpenter reported the following static checker warning.
fs/ocfs2/super.c:1269 ocfs2_parse_options() warn: '(-1)' 65535 can't fit into 32767 'mopt->slot'
fs/ocfs2/suballoc.c:859 ocfs2_init_inode_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_inode_steal_slot'
fs/ocfs2/suballoc.c:867 ocfs2_init_meta_steal_slot() warn: '(-1)' 65535 can't fit into 32767 'osb->s_meta_steal_slot'
That's because OCFS2_INVALID_SLOT is (u16)-1. Slot number in ocfs2 can be
never negative, so change s16 to u16.
Link: http://lkml.kernel.org/r/20200627001259.19757-1-junxiao.bi@oracle.com
Fixes: 9277f8334ffc ("ocfs2: fix value of OCFS2_INVALID_SLOT")
Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Reviewed-by: Gang He <ghe(a)suse.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/ocfs2.h | 4 ++--
fs/ocfs2/suballoc.c | 4 ++--
fs/ocfs2/super.c | 4 ++--
3 files changed, 6 insertions(+), 6 deletions(-)
--- a/fs/ocfs2/ocfs2.h~ocfs2-change-slot-number-type-s16-to-u16
+++ a/fs/ocfs2/ocfs2.h
@@ -327,8 +327,8 @@ struct ocfs2_super
spinlock_t osb_lock;
u32 s_next_generation;
unsigned long osb_flags;
- s16 s_inode_steal_slot;
- s16 s_meta_steal_slot;
+ u16 s_inode_steal_slot;
+ u16 s_meta_steal_slot;
atomic_t s_num_inodes_stolen;
atomic_t s_num_meta_stolen;
--- a/fs/ocfs2/suballoc.c~ocfs2-change-slot-number-type-s16-to-u16
+++ a/fs/ocfs2/suballoc.c
@@ -879,9 +879,9 @@ static void __ocfs2_set_steal_slot(struc
{
spin_lock(&osb->osb_lock);
if (type == INODE_ALLOC_SYSTEM_INODE)
- osb->s_inode_steal_slot = slot;
+ osb->s_inode_steal_slot = (u16)slot;
else if (type == EXTENT_ALLOC_SYSTEM_INODE)
- osb->s_meta_steal_slot = slot;
+ osb->s_meta_steal_slot = (u16)slot;
spin_unlock(&osb->osb_lock);
}
--- a/fs/ocfs2/super.c~ocfs2-change-slot-number-type-s16-to-u16
+++ a/fs/ocfs2/super.c
@@ -78,7 +78,7 @@ struct mount_options
unsigned long commit_interval;
unsigned long mount_opt;
unsigned int atime_quantum;
- signed short slot;
+ unsigned short slot;
int localalloc_opt;
unsigned int resv_level;
int dir_resv_level;
@@ -1349,7 +1349,7 @@ static int ocfs2_parse_options(struct su
goto bail;
}
if (option)
- mopt->slot = (s16)option;
+ mopt->slot = (u16)option;
break;
case Opt_commit:
if (match_int(&args[0], &option)) {
_
Patches currently in -mm which might be from junxiao.bi(a)oracle.com are
The patch titled
Subject: mm: fix kthread_use_mm() vs TLB invalidate
has been removed from the -mm tree. Its filename was
mm-fix-kthread_use_mm-vs-tlb-invalidate.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Peter Zijlstra <peterz(a)infradead.org>
Subject: mm: fix kthread_use_mm() vs TLB invalidate
For SMP systems using IPI based TLB invalidation, looking at
current->active_mm is entirely reasonable. This then presents the
following race condition:
CPU0 CPU1
flush_tlb_mm(mm) use_mm(mm)
<send-IPI>
tsk->active_mm = mm;
<IPI>
if (tsk->active_mm == mm)
// flush TLBs
</IPI>
switch_mm(old_mm,mm,tsk);
Where it is possible the IPI flushed the TLBs for @old_mm, not @mm,
because the IPI lands before we actually switched.
Avoid this by disabling IRQs across changing ->active_mm and
switch_mm().
Of the (SMP) architectures that have IPI based TLB invalidate:
Alpha - checks active_mm
ARC - ASID specific
IA64 - checks active_mm
MIPS - ASID specific flush
OpenRISC - shoots down world
PARISC - shoots down world
SH - ASID specific
SPARC - ASID specific
x86 - N/A
xtensa - checks active_mm
So at the very least Alpha, IA64 and Xtensa are suspect.
On top of this, for scheduler consistency we need at least preemption
disabled across changing tsk->mm and doing switch_mm(), which is
currently provided by task_lock(), but that's not sufficient for
PREEMPT_RT.
[akpm(a)linux-foundation.org: add comment]
Link: http://lkml.kernel.org/r/20200721154106.GE10769@hirez.programming.kicks-ass…
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reported-by: Andy Lutomirski <luto(a)amacapital.net>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Jann Horn <jannh(a)google.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/kthread.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/kernel/kthread.c~mm-fix-kthread_use_mm-vs-tlb-invalidate
+++ a/kernel/kthread.c
@@ -1241,13 +1241,16 @@ void kthread_use_mm(struct mm_struct *mm
WARN_ON_ONCE(tsk->mm);
task_lock(tsk);
+ /* Hold off tlb flush IPIs while switching mm's */
+ local_irq_disable();
active_mm = tsk->active_mm;
if (active_mm != mm) {
mmgrab(mm);
tsk->active_mm = mm;
}
tsk->mm = mm;
- switch_mm(active_mm, mm, tsk);
+ switch_mm_irqs_off(active_mm, mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
#ifdef finish_arch_post_lock_switch
finish_arch_post_lock_switch();
@@ -1276,9 +1279,11 @@ void kthread_unuse_mm(struct mm_struct *
task_lock(tsk);
sync_mm_rss(mm);
+ local_irq_disable();
tsk->mm = NULL;
/* active_mm is still 'mm' */
enter_lazy_tlb(mm, tsk);
+ local_irq_enable();
task_unlock(tsk);
}
EXPORT_SYMBOL_GPL(kthread_unuse_mm);
_
Patches currently in -mm which might be from peterz(a)infradead.org are
The patch titled
Subject: mm/shuffle: don't move pages between zones and don't read garbage memmaps
has been removed from the -mm tree. Its filename was
mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/shuffle: don't move pages between zones and don't read garbage memmaps
Especially with memory hotplug, we can have offline sections (with a
garbage memmap) and overlapping zones. We have to make sure to only touch
initialized memmaps (online sections managed by the buddy) and that the
zone matches, to not move pages between zones.
To test if this can actually happen, I added a simple
BUG_ON(page_zone(page_i) != page_zone(page_j));
right before the swap. When hotplugging a 256M DIMM to a 4G x86-64 VM and
onlining the first memory block "online_movable" and the second memory
block "online_kernel", it will trigger the BUG, as both zones (NORMAL and
MOVABLE) overlap.
This might result in all kinds of weird situations (e.g., double
allocations, list corruptions, unmovable allocations ending up in the
movable zone).
Link: http://lkml.kernel.org/r/20200624094741.9918-2-david@redhat.com
Fixes: e900a918b098 ("mm: shuffle initial free memory to improve memory-side-cache utilization")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Wei Yang <richard.weiyang(a)linux.alibaba.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org> [5.2+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shuffle.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
--- a/mm/shuffle.c~mm-shuffle-dont-move-pages-between-zones-and-dont-read-garbage-memmaps
+++ a/mm/shuffle.c
@@ -58,25 +58,25 @@ module_param_call(shuffle, shuffle_store
* For two pages to be swapped in the shuffle, they must be free (on a
* 'free_area' lru), have the same order, and have the same migratetype.
*/
-static struct page * __meminit shuffle_valid_page(unsigned long pfn, int order)
+static struct page * __meminit shuffle_valid_page(struct zone *zone,
+ unsigned long pfn, int order)
{
- struct page *page;
+ struct page *page = pfn_to_online_page(pfn);
/*
* Given we're dealing with randomly selected pfns in a zone we
* need to ask questions like...
*/
- /* ...is the pfn even in the memmap? */
- if (!pfn_valid_within(pfn))
+ /* ... is the page managed by the buddy? */
+ if (!page)
return NULL;
- /* ...is the pfn in a present section or a hole? */
- if (!pfn_in_present_section(pfn))
+ /* ... is the page assigned to the same zone? */
+ if (page_zone(page) != zone)
return NULL;
/* ...is the page free and currently on a free_area list? */
- page = pfn_to_page(pfn);
if (!PageBuddy(page))
return NULL;
@@ -123,7 +123,7 @@ void __meminit __shuffle_zone(struct zon
* page_j randomly selected in the span @zone_start_pfn to
* @spanned_pages.
*/
- page_i = shuffle_valid_page(i, order);
+ page_i = shuffle_valid_page(z, i, order);
if (!page_i)
continue;
@@ -137,7 +137,7 @@ void __meminit __shuffle_zone(struct zon
j = z->zone_start_pfn +
ALIGN_DOWN(get_random_long() % z->spanned_pages,
order_pages);
- page_j = shuffle_valid_page(j, order);
+ page_j = shuffle_valid_page(z, j, order);
if (page_j && page_j != page_i)
break;
}
_
Patches currently in -mm which might be from david(a)redhat.com are
From: Chengming Zhou <zhouchengming(a)bytedance.com>
When module loaded and enabled, we will use __ftrace_replace_code
for module if any ftrace_ops referenced it found. But we will get
wrong ftrace_addr for module rec in ftrace_get_addr_new, because
rec->flags has not been setup correctly. It can cause the callback
function of a ftrace_ops has FTRACE_OPS_FL_SAVE_REGS to be called
with pt_regs set to NULL.
So setup correct FTRACE_FL_REGS flags for rec when we call
referenced_filters to find ftrace_ops references it.
Link: https://lkml.kernel.org/r/20200728180554.65203-1-zhouchengming@bytedance.com
Cc: stable(a)vger.kernel.org
Fixes: 8c4f3c3fa9681 ("ftrace: Check module functions being traced on reload")
Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index c141d347f71a..d052f856f1cf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -6198,8 +6198,11 @@ static int referenced_filters(struct dyn_ftrace *rec)
int cnt = 0;
for (ops = ftrace_ops_list; ops != &ftrace_list_end; ops = ops->next) {
- if (ops_references_rec(ops, rec))
- cnt++;
+ if (ops_references_rec(ops, rec)) {
+ cnt++;
+ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS)
+ rec->flags |= FTRACE_FL_REGS;
+ }
}
return cnt;
@@ -6378,8 +6381,8 @@ void ftrace_module_enable(struct module *mod)
if (ftrace_start_up)
cnt += referenced_filters(rec);
- /* This clears FTRACE_FL_DISABLED */
- rec->flags = cnt;
+ rec->flags &= ~FTRACE_FL_DISABLED;
+ rec->flags += cnt;
if (ftrace_start_up && cnt) {
int failed = __ftrace_replace_code(rec, 1);
--
2.26.2
From: Qiushi Wu <wu000273(a)umn.edu>
[ Upstream commit 17ed808ad243192fb923e4e653c1338d3ba06207 ]
When kobject_init_and_add() returns an error, it should be handled
because kobject_init_and_add() takes a reference even when it fails. If
this function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.
Therefore, replace calling kfree() and call kobject_put() and add a
missing kobject_put() in the edac_device_register_sysfs_main_kobj()
error path.
[ bp: Massage and merge into a single patch. ]
Fixes: b2ed215a3338 ("Kobject: change drivers/edac to use kobject_init_and_add")
Signed-off-by: Qiushi Wu <wu000273(a)umn.edu>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu
Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/edac/edac_device_sysfs.c | 1 +
drivers/edac/edac_pci_sysfs.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/edac/edac_device_sysfs.c b/drivers/edac/edac_device_sysfs.c
index fb68a06ad6837..18991cfec2af4 100644
--- a/drivers/edac/edac_device_sysfs.c
+++ b/drivers/edac/edac_device_sysfs.c
@@ -280,6 +280,7 @@ int edac_device_register_sysfs_main_kobj(struct edac_device_ctl_info *edac_dev)
/* Error exit stack */
err_kobj_reg:
+ kobject_put(&edac_dev->kobj);
module_put(edac_dev->owner);
err_mod_get:
diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c
index 24d877f6e5775..c56128402bc67 100644
--- a/drivers/edac/edac_pci_sysfs.c
+++ b/drivers/edac/edac_pci_sysfs.c
@@ -394,7 +394,7 @@ static int edac_pci_main_kobj_setup(void)
/* Error unwind statck */
kobject_init_and_add_fail:
- kfree(edac_pci_top_main_kobj);
+ kobject_put(edac_pci_top_main_kobj);
kzalloc_fail:
module_put(THIS_MODULE);
--
2.25.1
From: Qiushi Wu <wu000273(a)umn.edu>
[ Upstream commit 17ed808ad243192fb923e4e653c1338d3ba06207 ]
When kobject_init_and_add() returns an error, it should be handled
because kobject_init_and_add() takes a reference even when it fails. If
this function returns an error, kobject_put() must be called to properly
clean up the memory associated with the object.
Therefore, replace calling kfree() and call kobject_put() and add a
missing kobject_put() in the edac_device_register_sysfs_main_kobj()
error path.
[ bp: Massage and merge into a single patch. ]
Fixes: b2ed215a3338 ("Kobject: change drivers/edac to use kobject_init_and_add")
Signed-off-by: Qiushi Wu <wu000273(a)umn.edu>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Link: https://lkml.kernel.org/r/20200528202238.18078-1-wu000273@umn.edu
Link: https://lkml.kernel.org/r/20200528203526.20908-1-wu000273@umn.edu
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/edac/edac_device_sysfs.c | 1 +
drivers/edac/edac_pci_sysfs.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/edac/edac_device_sysfs.c b/drivers/edac/edac_device_sysfs.c
index 93da1a45c7161..470b02fc2de96 100644
--- a/drivers/edac/edac_device_sysfs.c
+++ b/drivers/edac/edac_device_sysfs.c
@@ -275,6 +275,7 @@ int edac_device_register_sysfs_main_kobj(struct edac_device_ctl_info *edac_dev)
/* Error exit stack */
err_kobj_reg:
+ kobject_put(&edac_dev->kobj);
module_put(edac_dev->owner);
err_out:
diff --git a/drivers/edac/edac_pci_sysfs.c b/drivers/edac/edac_pci_sysfs.c
index 6e3428ba400f3..622d117e25335 100644
--- a/drivers/edac/edac_pci_sysfs.c
+++ b/drivers/edac/edac_pci_sysfs.c
@@ -386,7 +386,7 @@ static int edac_pci_main_kobj_setup(void)
/* Error unwind statck */
kobject_init_and_add_fail:
- kfree(edac_pci_top_main_kobj);
+ kobject_put(edac_pci_top_main_kobj);
kzalloc_fail:
module_put(THIS_MODULE);
--
2.25.1
From: Jakub Kicinski <kuba(a)kernel.org>
When ur_load_imm_any() is inlined into jeq_imm(), it's possible for the
compiler to deduce a case where _val can only have the value of -1 at
compile time. Specifically,
/* struct bpf_insn: _s32 imm */
u64 imm = insn->imm; /* sign extend */
if (imm >> 32) { /* non-zero only if insn->imm is negative */
/* inlined from ur_load_imm_any */
u32 __imm = imm >> 32; /* therefore, always 0xffffffff */
if (__builtin_constant_p(__imm) && __imm > 255)
compiletime_assert_XXX()
This can result in tripping a BUILD_BUG_ON() in __BF_FIELD_CHECK() that
checks that a given value is representable in one byte (interpreted as
unsigned).
FIELD_FIT() should return true or false at runtime for whether a value
can fit for not. Don't break the build over a value that's too large for
the mask. We'd prefer to keep the inlining and compiler optimizations
though we know this case will always return false.
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/kernel-hardening/CAK7LNASvb0UDJ0U5wkYYRzTAdnEs64HjX…
Reported-by: Masahiro Yamada <masahiroy(a)kernel.org>
Debugged-by: Sami Tolvanen <samitolvanen(a)google.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
Acked-by: Alex Elder <elder(a)linaro.org>
---
Note: resent patch 1/2 as per Jakub on
https://lore.kernel.org/netdev/20200708230402.1644819-1-ndesaulniers@google…
include/linux/bitfield.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h
index 48ea093ff04c..4e035aca6f7e 100644
--- a/include/linux/bitfield.h
+++ b/include/linux/bitfield.h
@@ -77,7 +77,7 @@
*/
#define FIELD_FIT(_mask, _val) \
({ \
- __BF_FIELD_CHECK(_mask, 0ULL, _val, "FIELD_FIT: "); \
+ __BF_FIELD_CHECK(_mask, 0ULL, 0ULL, "FIELD_FIT: "); \
!((((typeof(_mask))_val) << __bf_shf(_mask)) & ~(_mask)); \
})
--
2.27.0.383.g050319c2ae-goog
Hi Marc,
On Tue, Aug 04, 2020 at 05:52:36PM -0700, Marc Plumb wrote:
> Seeding two PRNGs with the same entropy causes two problems. The minor one
> is that you're double counting entropy. The major one is that anyone who can
> determine the state of one PRNG can determine the state of the other.
>
> The net_rand_state PRNG is effectively a 113 bit LFSR, so anyone who can see
> any 113 bits of output can determine the complete internal state.
>
> The output of the net_rand_state PRNG is used to determine how data is sent
> to the network, so the output is effectively broadcast to anyone watching
> network traffic. Therefore anyone watching the network traffic can determine
> the seed data being fed to the net_rand_state PRNG.
The problem this patch is trying to work around is that the reporter
(Amit) was able to determine the entire net_rand_state after observing
a certain number of packets due to this trivial LFSR and the fact that
its internal state between two reseedings only depends on the number
of calls to read it. (please note that regarding this point I'll
propose a patch to replace that PRNG to stop directly exposing the
internal state to the network).
If you look closer at the patch, you'll see that in one interrupt
the patch only uses any 32 out of the 128 bits of fast_pool to
update only 32 bits of the net_rand_state. As such, the sequence
observed on the network also depends on the remaining bits of
net_rand_state, while the 96 other bits of the fast_pool are not
exposed there.
> Since this is the same
> seed data being fed to get_random_bytes, it allows an attacker to determine
> the state and there output of /dev/random. I sincerely hope that this was
> not the intended goal. :)
Not only was this obviously not the goal, but I'd be particularly
interested in seeing this reality demonstrated, considering that
the whole 128 bits of fast_pool together count as a single bit of
entropy, and that as such, even if you were able to figure the
value of the 32 bits leaked to net_rand_state, you'd still have to
guess the 96 other bits for each single entropy bit :-/
Regards,
Willy
When FTRIM is issued on a group, ext4 marks it as trimmed so another FTRIM
on the same group has no effect. Ext4 marks group as trimmed if at least
one block is trimmed, therefore it is possible that a group is marked as
trimmed even if there are blocks in that group left untrimmed.
This patch marks group as trimmed only if there are no more blocks
in that group to be trimmed.
Fixes: 3d56b8d2c74cc3f375ce332b3ac3519e009d79ee
Tested-by: Lazar Beloica <lazar.beloica(a)nutanix.com>
Signed-off-by: Lazar Beloica <lazar.beloica(a)nutanix.com>
---
fs/ext4/mballoc.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c0a331e..130936b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -5346,6 +5346,7 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count,
{
void *bitmap;
ext4_grpblk_t next, count = 0, free_count = 0;
+ ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
struct ext4_buddy e4b;
int ret = 0;
@@ -5401,7 +5402,9 @@ static int ext4_trim_extent(struct super_block *sb, int start, int count,
if (!ret) {
ret = count;
- EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
+ next = mb_find_next_bit(bitmap, max_blks, max + 1);
+ if (next == max_blks)
+ EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
}
out:
ext4_unlock_group(sb, group);
--
1.8.3.1
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: a95883ce5505 - USB: serial: qcserial: add EM7305 QDL product ID
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 2:
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
🚧 ✅ kdump - kexec_boot
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
🚧 ⚡⚡⚡ kdump - sysrq-c
Host 3:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
🚧 ⚡⚡⚡ kdump - sysrq-c
Host 5:
✅ Boot test
🚧 ✅ kdump - sysrq-c
s390x:
Host 1:
✅ Boot test
✅ selinux-policy: serge-testsuite
✅ stress: stress-ng
🚧 ✅ Storage blktests
Host 2:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ Networking firewall: basic netfilter test
🚧 ✅ audit: audit testsuite test
🚧 ✅ trace: ftrace/tracer
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ❌ CPU: Frequency Driver Test
🚧 ✅ CPU: Idle Test
🚧 ✅ xfstests - btrfs
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 2:
✅ Boot test
🚧 ✅ kdump - sysrq-c
🚧 ✅ kdump - file-load
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ ACPI table test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ kdump - kexec_boot
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ ACPI table test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ kdump - kexec_boot
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ ACPI table test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ kdump - kexec_boot
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
On Fri, Aug 7, 2020 at 10:55 AM Andy Lutomirski <luto(a)amacapital.net> wrote:
>
> I think the real random.c can run plenty fast. It’s ChaCha20 plus ludicrous overhead right now.
I doubt it.
I tried something very much like that in user space to just see how
many cycles it ended up being.
I made a "just raw ChaCha20", and it was already much too slow for
what some of the networking people claim to want.
And maybe they are asking for too much, but if they think it's too
slow, they'll not use it, and then we're back to square one.
Now, what *might* be acceptable is to not do ChaCha20, but simply do a
single double-round of it.
So after doing 10 prandom_u32() calls, you'd have done a full
ChaCha20. I didn't actually try that, but from looking at the costs
from trying the full thing, I think it might be in the right ballpark.
How does that sound to people?
Linus
The patch titled
Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault
has been removed from the -mm tree. Its filename was
mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault.patch
This patch was dropped because it was nacked
------------------------------------------------------
From: Yang Shi <yang.shi(a)linux.alibaba.com>
Subject: mm/memory.c: avoid access flag update TLB flush for retried page fault
Recently we found regression when running will_it_scale/page_fault3 test
on ARM64. Over 70% down for the multi processes cases and over 20% down
for the multi threads cases. It turns out the regression is caused by
commit 89b15332af7c0312a41e50846819ca6613b58b4c ("mm: drop mmap_sem before
calling balance_dirty_pages() in write fault").
The test mmaps a memory size file then write to the mapping, this would
make all memory dirty and trigger dirty pages throttle, that upstream
commit would release mmap_sem then retry the page fault. The retried page
fault would see correct PTEs installed by the first try then update dirty
bit and clear read-only bit and flush TLBs for ARM. The regression is
caused by the excessive TLB flush. It is fine on x86 since x86 doesn't
clear read-only bit so there is no need to flush TLB for this case.
The page fault would be retried due to:
1. Waiting for page readahead
2. Waiting for page swapped in
3. Waiting for dirty pages throttling
The first two cases don't have PTEs set up at all, so the retried page
fault would install the PTEs, so they don't reach there. But the #3 case
usually has PTEs installed, the retried page fault would reach the dirty
bit and read-only bit update. But it seems not necessary to modify those
bits again for #3 since they should be already set by the first page fault
try.
Of course the parallel page fault may set up PTEs, but we just need care
about write fault. If the parallel page fault setup a writable and dirty
PTE then the retried fault doesn't need do anything extra. If the
parallel page fault setup a clean read-only PTE, the retried fault should
just call do_wp_page() then return as the below code snippet shows:
if (vmf->flags & FAULT_FLAG_WRITE) {
if (!pte_write(entry))
return do_wp_page(vmf);
}
With this fix the test result get back to normal.
[yang.shi(a)linux.alibaba.com: incorporate comment from Will Deacon, update commit log per discussion]
Link: http://lkml.kernel.org/r/1594848990-55657-1-git-send-email-yang.shi@linux.a…
Link: http://lkml.kernel.org/r/1594148072-91273-1-git-send-email-yang.shi@linux.a…
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
Reported-by: Xu Yu <xuyu(a)linux.alibaba.com>
Debugged-by: Xu Yu <xuyu(a)linux.alibaba.com>
Tested-by: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Josef Bacik <josef(a)toxicpanda.com>
Cc: Hillf Danton <hdanton(a)sina.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
--- a/mm/memory.c~mm-avoid-access-flag-update-tlb-flush-for-retried-page-fault
+++ a/mm/memory.c
@@ -4241,8 +4241,14 @@ static vm_fault_t handle_pte_fault(struc
if (vmf->flags & FAULT_FLAG_WRITE) {
if (!pte_write(entry))
return do_wp_page(vmf);
- entry = pte_mkdirty(entry);
}
+
+ if (vmf->flags & FAULT_FLAG_TRIED)
+ goto unlock;
+
+ if (vmf->flags & FAULT_FLAG_WRITE)
+ entry = pte_mkdirty(entry);
+
entry = pte_mkyoung(entry);
if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry,
vmf->flags & FAULT_FLAG_WRITE)) {
_
Patches currently in -mm which might be from yang.shi(a)linux.alibaba.com are
mm-filemap-clear-idle-flag-for-writes.patch
mm-filemap-add-missing-fgp_-flags-in-kerneldoc-comment-for-pagecache_get_page.patch
mm-thp-remove-debug_cow-switch.patch
If the link register was zeroed out, do not attempt to use it for
address calculations for which there are currently no fixup handlers,
which can lead to a panic during unwind. Since panicking triggers
another unwind, this can lead to an infinite loop. If this occurs
during start_kernel(), this can prevent a kernel from booting.
commit 59b6359dd92d ("ARM: 8702/1: head-common.S: Clear lr before jumping to start_kernel()")
intentionally zeros out the link register in __mmap_switched which tail
calls into start kernel. Test for this condition so that we can stop
unwinding when initiated within start_kernel() correctly.
Cc: stable(a)vger.kernel.org
Fixes: commit 6dc5fd93b2f1 ("ARM: 8900/1: UNWINDER_FRAME_POINTER implementation for Clang")
Reported-by: Miles Chen <miles.chen(a)mediatek.com>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
arch/arm/lib/backtrace-clang.S | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm/lib/backtrace-clang.S b/arch/arm/lib/backtrace-clang.S
index 6174c45f53a5..5388ac664c12 100644
--- a/arch/arm/lib/backtrace-clang.S
+++ b/arch/arm/lib/backtrace-clang.S
@@ -144,6 +144,8 @@ for_each_frame: tst frame, mask @ Check for address exceptions
*/
1003: ldr sv_lr, [sv_fp, #4] @ get saved lr from next frame
+ tst sv_lr, #0 @ If there's no previous lr,
+ beq finished_setup @ we're done.
ldr r0, [sv_lr, #-4] @ get call instruction
ldr r3, .Lopcode+4
and r2, r3, r0 @ is this a bl call
--
2.28.0.163.g6104cc2f0b6-goog
Hi Andy,
On Fri, Aug 07, 2020 at 10:55:11AM -0700, Andy Lutomirski wrote:
> >> This is still another non-cryptographic PRNG.
> >
> > Absolutely. During some discussions regarding the possibility of using
> > CSPRNGs, orders around hundreds of CPU cycles were mentioned for them,
> > which can definitely be a huge waste of precious resources for some
> > workloads, possibly causing the addition of a few percent extra machines
> > in certain environments just to keep the average load under a certain
> > threshold.
>
> I think the real random.c can run plenty fast. It's ChaCha20 plus ludicrous
> overhead right now. I'm working (slowly) on making the overhead go away. I'm
> hoping to have something testable in a few days. As it stands, there is a
> ton of indirection, a pile of locks, multiple time comparisons, per-node and
> percpu buffers (why both?), wasted bits due to alignment, and probably other
> things that can be cleaned up. I'm trying to come up with something that is
> fast and has easy-to-understand semantics.
>
> You can follow along at:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=rando…
Thanks, we'll see. I developed a quick test tool that's meant to be easy
to use to measure the performance impact on connect/accept. I have not
yet run it on a modified PRNG to verify if it works. I'll send it once
I've tested. I'd definitely would like to see no measurable performance
drop, and ideally even a small performance increase (as Tausworthe isn't
the lightest thing around either so we do have some little margin).
Willy
The patch below does not apply to the 5.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d1719f70d0a5b83b12786a7dbc5b9fe396469016 Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Thu, 30 Jul 2020 13:43:53 -0600
Subject: [PATCH] io_uring: don't touch 'ctx' after installing file descriptor
As soon as we install the file descriptor, we have to assume that it
can get arbitrarily closed. We currently account memory (and note that
we did) after installing the ring fd, which means that it could be a
potential use-after-free condition if the fd is closed right after
being installed, but before we fiddle with the ctx.
In fact, syzbot reported this exact scenario:
BUG: KASAN: use-after-free in io_account_mem fs/io_uring.c:7397 [inline]
BUG: KASAN: use-after-free in io_uring_create fs/io_uring.c:8369 [inline]
BUG: KASAN: use-after-free in io_uring_setup+0x2797/0x2910 fs/io_uring.c:8400
Read of size 1 at addr ffff888087a41044 by task syz-executor.5/18145
CPU: 0 PID: 18145 Comm: syz-executor.5 Not tainted 5.8.0-rc7-next-20200729-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x18f/0x20d lib/dump_stack.c:118
print_address_description.constprop.0.cold+0xae/0x497 mm/kasan/report.c:383
__kasan_report mm/kasan/report.c:513 [inline]
kasan_report.cold+0x1f/0x37 mm/kasan/report.c:530
io_account_mem fs/io_uring.c:7397 [inline]
io_uring_create fs/io_uring.c:8369 [inline]
io_uring_setup+0x2797/0x2910 fs/io_uring.c:8400
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x45c429
Code: 8d b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 5b b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f8f121d0c78 EFLAGS: 00000246 ORIG_RAX: 00000000000001a9
RAX: ffffffffffffffda RBX: 0000000000008540 RCX: 000000000045c429
RDX: 0000000000000000 RSI: 0000000020000040 RDI: 0000000000000196
RBP: 000000000078bf38 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000078bf0c
R13: 00007fff86698cff R14: 00007f8f121d19c0 R15: 000000000078bf0c
Move the accounting of the ring used locked memory before we get and
install the ring file descriptor.
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+9d46305e76057f30c74e(a)syzkaller.appspotmail.com
Fixes: 309758254ea6 ("io_uring: report pinned memory usage")
Reviewed-by: Stefano Garzarella <sgarzare(a)redhat.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index fabf0b692384..33702f3b5af8 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -8329,6 +8329,15 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
ret = -EFAULT;
goto err;
}
+
+ /*
+ * Account memory _before_ installing the file descriptor. Once
+ * the descriptor is installed, it can get closed at any time.
+ */
+ io_account_mem(ctx, ring_pages(p->sq_entries, p->cq_entries),
+ ACCT_LOCKED);
+ ctx->limit_mem = limit_mem;
+
/*
* Install ring fd as the very last thing, so we don't risk someone
* having closed it before we finish setup
@@ -8338,9 +8347,6 @@ static int io_uring_create(unsigned entries, struct io_uring_params *p,
goto err;
trace_io_uring_create(ret, ctx, p->sq_entries, p->cq_entries, p->flags);
- io_account_mem(ctx, ring_pages(p->sq_entries, p->cq_entries),
- ACCT_LOCKED);
- ctx->limit_mem = limit_mem;
return ret;
err:
io_ring_ctx_wait_and_kill(ctx);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4b836a1426cb0f1ef2a6e211d7e553221594f8fc Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 27 Jul 2020 14:04:24 +0200
Subject: [PATCH] binder: Prevent context manager from incrementing ref 0
Binder is designed such that a binder_proc never has references to
itself. If this rule is violated, memory corruption can occur when a
process sends a transaction to itself; see e.g.
<https://syzkaller.appspot.com/bug?extid=09e05aba06723a94d43d>.
There is a remaining edgecase through which such a transaction-to-self
can still occur from the context of a task with BINDER_SET_CONTEXT_MGR
access:
- task A opens /dev/binder twice, creating binder_proc instances P1
and P2
- P1 becomes context manager
- P2 calls ACQUIRE on the magic handle 0, allocating index 0 in its
handle table
- P1 dies (by closing the /dev/binder fd and waiting a bit)
- P2 becomes context manager
- P2 calls ACQUIRE on the magic handle 0, allocating index 1 in its
handle table
[this triggers a warning: "binder: 1974:1974 tried to acquire
reference to desc 0, got 1 instead"]
- task B opens /dev/binder once, creating binder_proc instance P3
- P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way
transaction)
- P2 receives the handle and uses it to call P3 (two-way transaction)
- P3 calls P2 (via magic handle 0) (two-way transaction)
- P2 calls P2 (via handle 1) (two-way transaction)
And then, if P2 does *NOT* accept the incoming transaction work, but
instead closes the binder fd, we get a crash.
Solve it by preventing the context manager from using ACQUIRE on ref 0.
There shouldn't be any legitimate reason for the context manager to do
that.
Additionally, print a warning if someone manages to find another way to
trigger a transaction-to-self bug in the future.
Cc: stable(a)vger.kernel.org
Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
Acked-by: Todd Kjos <tkjos(a)google.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Martijn Coenen <maco(a)android.com>
Link: https://lore.kernel.org/r/20200727120424.1627555-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f50c5f182bb5..5b310eea9e52 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2982,6 +2982,12 @@ static void binder_transaction(struct binder_proc *proc,
goto err_dead_binder;
}
e->to_node = target_node->debug_id;
+ if (WARN_ON(proc == target_proc)) {
+ return_error = BR_FAILED_REPLY;
+ return_error_param = -EINVAL;
+ return_error_line = __LINE__;
+ goto err_invalid_target_handle;
+ }
if (security_binder_transaction(proc->tsk,
target_proc->tsk) < 0) {
return_error = BR_FAILED_REPLY;
@@ -3635,10 +3641,17 @@ static int binder_thread_write(struct binder_proc *proc,
struct binder_node *ctx_mgr_node;
mutex_lock(&context->context_mgr_node_lock);
ctx_mgr_node = context->binder_context_mgr_node;
- if (ctx_mgr_node)
+ if (ctx_mgr_node) {
+ if (ctx_mgr_node->proc == proc) {
+ binder_user_error("%d:%d context manager tried to acquire desc 0\n",
+ proc->pid, thread->pid);
+ mutex_unlock(&context->context_mgr_node_lock);
+ return -EINVAL;
+ }
ret = binder_inc_ref_for_node(
proc, ctx_mgr_node,
strong, NULL, &rdata);
+ }
mutex_unlock(&context->context_mgr_node_lock);
}
if (ret)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4b836a1426cb0f1ef2a6e211d7e553221594f8fc Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 27 Jul 2020 14:04:24 +0200
Subject: [PATCH] binder: Prevent context manager from incrementing ref 0
Binder is designed such that a binder_proc never has references to
itself. If this rule is violated, memory corruption can occur when a
process sends a transaction to itself; see e.g.
<https://syzkaller.appspot.com/bug?extid=09e05aba06723a94d43d>.
There is a remaining edgecase through which such a transaction-to-self
can still occur from the context of a task with BINDER_SET_CONTEXT_MGR
access:
- task A opens /dev/binder twice, creating binder_proc instances P1
and P2
- P1 becomes context manager
- P2 calls ACQUIRE on the magic handle 0, allocating index 0 in its
handle table
- P1 dies (by closing the /dev/binder fd and waiting a bit)
- P2 becomes context manager
- P2 calls ACQUIRE on the magic handle 0, allocating index 1 in its
handle table
[this triggers a warning: "binder: 1974:1974 tried to acquire
reference to desc 0, got 1 instead"]
- task B opens /dev/binder once, creating binder_proc instance P3
- P3 calls P2 (via magic handle 0) with (void*)1 as argument (two-way
transaction)
- P2 receives the handle and uses it to call P3 (two-way transaction)
- P3 calls P2 (via magic handle 0) (two-way transaction)
- P2 calls P2 (via handle 1) (two-way transaction)
And then, if P2 does *NOT* accept the incoming transaction work, but
instead closes the binder fd, we get a crash.
Solve it by preventing the context manager from using ACQUIRE on ref 0.
There shouldn't be any legitimate reason for the context manager to do
that.
Additionally, print a warning if someone manages to find another way to
trigger a transaction-to-self bug in the future.
Cc: stable(a)vger.kernel.org
Fixes: 457b9a6f09f0 ("Staging: android: add binder driver")
Acked-by: Todd Kjos <tkjos(a)google.com>
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Martijn Coenen <maco(a)android.com>
Link: https://lore.kernel.org/r/20200727120424.1627555-1-jannh@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index f50c5f182bb5..5b310eea9e52 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2982,6 +2982,12 @@ static void binder_transaction(struct binder_proc *proc,
goto err_dead_binder;
}
e->to_node = target_node->debug_id;
+ if (WARN_ON(proc == target_proc)) {
+ return_error = BR_FAILED_REPLY;
+ return_error_param = -EINVAL;
+ return_error_line = __LINE__;
+ goto err_invalid_target_handle;
+ }
if (security_binder_transaction(proc->tsk,
target_proc->tsk) < 0) {
return_error = BR_FAILED_REPLY;
@@ -3635,10 +3641,17 @@ static int binder_thread_write(struct binder_proc *proc,
struct binder_node *ctx_mgr_node;
mutex_lock(&context->context_mgr_node_lock);
ctx_mgr_node = context->binder_context_mgr_node;
- if (ctx_mgr_node)
+ if (ctx_mgr_node) {
+ if (ctx_mgr_node->proc == proc) {
+ binder_user_error("%d:%d context manager tried to acquire desc 0\n",
+ proc->pid, thread->pid);
+ mutex_unlock(&context->context_mgr_node_lock);
+ return -EINVAL;
+ }
ret = binder_inc_ref_for_node(
proc, ctx_mgr_node,
strong, NULL, &rdata);
+ }
mutex_unlock(&context->context_mgr_node_lock);
}
if (ret)
The only-root-readable /sys/module/$module/sections/$section files
did not truncate their output to the available buffer size. While most
paths into the kernfs read handlers end up using PAGE_SIZE buffers,
it's possible to get there through other paths (e.g. splice, sendfile).
Actually limit the output to the "count" passed into the read function,
and report it back correctly. *sigh*
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/lkml/20200805002015.GE23458@shao2-debian
Fixes: ed66f991bb19 ("module: Refactor section attr into bin attribute")
Cc: stable(a)vger.kernel.org
Cc: Jessica Yu <jeyu(a)kernel.org>
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
kernel/module.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/kernel/module.c b/kernel/module.c
index aa183c9ac0a2..08c46084d8cc 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1520,18 +1520,34 @@ struct module_sect_attrs {
struct module_sect_attr attrs[];
};
+#define MODULE_SECT_READ_SIZE (3 /* "0x", "\n" */ + (BITS_PER_LONG / 4))
static ssize_t module_sect_read(struct file *file, struct kobject *kobj,
struct bin_attribute *battr,
char *buf, loff_t pos, size_t count)
{
struct module_sect_attr *sattr =
container_of(battr, struct module_sect_attr, battr);
+ char bounce[MODULE_SECT_READ_SIZE + 1];
+ size_t wrote;
if (pos != 0)
return -EINVAL;
- return sprintf(buf, "0x%px\n",
- kallsyms_show_value(file->f_cred) ? (void *)sattr->address : NULL);
+ /*
+ * Since we're a binary read handler, we must account for the
+ * trailing NUL byte that sprintf will write: if "buf" is
+ * too small to hold the NUL, or the NUL is exactly the last
+ * byte, the read will look like it got truncated by one byte.
+ * Since there is no way to ask sprintf nicely to not write
+ * the NUL, we have to use a bounce buffer.
+ */
+ wrote = scnprintf(bounce, sizeof(bounce), "0x%px\n",
+ kallsyms_show_value(file->f_cred)
+ ? (void *)sattr->address : NULL);
+ count = min(count, wrote);
+ memcpy(buf, bounce, count);
+
+ return count;
}
static void free_sect_attrs(struct module_sect_attrs *sect_attrs)
@@ -1580,7 +1596,7 @@ static void add_sect_attrs(struct module *mod, const struct load_info *info)
goto out;
sect_attrs->nsections++;
sattr->battr.read = module_sect_read;
- sattr->battr.size = 3 /* "0x", "\n" */ + (BITS_PER_LONG / 4);
+ sattr->battr.size = MODULE_SECT_READ_SIZE;
sattr->battr.attr.mode = 0400;
*(gattr++) = &(sattr++)->battr;
}
--
2.25.1
This patch adds check to ensure that the struct net_device::ml_priv is
allocated, as it is used later by the j1939 stack.
The allocation is done by all mainline CAN network drivers, but when using
bond or team devices this is not the case.
Bail out if no ml_priv is allocated.
Reported-by: syzbot+f03d384f3455d28833eb(a)syzkaller.appspotmail.com
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Cc: linux-stable <stable(a)vger.kernel.org> # >= v5.4
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
---
net/can/j1939/socket.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/net/can/j1939/socket.c b/net/can/j1939/socket.c
index b9a17c2ee16f..27542de233c7 100644
--- a/net/can/j1939/socket.c
+++ b/net/can/j1939/socket.c
@@ -467,6 +467,14 @@ static int j1939_sk_bind(struct socket *sock, struct sockaddr *uaddr, int len)
goto out_release_sock;
}
+ if (!ndev->ml_priv) {
+ netdev_warn_once(ndev,
+ "No CAN mid layer private allocated, please fix your driver and use alloc_candev()!\n");
+ dev_put(ndev);
+ ret = -ENODEV;
+ goto out_release_sock;
+ }
+
priv = j1939_netdev_start(ndev);
dev_put(ndev);
if (IS_ERR(priv)) {
--
2.28.0
The current stack implementation do not support ECTS requests of not
aligned TP sized blocks.
If ECTS will request a block with size and offset spanning two TP
blocks, this will cause memcpy() to read beyond the queued skb (which
does only contain one TP sized block).
Sometimes KASAN will detect this read if the memory region beyond the
skb was previously allocated and freed. In other situations it will stay
undetected. The ETP transfer in any case will be corrupted.
This patch adds a sanity check to avoid this kind of read and abort the
session with error J1939_XTP_ABORT_ECTS_TOO_BIG.
Reported-by: syzbot+5322482fe520b02aea30(a)syzkaller.appspotmail.com
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol")
Cc: linux-stable <stable(a)vger.kernel.org> # >= v5.4
Signed-off-by: Oleksij Rempel <o.rempel(a)pengutronix.de>
---
net/can/j1939/transport.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/net/can/j1939/transport.c b/net/can/j1939/transport.c
index b135c5e2a86e..30957c9a8eb7 100644
--- a/net/can/j1939/transport.c
+++ b/net/can/j1939/transport.c
@@ -787,6 +787,18 @@ static int j1939_session_tx_dat(struct j1939_session *session)
if (len > 7)
len = 7;
+ if (offset + len > se_skb->len) {
+ netdev_err_once(priv->ndev,
+ "%s: 0x%p: requested data outside of queued buffer: offset %i, len %i, pkt.tx: %i\n",
+ __func__, session, skcb->offset, se_skb->len , session->pkt.tx);
+ return -EOVERFLOW;
+ }
+
+ if (!len) {
+ ret = -ENOBUFS;
+ break;
+ }
+
memcpy(&dat[1], &tpdat[offset], len);
ret = j1939_tp_tx_dat(session, dat, len + 1);
if (ret < 0) {
@@ -1120,6 +1132,9 @@ static enum hrtimer_restart j1939_tp_txtimer(struct hrtimer *hrtimer)
* cleanup including propagation of the error to user space.
*/
break;
+ case -EOVERFLOW:
+ j1939_session_cancel(session, J1939_XTP_ABORT_ECTS_TOO_BIG);
+ break;
case 0:
session->tx_retry = 0;
break;
--
2.28.0
I'm announcing the release of the 5.7.14 kernel.
All users of the 5.7 kernel series must upgrade.
The updated 5.7.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.7.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/include/asm/percpu.h | 2
arch/arm64/include/asm/archrandom.h | 1
arch/arm64/include/asm/pointer_auth.h | 8 +++
arch/arm64/kernel/kaslr.c | 2
drivers/char/random.c | 1
include/linux/prandom.h | 78 ++++++++++++++++++++++++++++++++++
include/linux/random.h | 63 +--------------------------
kernel/time/timer.c | 8 +++
lib/random32.c | 2
10 files changed, 103 insertions(+), 64 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.7.14
Grygorii Strashko (1):
ARM: percpu.h: fix build error
Linus Torvalds (3):
random32: remove net_rand_state from the latent entropy gcc plugin
random32: move the pseudo-random 32-bit definitions to prandom.h
random: random.h should include archrandom.h, not the other way around
Marc Zyngier (1):
arm64: Workaround circular dependency in pointer_auth.h
Willy Tarreau (2):
random32: update the net random state on interrupt and activity
random: fix circular include dependency on arm64 after addition of percpu.h
I'm announcing the release of the 5.4.57 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/include/asm/percpu.h | 2
arch/arm64/include/asm/pointer_auth.h | 8 ++-
drivers/char/random.c | 1
fs/ext4/inode.c | 5 ++
include/linux/bpf.h | 13 ++++-
include/linux/prandom.h | 78 ++++++++++++++++++++++++++++++++
include/linux/random.h | 63 +------------------------
include/linux/skmsg.h | 13 +++++
kernel/bpf/syscall.c | 4 -
kernel/time/timer.c | 8 +++
lib/random32.c | 2
net/core/sock_map.c | 50 ++++++++++++++++++--
tools/testing/selftests/bpf/test_maps.c | 12 ++--
14 files changed, 184 insertions(+), 77 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.4.57
Grygorii Strashko (1):
ARM: percpu.h: fix build error
Jiang Ying (1):
ext4: fix direct I/O read error
Linus Torvalds (2):
random32: remove net_rand_state from the latent entropy gcc plugin
random32: move the pseudo-random 32-bit definitions to prandom.h
Lorenz Bauer (2):
selftests: bpf: Fix detach from sockmap tests
bpf: sockmap: Require attach_bpf_fd when detaching a program
Marc Zyngier (1):
arm64: Workaround circular dependency in pointer_auth.h
Willy Tarreau (2):
random32: update the net random state on interrupt and activity
random: fix circular include dependency on arm64 after addition of percpu.h