I'm looking at CVE-2015-8553 which is fixed by:
commit 7681f31ec9cdacab4fd10570be924f2cef6669ba
Author: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Date: Wed Feb 13 18:21:31 2019 -0500
xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
I'm aware that this change is incompatible with qemu < 2.5, but that's
now quite old. Do you think it makes sense to apply this change to
some stable branches?
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
Among other improvements, this patch series fixes a data corruption bug
in the mac_scsi driver and a bug in the EH abort routine in the core
5380 driver.
For consistency I have ignored certain checkpatch.pl complaints about
the indentation in mac_scsi.c. The remaining complaints seem to be
false positives.
Some of these patches are not trivial to backport. Those patches have
been nominated for recent -stable branches only.
Finn Thain (7):
Revert "scsi: ncr5380: Increase register polling limit"
scsi: NCR5380: Always re-enable reselection interrupt
scsi: NCR5380: Handle PDMA failure reliably
scsi: mac_scsi: Increase PIO/PDMA transfer length threshold
scsi: mac_scsi: Fix pseudo DMA implementation, take 2
scsi: mac_scsi: Enable PDMA on Mac IIfx
scsi: mac_scsi: Treat Last Byte Sent time-out as failure
arch/m68k/include/asm/mac_pdma.h | 179 ++++++++++++++++++++++
arch/m68k/mac/config.c | 10 +-
drivers/scsi/NCR5380.c | 18 +--
drivers/scsi/NCR5380.h | 2 +-
drivers/scsi/mac_scsi.c | 249 +++++++++++--------------------
5 files changed, 280 insertions(+), 178 deletions(-)
create mode 100644 arch/m68k/include/asm/mac_pdma.h
--
2.21.0
Note, this is going to be the LAST 5.0.y kernel release. After this one, it is
end-of-life, please move to 5.1.y at this point in time. If there is anything
wrong with the 5.1.y tree, preventing you from moving to 5.1.y, please let me
know.
This is the start of the stable review cycle for the 5.0.21 release.
There are 36 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed 05 Jun 2019 09:04:48 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.0.21-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.0.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.0.21-rc1
Junwei Hu <hujunwei4(a)huawei.com>
tipc: fix modprobe tipc failed after switch order of device registration
David S. Miller <davem(a)davemloft.net>
Revert "tipc: fix modprobe tipc failed after switch order of device registration"
Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
Daniel Axtens <dja(a)axtens.net>
crypto: vmx - ghash: do nosimd fallback manually
Willem de Bruijn <willemb(a)google.com>
net: correct zerocopy refcnt with udp MSG_MORE
Vishal Kulkarni <vishal(a)chelsio.com>
cxgb4: Revert "cxgb4: Remove SGE_HOST_PAGE_SIZE dependency on page size"
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: don't ignore netdev notifications if no TLS features
Jakub Kicinski <jakub.kicinski(a)netronome.com>
net/tls: fix state removal with feature flags off
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Reduce memory usage when running in kdump kernel.
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix possible BUG() condition when calling pci_disable_msix().
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix aggregation buffer leak under OOM condition.
Weifeng Voon <weifeng.voon(a)intel.com>
net: stmmac: dma channel control register need to be init first
Tan, Tee Min <tee.min.tan(a)intel.com>
net: stmmac: fix ethtool flow control not able to get/set
Saeed Mahameed <saeedm(a)mellanox.com>
net/mlx5e: Disable rxhash when CQE compress is enabled
Parav Pandit <parav(a)mellanox.com>
net/mlx5: Allocate root ns memory using kzalloc to match kfree
Chris Packham <chris.packham(a)alliedtelesis.co.nz>
tipc: Avoid copying bytes beyond the supplied data
Parav Pandit <parav(a)mellanox.com>
net/mlx5: Avoid double free in fs init error unwinding path
Kloetzke Jan <Jan.Kloetzke(a)preh.de>
usbnet: fix kernel crash after disconnect
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix MAC address being lost in PCI D3
Jisheng Zhang <Jisheng.Zhang(a)synaptics.com>
net: stmmac: fix reset gpio free missing
Vlad Buslov <vladbu(a)mellanox.com>
net: sched: don't use tc_action->order during action dump
Russell King <rmk+kernel(a)armlinux.org.uk>
net: phy: marvell10g: report if the PHY fails to boot firmware
Antoine Tenart <antoine.tenart(a)bootlin.com>
net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
Jisheng Zhang <Jisheng.Zhang(a)synaptics.com>
net: mvneta: Fix err code path of probe
Eric Dumazet <edumazet(a)google.com>
net-gro: fix use-after-free read in napi_gro_frags()
Andy Duan <fugang.duan(a)nxp.com>
net: fec: fix the clk mismatch in failed_reset path
Rasmus Villemoes <rasmus.villemoes(a)prevas.dk>
net: dsa: mv88e6xxx: fix handling of upper half of STATS_TYPE_PORT
Jiri Pirko <jiri(a)mellanox.com>
mlxsw: spectrum_acl: Avoid warning after identical rules insertion
Eric Dumazet <edumazet(a)google.com>
llc: fix skb leak in llc_build_and_send_ui_pkt()
David Ahern <dsahern(a)gmail.com>
ipv6: Fix redirect with VRF
Mike Manning <mmanning(a)vyatta.att-mail.com>
ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
Eric Dumazet <edumazet(a)google.com>
ipv4/igmp: fix build error if !CONFIG_IP_MULTICAST
Eric Dumazet <edumazet(a)google.com>
ipv4/igmp: fix another memory leak in igmpv3_del_delrec()
Eric Dumazet <edumazet(a)google.com>
inet: switch IP ID generator to siphash
Raju Rangoju <rajur(a)chelsio.com>
cxgb4: offload VLAN flows regardless of VLAN ethtype
Jarod Wilson <jarod(a)redhat.com>
bonding/802.3ad: fix slave link initialization transition states
-------------
Diffstat:
Makefile | 4 +-
drivers/crypto/vmx/ghash.c | 212 +++++++++------------
drivers/net/bonding/bond_main.c | 15 +-
drivers/net/dsa/mv88e6xxx/chip.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 2 +-
.../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 5 +-
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 11 ++
drivers/net/ethernet/freescale/fec_main.c | 2 +-
drivers/net/ethernet/marvell/mvneta.c | 4 +-
drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 10 +-
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 ++
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 6 +-
.../net/ethernet/mellanox/mlxsw/spectrum_acl_erp.c | 11 +-
drivers/net/ethernet/realtek/r8169.c | 3 +
.../net/ethernet/stmicro/stmmac/stmmac_ethtool.c | 4 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 8 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c | 3 +-
drivers/net/phy/marvell10g.c | 13 ++
drivers/net/usb/usbnet.c | 6 +
drivers/xen/xen-pciback/pciback_ops.c | 2 -
include/linux/siphash.h | 5 +
include/net/netns/ipv4.h | 2 +
include/uapi/linux/tipc_config.h | 10 +-
net/core/dev.c | 2 +-
net/core/skbuff.c | 6 +-
net/ipv4/igmp.c | 47 +++--
net/ipv4/ip_output.c | 4 +-
net/ipv4/route.c | 12 +-
net/ipv6/ip6_output.c | 4 +-
net/ipv6/output_core.c | 30 +--
net/ipv6/raw.c | 2 +
net/ipv6/route.c | 6 +
net/llc/llc_output.c | 2 +
net/sched/act_api.c | 3 +-
net/tipc/core.c | 32 ++--
net/tipc/subscr.h | 5 +-
net/tipc/topsrv.c | 14 +-
net/tls/tls_device.c | 9 +-
41 files changed, 313 insertions(+), 245 deletions(-)
Commit 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in
/proc/PID/stat") stopped reporting eip/esp and commit fd7d56270b52
("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
reintroduced the feature to fix a regression with userspace core dump
handlers (such as minicoredumper).
Because PF_DUMPCORE is only set for the primary thread, this didn't fix
the original problem for secondary threads. Allow reporting the eip/esp
for all threads by checking for PF_EXITING as well. This is set for all
the other threads when they are killed. coredump_wait() waits for all
the tasks to become inactive before proceeding to invoke a core dumper.
Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping")
Reported-by: Jan Luebbe <jlu(a)pengutronix.de>
Signed-off-by: John Ogness <john.ogness(a)linutronix.de>
---
This is a rework of Jan's v1 patch that allows accessing eip/esp of all
the threads without risk of the task still executing on a CPU.
The code chagnes are the same as v2. With v3 I included a "Fixes" tag,
fixed a typo in the commit message, and Cc'd stable.
fs/proc/array.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/proc/array.c b/fs/proc/array.c
index 2edbb657f859..55180501b915 100644
--- a/fs/proc/array.c
+++ b/fs/proc/array.c
@@ -462,7 +462,7 @@ static int do_task_stat(struct seq_file *m, struct pid_namespace *ns,
* a program is not able to use ptrace(2) in that case. It is
* safe because the task has stopped executing permanently.
*/
- if (permitted && (task->flags & PF_DUMPCORE)) {
+ if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) {
if (try_get_task_stack(task)) {
eip = KSTK_EIP(task);
esp = KSTK_ESP(task);
--
2.11.0
The patch titled
Subject: mm, compaction: make sure we isolate a valid PFN
has been removed from the -mm tree. Its filename was
mm-compaction-make-sure-we-isolate-a-valid-pfn.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Subject: mm, compaction: make sure we isolate a valid PFN
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).
Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page. This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
[0000000000000008] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
...
CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : set_pfnblock_flags_mask+0x58/0xe8
lr : compaction_alloc+0x300/0x950
[...]
Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
Call trace:
set_pfnblock_flags_mask+0x58/0xe8
compaction_alloc+0x300/0x950
migrate_pages+0x1a4/0xbb0
compact_zone+0x750/0xde8
compact_zone_order+0xd8/0x118
try_to_compact_pages+0xb4/0x290
__alloc_pages_direct_compact+0x84/0x1e0
__alloc_pages_nodemask+0x5e0/0xe18
alloc_pages_vma+0x1cc/0x210
do_huge_pmd_anonymous_page+0x108/0x7c8
__handle_mm_fault+0xdd4/0x1190
handle_mm_fault+0x114/0x1c0
__get_user_pages+0x198/0x3c0
get_user_pages_unlocked+0xb4/0x1d8
__gfn_to_pfn_memslot+0x12c/0x3b8
gfn_to_pfn_prot+0x4c/0x60
kvm_handle_guest_abort+0x4b0/0xcd8
handle_exit+0x140/0x1b8
kvm_arch_vcpu_ioctl_run+0x260/0x768
kvm_vcpu_ioctl+0x490/0x898
do_vfs_ioctl+0xc4/0x898
ksys_ioctl+0x8c/0xa0
__arm64_sys_ioctl+0x28/0x38
el0_svc_common+0x74/0x118
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
Code: f8607840 f100001f 8b011401 9a801020 (f9400400)
---[ end trace af6a35219325a9b6 ]---
The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.
This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@a…
Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Reported-by: Marc Zyngier <marc.zyngier(a)arm.com>
Reviewed-by: Mel Gorman <mgorman(a)techsingularity.net>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Marc Zyngier <marc.zyngier(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/compaction.c~mm-compaction-make-sure-we-isolate-a-valid-pfn
+++ a/mm/compaction.c
@@ -1399,7 +1399,7 @@ fast_isolate_freepages(struct compact_co
page = pfn_to_page(highest);
cc->free_pfn = highest;
} else {
- if (cc->direct_compaction) {
+ if (cc->direct_compaction && pfn_valid(min_pfn)) {
page = pfn_to_page(min_pfn);
cc->free_pfn = min_pfn;
}
_
Patches currently in -mm which might be from suzuki.poulose(a)arm.com are
The patch titled
Subject: kernel/signal.c: trace_signal_deliver when signal_group_exit
has been removed from the -mm tree. Its filename was
signal-trace_signal_deliver-when-signal_group_exit.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Zhenliang Wei <weizhenliang(a)huawei.com>
Subject: kernel/signal.c: trace_signal_deliver when signal_group_exit
In the fixes commit, removing SIGKILL from each thread signal mask and
executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the
SIGKILL signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal" after
executing sigdelset.
Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info.
Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com
Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT")
Signed-off-by: Zhenliang Wei <weizhenliang(a)huawei.com>
Reviewed-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: Eric W. Biederman <ebiederm(a)xmission.com>
Cc: Ivan Delalande <colona(a)arista.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Deepa Dinamani <deepa.kernel(a)gmail.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/signal.c | 2 ++
1 file changed, 2 insertions(+)
--- a/kernel/signal.c~signal-trace_signal_deliver-when-signal_group_exit
+++ a/kernel/signal.c
@@ -2485,6 +2485,8 @@ relock:
if (signal_group_exit(signal)) {
ksig->info.si_signo = signr = SIGKILL;
sigdelset(¤t->pending.signal, SIGKILL);
+ trace_signal_deliver(SIGKILL, SEND_SIG_NOINFO,
+ &sighand->action[SIGKILL - 1]);
recalc_sigpending();
goto fatal;
}
_
Patches currently in -mm which might be from weizhenliang(a)huawei.com are
The patch titled
Subject: kasan: initialize tag to 0xff in __kasan_kmalloc
has been removed from the -mm tree. Its filename was
kasan-initialize-tag-to-0xff-in-__kasan_kmalloc.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Nathan Chancellor <natechancellor(a)gmail.com>
Subject: kasan: initialize tag to 0xff in __kasan_kmalloc
When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang
warns:
mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when
used here [-Wuninitialized]
kasan_unpoison_shadow(set_tag(object, tag), size);
^~~
set_tag ignores tag in this configuration but clang doesn't realize it at
this point in its pipeline, as it points to arch_kasan_set_tag as being
the point where it is used, which will later be expanded to (void
*)(object) without a use of tag. Initialize tag to 0xff, as it removes
this warning and doesn't change the meaning of the code.
Link: https://github.com/ClangBuiltLinux/linux/issues/465
Link: http://lkml.kernel.org/r/20190502163057.6603-1-natechancellor@gmail.com
Fixes: 7f94ffbc4c6a ("kasan: add hooks implementation for tag-based mode")
Signed-off-by: Nathan Chancellor <natechancellor(a)gmail.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)google.com>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/kasan/common.c~kasan-initialize-tag-to-0xff-in-__kasan_kmalloc
+++ a/mm/kasan/common.c
@@ -464,7 +464,7 @@ static void *__kasan_kmalloc(struct kmem
{
unsigned long redzone_start;
unsigned long redzone_end;
- u8 tag;
+ u8 tag = 0xff;
if (gfpflags_allow_blocking(flags))
quarantine_reduce();
_
Patches currently in -mm which might be from natechancellor(a)gmail.com are
The patch titled
Subject: memcg: make it work on sparse non-0-node systems
has been removed from the -mm tree. Its filename was
memcg-make-it-work-on-sparse-non-0-node-systems.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jiri Slaby <jslaby(a)suse.cz>
Subject: memcg: make it work on sparse non-0-node systems
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz
Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists")
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Suggested-by: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Raghavendra K T <raghavendra.kt(a)linux.vnet.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/list_lru.h | 1 +
mm/list_lru.c | 8 +++-----
2 files changed, 4 insertions(+), 5 deletions(-)
--- a/include/linux/list_lru.h~memcg-make-it-work-on-sparse-non-0-node-systems
+++ a/include/linux/list_lru.h
@@ -54,6 +54,7 @@ struct list_lru {
#ifdef CONFIG_MEMCG_KMEM
struct list_head list;
int shrinker_id;
+ bool memcg_aware;
#endif
};
--- a/mm/list_lru.c~memcg-make-it-work-on-sparse-non-0-node-systems
+++ a/mm/list_lru.c
@@ -38,11 +38,7 @@ static int lru_shrinker_id(struct list_l
static inline bool list_lru_memcg_aware(struct list_lru *lru)
{
- /*
- * This needs node 0 to be always present, even
- * in the systems supporting sparse numa ids.
- */
- return !!lru->node[0].memcg_lrus;
+ return lru->memcg_aware;
}
static inline struct list_lru_one *
@@ -452,6 +448,8 @@ static int memcg_init_list_lru(struct li
{
int i;
+ lru->memcg_aware = memcg_aware;
+
if (!memcg_aware)
return 0;
_
Patches currently in -mm which might be from jslaby(a)suse.cz are
The patch titled
Subject: mm, memcg: consider subtrees in memory.events
has been removed from the -mm tree. Its filename was
mm-consider-subtrees-in-memoryevents.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Chris Down <chris(a)chrisdown.name>
Subject: mm, memcg: consider subtrees in memory.events
memory.stat and other files already consider subtrees in their output, and
we should too in order to not present an inconsistent interface.
The current situation is fairly confusing, because people interacting with
cgroups expect hierarchical behaviour in the vein of memory.stat,
cgroup.events, and other files. For example, this causes confusion when
debugging reclaim events under low, as currently these always read "0" at
non-leaf memcg nodes, which frequently causes people to misdiagnose breach
behaviour. The same confusion applies to other counters in this file when
debugging issues.
Aggregation is done at write time instead of at read-time since these
counters aren't hot (unlike memory.stat which is per-page, so it does it
at read time), and it makes sense to bundle this with the file
notifications.
After this patch, events are propagated up the hierarchy:
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 0
oom 0
oom_kill 0
[root@ktst ~]# systemd-run -p MemoryMax=1 true
Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 7
oom 1
oom_kill 1
As this is a change in behaviour, this can be reverted to the old
behaviour by mounting with the `memory_localevents' flag set. However, we
use the new behaviour by default as there's a lack of evidence that there
are any current users of memory.events that would find this change
undesirable.
akpm: this is a behaviour change, so Cc:stable. THis is so that
forthcoming distros which use cgroup v2 are more likely to pick up the
revised behaviour.
Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Dennis Zhou <dennis(a)kernel.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++
include/linux/cgroup-defs.h | 5 +++++
include/linux/memcontrol.h | 10 ++++++++--
kernel/cgroup/cgroup.c | 16 ++++++++++++++--
4 files changed, 36 insertions(+), 4 deletions(-)
--- a/Documentation/admin-guide/cgroup-v2.rst~mm-consider-subtrees-in-memoryevents
+++ a/Documentation/admin-guide/cgroup-v2.rst
@@ -177,6 +177,15 @@ cgroup v2 currently supports the followi
ignored on non-init namespace mounts. Please refer to the
Delegation section for details.
+ memory_localevents
+
+ Only populate memory.events with data for the current cgroup,
+ and not any subtrees. This is legacy behaviour, the default
+ behaviour without this option is to include subtree counts.
+ This option is system wide and can only be set on mount or
+ modified through remount from the init namespace. The mount
+ option is ignored on non-init namespace mounts.
+
Organizing Processes and Threads
--------------------------------
--- a/include/linux/cgroup-defs.h~mm-consider-subtrees-in-memoryevents
+++ a/include/linux/cgroup-defs.h
@@ -89,6 +89,11 @@ enum {
* Enable cpuset controller in v1 cgroup to use v2 behavior.
*/
CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
+
+ /*
+ * Enable legacy local memory.events.
+ */
+ CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5),
};
/* cftype->flags */
--- a/include/linux/memcontrol.h~mm-consider-subtrees-in-memoryevents
+++ a/include/linux/memcontrol.h
@@ -737,8 +737,14 @@ static inline void count_memcg_event_mm(
static inline void memcg_memory_event(struct mem_cgroup *memcg,
enum memcg_memory_event event)
{
- atomic_long_inc(&memcg->memory_events[event]);
- cgroup_file_notify(&memcg->events_file);
+ do {
+ atomic_long_inc(&memcg->memory_events[event]);
+ cgroup_file_notify(&memcg->events_file);
+
+ if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+ break;
+ } while ((memcg = parent_mem_cgroup(memcg)) &&
+ !mem_cgroup_is_root(memcg));
}
static inline void memcg_memory_event_mm(struct mm_struct *mm,
--- a/kernel/cgroup/cgroup.c~mm-consider-subtrees-in-memoryevents
+++ a/kernel/cgroup/cgroup.c
@@ -1810,11 +1810,13 @@ int cgroup_show_path(struct seq_file *sf
enum cgroup2_param {
Opt_nsdelegate,
+ Opt_memory_localevents,
nr__cgroup2_params
};
static const struct fs_parameter_spec cgroup2_param_specs[] = {
- fsparam_flag ("nsdelegate", Opt_nsdelegate),
+ fsparam_flag("nsdelegate", Opt_nsdelegate),
+ fsparam_flag("memory_localevents", Opt_memory_localevents),
{}
};
@@ -1837,6 +1839,9 @@ static int cgroup2_parse_param(struct fs
case Opt_nsdelegate:
ctx->flags |= CGRP_ROOT_NS_DELEGATE;
return 0;
+ case Opt_memory_localevents:
+ ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
+ return 0;
}
return -EINVAL;
}
@@ -1848,6 +1853,11 @@ static void apply_cgroup_root_flags(unsi
cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
else
cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
+
+ if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+ cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
+ else
+ cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS;
}
}
@@ -1855,6 +1865,8 @@ static int cgroup_show_options(struct se
{
if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
seq_puts(seq, ",nsdelegate");
+ if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
+ seq_puts(seq, ",memory_localevents");
return 0;
}
@@ -6325,7 +6337,7 @@ static struct kobj_attribute cgroup_dele
static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
char *buf)
{
- return snprintf(buf, PAGE_SIZE, "nsdelegate\n");
+ return snprintf(buf, PAGE_SIZE, "nsdelegate\nmemory_localevents\n");
}
static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
_
Patches currently in -mm which might be from chris(a)chrisdown.name are
mm-proportional-memorylowmin-reclaim.patch
mm-make-memoryemin-the-baseline-for-utilisation-determination.patch
mm-make-memoryemin-the-baseline-for-utilisation-determination-fix.patch
This is a note to let you know that I've just added the patch titled
xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From f7fac17ca925faa03fc5eb854c081a24075f8bad Mon Sep 17 00:00:00 2001
From: Andrey Smirnov <andrew.smirnov(a)gmail.com>
Date: Wed, 22 May 2019 14:34:01 +0300
Subject: xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
Xhci_handshake() implements the algorithm already captured by
readl_poll_timeout_atomic(). Convert the former to use the latter to
avoid repetition.
Turned out this patch also fixes a bug on the AMD Stoneyridge platform
where usleep(1) sometimes takes over 10ms.
This means a 5 second timeout can easily take over 15 seconds which will
trigger the watchdog and reboot the system.
[Add info about patch fixing a bug to commit message -Mathias]
Signed-off-by: Andrey Smirnov <andrew.smirnov(a)gmail.com>
Tested-by: Raul E Rangel <rrangel(a)chromium.org>
Reviewed-by: Raul E Rangel <rrangel(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/host/xhci.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 048a675bbc52..20db378a6012 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -9,6 +9,7 @@
*/
#include <linux/pci.h>
+#include <linux/iopoll.h>
#include <linux/irq.h>
#include <linux/log2.h>
#include <linux/module.h>
@@ -52,7 +53,6 @@ static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
return false;
}
-/* TODO: copied from ehci-hcd.c - can this be refactored? */
/*
* xhci_handshake - spin reading hc until handshake completes or fails
* @ptr: address of hc register to be read
@@ -69,18 +69,16 @@ static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec)
{
u32 result;
+ int ret;
- do {
- result = readl(ptr);
- if (result == ~(u32)0) /* card removed */
- return -ENODEV;
- result &= mask;
- if (result == done)
- return 0;
- udelay(1);
- usec--;
- } while (usec > 0);
- return -ETIMEDOUT;
+ ret = readl_poll_timeout_atomic(ptr, result,
+ (result & mask) == done ||
+ result == U32_MAX,
+ 1, usec);
+ if (result == U32_MAX) /* card removed */
+ return -ENODEV;
+
+ return ret;
}
/*
--
2.21.0