The patch titled
Subject: seq_file: fix problem when seeking mid-record
has been added to the -mm tree. Its filename is
seq_file-fix-problem-when-seeking-mid-record.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/seq_file-fix-problem-when-seeking-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/seq_file-fix-problem-when-seeking-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: NeilBrown <neilb(a)suse.com>
Subject: seq_file: fix problem when seeking mid-record
If you use lseek or similar (e.g. pread) to access a location in a
seq_file file that is within a record, rather than at a record boundary,
then the first read will return the remainder of the record, and the
second read will return the whole of that same record (instead of the next
record). When seeking to a record boundary, the next record is correctly
returned.
This bug was introduced by a recent patch (identified below). Before that
patch, seq_read() would increment m->index when the last of the buffer was
returned (m->count == 0). After that patch, we rely on ->next to
increment m->index after filling the buffer - but there was one place
where that didn't happen.
Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb(a)suse.com>
Reported-by: Sergei Turchanov <turchanov(a)farpost.com>
Tested-by: Sergei Turchanov <turchanov(a)farpost.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Markus Elfring <Markus.Elfring(a)web.de>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/seq_file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/seq_file.c~seq_file-fix-problem-when-seeking-mid-record
+++ a/fs/seq_file.c
@@ -119,6 +119,7 @@ static int traverse(struct seq_file *m,
}
if (seq_has_overflowed(m))
goto Eoverflow;
+ p = m->op->next(m, p, &m->index);
if (pos + m->count > offset) {
m->from = offset - pos;
m->count -= m->from;
@@ -126,7 +127,6 @@ static int traverse(struct seq_file *m,
}
pos += m->count;
m->count = 0;
- p = m->op->next(m, p, &m->index);
if (pos == offset)
break;
}
_
Patches currently in -mm which might be from neilb(a)suse.com are
seq_file-fix-problem-when-seeking-mid-record.patch
The patch titled
Subject: memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
has been added to the -mm tree. Its filename is
memcg-oom-dont-require-__gfp_fs-when-invoking-memcg-oom-killer.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/memcg-oom-dont-require-__gfp_fs-wh…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/memcg-oom-dont-require-__gfp_fs-wh…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Subject: memcg, oom: don't require __GFP_FS when invoking memcg OOM killer
Masoud Sharbiani noticed that commit 29ef680ae7c21110 ("memcg, oom: move
out_of_memory back to the charge path") broke memcg OOM called from
__xfs_filemap_fault() path. It turned out that try_charge() is retrying
forever without making forward progress because mem_cgroup_oom(GFP_NOFS)
cannot invoke the OOM killer due to commit 3da88fb3bacfaa33 ("mm, oom:
move GFP_NOFS check to out_of_memory").
Allowing forced charge due to being unable to invoke memcg OOM killer will
lead to global OOM situation. Also, just returning -ENOMEM will be risky
because OOM path is lost and some paths (e.g. get_user_pages()) will leak
-ENOMEM. Therefore, invoking memcg OOM killer (despite GFP_NOFS) will be
the only choice we can choose for now.
Until 29ef680ae7c21110, we were able to invoke memcg OOM killer when
GFP_KERNEL reclaim failed [1]. But since 29ef680ae7c21110, we need to
invoke memcg OOM killer when GFP_NOFS reclaim failed [2]. Although in the
past we did invoke memcg OOM killer for GFP_NOFS [3], we might get
pre-mature memcg OOM reports due to this patch.
[1]
leaker invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, oom_score_adj=0
CPU: 0 PID: 2746 Comm: leaker Not tainted 4.18.0+ #19
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Call Trace:
dump_stack+0x63/0x88
dump_header+0x67/0x27a
? mem_cgroup_scan_tasks+0x91/0xf0
oom_kill_process+0x210/0x410
out_of_memory+0x10a/0x2c0
mem_cgroup_out_of_memory+0x46/0x80
mem_cgroup_oom_synchronize+0x2e4/0x310
? high_work_func+0x20/0x20
pagefault_out_of_memory+0x31/0x76
mm_fault_error+0x55/0x115
? handle_mm_fault+0xfd/0x220
__do_page_fault+0x433/0x4e0
do_page_fault+0x22/0x30
? page_fault+0x8/0x30
page_fault+0x1e/0x30
RIP: 0033:0x4009f0
Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
RSP: 002b:00007ffe29ae96f0 EFLAGS: 00010206
RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001ce1000
RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000
RBP: 000000000000000c R08: 0000000000000000 R09: 00007f94be09220d
R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0
R13: 0000000000000003 R14: 00007f949d845000 R15: 0000000002800000
Task in /leaker killed as a result of limit of /leaker
memory: usage 524288kB, limit 524288kB, failcnt 158965
memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
kmem: usage 2016kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /leaker: cache:844KB rss:521136KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:132KB writeback:0KB inactive_anon:0KB active_anon:521224KB inactive_file:1012KB active_file:8KB unevictable:0KB
Memory cgroup out of memory: Kill process 2746 (leaker) score 998 or sacrifice child
Killed process 2746 (leaker) total-vm:536704kB, anon-rss:521176kB, file-rss:1208kB, shmem-rss:0kB
oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[2]
leaker invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), nodemask=(null), order=0, oom_score_adj=0
CPU: 1 PID: 2746 Comm: leaker Not tainted 4.18.0+ #20
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Call Trace:
dump_stack+0x63/0x88
dump_header+0x67/0x27a
? mem_cgroup_scan_tasks+0x91/0xf0
oom_kill_process+0x210/0x410
out_of_memory+0x109/0x2d0
mem_cgroup_out_of_memory+0x46/0x80
try_charge+0x58d/0x650
? __radix_tree_replace+0x81/0x100
mem_cgroup_try_charge+0x7a/0x100
__add_to_page_cache_locked+0x92/0x180
add_to_page_cache_lru+0x4d/0xf0
iomap_readpages_actor+0xde/0x1b0
? iomap_zero_range_actor+0x1d0/0x1d0
iomap_apply+0xaf/0x130
iomap_readpages+0x9f/0x150
? iomap_zero_range_actor+0x1d0/0x1d0
xfs_vm_readpages+0x18/0x20 [xfs]
read_pages+0x60/0x140
__do_page_cache_readahead+0x193/0x1b0
ondemand_readahead+0x16d/0x2c0
page_cache_async_readahead+0x9a/0xd0
filemap_fault+0x403/0x620
? alloc_set_pte+0x12c/0x540
? _cond_resched+0x14/0x30
__xfs_filemap_fault+0x66/0x180 [xfs]
xfs_filemap_fault+0x27/0x30 [xfs]
__do_fault+0x19/0x40
__handle_mm_fault+0x8e8/0xb60
handle_mm_fault+0xfd/0x220
__do_page_fault+0x238/0x4e0
do_page_fault+0x22/0x30
? page_fault+0x8/0x30
page_fault+0x1e/0x30
RIP: 0033:0x4009f0
Code: 03 00 00 00 e8 71 fd ff ff 48 83 f8 ff 49 89 c6 74 74 48 89 c6 bf c0 0c 40 00 31 c0 e8 69 fd ff ff 45 85 ff 7e 21 31 c9 66 90 <41> 0f be 14 0e 01 d3 f7 c1 ff 0f 00 00 75 05 41 c6 04 0e 2a 48 83
RSP: 002b:00007ffda45c9290 EFLAGS: 00010206
RAX: 000000000000001b RBX: 0000000000000000 RCX: 0000000001a1e000
RDX: 0000000000000000 RSI: 000000007fffffe5 RDI: 0000000000000000
RBP: 000000000000000c R08: 0000000000000000 R09: 00007f6d061ff20d
R10: 0000000000000002 R11: 0000000000000246 R12: 00000000000186a0
R13: 0000000000000003 R14: 00007f6ce59b2000 R15: 0000000002800000
Task in /leaker killed as a result of limit of /leaker
memory: usage 524288kB, limit 524288kB, failcnt 7221
memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
kmem: usage 1944kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /leaker: cache:3632KB rss:518232KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:518408KB inactive_file:3908KB active_file:12KB unevictable:0KB
Memory cgroup out of memory: Kill process 2746 (leaker) score 992 or sacrifice child
Killed process 2746 (leaker) total-vm:536704kB, anon-rss:518264kB, file-rss:1188kB, shmem-rss:0kB
oom_reaper: reaped process 2746 (leaker), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[3]
leaker invoked oom-killer: gfp_mask=0x50, order=0, oom_score_adj=0
leaker cpuset=/ mems_allowed=0
CPU: 1 PID: 3206 Comm: leaker Not tainted 3.10.0-957.27.2.el7.x86_64 #1
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/13/2018
Call Trace:
[<ffffffffaf364147>] dump_stack+0x19/0x1b
[<ffffffffaf35eb6a>] dump_header+0x90/0x229
[<ffffffffaedbb456>] ? find_lock_task_mm+0x56/0xc0
[<ffffffffaee32a38>] ? try_get_mem_cgroup_from_mm+0x28/0x60
[<ffffffffaedbb904>] oom_kill_process+0x254/0x3d0
[<ffffffffaee36c36>] mem_cgroup_oom_synchronize+0x546/0x570
[<ffffffffaee360b0>] ? mem_cgroup_charge_common+0xc0/0xc0
[<ffffffffaedbc194>] pagefault_out_of_memory+0x14/0x90
[<ffffffffaf35d072>] mm_fault_error+0x6a/0x157
[<ffffffffaf3717c8>] __do_page_fault+0x3c8/0x4f0
[<ffffffffaf371925>] do_page_fault+0x35/0x90
[<ffffffffaf36d768>] page_fault+0x28/0x30
Task in /leaker killed as a result of limit of /leaker
memory: usage 524288kB, limit 524288kB, failcnt 20628
memory+swap: usage 524288kB, limit 9007199254740988kB, failcnt 0
kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /leaker: cache:840KB rss:523448KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:523448KB inactive_file:464KB active_file:376KB unevictable:0KB
Memory cgroup out of memory: Kill process 3206 (leaker) score 970 or sacrifice child
Killed process 3206 (leaker) total-vm:536692kB, anon-rss:523304kB, file-rss:412kB, shmem-rss:0kB
Bisected by Masoud Sharbiani.
Link: http://lkml.kernel.org/r/cbe54ed1-b6ba-a056-8899-2dc42526371d@i-love.sakura…
Fixes: 3da88fb3bacfaa33 ("mm, oom: move GFP_NOFS check to out_of_memory") [necessary after 29ef680ae7c21110]
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Reported-by: Masoud Sharbiani <msharbiani(a)apple.com>
Tested-by: Masoud Sharbiani <msharbiani(a)apple.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org> [4.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/oom_kill.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/oom_kill.c~memcg-oom-dont-require-__gfp_fs-when-invoking-memcg-oom-killer
+++ a/mm/oom_kill.c
@@ -1068,9 +1068,10 @@ bool out_of_memory(struct oom_control *o
* The OOM killer does not compensate for IO-less reclaim.
* pagefault_out_of_memory lost its gfp context so we have to
* make sure exclude 0 mask - all other users should have at least
- * ___GFP_DIRECT_RECLAIM to get here.
+ * ___GFP_DIRECT_RECLAIM to get here. But mem_cgroup_oom() has to
+ * invoke the OOM killer even if it is a GFP_NOFS allocation.
*/
- if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS))
+ if (oc->gfp_mask && !(oc->gfp_mask & __GFP_FS) && !is_memcg_oom(oc))
return true;
/*
_
Patches currently in -mm which might be from penguin-kernel(a)i-love.sakura.ne.jp are
memcg-oom-dont-require-__gfp_fs-when-invoking-memcg-oom-killer.patch
kernel-hung_taskc-monitor-killed-tasks.patch
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b617158dc096709d8600c53b6052144d12b89fab Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet(a)google.com>
Date: Fri, 19 Jul 2019 11:52:33 -0700
Subject: [PATCH] tcp: be more careful in tcp_fragment()
Some applications set tiny SO_SNDBUF values and expect
TCP to just work. Recent patches to address CVE-2019-11478
broke them in case of losses, since retransmits might
be prevented.
We should allow these flows to make progress.
This patch allows the first and last skb in retransmit queue
to be split even if memory limits are hit.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size). Note this allowance was already present
in stable backports for kernels < 4.15
Note for < 4.15 backports :
tcp_rtx_queue_tail() will probably look like :
static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
{
struct sk_buff *skb = tcp_send_head(sk);
return skb ? tcp_write_queue_prev(sk, skb) : tcp_write_queue_tail(sk);
}
Fixes: f070ef2ac667 ("tcp: tcp_fragment() should apply sane memory limits")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Andrew Prout <aprout(a)ll.mit.edu>
Tested-by: Andrew Prout <aprout(a)ll.mit.edu>
Tested-by: Jonathan Lemon <jonathan.lemon(a)gmail.com>
Tested-by: Michal Kubecek <mkubecek(a)suse.cz>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Acked-by: Yuchung Cheng <ycheng(a)google.com>
Acked-by: Christoph Paasch <cpaasch(a)apple.com>
Cc: Jonathan Looney <jtl(a)netflix.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f42d300f0cfa..e5cf514ba118 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1709,6 +1709,11 @@ static inline struct sk_buff *tcp_rtx_queue_head(const struct sock *sk)
return skb_rb_first(&sk->tcp_rtx_queue);
}
+static inline struct sk_buff *tcp_rtx_queue_tail(const struct sock *sk)
+{
+ return skb_rb_last(&sk->tcp_rtx_queue);
+}
+
static inline struct sk_buff *tcp_write_queue_head(const struct sock *sk)
{
return skb_peek(&sk->sk_write_queue);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 4af1f5dae9d3..6e4afc48d7bb 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1288,6 +1288,7 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
struct tcp_sock *tp = tcp_sk(sk);
struct sk_buff *buff;
int nsize, old_factor;
+ long limit;
int nlen;
u8 flags;
@@ -1298,8 +1299,16 @@ int tcp_fragment(struct sock *sk, enum tcp_queue tcp_queue,
if (nsize < 0)
nsize = 0;
- if (unlikely((sk->sk_wmem_queued >> 1) > sk->sk_sndbuf &&
- tcp_queue != TCP_FRAG_IN_WRITE_QUEUE)) {
+ /* tcp_sendmsg() can overshoot sk_wmem_queued by one full size skb.
+ * We need some allowance to not penalize applications setting small
+ * SO_SNDBUF values.
+ * Also allow first and last skb in retransmit queue to be split.
+ */
+ limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE);
+ if (unlikely((sk->sk_wmem_queued >> 1) > limit &&
+ tcp_queue != TCP_FRAG_IN_WRITE_QUEUE &&
+ skb != tcp_rtx_queue_head(sk) &&
+ skb != tcp_rtx_queue_tail(sk))) {
NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG);
return -ENOMEM;
}
Commit daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in
platform_get_irq()") broke the Embedded Controller driver on most LPC
Chromebooks (i.e., most x86 Chromebooks), because cros_ec_lpc expects
platform_get_irq() to return -ENXIO for non-existent IRQs.
Unfortunately, acpi_dev_gpio_irq_get() doesn't follow this convention
and returns -ENOENT instead. So we get this error from cros_ec_lpc:
couldn't retrieve IRQ number (-2)
I see a variety of drivers that treat -ENXIO specially, so rather than
fix all of them, let's fix up the API to restore its previous behavior.
I reported this on v2 of this patch:
https://lore.kernel.org/lkml/20190220180538.GA42642@google.com/
but apparently the patch had already been merged before v3 got sent out:
https://lore.kernel.org/lkml/20190221193429.161300-1-egranata@chromium.org/
and the result is that the bug landed and remains unfixed.
I differ from the v3 patch by:
* allowing for ret==0, even though acpi_dev_gpio_irq_get() specifically
documents (and enforces) that 0 is not a valid return value (noted on
the v3 review)
* adding a small comment
Reported-by: Brian Norris <briannorris(a)chromium.org>
Reported-by: Salvatore Bellizzi <salvatore.bellizzi(a)linux.seppia.net>
Cc: Enrico Granata <egranata(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Fixes: daaef255dc96 ("driver: platform: Support parsing GpioInt 0 in platform_get_irq()")
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
---
Side note: it might have helped alleviate some of this pain if there
were email notifications to the mailing list when a patch gets applied.
I didn't realize (and I'm not sure if Enrico did) that v2 was already
merged by the time I noted its mistakes. If I had known, I would have
suggested a follow-up patch, not a v3.
I know some maintainers' "tip bots" do this, but not all apparently.
drivers/base/platform.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 506a0175a5a7..ec974ba9c0c4 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -157,8 +157,13 @@ int platform_get_irq(struct platform_device *dev, unsigned int num)
* the device will only expose one IRQ, and this fallback
* allows a common code path across either kind of resource.
*/
- if (num == 0 && has_acpi_companion(&dev->dev))
- return acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
+ if (num == 0 && has_acpi_companion(&dev->dev)) {
+ int ret = acpi_dev_gpio_irq_get(ACPI_COMPANION(&dev->dev), num);
+
+ /* Our callers expect -ENXIO for missing IRQs. */
+ if (ret >= 0 || ret == -EPROBE_DEFER)
+ return ret;
+ }
return -ENXIO;
#endif
--
2.22.0.709.g102302147b-goog