Hello,
I request the following patch from v4.10-rc1 to get cherry-picked into
"stable/linux-4.9.y":
> commit f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> Author: Alexander Duyck <alexander.h.duyck(a)intel.com>
> Date: Tue Oct 25 16:08:46 2016 -0700
>
> i40e: Be much more verbose about what we can and cannot offload
>
> This change makes it so that we are much more robust about defining what we
> can and cannot offload. Previously we were just checking for the L4 tunnel
> header length, however there are other fields we should be verifying as
> there are multiple scenarios in which we cannot perform hardware offloads.
>
> In addition the device only supports GSO as long as the MSS is 64 or
> greater. We were not checking this so an MSS less than that was resulting
> in Tx hangs.
>
> Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b
> Signed-off-by: Alexander Duyck <alexander.h.duyck(a)intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers(a)intel.com>
Debian had this old Bug
<https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=892105> reported
against 4.9.82, which still exists in Debians old-stable 9 "Stretch"
current kernel 4.9.258, but also with latest stable 4.9.273.
Our environment
===============
- KVM server
- dual port i40e
- classic bridge with enp96s0f0
- VM attached to bridge via veth
- no VLANs
- no MacVLan
> # ethtool -i enp96s0f0
> driver: i40e
> version: 1.6.16-k
> firmware-version: 3.33 0x80000e48 1.1876.0
> expansion-rom-version:
> bus-info: 0000:60:00.0
> supports-statistics: yes
> supports-test: yes
> supports-eeprom-access: yes
> supports-register-dump: yes
> supports-priv-flags: ye
> # lspci -s 0000:60:00.0
> 60:00.0 Ethernet controller: Intel Corporation Ethernet Connection X722 for 10GBASE-T (rev 09)
Analysis
========
As soon as we start one of our "Ubuntu" images the bridge stops
receiving unicast packages for *all* VMs on that bridge.
- we still see outgoing traffic leaving the host, e.g. ARP requests
- "tcpdump -i enp96s0f0" shows no incoming unicast traffic, e.g. no ARP
response
- broadcast traffic passes the bridge
- VMs on the same bridge can communicate with each other
Most often I see the following error message after doing `dmesg -n 8`:
> [ +9,376367] i40e 0000:60:00.0: cleared PE_CRITERR
> [ +0,000252] i40e 0000:60:00.0: TX driver issue detected, PF reset issued
> [ +0,859912] i40e 0000:60:00.0: Error I40E_AQ_RC_EINVAL adding RX filters on PF, promiscuous mode forced on
In one case I've seen this also (don't know if it is relevant):
> [ 218.921466] i40e 0000:60:00.0 enp96s0f0: VSI_seid 390, Hung TX queue 43, tx_pending_hw: 1, NTC:0xa6, HWB: 0xa6, NTU: 0xa7, TAIL: 0xa7
> [ 218.921470] i40e 0000:60:00.0 enp96s0f0: VSI_seid 390, Issuing force_wb for TX queue 43, Interrupt Reg: 0x0
After that error the only way to reset this broken state it to reboot
the host. I've been unable to tear down the bridge and/or remove the
`i40e` driver, which most often crashes the Linux kernel (some other bug
on `ip link set enp96s0f0 nomaster`).
If you need more data I have a PCAP file, but I still don't know which
packet exactly triggers the bug.
The bugs seems to be fixed with 4.10.0 and I bisected it down to
> git bisect start '--' 'drivers/net/ethernet/intel/i40e'
> # new: [c470abd4fde40ea6a0846a2beab642a578c0b8cd] Linux 4.10
> git bisect new c470abd4fde40ea6a0846a2beab642a578c0b8cd
> # old: [69973b830859bc6529a7a0468ba0d80ee5117826] Linux 4.9
> git bisect old 69973b830859bc6529a7a0468ba0d80ee5117826
> # old: [13fd3f9cc3def8b276c7913ae4acbfa2653cb198] i40e: clear mac filter count on reset
> git bisect old 13fd3f9cc3def8b276c7913ae4acbfa2653cb198
> # new: [7ec9ba11b046b4b7fd768c366870ada60d409295] i40e: Driver prints log message on link speed change
> git bisect new 7ec9ba11b046b4b7fd768c366870ada60d409295
> # new: [0b7c8b5d5436317a5f4509e2a150c6cec017f348] i40e: fix trivial typo in naming of i40e_sync_filters_subtask
> git bisect new 0b7c8b5d5436317a5f4509e2a150c6cec017f348
> # new: [f114dca2533ca770aebebffb5ed56e5e7d1fb3fb] i40e: Be much more verbose about what we can and cannot offload
> git bisect new f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
> # old: [81fa7c97bebd6e1a52c4e059eeffe18df5b3f11f] i40e: Implementation of ERROR state for NVM update state machine
> git bisect old 81fa7c97bebd6e1a52c4e059eeffe18df5b3f11f
> # old: [3aa7b74dbeedfb32406fec70cfd76d797209e8c9] i40e: removed unreachable code
> git bisect old 3aa7b74dbeedfb32406fec70cfd76d797209e8c9
> # first new commit: [f114dca2533ca770aebebffb5ed56e5e7d1fb3fb] i40e: Be much more verbose about what we can and cannot offload
I used v4.10 as the basis and only bisected everything in
drivers/net/ethernet/intel/i40e/ as vanilla v4.9 and several other
versions between that and v4.10 crashed my host, so basically
git checkout v4.10
git checkout $hash -- drivers/net/ethernet/intel/i40e/
make all modules_install install
git checkout v4-10 -- drivers/net/ethernet/intel/i40e/
git bisect (old|new) $hash
I verified that cherry-picking f114dca2533ca770aebebffb5ed56e5e7d1fb3fb
on top of v4.9.273 fixes the problem and reverting it again shows the
problem again.
Philipp
--
Philipp Hahn
Open Source Software Engineer
Univention GmbH
be open.
Mary-Somerville-Str. 1
D-28359 Bremen
📞 +49-421-22232-57
🖶 +49-421-22232-99
✉️ hahn(a)univention.de
🌐 https://www.univention.de/
Geschäftsführer: Peter H. Ganten
HRB 20755 Amtsgericht Bremen
Steuer-Nr.: 71-597-02876
The standard printk() tries to flush the message to the console
immediately. It tries to take the console lock. If the lock is
already taken then the current owner is responsible for flushing
even the new message.
There is a small race window between checking whether a new message is
available and releasing the console lock. It is solved by re-checking
the state after releasing the console lock. If the check is positive
then console_unlock() tries to take the lock again and process the new
message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that
console_seq is not longer read atomically. As a result, the re-check might
be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under
the console lock. In the worst case, it will take the lock again only
to realized that the new message has already been proceed. But it
was possible even before.
Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock")
Cc: stable(a)vger.kernel.org # 5.13
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
---
kernel/printk/printk.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 142a58d124d9..87411084075e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2545,6 +2545,7 @@ void console_unlock(void)
bool do_cond_resched, retry;
struct printk_info info;
struct printk_record r;
+ u64 next_seq;
if (console_suspended) {
up_console_sem();
@@ -2654,8 +2655,10 @@ void console_unlock(void)
cond_resched();
}
- console_locked = 0;
+ /* Get consistent value of the next-to-be-used sequence number. */
+ next_seq = console_seq;
+ console_locked = 0;
up_console_sem();
/*
@@ -2664,7 +2667,7 @@ void console_unlock(void)
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- retry = prb_read_valid(prb, console_seq, NULL);
+ retry = prb_read_valid(prb, next_seq, NULL);
printk_safe_exit_irqrestore(flags);
if (retry && console_trylock())
--
2.26.2
Hi Greg,
Linus has taken in a group of mm/thp commits Cc stable today:
504e070dc08f mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
22061a1ffabd mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page()
31657170deaf mm/thp: fix page_address_in_vma() on file THP tails
494334e43c16 mm/thp: fix vma_address() if virtual address below file offset
732ed55823fc mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
3b77e8c8cde5 mm/thp: make is_huge_zero_pmd() safe and quicker
99fa8a48203d mm/thp: fix __split_huge_pmd_locked() on shmem migration entry
ffc90cbb2970 mm, thp: use head page in __migration_entry_wait()
and I expect some more to follow in a few days time (thanks Andrew).
No problem with the commits themselves, but I'm aware that some of them
have dependencies on other commits not yet in stable, which I have to
sort out for you now.
I'd prefer to avoid a deluge of "does not apply" messages, so ask you
please to hold off trying to merge these into stable trees for a few days:
I'll get back to you with what's needed for them to apply.
Thanks,
Hugh
If a user program uses userfaultfd on ranges of heap memory, it may
end up passing a tagged pointer to the kernel in the range.start
field of the UFFDIO_REGISTER ioctl. This can happen when using an
MTE-capable allocator, or on Android if using the Tagged Pointers
feature for MTE readiness [1].
When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field
of struct uffd_msg. However, from the application's perspective,
the tagged address *is* the memory address, so if the application
is unaware of memory tags, it may get confused by receiving an
address that is, from its point of view, outside of the bounds of the
allocation. We observed this behavior in the kselftest for userfaultfd
[2] but other applications could have the same problem.
Fix this by remembering which tag was used to originally register the
userfaultfd and passing that tag back in fault.address. In a future
enhancement, we may want to pass back the original fault address,
but like SA_EXPOSE_TAGBITS, this should be guarded by a flag.
[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0…
Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Cc: <stable(a)vger.kernel.org> # 5.4
---
Documentation/arm64/tagged-pointers.rst | 5 +++++
fs/userfaultfd.c | 17 +++++++++++------
include/linux/mm_types.h | 3 ++-
3 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst
index 19d284b70384..ec8e1f90b744 100644
--- a/Documentation/arm64/tagged-pointers.rst
+++ b/Documentation/arm64/tagged-pointers.rst
@@ -73,6 +73,11 @@ flag setting.
Non-zero tags are never preserved in sigcontext.fault_address
regardless of the SA_EXPOSE_TAGBITS flag setting.
+When using userfaultfd the address tag supplied in the range.start
+field of the UFFDIO_REGISTER ioctl is preserved and returned to
+userspace via the fault.address field of struct uffd_msg, and the
+tag of the original fault address is discarded.
+
The architecture prevents the use of a tagged PC, so the upper byte will
be set to a sign-extension of bit 55 on exception return.
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index dd7a6c62b56f..adb0f7d0638a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -110,15 +110,15 @@ static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode,
struct userfaultfd_wake_range *range = key;
int ret;
struct userfaultfd_wait_queue *uwq;
- unsigned long start, len;
+ unsigned long start, len, addr;
uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
ret = 0;
/* len == 0 means wake all */
start = range->start;
len = range->len;
- if (len && (start > uwq->msg.arg.pagefault.address ||
- start + len <= uwq->msg.arg.pagefault.address))
+ addr = untagged_addr(uwq->msg.arg.pagefault.address);
+ if (len && (start > addr || start + len <= addr))
goto out;
WRITE_ONCE(uwq->waken, true);
/*
@@ -480,8 +480,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function);
uwq.wq.private = current;
- uwq.msg = userfault_msg(vmf->address, vmf->flags, reason,
- ctx->features);
+ uwq.msg = userfault_msg(
+ vmf->address + vmf->vma->vm_userfaultfd_ctx.address_tag,
+ vmf->flags, reason, ctx->features);
uwq.ctx = ctx;
uwq.waken = false;
@@ -1287,7 +1288,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
unsigned long vm_flags, new_flags;
bool found;
bool basic_ioctls;
- unsigned long start, end, vma_end;
+ unsigned long address_tag, start, end, vma_end;
user_uffdio_register = (struct uffdio_register __user *) arg;
@@ -1313,6 +1314,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vm_flags |= VM_UFFD_MINOR;
}
+ address_tag = uffdio_register.range.start -
+ untagged_addr(uffdio_register.range.start);
+
ret = validate_range(mm, &uffdio_register.range.start,
uffdio_register.range.len);
if (ret)
@@ -1462,6 +1466,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
*/
vma->vm_flags = new_flags;
vma->vm_userfaultfd_ctx.ctx = ctx;
+ vma->vm_userfaultfd_ctx.address_tag = address_tag;
if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma))
hugetlb_unshare_all_pmds(vma);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 8f0fb62e8975..cb93e5b17896 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -286,9 +286,10 @@ struct vm_region {
};
#ifdef CONFIG_USERFAULTFD
-#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, })
+#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, 0, })
struct vm_userfaultfd_ctx {
struct userfaultfd_ctx *ctx;
+ unsigned long address_tag;
};
#else /* CONFIG_USERFAULTFD */
#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) {})
--
2.32.0.93.g670b81a890-goog
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 4.14.238 kernel.
All users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcdGkACgkQ3qZv95d3
LNycCg/+KmyrChXSyZIUeUT5UVNeEEaR1zjJLORWvuHehW9Q4hcnvRlXuEGO7q5g
U+8XHm8H+hIjkwfpLmim1Jn6hMTx9P8fZ0t45YXkkXmPBoMCSySEiPpAKMaDQPxs
EU5ULrkNtTXiengdR6w13ayuSMSIacPyXFmY20OdzAnhtiXwgv5s9HgRDkcDZomh
M/Fqux6b16fXDS12qUdI7RbNUyJnWkBOm9KpE/zAzyMQlj0r/NUs1T02JS8/gWww
SfwgECLfvoFPNuxXI2Y9WEKQ40xx6Hb/Fzatvs18WjwLC+SvUfwPKlOyP6sogq+N
2kn7eFygkZzyDCL8GYv3ZVd/O8Km4kMWWthehJ/SD6MEzbIVlZmjCISivYA9fZVf
rLqWAdymiRDhJqak1pwsW8fVOBxJJYLMUJy3tv5Zjcg1bBWPrE4VufsntZt9YVdr
evpLVKeOU8p6aCdy7FwN+b/dBPriZt6oesNkhO3OMfW73FBesp8bgdH4Rhs9ISkv
lXb0mjYmE2ZJ6S+vKnRuHVDoiOc8u9fZnQqrCwNzI0QFltCYU9AZGI3cbmVH5a/h
/1TFCC0uVpwWquFuszkfvyItjFRpZhhjYsMZOB7jzCR/EDJakx2S5qsCMNW3bdQt
W5emHwENNkmlLEuRQwuoAbwfOPqV8IK9nO5goh0Sg4G45OzXU20=
=KrR2
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 4.19.196 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcdF8ACgkQ3qZv95d3
LNyFdxAAnB9VMQ9XCMIW0KFIuc84l34tHe7bexocQHEGxWFhsJEnyNzGGcZxlY7G
XKlHXzh7QWPWuf82jt7fNSrctyAO6Kun3VF5ucGsixgCStb+0byHHL6F4N9eEdPH
v2h8HA46OTGShHBLsMsFsLLVY335WNSz+YnI3tXmHuMcgckUPDYoS2Zh0GFJTrsF
jW4YCheGMtcJHU80D8pUyEucb4GdyasNHkwW2d5vn7PhKNtr4Tv2rRjF2/EHWdog
RZ18hHzJRI/DwGrVwLXy8hcaVG8qp7b1Se5lRIZIFaYxRjXs7vpqFngabXoq8K1l
Q2OaQNvlKVDNBJnnSumu1mFxZgWdixQ3i9qLVhSvS+yAilP45cHZkkGee+6uchJE
vJc4WVV08NTfUTbPBMdCkHCfLRai06qdkOxf9l4YX5Phs86gi6SCbaKZMA5mUpWr
bw9KJi5UgQLYfETzagJmncCm+BqGEnVzMGn98kUgAcxLWIMerEg7AaIL0Jul0Gig
GElwsTo1O4byu8Ee0sF9nppW6iB+FlBqOVepwy+d7RocaihKyssm87Gwvoy+Py8b
4yH1MGFGP2Un0SyKGUz7t3RY/hD8glmILHs/l+1UTeb8RkgZDy3iw+Cbrcu0CWQq
JM86nlanh0UyklQ2lM51DIZqSWs5J6BHyJyuGNs6CrbjltlGbHo=
=TGQ2
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 5.4.129 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcdE8ACgkQ3qZv95d3
LNwMew/9HLxjRRYhe6jCrc66+H2ekkh7TqhkWrgr4zEzYg2k4Xb34MKsq6/jowkw
BuAQdY2R1Hrg3UgjmXMnYw6aCVOyOFbbsmK1A07CjW36a4IFrKaKLmdL0nKJl5Vr
V+y2v8f9uJfJ9ceN6VdUR7hH1bJLRuoTv0klKOBcGEvvInZiZ2qxzLujVCWYD8CF
a8xwvtzt/QWAsPrfMFPa6voLZkLPfNu5sYiqJsboXyt7a9XJYQZ1sJBDfb3HK+uo
c0rPcrbQTthmjYmmfFpU6EfBpRBqHui3aWPOCR4OFRtw/1KcB3NIUqkNyTLXxIWB
7OSfLESDXTKCd+RX+tctOCaIaMtRWT/o7RqQy/knI6VXpJk42o/w9tmMVxIU1zr3
v+kqJd7wQ/B03zei4XhZBTnPhT2tOFNtMFzYvagGyJXR9u0UjDK4Jf8i8ppFOJD5
USkl54p8xzHqnoSHV4SidmiQr6adUlnZhJFcr/0ODTd74+7+08KjhIiNbO+HY0lx
EKrW7lmcXCbr5zvHHJMODbYkgYf0iQ/RawRxU/VZ4GQzFT+92ep0qKENDcDTmE8F
9OBIgdOw6kt7p41LM/H6XxHQMLRnVne/2btUxmg7nWfESTyn+2iPp749X2qhuPeW
ufUQVGqajlQIklGVVcG0sHxfOFjvZEqNQfqEdcP5mK3g0RoIhPw=
=k7Vi
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 5.10.47 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmDcdEIACgkQ3qZv95d3
LNyCpw/+K96+1ahen7kcs2rt3783nti4S33dpn7vKXeD6B2i5IhdqlFk3aCDqshh
aMoy8kNgtXb90GdgPobfWQGJt+1MMBfgxDd38VhqdBovOM1JNkCrVfrrv1lPmw/C
QUAiwxHeOd7why2UUMJaXb7xsAv/ircK+zi5sVImpf6NCgXeKaJOB74kFYH7VI+R
6LRPBWuOpDc3As1I1MOoC0tIXWI7YCictecr1LTDi75REu58x64ty0HN/b6Gj/Io
Y3EGlTe8GRIsChDAArYScCTYCixyN2pj/Loc7vlZUCHb3puQShO4bSNyPsykYxfR
HMB5F5Jf1HaKN0im5rel1sKd2hn0/tWwla8orRIWubXhBPrSxqsJJn642h2o7ZZN
8axd1K8Gd+zpqHZjjl4mYtcJo3A7Cj71r9XGVmfVMowTLs5wiX/30h98PfevlGGv
mj+ybjue3Gypk3ZTaHRifRLDh5KzTJNSMxm1YJcJ7IhsTBsgQXDMuInSgkSG32yz
Ggk/xj+GU4ob43EU92hSc4Cbh6zSUnQ2ac5vQAeKyYqKtAmEGL+hCf3Mnatk0ItS
UUnwXPflRB1eCbI3JFYZ5A0Igp+60WMARjRWb/MbWX1kwMOKlZKl9fUjfeJV6b3e
xURPJruLnZelsIXS8L9Nx1SWu51HnVhhf7D/58CDByqnBCq6oaQ=
=b4hI
-----END PGP SIGNATURE-----
The tgid_map array records a mapping from pid to tgid, where the index
of an entry within the array is the pid & the value stored at that index
is the tgid.
The saved_tgids_next() function iterates over pointers into the tgid_map
array & dereferences the pointers which results in the tgid, but then it
passes that dereferenced value to trace_find_tgid() which treats it as a
pid & does a further lookup within the tgid_map array. It seems likely
that the intent here was to skip over entries in tgid_map for which the
recorded tgid is zero, but instead we end up skipping over entries for
which the thread group leader hasn't yet had its own tgid recorded in
tgid_map.
A minimal fix would be to remove the call to trace_find_tgid, turning:
if (trace_find_tgid(*ptr))
into:
if (*ptr)
..but it seems like this logic can be much simpler if we simply let
seq_read() iterate over the whole tgid_map array & filter out empty
entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that
approach, removing the incorrect logic here entirely.
Signed-off-by: Paul Burton <paulburton(a)google.com>
Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks")
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Joel Fernandes <joelaf(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
kernel/trace/trace.c | 38 +++++++++++++-------------------------
1 file changed, 13 insertions(+), 25 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d23a09d3eb37b..9570667310bcc 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -5608,37 +5608,20 @@ static const struct file_operations tracing_readme_fops = {
static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos)
{
- int *ptr = v;
+ int pid = ++(*pos);
- if (*pos || m->count)
- ptr++;
-
- (*pos)++;
-
- for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) {
- if (trace_find_tgid(*ptr))
- return ptr;
- }
+ if (pid > PID_MAX_DEFAULT)
+ return NULL;
- return NULL;
+ return &tgid_map[pid];
}
static void *saved_tgids_start(struct seq_file *m, loff_t *pos)
{
- void *v;
- loff_t l = 0;
-
- if (!tgid_map)
+ if (!tgid_map || *pos > PID_MAX_DEFAULT)
return NULL;
- v = &tgid_map[0];
- while (l <= *pos) {
- v = saved_tgids_next(m, v, &l);
- if (!v)
- return NULL;
- }
-
- return v;
+ return &tgid_map[*pos];
}
static void saved_tgids_stop(struct seq_file *m, void *v)
@@ -5647,9 +5630,14 @@ static void saved_tgids_stop(struct seq_file *m, void *v)
static int saved_tgids_show(struct seq_file *m, void *v)
{
- int pid = (int *)v - tgid_map;
+ int *entry = (int *)v;
+ int pid = entry - tgid_map;
+ int tgid = *entry;
+
+ if (tgid == 0)
+ return SEQ_SKIP;
- seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid));
+ seq_printf(m, "%d %d\n", pid, tgid);
return 0;
}
base-commit: 62fb9874f5da54fdb243003b386128037319b219
--
2.32.0.93.g670b81a890-goog