The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 35e351780fa9d8240dd6f7e4f245f9ea37e96c19
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042320-angled-goldmine-2cd7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
35e351780fa9 ("fork: defer linking file vma until vma is fully initialized")
d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
2820b0f09be9 ("hugetlbfs: close race between MADV_DONTNEED and page fault")
b5df09226450 ("mm: set up vma iterator for vma_iter_prealloc() calls")
f72cf24a8686 ("mm: use vma_iter_clear_gfp() in nommu")
da0892547b10 ("maple_tree: re-introduce entry to mas_preallocate() arguments")
fd892593d44d ("mm: change do_vmi_align_munmap() tracking of VMAs to remove")
5502ea44f5ad ("mm/hugetlb: add page_mask for hugetlb_follow_page_mask()")
dd767aaa2fc8 ("mm/hugetlb: handle FOLL_DUMP well in follow_page_mask()")
1279aa0656bb ("mm: make show_free_areas() static")
408579cd627a ("mm: Update do_vmi_align_munmap() return semantics")
e4bd84c069f2 ("mm: Always downgrade mmap_lock if requested")
43ec8a620b38 ("Merge tag 'unmap-fix-20230629' of git://git.infradead.org/users/dwmw2/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Wed, 10 Apr 2024 17:14:41 +0800
Subject: [PATCH] fork: defer linking file vma until vma is fully initialized
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index 39a5046c2f0b..aebb3e6c96dc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ /*
+ * Link the vma into the MT. After using __mt_dup(), memory
+ * allocation is not necessary here, so it cannot fail.
+ */
+ vma_iter_bulk_store(&vmi, tmp);
+
+ mm->map_count++;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
- /*
- * Link the vma into the MT. After using __mt_dup(), memory
- * allocation is not necessary here, so it cannot fail.
- */
- vma_iter_bulk_store(&vmi, tmp);
-
- mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
if (retval) {
mpnt = vma_next(&vmi);
goto loop_out;
The warning described on patch "tracing: Increase PERF_MAX_TRACE_SIZE to
handle Sentinel1 and docker together" can be triggered with a perf probe on
do_execve with a large path. As PATH_MAX is larger than PERF_MAX_TRACE_SIZE
(2048 before the patch), the warning will trigger.
The fix was included in 5.16, so backporting to 5.15 and earlier LTS
kernels. Also included is a patch that better describes the attempted
allocation size.
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 059a49aa2e25c58f90b50151f109dd3c4cdb3a47
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024041414-humming-alarm-eb41@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
059a49aa2e25 ("virtio_net: Do not send RSS key if it is not supported")
fb6e30a72539 ("net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops")
02cbfba1add5 ("idpf: add ethtool callbacks")
a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll")
3a8845af66ed ("idpf: add RX splitq napi poll support")
c2d548cad150 ("idpf: add TX splitq napi poll support")
6818c4d5b3c2 ("idpf: add splitq start_xmit")
d4d558718266 ("idpf: initialize interrupts and enable vport")
95af467d9a4e ("idpf: configure resources for RX queues")
1c325aac10a8 ("idpf: configure resources for TX queues")
ce1b75d0635c ("idpf: add ptypes and MAC filter support")
0fe45467a104 ("idpf: add create vport and netdev configuration")
4930fbf419a7 ("idpf: add core init and interrupt request")
8077c727561a ("idpf: add controlq init and reset checks")
e850efed5e15 ("idpf: add module register and probe functionality")
b9335a757232 ("net/mlx5e: Make flow classification filters static")
4cab498f33f7 ("hv_netvsc: Allocate rx indirection table size dynamically")
4f3ed1293feb ("ixgbe: Allow flow hash to be set via ethtool")
2b30f8291a30 ("net: ethtool: add support for MAC Merge layer")
8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Wed, 3 Apr 2024 08:43:12 -0700
Subject: [PATCH] virtio_net: Do not send RSS key if it is not supported
There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.
Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:
# ethtool -X eth0 hfunc toeplitz
This is how the problem happens:
1) ethtool_set_rxfh() calls virtnet_set_rxfh()
2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather
4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;
5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):
if (!sz) {
virtio_error(vdev, "virtio: zero sized buffers are not allowed");
6) virtio_error() (also QEMU function) set the device as broken
vdev->broken = true;
7) Qemu bails out, and do not repond this crazy kernel.
8) The kernel is waiting for the response to come back (function
virtnet_send_command())
9) The kernel is waiting doing the following :
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
!virtqueue_is_broken(vi->cvq))
cpu_relax();
10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.
Fix it by not sending RSS commands if the feature is not available in
the device.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: stable(a)vger.kernel.org
Cc: qemu-devel(a)nongnu.org
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Heng Qi <hengqi(a)linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..115c3c5414f2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3807,6 +3807,7 @@ static int virtnet_set_rxfh(struct net_device *dev,
struct netlink_ext_ack *extack)
{
struct virtnet_info *vi = netdev_priv(dev);
+ bool update = false;
int i;
if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
@@ -3814,13 +3815,28 @@ static int virtnet_set_rxfh(struct net_device *dev,
return -EOPNOTSUPP;
if (rxfh->indir) {
+ if (!vi->has_rss)
+ return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = rxfh->indir[i];
+ update = true;
}
- if (rxfh->key)
- memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
- virtnet_commit_rss_command(vi);
+ if (rxfh->key) {
+ /* If either _F_HASH_REPORT or _F_RSS are negotiated, the
+ * device provides hash calculation capabilities, that is,
+ * hash_key is configured.
+ */
+ if (!vi->has_rss && !vi->has_rss_hash_report)
+ return -EOPNOTSUPP;
+
+ memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
+ update = true;
+ }
+
+ if (update)
+ virtnet_commit_rss_command(vi);
return 0;
}
@@ -4729,13 +4745,15 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
- if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+ }
+
+ if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, rss_max_key_size));
From: Shifeng Li <lishifeng(a)sangfor.com.cn>
[ Upstream commit 8f5100da56b3980276234e812ce98d8f075194cd ]
Fix a cmd->ent use after free due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process allocated a ent->idx but didn't set ent to cmd->ent_arr in
cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
the spin lock.
[70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
[70013.081968]
[70013.082028] Workqueue: events aer_isr
[70013.082053] Call Trace:
[70013.082067] dump_stack+0x8b/0xbb
[70013.082086] print_address_description+0x6a/0x270
[70013.082102] kasan_report+0x179/0x2c0
[70013.082173] mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.082267] mlx5_cmd_flush+0x80/0x180 [mlx5_core]
[70013.082304] mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
[70013.082338] mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
[70013.082377] remove_one+0x200/0x2b0 [mlx5_core]
[70013.082409] pci_device_remove+0xf3/0x280
[70013.082439] device_release_driver_internal+0x1c3/0x470
[70013.082453] pci_stop_bus_device+0x109/0x160
[70013.082468] pci_stop_and_remove_bus_device+0xe/0x20
[70013.082485] pcie_do_fatal_recovery+0x167/0x550
[70013.082493] aer_isr+0x7d2/0x960
[70013.082543] process_one_work+0x65f/0x12d0
[70013.082556] worker_thread+0x87/0xb50
[70013.082571] kthread+0x2e9/0x3a0
[70013.082592] ret_from_fork+0x1f/0x40
The logical relationship of this error is as follows:
aer_recover_work | ent->work
-------------------------------------------+------------------------------
aer_recover_work_func |
|- pcie_do_recovery |
|- report_error_detected |
|- mlx5_pci_err_detected |cmd_work_handler
|- mlx5_enter_error_state | |- cmd_alloc_index
|- enter_error_state | |- lock cmd->alloc_lock
|- mlx5_cmd_flush | |- clear_bit
|- mlx5_cmd_trigger_completions| |- unlock cmd->alloc_lock
|- lock cmd->alloc_lock |
|- vector = ~dev->cmd.vars.bitmask
|- for_each_set_bit |
|- cmd_ent_get(cmd->ent_arr[i]) (UAF)
|- unlock cmd->alloc_lock | |- cmd->ent_arr[ent->idx]=ent
The cmd->ent_arr[ent->idx] assignment and the bit clearing are not
protected by the cmd->alloc_lock in cmd_work_handler().
Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
Reviewed-by: Moshe Shemesh <moshe(a)nvidia.com>
Signed-off-by: Shifeng Li <lishifeng(a)sangfor.com.cn>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
[Samasth: backport for 5.4.y]
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda(a)oracle.com>
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/cmd.c
conflict caused due to the absence of
commit 58db72869a9f ("net/mlx5: Re-organize mlx5_cmd struct")
which is structural change of code and is not necessary for this
patch.
---
commit:
50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
is present from linux-5.4.y onwards but the current commit which fixes
it is only present from linux-6.1.y. Would be nice to get an opinion
from the author or maintainer.
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 93a6597366f5..ebf5b30b2100 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -114,15 +114,18 @@ static u8 alloc_token(struct mlx5_cmd *cmd)
return token;
}
-static int cmd_alloc_index(struct mlx5_cmd *cmd)
+static int cmd_alloc_index(struct mlx5_cmd *cmd, struct mlx5_cmd_work_ent *ent)
{
unsigned long flags;
int ret;
spin_lock_irqsave(&cmd->alloc_lock, flags);
ret = find_first_bit(&cmd->bitmask, cmd->max_reg_cmds);
- if (ret < cmd->max_reg_cmds)
+ if (ret < cmd->max_reg_cmds) {
clear_bit(ret, &cmd->bitmask);
+ ent->idx = ret;
+ cmd->ent_arr[ent->idx] = ent;
+ }
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
return ret < cmd->max_reg_cmds ? ret : -ENOMEM;
@@ -905,7 +908,7 @@ static void cmd_work_handler(struct work_struct *work)
sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem;
down(sem);
if (!ent->page_queue) {
- alloc_ret = cmd_alloc_index(cmd);
+ alloc_ret = cmd_alloc_index(cmd, ent);
if (alloc_ret < 0) {
mlx5_core_err(dev, "failed to allocate command entry\n");
if (ent->callback) {
@@ -920,15 +923,14 @@ static void cmd_work_handler(struct work_struct *work)
up(sem);
return;
}
- ent->idx = alloc_ret;
} else {
ent->idx = cmd->max_reg_cmds;
spin_lock_irqsave(&cmd->alloc_lock, flags);
clear_bit(ent->idx, &cmd->bitmask);
+ cmd->ent_arr[ent->idx] = ent;
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
}
- cmd->ent_arr[ent->idx] = ent;
lay = get_inst(cmd, ent->idx);
ent->lay = lay;
memset(lay, 0, sizeof(*lay));
--
2.43.0
Please backport the following fixes for selftest/seccomp.
I forgot to send the original patches to the stable list!
They resolve some edge cases in the testing.
commit 8e3c9f9f3a0742cd12b682a1766674253b33fcf0
selftests/seccomp: user_notification_addfd check nextfd is available
commit: 471dbc547612adeaa769e48498ef591c6c95a57a
selftests/seccomp: Change the syscall used in KILL_THREAD test
commit: ecaaa55c9fa5e8058445a8b891070b12208cdb6d
selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)
Please backport to 6.6 onwards.
Thanks,
Terry.
cc stable(a)vger.kernel.org
On Fri, 2024-04-26 at 10:50 +0200, Greg Kroah-Hartman wrote:
> Hi,
>
> This is the friendly email-bot of Greg Kroah-Hartman's inbox. I've
> detected that you have sent him a direct question that might be
> better
> sent to a public mailing list which is much faster in responding to
> questions than Greg normally is.
>
> Please try asking one of the following mailing lists for your
> questions:
>
> For udev and hotplug related questions, please ask on the
> linux-hotplug(a)vger.kernel.org mailing list
>
> For USB related questions, please ask on the
> linux-usb(a)vger.kernel.org
> mailing list
>
> For PCI related questions, please ask on the
> linux-pci(a)vger.kernel.org or linux-kernel(a)vger.kernel.org mailing
> lists
>
> For serial and tty related questions, please ask on the
> linux-serial(a)vger.kernel.org mailing list.
>
> For staging tree related questions, please ask on the
> linux-staging(a)lists.linux.dev mailing list.
>
> For general kernel related questions, please ask on the
> kernelnewbies(a)nl.linux.org or linux-kernel(a)vger.kernel.org mailing
> lists, depending on the type of question. More basic, beginner
> questions are better asked on the kernelnewbies list, after reading
> the wiki at www.kernelnewbies.org.
>
> For Linux stable and longterm kernel release questions or patches
> to
> be included in the stable or longterm kernel trees, please email
> stable(a)vger.kernel.org and Cc: the linux-kernel(a)vger.kernel.org
> mailing list so all of the stable kernel developers can be
> notified.
> Also please read Documentation/process/stable-kernel-rules.rst in
> the
> Linux kernel tree for the proper procedure to get patches accepted
> into the stable or longterm kernel releases.
>
> If you really want to ask Greg the question, please read the
> following
> two links as to why emailing a single person directly is usually not
> a
> good thing, and causes extra work on a single person:
> http://www.arm.linux.org.uk/news/?newsitem=11
> http://www.eyrie.org/~eagle/faqs/questions.html
>
> After reading those messages, and you still feel that you want to
> email
> Greg instead of posting on a mailing list, then resend your message
> within 24 hours and it will go through to him. But be forewarned,
> his
> email inbox currently looks like:
> 912 messages in /home/greg/mail/INBOX/
> so it might be a while before he gets to the message.
>
> Thank you for your understanding.
>
> The email triggering this response has been automatically discarded.
>
> thanks,
>
> greg k-h's email bot
Please consider the commits below for backporting to v5.15. These
patches are prerequisites for the backport of the x86 EFI stub
refactor that is needed for distros to sign v5.15 images for secure
boot in a way that complies with new MS requirements for memory
protections while running in the EFI firmware.
All patches either predate v6.1 or have been backported to it already.
The remaining ~50 changes will be posted as a patch series in due
time, as they will not apply cleanly to v5.15.
Please apply in the order that they appear below.
Thanks,
Ard.
44f155b4b07b8293472c9797d5b39839b91041ca
4da87c51705815fe1fbd41cc61640bb80da5bc54
7c4146e8885512719a50b641e9277a1712e052ff
176db622573f028f85221873ea4577e096785315
950d00558a920227b5703d1fcc4751cfe03853cd
ec1c66af3a30d45c2420da0974c01d3515dba26e
a9ee679b1f8c3803490ed2eeffb688aaee56583f
3ba75c1316390b2bc39c19cb8f0f85922ab3f9ed
82e0d6d76a2a74bd6a31141d555d53b4cc22c2a3
31f1a0edff78c43e8a3bd3692af0db1b25c21b17
9cf42bca30e98a1c6c9e8abf876940a551eaa3d1
cb8bda8ad4438b4bcfcf89697fc84803fb210017
e2ab9eab324cdf240de89741e4a1aa79919f0196
5c3a85f35b583259cf5ca0344cd79c8899ba1bb7
91592b5c0c2f076ff9d8cc0c14aa563448ac9fc4
73a6dec80e2acedaef3ca603d4b5799049f6e9f8
7f22ca396778fea9332d83ec2359dbe8396e9a06
4b52016247aeaa55ca3e3bc2e03cd91114c145c2
630f337f0c4fd80390e8600adcab31550aea33df
db14655ad7854b69a2efda348e30d02dbc19e8a1
bad267f9e18f8e9e628abd1811d2899b1735a4e1
62b71cd73d41ddac6b1760402bbe8c4932e23531
cc3fdda2876e58a7e83e558ab51853cf106afb6a
d2d7a54f69b67cd0a30e0ebb5307cb2de625baac
I'm announcing the release of the 6.1.89 kernel.
Only users of the 6.1 kernel series that had build problems with 6.1.88 need to upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
arch/arm/mach-omap2/pdata-quirks.c | 10 -----
sound/soc/ti/omap3pandora.c | 63 +++++++++++++++++++++++--------------
3 files changed, 41 insertions(+), 34 deletions(-)
Greg Kroah-Hartman (2):
Revert "ASoC: ti: Convert Pandora ASoC to GPIO descriptors"
Linux 6.1.89
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and `MADV_DONTNEED` semantics). Consider this race:
1. T1 performs page removal in sgx_encl_remove_pages() and stops right
after removing the page table entry and right before re-acquiring the
enclave lock to EREMOVE and xa_erase(&encl->page_array) the page.
2. T2 tries to access the page, and #PF[not_present] is raised. The
condition to EAUG in sgx_vma_fault() is not satisfied because the
page is still present in encl->page_array, thus the SGX driver
assumes that the fault happened because the page was swapped out. The
driver continues on a code path that installs a page table entry
*without* performing EAUG.
3. The enclave page metadata is in inconsistent state: the PTE is
installed but there was no EAUG. Thus, T2 in userspace infinitely
receives SIGSEGV on this page (and EACCEPT always fails).
Fix this by making sure that T1 (the page-removing thread) always wins
this data race. In particular, the page-being-removed is marked as such,
and T2 retries until the page is fully removed.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 3 ++-
arch/x86/kernel/cpu/sgx/encl.h | 3 +++
arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+ SGX_ENCL_PAGE_BEING_REMOVED))
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@
/* 'desc' bit marking that the page is being reclaimed. */
#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2)
+
struct sgx_encl_page {
unsigned long desc;
unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c542d4dd3e64 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
*/
+ entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.34.1
Two enclave threads may try to access the same non-present enclave page
simultaneously (e.g., if the SGX runtime supports lazy allocation). The
threads will end up in sgx_encl_eaug_page(), racing to acquire the
enclave lock. The winning thread will perform EAUG, set up the page
table entry, and insert the page into encl->page_array. The losing
thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed
to error handling path.
This error handling path contains two bugs: (1) SIGBUS is sent to
userspace even though the enclave page is correctly installed by another
thread, and (2) sgx_encl_free_epc_page() is called that performs EREMOVE
even though the enclave page was never intended to be removed. The first
bug is less severe because it impacts only the user space; the second
bug is more severe because it also impacts the OS state by ripping the
page (added by the winning thread) from the enclave.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux
fault handler so that no signal is sent to userspace, and (2) by
replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no
EREMOVE is performed.
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.34.1
Before sending Enter Mode for an Alt Mode, there is a gap between Discover
Modes and the Alt Mode driver queueing the Enter Mode VDM for the port
partner to send a message to the port.
If this message results in unregistering Alt Modes such as in a DR_SWAP,
then the following deadlock can occur with respect to the DisplayPort Alt
Mode driver:
1. The DR_SWAP state holds port->lock. Unregistering the Alt Mode driver
results in a cancel_work_sync() that waits for the current dp_altmode_work
to finish.
2. dp_altmode_work makes a call to tcpm_altmode_enter. The deadlock occurs
because tcpm_queue_vdm_unlock attempts to hold port->lock.
Before attempting to grab the lock, ensure that the port is in a state
vdm_run_state_machine can run in. Alt Mode unregistration will not occur
in these states.
Fixes: 03eafcfb60c0 ("usb: typec: tcpm: Add tcpm_queue_vdm_unlocked() helper")
Cc: stable(a)vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera(a)google.com>
---
drivers/usb/typec/tcpm/tcpm.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index c26fb70c3ec6..6fa1601ac259 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -1564,6 +1564,10 @@ static void tcpm_queue_vdm(struct tcpm_port *port, const u32 header,
static void tcpm_queue_vdm_unlocked(struct tcpm_port *port, const u32 header,
const u32 *data, int cnt, enum tcpm_transmit_type tx_sop_type)
{
+ if (port->state != SRC_READY && port->state != SNK_READY &&
+ port->state != SRC_VDM_IDENTITY_REQUEST)
+ return;
+
mutex_lock(&port->lock);
tcpm_queue_vdm(port, header, data, cnt, TCPC_TX_SOP);
mutex_unlock(&port->lock);
base-commit: 684e9f5f97eb4b7831298ffad140d5c1d426ff27
--
2.44.0.769.g3c40516874-goog
From: Konstantin Pugin <ria.freelander(a)gmail.com>
When specifying flag SER_RS485_RTS_ON_SEND in RS485 configuration,
we get the following warning after commit 4afeced55baa ("serial: core:
fix sanitizing check for RTS settings"):
invalid RTS setting, using RTS_AFTER_SEND instead
This results in SER_RS485_RTS_AFTER_SEND being set and the
driver always write to the register field SC16IS7XX_EFCR_RTS_INVERT_BIT,
which breaks some hardware using these chips.
The hardware supports both RTS_ON_SEND and RTS_AFTER_SEND modes, so fix
this by announcing support for RTS_ON_SEND.
Cc: stable(a)vger.kernel.org
Fixes: 267913ecf737 ("serial: sc16is7xx: Fill in rs485_supported")
Tested-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Reviewed-by: Andy Shevchenko <andy(a)kernel.org>
Signed-off-by: Konstantin Pugin <ria.freelander(a)gmail.com>
---
drivers/tty/serial/sc16is7xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 03cf30e20b75..dfcc804f558f 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1449,7 +1449,7 @@ static int sc16is7xx_setup_mctrl_ports(struct sc16is7xx_port *s,
}
static const struct serial_rs485 sc16is7xx_rs485_supported = {
- .flags = SER_RS485_ENABLED | SER_RS485_RTS_AFTER_SEND,
+ .flags = SER_RS485_ENABLED | SER_RS485_RTS_ON_SEND | SER_RS485_RTS_AFTER_SEND,
.delay_rts_before_send = 1,
.delay_rts_after_send = 1, /* Not supported but keep returning -EINVAL */
};
--
2.44.0
xen kbdfront registers itself as being able to deliver *any* key since
it doesn't know what keys the backend may produce.
Unfortunately, the generated modalias gets too large and uevent creation
fails with -ENOMEM.
This can lead to gdm not using the keyboard since there is no seat
associated [1] and the debian installer crashing [2].
Trim the ranges of key capabilities by removing some BTN_* ranges.
While doing this, some neighboring undefined ranges are removed to trim
it further.
An upper limit of KEY_KBD_LCD_MENU5 is still too large. Use an upper
limit of KEY_BRIGHTNESS_MENU.
This removes:
BTN_DPAD_UP(0x220)..BTN_DPAD_RIGHT(0x223)
Empty space 0x224..0x229
Empty space 0x28a..0x28f
KEY_MACRO1(0x290)..KEY_MACRO30(0x2ad)
KEY_MACRO_RECORD_START 0x2b0
KEY_MACRO_RECORD_STOP 0x2b1
KEY_MACRO_PRESET_CYCLE 0x2b2
KEY_MACRO_PRESET1(0x2b3)..KEY_MACRO_PRESET3(0xb5)
Empty space 0x2b6..0x2b7
KEY_KBD_LCD_MENU1(0x2b8)..KEY_KBD_LCD_MENU5(0x2bc)
Empty space 0x2bd..0x2bf
BTN_TRIGGER_HAPPY(0x2c0)..BTN_TRIGGER_HAPPY40(0x2e7)
Empty space 0x2e8..0x2ff
The modalias shrinks from 2082 to 1550 bytes.
A chunk of keys need to be removed to allow the keyboard to be used.
This may break some functionality, but the hope is these macro keys are
uncommon and don't affect any users.
[1] https://github.com/systemd/systemd/issues/22944
[2] https://lore.kernel.org/xen-devel/87o8dw52jc.fsf@vps.thesusis.net/T/
Cc: Phillip Susi <phill(a)thesusis.net>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jason Andryuk <jandryuk(a)gmail.com>
Reviewed-by: Mattijs Korpershoek <mkorpershoek(a)baylibre.com>
---
v3:
Add Mattijs R-b
Put /* and */ on separate lines
---
drivers/input/misc/xen-kbdfront.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/input/misc/xen-kbdfront.c b/drivers/input/misc/xen-kbdfront.c
index 67f1c7364c95..d59ba8f9852e 100644
--- a/drivers/input/misc/xen-kbdfront.c
+++ b/drivers/input/misc/xen-kbdfront.c
@@ -256,7 +256,16 @@ static int xenkbd_probe(struct xenbus_device *dev,
__set_bit(EV_KEY, kbd->evbit);
for (i = KEY_ESC; i < KEY_UNKNOWN; i++)
__set_bit(i, kbd->keybit);
- for (i = KEY_OK; i < KEY_MAX; i++)
+ /*
+ * In theory we want to go KEY_OK..KEY_MAX, but that grows the
+ * modalias line too long. There is a gap of buttons from
+ * BTN_DPAD_UP..BTN_DPAD_RIGHT and KEY_ALS_TOGGLE is the next
+ * defined. Then continue up to KEY_BRIGHTNESS_MENU as an upper
+ * limit.
+ */
+ for (i = KEY_OK; i < BTN_DPAD_UP; i++)
+ __set_bit(i, kbd->keybit);
+ for (i = KEY_ALS_TOGGLE; i <= KEY_BRIGHTNESS_MENU; i++)
__set_bit(i, kbd->keybit);
ret = input_register_device(kbd);
--
2.41.0
The clk_alpha_pll_stromer_set_rate() function writes inproper
values into the ALPHA_VAL{,_U} registers which results in wrong
clock rates when the alpha value is used.
The broken behaviour can be seen on IPQ5018 for example, when
dynamic scaling sets the CPU frequency to 800000 KHz. In this
case the CPU cores are running only at 792031 KHz:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
800000
# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
792031
This happens because the function ignores the fact that the alpha
value calculated by the alpha_pll_round_rate() function is only
32 bits wide which must be extended to 40 bits if it is used on
a hardware which supports 40 bits wide values.
Extend the clk_alpha_pll_stromer_set_rate() function to convert
the alpha value to 40 bits before wrinting that into the registers
in order to ensure that the hardware really uses the requested rate.
After the change the CPU frequency is correct:
# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
800000
# cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
800000
Cc: stable(a)vger.kernel.org
Fixes: e47a4f55f240 ("clk: qcom: clk-alpha-pll: Add support for Stromer PLLs")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
Changes in v3:
- remove constants' comparison (Konrad)
- Link to v2: https://lore.kernel.org/r/20240326-alpha-pll-fix-stromer-set-rate-v2-1-48ae…
Changes in v2:
- fix subject prefix
- rebase on v6.9-rc1
- Link to v1: https://lore.kernel.org/r/20240324-alpha-pll-fix-stromer-set-rate-v1-1-335b…
Depends on the following patch:
https://lore.kernel.org/r/20240315-apss-ipq-pll-ipq5018-hang-v2-1-6fe30ada2…
---
drivers/clk/qcom/clk-alpha-pll.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/clk/qcom/clk-alpha-pll.c b/drivers/clk/qcom/clk-alpha-pll.c
index 8a412ef47e16..8a8abb429577 100644
--- a/drivers/clk/qcom/clk-alpha-pll.c
+++ b/drivers/clk/qcom/clk-alpha-pll.c
@@ -2490,6 +2490,8 @@ static int clk_alpha_pll_stromer_set_rate(struct clk_hw *hw, unsigned long rate,
rate = alpha_pll_round_rate(rate, prate, &l, &a, ALPHA_REG_BITWIDTH);
regmap_write(pll->clkr.regmap, PLL_L_VAL(pll), l);
+
+ a <<= ALPHA_REG_BITWIDTH - ALPHA_BITWIDTH;
regmap_write(pll->clkr.regmap, PLL_ALPHA_VAL(pll), a);
regmap_write(pll->clkr.regmap, PLL_ALPHA_VAL_U(pll),
a >> ALPHA_BITWIDTH);
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240324-alpha-pll-fix-stromer-set-rate-472376e624f0
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
Booting v6.8 results in a hang on various IPQ5018 based boards.
Investigating the problem showed that the hang happens when the
clk_alpha_pll_stromer_plus_set_rate() function tries to write
into the PLL_MODE register of the APSS PLL.
Checking the downstream code revealed that it uses [1] stromer
specific operations for IPQ5018, whereas in the current code
the stromer plus specific operations are used.
The ops in the 'ipq_pll_stromer_plus' clock definition can't be
changed since that is needed for IPQ5332, so add a new alpha pll
clock declaration which uses the correct stromer ops and use this
new clock for IPQ5018 to avoid the boot failure.
Also, change pll_type in 'ipq5018_pll_data' to
CLK_ALPHA_PLL_TYPE_STROMER to better reflect that it is a Stromer
PLL and change the apss_ipq_pll_probe() function accordingly.
1. https://git.codelinaro.org/clo/qsdk/oss/kernel/linux-ipq-5.4/-/blob/NHSS.QS…
Cc: stable(a)vger.kernel.org
Fixes: 50492f929486 ("clk: qcom: apss-ipq-pll: add support for IPQ5018")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
Changes in v2:
- extend commit description due to the changes
- add a comment about why CLK_ALPHA_PLL_TYPE_STROMER_PLUS register offsets
are used
- constify hw clock init data (Stephen)
- change pll_type in ipq5018_pll_data to CLK_ALPHA_PLL_TYPE_STROMER (Konrad)
- Link to v1: https://lore.kernel.org/r/20240311-apss-ipq-pll-ipq5018-hang-v1-1-8ed42b7a9…
---
Based on v6.8.
---
drivers/clk/qcom/apss-ipq-pll.c | 30 +++++++++++++++++++++++++++---
1 file changed, 27 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/qcom/apss-ipq-pll.c b/drivers/clk/qcom/apss-ipq-pll.c
index 678b805f13d45..dfffec2f06ae7 100644
--- a/drivers/clk/qcom/apss-ipq-pll.c
+++ b/drivers/clk/qcom/apss-ipq-pll.c
@@ -55,6 +55,29 @@ static struct clk_alpha_pll ipq_pll_huayra = {
},
};
+static struct clk_alpha_pll ipq_pll_stromer = {
+ .offset = 0x0,
+ /*
+ * Reuse CLK_ALPHA_PLL_TYPE_STROMER_PLUS register offsets.
+ * Although this is a bit confusing, but the offset values
+ * are correct nevertheless.
+ */
+ .regs = ipq_pll_offsets[CLK_ALPHA_PLL_TYPE_STROMER_PLUS],
+ .flags = SUPPORTS_DYNAMIC_UPDATE,
+ .clkr = {
+ .enable_reg = 0x0,
+ .enable_mask = BIT(0),
+ .hw.init = &(const struct clk_init_data) {
+ .name = "a53pll",
+ .parent_data = &(const struct clk_parent_data) {
+ .fw_name = "xo",
+ },
+ .num_parents = 1,
+ .ops = &clk_alpha_pll_stromer_ops,
+ },
+ },
+};
+
static struct clk_alpha_pll ipq_pll_stromer_plus = {
.offset = 0x0,
.regs = ipq_pll_offsets[CLK_ALPHA_PLL_TYPE_STROMER_PLUS],
@@ -144,8 +167,8 @@ struct apss_pll_data {
};
static const struct apss_pll_data ipq5018_pll_data = {
- .pll_type = CLK_ALPHA_PLL_TYPE_STROMER_PLUS,
- .pll = &ipq_pll_stromer_plus,
+ .pll_type = CLK_ALPHA_PLL_TYPE_STROMER,
+ .pll = &ipq_pll_stromer,
.pll_config = &ipq5018_pll_config,
};
@@ -203,7 +226,8 @@ static int apss_ipq_pll_probe(struct platform_device *pdev)
if (data->pll_type == CLK_ALPHA_PLL_TYPE_HUAYRA)
clk_alpha_pll_configure(data->pll, regmap, data->pll_config);
- else if (data->pll_type == CLK_ALPHA_PLL_TYPE_STROMER_PLUS)
+ else if (data->pll_type == CLK_ALPHA_PLL_TYPE_STROMER ||
+ data->pll_type == CLK_ALPHA_PLL_TYPE_STROMER_PLUS)
clk_stromer_pll_configure(data->pll, regmap, data->pll_config);
ret = devm_clk_register_regmap(dev, &data->pll->clkr);
---
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
change-id: 20240311-apss-ipq-pll-ipq5018-hang-74d9a8f47136
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ffe3986fece696cf65e0ef99e74c75f848be8e30
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042720-safeness-stowaway-2308@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ffe3986fece6 ("ring-buffer: Only update pages_touched when a new page is touched")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ffe3986fece696cf65e0ef99e74c75f848be8e30 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Tue, 9 Apr 2024 15:13:09 -0400
Subject: [PATCH] ring-buffer: Only update pages_touched when a new page is
touched
The "buffer_percent" logic that is used by the ring buffer splice code to
only wake up the tasks when there's no data after the buffer is filled to
the percentage of the "buffer_percent" file is dependent on three
variables that determine the amount of data that is in the ring buffer:
1) pages_read - incremented whenever a new sub-buffer is consumed
2) pages_lost - incremented every time a writer overwrites a sub-buffer
3) pages_touched - incremented when a write goes to a new sub-buffer
The percentage is the calculation of:
(pages_touched - (pages_lost + pages_read)) / nr_pages
Basically, the amount of data is the total number of sub-bufs that have been
touched, minus the number of sub-bufs lost and sub-bufs consumed. This is
divided by the total count to give the buffer percentage. When the
percentage is greater than the value in the "buffer_percent" file, it
wakes up splice readers waiting for that amount.
It was observed that over time, the amount read from the splice was
constantly decreasing the longer the trace was running. That is, if one
asked for 60%, it would read over 60% when it first starts tracing, but
then it would be woken up at under 60% and would slowly decrease the
amount of data read after being woken up, where the amount becomes much
less than the buffer percent.
This was due to an accounting of the pages_touched incrementation. This
value is incremented whenever a writer transfers to a new sub-buffer. But
the place where it was incremented was incorrect. If a writer overflowed
the current sub-buffer it would go to the next one. If it gets preempted
by an interrupt at that time, and the interrupt performs a trace, it too
will end up going to the next sub-buffer. But only one should increment
the counter. Unfortunately, that was not the case.
Change the cmpxchg() that does the real switch of the tail-page into a
try_cmpxchg(), and on success, perform the increment of pages_touched. This
will only increment the counter once for when the writer moves to a new
sub-buffer, and not when there's a race and is incremented for when a
writer and its preempting writer both move to the same new sub-buffer.
Link: https://lore.kernel.org/linux-trace-kernel/20240409151309.0d0e5056@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 25476ead681b..6511dc3a00da 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1393,7 +1393,6 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
old_write = local_add_return(RB_WRITE_INTCNT, &next_page->write);
old_entries = local_add_return(RB_WRITE_INTCNT, &next_page->entries);
- local_inc(&cpu_buffer->pages_touched);
/*
* Just make sure we have seen our old_write and synchronize
* with any interrupts that come in.
@@ -1430,8 +1429,9 @@ static void rb_tail_page_update(struct ring_buffer_per_cpu *cpu_buffer,
*/
local_set(&next_page->page->commit, 0);
- /* Again, either we update tail_page or an interrupt does */
- (void)cmpxchg(&cpu_buffer->tail_page, tail_page, next_page);
+ /* Either we update tail_page or an interrupt does */
+ if (try_cmpxchg(&cpu_buffer->tail_page, &tail_page, next_page))
+ local_inc(&cpu_buffer->pages_touched);
}
}