The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 35e351780fa9d8240dd6f7e4f245f9ea37e96c19
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042320-angled-goldmine-2cd7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
35e351780fa9 ("fork: defer linking file vma until vma is fully initialized")
d24062914837 ("fork: use __mt_dup() to duplicate maple tree in dup_mmap()")
2820b0f09be9 ("hugetlbfs: close race between MADV_DONTNEED and page fault")
b5df09226450 ("mm: set up vma iterator for vma_iter_prealloc() calls")
f72cf24a8686 ("mm: use vma_iter_clear_gfp() in nommu")
da0892547b10 ("maple_tree: re-introduce entry to mas_preallocate() arguments")
fd892593d44d ("mm: change do_vmi_align_munmap() tracking of VMAs to remove")
5502ea44f5ad ("mm/hugetlb: add page_mask for hugetlb_follow_page_mask()")
dd767aaa2fc8 ("mm/hugetlb: handle FOLL_DUMP well in follow_page_mask()")
1279aa0656bb ("mm: make show_free_areas() static")
408579cd627a ("mm: Update do_vmi_align_munmap() return semantics")
e4bd84c069f2 ("mm: Always downgrade mmap_lock if requested")
43ec8a620b38 ("Merge tag 'unmap-fix-20230629' of git://git.infradead.org/users/dwmw2/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Wed, 10 Apr 2024 17:14:41 +0800
Subject: [PATCH] fork: defer linking file vma until vma is fully initialized
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/kernel/fork.c b/kernel/fork.c
index 39a5046c2f0b..aebb3e6c96dc 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -714,6 +714,23 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ /*
+ * Link the vma into the MT. After using __mt_dup(), memory
+ * allocation is not necessary here, so it cannot fail.
+ */
+ vma_iter_bulk_store(&vmi, tmp);
+
+ mm->map_count++;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -730,25 +747,9 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
- /*
- * Link the vma into the MT. After using __mt_dup(), memory
- * allocation is not necessary here, so it cannot fail.
- */
- vma_iter_bulk_store(&vmi, tmp);
-
- mm->map_count++;
if (!(tmp->vm_flags & VM_WIPEONFORK))
retval = copy_page_range(tmp, mpnt);
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
if (retval) {
mpnt = vma_next(&vmi);
goto loop_out;
The warning described on patch "tracing: Increase PERF_MAX_TRACE_SIZE to
handle Sentinel1 and docker together" can be triggered with a perf probe on
do_execve with a large path. As PATH_MAX is larger than PERF_MAX_TRACE_SIZE
(2048 before the patch), the warning will trigger.
The fix was included in 5.16, so backporting to 5.15 and earlier LTS
kernels. Also included is a patch that better describes the attempted
allocation size.
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 059a49aa2e25c58f90b50151f109dd3c4cdb3a47
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024041414-humming-alarm-eb41@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
059a49aa2e25 ("virtio_net: Do not send RSS key if it is not supported")
fb6e30a72539 ("net: ethtool: pass a pointer to parameters to get/set_rxfh ethtool ops")
02cbfba1add5 ("idpf: add ethtool callbacks")
a5ab9ee0df0b ("idpf: add singleq start_xmit and napi poll")
3a8845af66ed ("idpf: add RX splitq napi poll support")
c2d548cad150 ("idpf: add TX splitq napi poll support")
6818c4d5b3c2 ("idpf: add splitq start_xmit")
d4d558718266 ("idpf: initialize interrupts and enable vport")
95af467d9a4e ("idpf: configure resources for RX queues")
1c325aac10a8 ("idpf: configure resources for TX queues")
ce1b75d0635c ("idpf: add ptypes and MAC filter support")
0fe45467a104 ("idpf: add create vport and netdev configuration")
4930fbf419a7 ("idpf: add core init and interrupt request")
8077c727561a ("idpf: add controlq init and reset checks")
e850efed5e15 ("idpf: add module register and probe functionality")
b9335a757232 ("net/mlx5e: Make flow classification filters static")
4cab498f33f7 ("hv_netvsc: Allocate rx indirection table size dynamically")
4f3ed1293feb ("ixgbe: Allow flow hash to be set via ethtool")
2b30f8291a30 ("net: ethtool: add support for MAC Merge layer")
8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 059a49aa2e25c58f90b50151f109dd3c4cdb3a47 Mon Sep 17 00:00:00 2001
From: Breno Leitao <leitao(a)debian.org>
Date: Wed, 3 Apr 2024 08:43:12 -0700
Subject: [PATCH] virtio_net: Do not send RSS key if it is not supported
There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.
Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:
# ethtool -X eth0 hfunc toeplitz
This is how the problem happens:
1) ethtool_set_rxfh() calls virtnet_set_rxfh()
2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather
4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;
5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):
if (!sz) {
virtio_error(vdev, "virtio: zero sized buffers are not allowed");
6) virtio_error() (also QEMU function) set the device as broken
vdev->broken = true;
7) Qemu bails out, and do not repond this crazy kernel.
8) The kernel is waiting for the response to come back (function
virtnet_send_command())
9) The kernel is waiting doing the following :
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
!virtqueue_is_broken(vi->cvq))
cpu_relax();
10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.
Fix it by not sending RSS commands if the feature is not available in
the device.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: stable(a)vger.kernel.org
Cc: qemu-devel(a)nongnu.org
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Heng Qi <hengqi(a)linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..115c3c5414f2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3807,6 +3807,7 @@ static int virtnet_set_rxfh(struct net_device *dev,
struct netlink_ext_ack *extack)
{
struct virtnet_info *vi = netdev_priv(dev);
+ bool update = false;
int i;
if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
@@ -3814,13 +3815,28 @@ static int virtnet_set_rxfh(struct net_device *dev,
return -EOPNOTSUPP;
if (rxfh->indir) {
+ if (!vi->has_rss)
+ return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = rxfh->indir[i];
+ update = true;
}
- if (rxfh->key)
- memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
- virtnet_commit_rss_command(vi);
+ if (rxfh->key) {
+ /* If either _F_HASH_REPORT or _F_RSS are negotiated, the
+ * device provides hash calculation capabilities, that is,
+ * hash_key is configured.
+ */
+ if (!vi->has_rss && !vi->has_rss_hash_report)
+ return -EOPNOTSUPP;
+
+ memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
+ update = true;
+ }
+
+ if (update)
+ virtnet_commit_rss_command(vi);
return 0;
}
@@ -4729,13 +4745,15 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
- if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+ }
+
+ if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, rss_max_key_size));
From: Shifeng Li <lishifeng(a)sangfor.com.cn>
[ Upstream commit 8f5100da56b3980276234e812ce98d8f075194cd ]
Fix a cmd->ent use after free due to a race on command entry.
Such race occurs when one of the commands releases its last refcount and
frees its index and entry while another process running command flush
flow takes refcount to this command entry. The process which handles
commands flush may see this command as needed to be flushed if the other
process allocated a ent->idx but didn't set ent to cmd->ent_arr in
cmd_work_handler(). Fix it by moving the assignment of cmd->ent_arr into
the spin lock.
[70013.081955] BUG: KASAN: use-after-free in mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.081967] Write of size 4 at addr ffff88880b1510b4 by task kworker/26:1/1433361
[70013.081968]
[70013.082028] Workqueue: events aer_isr
[70013.082053] Call Trace:
[70013.082067] dump_stack+0x8b/0xbb
[70013.082086] print_address_description+0x6a/0x270
[70013.082102] kasan_report+0x179/0x2c0
[70013.082173] mlx5_cmd_trigger_completions+0x1e2/0x4c0 [mlx5_core]
[70013.082267] mlx5_cmd_flush+0x80/0x180 [mlx5_core]
[70013.082304] mlx5_enter_error_state+0x106/0x1d0 [mlx5_core]
[70013.082338] mlx5_try_fast_unload+0x2ea/0x4d0 [mlx5_core]
[70013.082377] remove_one+0x200/0x2b0 [mlx5_core]
[70013.082409] pci_device_remove+0xf3/0x280
[70013.082439] device_release_driver_internal+0x1c3/0x470
[70013.082453] pci_stop_bus_device+0x109/0x160
[70013.082468] pci_stop_and_remove_bus_device+0xe/0x20
[70013.082485] pcie_do_fatal_recovery+0x167/0x550
[70013.082493] aer_isr+0x7d2/0x960
[70013.082543] process_one_work+0x65f/0x12d0
[70013.082556] worker_thread+0x87/0xb50
[70013.082571] kthread+0x2e9/0x3a0
[70013.082592] ret_from_fork+0x1f/0x40
The logical relationship of this error is as follows:
aer_recover_work | ent->work
-------------------------------------------+------------------------------
aer_recover_work_func |
|- pcie_do_recovery |
|- report_error_detected |
|- mlx5_pci_err_detected |cmd_work_handler
|- mlx5_enter_error_state | |- cmd_alloc_index
|- enter_error_state | |- lock cmd->alloc_lock
|- mlx5_cmd_flush | |- clear_bit
|- mlx5_cmd_trigger_completions| |- unlock cmd->alloc_lock
|- lock cmd->alloc_lock |
|- vector = ~dev->cmd.vars.bitmask
|- for_each_set_bit |
|- cmd_ent_get(cmd->ent_arr[i]) (UAF)
|- unlock cmd->alloc_lock | |- cmd->ent_arr[ent->idx]=ent
The cmd->ent_arr[ent->idx] assignment and the bit clearing are not
protected by the cmd->alloc_lock in cmd_work_handler().
Fixes: 50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
Reviewed-by: Moshe Shemesh <moshe(a)nvidia.com>
Signed-off-by: Shifeng Li <lishifeng(a)sangfor.com.cn>
Signed-off-by: Saeed Mahameed <saeedm(a)nvidia.com>
[Samasth: backport for 5.4.y]
Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda(a)oracle.com>
Conflicts:
drivers/net/ethernet/mellanox/mlx5/core/cmd.c
conflict caused due to the absence of
commit 58db72869a9f ("net/mlx5: Re-organize mlx5_cmd struct")
which is structural change of code and is not necessary for this
patch.
---
commit:
50b2412b7e78 ("net/mlx5: Avoid possible free of command entry while timeout comp handler")
is present from linux-5.4.y onwards but the current commit which fixes
it is only present from linux-6.1.y. Would be nice to get an opinion
from the author or maintainer.
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 93a6597366f5..ebf5b30b2100 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -114,15 +114,18 @@ static u8 alloc_token(struct mlx5_cmd *cmd)
return token;
}
-static int cmd_alloc_index(struct mlx5_cmd *cmd)
+static int cmd_alloc_index(struct mlx5_cmd *cmd, struct mlx5_cmd_work_ent *ent)
{
unsigned long flags;
int ret;
spin_lock_irqsave(&cmd->alloc_lock, flags);
ret = find_first_bit(&cmd->bitmask, cmd->max_reg_cmds);
- if (ret < cmd->max_reg_cmds)
+ if (ret < cmd->max_reg_cmds) {
clear_bit(ret, &cmd->bitmask);
+ ent->idx = ret;
+ cmd->ent_arr[ent->idx] = ent;
+ }
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
return ret < cmd->max_reg_cmds ? ret : -ENOMEM;
@@ -905,7 +908,7 @@ static void cmd_work_handler(struct work_struct *work)
sem = ent->page_queue ? &cmd->pages_sem : &cmd->sem;
down(sem);
if (!ent->page_queue) {
- alloc_ret = cmd_alloc_index(cmd);
+ alloc_ret = cmd_alloc_index(cmd, ent);
if (alloc_ret < 0) {
mlx5_core_err(dev, "failed to allocate command entry\n");
if (ent->callback) {
@@ -920,15 +923,14 @@ static void cmd_work_handler(struct work_struct *work)
up(sem);
return;
}
- ent->idx = alloc_ret;
} else {
ent->idx = cmd->max_reg_cmds;
spin_lock_irqsave(&cmd->alloc_lock, flags);
clear_bit(ent->idx, &cmd->bitmask);
+ cmd->ent_arr[ent->idx] = ent;
spin_unlock_irqrestore(&cmd->alloc_lock, flags);
}
- cmd->ent_arr[ent->idx] = ent;
lay = get_inst(cmd, ent->idx);
ent->lay = lay;
memset(lay, 0, sizeof(*lay));
--
2.43.0
Please backport the following fixes for selftest/seccomp.
I forgot to send the original patches to the stable list!
They resolve some edge cases in the testing.
commit 8e3c9f9f3a0742cd12b682a1766674253b33fcf0
selftests/seccomp: user_notification_addfd check nextfd is available
commit: 471dbc547612adeaa769e48498ef591c6c95a57a
selftests/seccomp: Change the syscall used in KILL_THREAD test
commit: ecaaa55c9fa5e8058445a8b891070b12208cdb6d
selftests/seccomp: Handle EINVAL on unshare(CLONE_NEWPID)
Please backport to 6.6 onwards.
Thanks,
Terry.
cc stable(a)vger.kernel.org
On Fri, 2024-04-26 at 10:50 +0200, Greg Kroah-Hartman wrote:
> Hi,
>
> This is the friendly email-bot of Greg Kroah-Hartman's inbox. I've
> detected that you have sent him a direct question that might be
> better
> sent to a public mailing list which is much faster in responding to
> questions than Greg normally is.
>
> Please try asking one of the following mailing lists for your
> questions:
>
> For udev and hotplug related questions, please ask on the
> linux-hotplug(a)vger.kernel.org mailing list
>
> For USB related questions, please ask on the
> linux-usb(a)vger.kernel.org
> mailing list
>
> For PCI related questions, please ask on the
> linux-pci(a)vger.kernel.org or linux-kernel(a)vger.kernel.org mailing
> lists
>
> For serial and tty related questions, please ask on the
> linux-serial(a)vger.kernel.org mailing list.
>
> For staging tree related questions, please ask on the
> linux-staging(a)lists.linux.dev mailing list.
>
> For general kernel related questions, please ask on the
> kernelnewbies(a)nl.linux.org or linux-kernel(a)vger.kernel.org mailing
> lists, depending on the type of question. More basic, beginner
> questions are better asked on the kernelnewbies list, after reading
> the wiki at www.kernelnewbies.org.
>
> For Linux stable and longterm kernel release questions or patches
> to
> be included in the stable or longterm kernel trees, please email
> stable(a)vger.kernel.org and Cc: the linux-kernel(a)vger.kernel.org
> mailing list so all of the stable kernel developers can be
> notified.
> Also please read Documentation/process/stable-kernel-rules.rst in
> the
> Linux kernel tree for the proper procedure to get patches accepted
> into the stable or longterm kernel releases.
>
> If you really want to ask Greg the question, please read the
> following
> two links as to why emailing a single person directly is usually not
> a
> good thing, and causes extra work on a single person:
> http://www.arm.linux.org.uk/news/?newsitem=11
> http://www.eyrie.org/~eagle/faqs/questions.html
>
> After reading those messages, and you still feel that you want to
> email
> Greg instead of posting on a mailing list, then resend your message
> within 24 hours and it will go through to him. But be forewarned,
> his
> email inbox currently looks like:
> 912 messages in /home/greg/mail/INBOX/
> so it might be a while before he gets to the message.
>
> Thank you for your understanding.
>
> The email triggering this response has been automatically discarded.
>
> thanks,
>
> greg k-h's email bot
Please consider the commits below for backporting to v5.15. These
patches are prerequisites for the backport of the x86 EFI stub
refactor that is needed for distros to sign v5.15 images for secure
boot in a way that complies with new MS requirements for memory
protections while running in the EFI firmware.
All patches either predate v6.1 or have been backported to it already.
The remaining ~50 changes will be posted as a patch series in due
time, as they will not apply cleanly to v5.15.
Please apply in the order that they appear below.
Thanks,
Ard.
44f155b4b07b8293472c9797d5b39839b91041ca
4da87c51705815fe1fbd41cc61640bb80da5bc54
7c4146e8885512719a50b641e9277a1712e052ff
176db622573f028f85221873ea4577e096785315
950d00558a920227b5703d1fcc4751cfe03853cd
ec1c66af3a30d45c2420da0974c01d3515dba26e
a9ee679b1f8c3803490ed2eeffb688aaee56583f
3ba75c1316390b2bc39c19cb8f0f85922ab3f9ed
82e0d6d76a2a74bd6a31141d555d53b4cc22c2a3
31f1a0edff78c43e8a3bd3692af0db1b25c21b17
9cf42bca30e98a1c6c9e8abf876940a551eaa3d1
cb8bda8ad4438b4bcfcf89697fc84803fb210017
e2ab9eab324cdf240de89741e4a1aa79919f0196
5c3a85f35b583259cf5ca0344cd79c8899ba1bb7
91592b5c0c2f076ff9d8cc0c14aa563448ac9fc4
73a6dec80e2acedaef3ca603d4b5799049f6e9f8
7f22ca396778fea9332d83ec2359dbe8396e9a06
4b52016247aeaa55ca3e3bc2e03cd91114c145c2
630f337f0c4fd80390e8600adcab31550aea33df
db14655ad7854b69a2efda348e30d02dbc19e8a1
bad267f9e18f8e9e628abd1811d2899b1735a4e1
62b71cd73d41ddac6b1760402bbe8c4932e23531
cc3fdda2876e58a7e83e558ab51853cf106afb6a
d2d7a54f69b67cd0a30e0ebb5307cb2de625baac
I'm announcing the release of the 6.1.89 kernel.
Only users of the 6.1 kernel series that had build problems with 6.1.88 need to upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
arch/arm/mach-omap2/pdata-quirks.c | 10 -----
sound/soc/ti/omap3pandora.c | 63 +++++++++++++++++++++++--------------
3 files changed, 41 insertions(+), 34 deletions(-)
Greg Kroah-Hartman (2):
Revert "ASoC: ti: Convert Pandora ASoC to GPIO descriptors"
Linux 6.1.89
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and `MADV_DONTNEED` semantics). Consider this race:
1. T1 performs page removal in sgx_encl_remove_pages() and stops right
after removing the page table entry and right before re-acquiring the
enclave lock to EREMOVE and xa_erase(&encl->page_array) the page.
2. T2 tries to access the page, and #PF[not_present] is raised. The
condition to EAUG in sgx_vma_fault() is not satisfied because the
page is still present in encl->page_array, thus the SGX driver
assumes that the fault happened because the page was swapped out. The
driver continues on a code path that installs a page table entry
*without* performing EAUG.
3. The enclave page metadata is in inconsistent state: the PTE is
installed but there was no EAUG. Thus, T2 in userspace infinitely
receives SIGSEGV on this page (and EACCEPT always fails).
Fix this by making sure that T1 (the page-removing thread) always wins
this data race. In particular, the page-being-removed is marked as such,
and T2 retries until the page is fully removed.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 3 ++-
arch/x86/kernel/cpu/sgx/encl.h | 3 +++
arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+ SGX_ENCL_PAGE_BEING_REMOVED))
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@
/* 'desc' bit marking that the page is being reclaimed. */
#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2)
+
struct sgx_encl_page {
unsigned long desc;
unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c542d4dd3e64 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
*/
+ entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.34.1
Two enclave threads may try to access the same non-present enclave page
simultaneously (e.g., if the SGX runtime supports lazy allocation). The
threads will end up in sgx_encl_eaug_page(), racing to acquire the
enclave lock. The winning thread will perform EAUG, set up the page
table entry, and insert the page into encl->page_array. The losing
thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed
to error handling path.
This error handling path contains two bugs: (1) SIGBUS is sent to
userspace even though the enclave page is correctly installed by another
thread, and (2) sgx_encl_free_epc_page() is called that performs EREMOVE
even though the enclave page was never intended to be removed. The first
bug is less severe because it impacts only the user space; the second
bug is more severe because it also impacts the OS state by ripping the
page (added by the winning thread) from the enclave.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux
fault handler so that no signal is sent to userspace, and (2) by
replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no
EREMOVE is performed.
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.34.1