From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
One alternative to the fix Christian proposed in
https://lore.kernel.org/dri-devel/20241024124159.4519-3-christian.koenig@am…
is to replace the rather complex open coded sorting loops with the kernel
standard sort followed by a context squashing pass.
Proposed advantage of this would be readability but one concern Christian
raised was that there could be many fences, that they are typically mostly
sorted, and so the kernel's heap sort would be much worse by the proposed
algorithm.
I had a look running some games and vkcube to see what are the typical
number of input fences. Tested scenarios:
1) Hogwarts Legacy under Gamescope
450 calls per second to __dma_fence_unwrap_merge.
Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:
N Before After
0 0.91%
1 69.40%
2-3 28.72% 9.4% (90.6% resolved to one fence)
4-5 0.93%
6-9 0.03%
10+
2) Cyberpunk 2077 under Gamescope
1050 calls per second, amounting to 0.01% CPU time according to perf top.
N Before After
0 1.13%
1 52.30%
2-3 40.34% 55.57%
4-5 1.46% 0.50%
6-9 2.44%
10+ 2.34%
3) vkcube under Plasma
90 calls per second.
N Before After
0
1
2-3 100% 0% (Ie. all resolved to a single fence)
4-5
6-9
10+
In the case of vkcube all invocations in the 2-3 bucket were actually
just two input fences.
From these numbers it looks like the heap sort should not be a
disadvantage, given how the dominant case is <= 2 input fences which heap
sort solves with just one compare and swap. (And for the case of one input
fence we have a fast path in the previous patch.)
A complementary possibility is to implement a different sorting algorithm
under the same API as the kernel's sort() and so keep the simplicity,
potentially moving the new sort under lib/ if it would be found more
widely useful.
v2:
* Hold on to fence references and reduce commentary. (Christian)
* Record and use latest signaled timestamp in the 2nd loop too.
* Consolidate zero or one fences fast paths.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: 245a4a7b531c ("dma-buf: generalize dma_fence unwrap & merging v3")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3617
Cc: Christian König <christian.koenig(a)amd.com>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Gustavo Padovan <gustavo(a)padovan.org>
Cc: Friedrich Vock <friedrich.vock(a)gmx.de>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: <stable(a)vger.kernel.org> # v6.0+
---
drivers/dma-buf/dma-fence-unwrap.c | 129 ++++++++++++++---------------
1 file changed, 64 insertions(+), 65 deletions(-)
diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
index 628af51c81af..26cad03340ce 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -12,6 +12,7 @@
#include <linux/dma-fence-chain.h>
#include <linux/dma-fence-unwrap.h>
#include <linux/slab.h>
+#include <linux/sort.h>
/* Internal helper to start new array iteration, don't use directly */
static struct dma_fence *
@@ -59,6 +60,25 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor)
}
EXPORT_SYMBOL_GPL(dma_fence_unwrap_next);
+
+static int fence_cmp(const void *_a, const void *_b)
+{
+ struct dma_fence *a = *(struct dma_fence **)_a;
+ struct dma_fence *b = *(struct dma_fence **)_b;
+
+ if (a->context < b->context)
+ return -1;
+ else if (a->context > b->context)
+ return 1;
+
+ if (dma_fence_is_later(b, a))
+ return -1;
+ else if (dma_fence_is_later(a, b))
+ return 1;
+
+ return 0;
+}
+
/* Implementation for the dma_fence_merge() marco, don't use directly */
struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence **fences,
@@ -67,8 +87,7 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
struct dma_fence_array *result;
struct dma_fence *tmp, **array;
ktime_t timestamp;
- unsigned int i;
- size_t count;
+ int i, j, count;
count = 0;
timestamp = ns_to_ktime(0);
@@ -96,78 +115,58 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
if (!array)
return NULL;
- /*
- * This trashes the input fence array and uses it as position for the
- * following merge loop. This works because the dma_fence_merge()
- * wrapper macro is creating this temporary array on the stack together
- * with the iterators.
- */
- for (i = 0; i < num_fences; ++i)
- fences[i] = dma_fence_unwrap_first(fences[i], &iter[i]);
-
count = 0;
- do {
- unsigned int sel;
-
-restart:
- tmp = NULL;
- for (i = 0; i < num_fences; ++i) {
- struct dma_fence *next;
-
- while (fences[i] && dma_fence_is_signaled(fences[i]))
- fences[i] = dma_fence_unwrap_next(&iter[i]);
-
- next = fences[i];
- if (!next)
- continue;
-
- /*
- * We can't guarantee that inpute fences are ordered by
- * context, but it is still quite likely when this
- * function is used multiple times. So attempt to order
- * the fences by context as we pass over them and merge
- * fences with the same context.
- */
- if (!tmp || tmp->context > next->context) {
- tmp = next;
- sel = i;
-
- } else if (tmp->context < next->context) {
- continue;
-
- } else if (dma_fence_is_later(tmp, next)) {
- fences[i] = dma_fence_unwrap_next(&iter[i]);
- goto restart;
+ for (i = 0; i < num_fences; ++i) {
+ dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
+ if (!dma_fence_is_signaled(tmp)) {
+ array[count++] = dma_fence_get(tmp);
} else {
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
- goto restart;
+ ktime_t t = dma_fence_timestamp(tmp);
+
+ if (ktime_after(t, timestamp))
+ timestamp = t;
}
}
+ }
- if (tmp) {
- array[count++] = dma_fence_get(tmp);
- fences[sel] = dma_fence_unwrap_next(&iter[sel]);
+ if (count == 0 || count == 1)
+ goto return_fastpath;
+
+ sort(array, count, sizeof(*array), fence_cmp, NULL);
+
+ /*
+ * Only keep the most recent fence for each context.
+ */
+ j = 0;
+ tmp = array[0];
+ for (i = 1; i < count; i++) {
+ if (array[i]->context != tmp->context)
+ array[j++] = tmp;
+ else
+ dma_fence_put(tmp);
+ tmp = array[i];
+ }
+ if (j == 0 || tmp->context != array[j - 1]->context) {
+ array[j++] = tmp;
+ }
+ count = j;
+
+ if (count > 1) {
+ result = dma_fence_array_create(count, array,
+ dma_fence_context_alloc(1),
+ 1, false);
+ if (!result) {
+ tmp = NULL;
+ goto return_tmp;
}
- } while (tmp);
-
- if (count == 0) {
- tmp = dma_fence_allocate_private_stub(ktime_get());
- goto return_tmp;
+ return &result->base;
}
- if (count == 1) {
+return_fastpath:
+ if (count == 0)
+ tmp = dma_fence_allocate_private_stub(timestamp);
+ else
tmp = array[0];
- goto return_tmp;
- }
-
- result = dma_fence_array_create(count, array,
- dma_fence_context_alloc(1),
- 1, false);
- if (!result) {
- tmp = NULL;
- goto return_tmp;
- }
- return &result->base;
return_tmp:
kfree(array);
--
2.46.0
When I reworked delayed ref comparison in cf4f04325b2b ("btrfs: move
->parent and ->ref_root into btrfs_delayed_ref_node"), I made a mistake
and returned -1 for the case where ref1->ref_root was > than
ref2->ref_root. This is a subtle bug that can result in improper
delayed ref running order, which can result in transaction aborts.
cc: stable(a)vger.kernel.org
Fixes: cf4f04325b2b ("btrfs: move ->parent and ->ref_root into btrfs_delayed_ref_node")
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/delayed-ref.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index 4d2ad5b66928..0d878dbbabba 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -299,7 +299,7 @@ static int comp_refs(struct btrfs_delayed_ref_node *ref1,
if (ref1->ref_root < ref2->ref_root)
return -1;
if (ref1->ref_root > ref2->ref_root)
- return -1;
+ return 1;
if (ref1->type == BTRFS_EXTENT_DATA_REF_KEY)
ret = comp_data_refs(ref1, ref2);
}
--
2.43.0
From: "Darrick J. Wong" <djwong(a)kernel.org>
In commit ca6448aed4f10a, we created an "end_daddr" variable to fix
fsmap reporting when the end of the range requested falls in the middle
of an unknown (aka free on the rmapbt) region. Unfortunately, I didn't
notice that the the code sets end_daddr to the last sector of the device
but then uses that quantity to compute the length of the synthesized
mapping.
Zizhi Wo later observed that when end_daddr isn't set, we still don't
report the last fsblock on a device because in that case (aka when
info->last is true), the info->high mapping that we pass to
xfs_getfsmap_group_helper has a startblock that points to the last
fsblock. This is also wrong because the code uses startblock to
compute the length of the synthesized mapping.
Fix the second problem by setting end_daddr unconditionally, and fix the
first problem by setting start_daddr to one past the end of the range to
query.
Cc: <stable(a)vger.kernel.org> # v6.11
Fixes: ca6448aed4f10a ("xfs: Fix missing interval for missing_owner in xfs fsmap")
Signed-off-by: Darrick J. Wong <djwong(a)kernel.org>
Reported-by: Zizhi Wo <wozizhi(a)huawei.com>
---
fs/xfs/xfs_fsmap.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c
index 8d5d4d172d15..59b7a8e50414 100644
--- a/fs/xfs/xfs_fsmap.c
+++ b/fs/xfs/xfs_fsmap.c
@@ -165,7 +165,8 @@ struct xfs_getfsmap_info {
xfs_daddr_t next_daddr; /* next daddr we expect */
/* daddr of low fsmap key when we're using the rtbitmap */
xfs_daddr_t low_daddr;
- xfs_daddr_t end_daddr; /* daddr of high fsmap key */
+ /* daddr of high fsmap key, or the last daddr on the device */
+ xfs_daddr_t end_daddr;
u64 missing_owner; /* owner of holes */
u32 dev; /* device id */
/*
@@ -388,8 +389,8 @@ xfs_getfsmap_group_helper(
* we calculated from userspace's high key to synthesize the record.
* Note that if the btree query found a mapping, there won't be a gap.
*/
- if (info->last && info->end_daddr != XFS_BUF_DADDR_NULL)
- frec->start_daddr = info->end_daddr;
+ if (info->last)
+ frec->start_daddr = info->end_daddr + 1;
else
frec->start_daddr = xfs_gbno_to_daddr(xg, startblock);
@@ -737,8 +738,8 @@ xfs_getfsmap_rtdev_rtbitmap_helper(
* we calculated from userspace's high key to synthesize the record.
* Note that if the btree query found a mapping, there won't be a gap.
*/
- if (info->last && info->end_daddr != XFS_BUF_DADDR_NULL) {
- frec.start_daddr = info->end_daddr;
+ if (info->last) {
+ frec.start_daddr = info->end_daddr + 1;
} else {
frec.start_daddr = xfs_rtb_to_daddr(mp, start_rtb);
}
@@ -1108,7 +1109,10 @@ xfs_getfsmap(
struct xfs_trans *tp = NULL;
struct xfs_fsmap dkeys[2]; /* per-dev keys */
struct xfs_getfsmap_dev handlers[XFS_GETFSMAP_DEVS];
- struct xfs_getfsmap_info info = { NULL };
+ struct xfs_getfsmap_info info = {
+ .fsmap_recs = fsmap_recs,
+ .head = head,
+ };
bool use_rmap;
int i;
int error = 0;
@@ -1185,9 +1189,6 @@ xfs_getfsmap(
info.next_daddr = head->fmh_keys[0].fmr_physical +
head->fmh_keys[0].fmr_length;
- info.end_daddr = XFS_BUF_DADDR_NULL;
- info.fsmap_recs = fsmap_recs;
- info.head = head;
/* For each device we support... */
for (i = 0; i < XFS_GETFSMAP_DEVS; i++) {
@@ -1200,17 +1201,23 @@ xfs_getfsmap(
break;
/*
- * If this device number matches the high key, we have
- * to pass the high key to the handler to limit the
- * query results. If the device number exceeds the
- * low key, zero out the low key so that we get
- * everything from the beginning.
+ * If this device number matches the high key, we have to pass
+ * the high key to the handler to limit the query results, and
+ * set the end_daddr so that we can synthesize records at the
+ * end of the query range or device.
*/
if (handlers[i].dev == head->fmh_keys[1].fmr_device) {
dkeys[1] = head->fmh_keys[1];
info.end_daddr = min(handlers[i].nr_sectors - 1,
dkeys[1].fmr_physical);
+ } else {
+ info.end_daddr = handlers[i].nr_sectors - 1;
}
+
+ /*
+ * If the device number exceeds the low key, zero out the low
+ * key so that we get everything from the beginning.
+ */
if (handlers[i].dev > head->fmh_keys[0].fmr_device)
memset(&dkeys[0], 0, sizeof(struct xfs_fsmap));
--
2.45.2
I'm announcing the release of the 5.15.172 kernel.
All users of the 5.15 kernel series must upgrade.
The updated 5.15.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.15.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/boot/dts/rk3036-kylin.dts | 4
arch/arm/boot/dts/rk3036.dtsi | 14 -
arch/arm64/boot/dts/freescale/imx8mp.dtsi | 6
arch/arm64/boot/dts/rockchip/rk3308-roc-cc.dts | 4
arch/arm64/boot/dts/rockchip/rk3328.dtsi | 3
arch/arm64/boot/dts/rockchip/rk3368-lion.dtsi | 1
arch/arm64/boot/dts/rockchip/rk3399-rock960.dtsi | 2
arch/arm64/boot/dts/rockchip/rk3399-sapphire-excavator.dts | 2
drivers/acpi/prmt.c | 2
drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 4
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 8
drivers/hid/hid-core.c | 2
drivers/irqchip/irq-gic-v3.c | 7
drivers/md/dm-cache-target.c | 35 +-
drivers/md/dm-unstripe.c | 4
drivers/media/cec/usb/pulse8/pulse8-cec.c | 2
drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 3
drivers/media/dvb-core/dvb_frontend.c | 4
drivers/media/dvb-core/dvbdev.c | 17 +
drivers/media/dvb-frontends/cx24116.c | 7
drivers/media/dvb-frontends/stb0899_algo.c | 2
drivers/media/i2c/adv7604.c | 26 +
drivers/media/platform/s5p-jpeg/jpeg-core.c | 17 -
drivers/media/usb/uvc/uvc_driver.c | 2
drivers/media/v4l2-core/v4l2-ctrls-api.c | 17 -
drivers/net/can/c_can/c_can_main.c | 7
drivers/net/ethernet/arc/emac_main.c | 27 +-
drivers/net/ethernet/freescale/enetc/enetc_vf.c | 9
drivers/net/ethernet/hisilicon/hns3/hnae3.c | 5
drivers/net/ethernet/intel/i40e/i40e.h | 1
drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 1
drivers/net/ethernet/intel/i40e/i40e_main.c | 12
drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c | 2
drivers/net/ethernet/intel/ice/ice_fdir.h | 3
drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c | 16 +
drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.h | 1
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1
drivers/net/phy/dp83848.c | 2
drivers/pwm/pwm-imx-tpm.c | 4
drivers/scsi/sd_zbc.c | 3
drivers/thermal/qcom/lmh.c | 7
drivers/usb/dwc3/core.c | 25 -
drivers/usb/musb/sunxi.c | 2
drivers/usb/serial/io_edgeport.c | 8
drivers/usb/serial/option.c | 6
drivers/usb/serial/qcserial.c | 2
drivers/usb/typec/ucsi/ucsi_ccg.c | 2
fs/btrfs/delayed-ref.c | 2
fs/nfs/inode.c | 126 ++++++++-
fs/nfs/nfstrace.h | 1
fs/nfs/super.c | 10
fs/ocfs2/xattr.c | 3
fs/proc/vmcore.c | 9
include/linux/fs.h | 36 ++
include/linux/nfs_fs.h | 47 +++
include/linux/tick.h | 8
io_uring/io_uring.c | 50 ++-
kernel/fork.c | 2
kernel/ucount.c | 3
net/bridge/br_device.c | 5
net/core/dst.c | 17 -
net/sctp/sm_statefuns.c | 2
net/vmw_vsock/hyperv_transport.c | 1
net/vmw_vsock/virtio_transport_common.c | 1
security/keys/keyring.c | 7
sound/firewire/tascam/amdtp-tascam.c | 2
sound/pci/hda/patch_conexant.c | 2
sound/soc/stm/stm32_spdifrx.c | 2
sound/usb/mixer.c | 1
sound/usb/mixer_quirks.c | 170 +++++++++++++
sound/usb/quirks.c | 2
72 files changed, 673 insertions(+), 179 deletions(-)
Ahmed Zaki (1):
ice: Add a per-VF limit on number of FDIR filters
Aleksandr Loktionov (1):
i40e: fix race condition by adding filter's intermediate sync state
Alex Deucher (2):
drm/amdgpu: Adjust debugfs eviction and IB access permissions
drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read()
Amelie Delaunay (1):
ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove
Amir Goldstein (3):
io_uring: rename kiocb_end_write() local helper
fs: create kiocb_{start,end}_write() helpers
io_uring: use kiocb_{start,end}_write() helpers
Andrei Vagin (1):
ucounts: fix counter leak in inc_rlimit_get_ucounts()
Andrew Kanner (1):
ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()
Antonio Quartulli (1):
drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported
Benjamin Coddington (1):
NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
Benjamin Segall (1):
posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
Benoit Sevens (1):
media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format
Benoît Monin (1):
USB: serial: option: add Quectel RG650V
Chen Ridong (1):
security/keys: fix slab-out-of-bounds in key_task_permission
Dan Carpenter (3):
usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd()
USB: serial: io_edgeport: fix use after free in debug printk
ACPI: PRM: Clean up guid type in struct prm_handler_info
Dario Binacchi (1):
can: c_can: fix {rx,tx}_errors statistics
Diederik de Haas (1):
arm64: dts: rockchip: Remove hdmi's 2nd interrupt on rk3328
Diogo Silva (1):
net: phy: ti: add PHY_RST_AFTER_CLK_EN flag
Dmitry Baryshkov (1):
thermal/drivers/qcom/lmh: Remove false lockdep backtrace
Eric Dumazet (1):
net: do not delay dst_entries_add() in dst_release()
Erik Schumacher (1):
pwm: imx-tpm: Use correct MODULO value for EPWM mode
Filipe Manana (1):
btrfs: reinitialize delayed ref list after deleting it from the list
Geert Uytterhoeven (1):
arm64: dts: rockchip: Fix rt5651 compatible value on rk3399-sapphire-excavator
Greg Kroah-Hartman (1):
Linux 5.15.172
Heiko Stuebner (7):
arm64: dts: rockchip: Fix bluetooth properties on Rock960 boards
arm64: dts: rockchip: Remove #cooling-cells from fan on Theobroma lion
arm64: dts: rockchip: Fix LED triggers on rk3308-roc-cc
ARM: dts: rockchip: fix rk3036 acodec node
ARM: dts: rockchip: drop grf reference from rk3036 hdmi
ARM: dts: rockchip: Fix the spi controller on rk3036
ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin
Hyunwoo Kim (2):
hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointer
vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans
Jack Wu (1):
USB: serial: qcserial: add support for Sierra Wireless EM86xx
Jan Schär (3):
ALSA: usb-audio: Support jack detection on Dell dock
ALSA: usb-audio: Add quirks for Dell WD19 dock
ALSA: usb-audio: Add endianness annotations
Jarosław Janik (1):
Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown"
Jens Axboe (1):
io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
Jiri Kosina (1):
HID: core: zero-initialize the report buffer
Johan Jonker (1):
net: arc: fix the device for dma_map_single/dma_unmap_single
Johannes Thumshirn (1):
scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer
Marc Zyngier (1):
irqchip/gic-v3: Force propagation of the active state with a read-back
Mauro Carvalho Chehab (9):
media: stb0899_algo: initialize cfr before using it
media: dvbdev: prevent the risk of out of memory access
media: dvb_frontend: don't play tricks with underflow values
media: adv7604: prevent underflow condition when reporting colorspace
media: s5p-jpeg: prevent buffer overflows
media: cx24116: prevent overflows on SNR calculus
media: pulse8-cec: fix data timestamp at pulse8_setup()
media: v4l2-tpg: prevent the risk of a division by zero
media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl()
Mike Snitzer (1):
nfs: avoid i_lock contention in nfs_clear_invalid_mapping
Ming-Hung Tsai (4):
dm cache: correct the number of origin blocks to match the target length
dm cache: fix out-of-bounds access to the dirty bitset when resizing
dm cache: optimize dirty bit checking with find_next_bit when resizing
dm cache: fix potential out-of-bounds access on the first resume
Murad Masimov (1):
ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init()
NeilBrown (2):
NFSv3: only use NFS timeout for MOUNT when protocols are compatible
NFSv3: handle out-of-order write replies.
Nikolay Aleksandrov (1):
net: bridge: xmit: make sure we have at least eth header len bytes
Nícolas F. R. A. Prado (1):
net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case
Peiyang Wang (1):
net: hns3: fix kernel crash when uninstalling driver
Peng Fan (1):
arm64: dts: imx8mp: correct sdhc ipg clk
Qi Xi (1):
fs/proc: fix compile warning about variable 'vmcore_mmap_ops'
Reinhard Speyerer (1):
USB: serial: option: add Fibocom FG132 0x0112 composition
Roberto Sassu (1):
nfs: Fix KMSAN warning in decode_getfattr_attrs()
Roger Quadros (1):
usb: dwc3: fix fault at system suspend if device was already runtime suspended
Takashi Iwai (1):
ALSA: usb-audio: Add quirk for HP 320 FHD Webcam
Wei Fang (1):
net: enetc: set MAC address to the VF net_device
Xin Long (1):
sctp: properly validate chunk size in sctp_sf_ootb()
Zichen Xie (1):
dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow
Zijun Hu (1):
usb: musb: sunxi: Fix accessing an released usb phy
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111112-follicle-scapegoat-c6bf@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0fb4a7ad270b3b209e510eb9dc5b07bf02b7edaf Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Date: Tue, 29 Oct 2024 18:11:46 +0000
Subject: [PATCH] mm: refactor map_deny_write_exec()
Refactor the map_deny_write_exec() to not unnecessarily require a VMA
parameter but rather to accept VMA flags parameters, which allows us to
use this function early in mmap_region() in a subsequent commit.
While we're here, we refactor the function to be more readable and add
some additional documentation.
Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246…
Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Reviewed-by: Jann Horn <jannh(a)google.com>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Helge Deller <deller(a)gmx.de>
Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Mark Brown <broonie(a)kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mman.h b/include/linux/mman.h
index bcb201ab7a41..8ddca62d6460 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -188,16 +188,31 @@ static inline bool arch_memory_deny_write_exec_supported(void)
*
* d) mmap(PROT_READ | PROT_EXEC)
* mmap(PROT_READ | PROT_EXEC | PROT_BTI)
+ *
+ * This is only applicable if the user has set the Memory-Deny-Write-Execute
+ * (MDWE) protection mask for the current process.
+ *
+ * @old specifies the VMA flags the VMA originally possessed, and @new the ones
+ * we propose to set.
+ *
+ * Return: false if proposed change is OK, true if not ok and should be denied.
*/
-static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags)
+static inline bool map_deny_write_exec(unsigned long old, unsigned long new)
{
+ /* If MDWE is disabled, we have nothing to deny. */
if (!test_bit(MMF_HAS_MDWE, ¤t->mm->flags))
return false;
- if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE))
+ /* If the new VMA is not executable, we have nothing to deny. */
+ if (!(new & VM_EXEC))
+ return false;
+
+ /* Under MDWE we do not accept newly writably executable VMAs... */
+ if (new & VM_WRITE)
return true;
- if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC))
+ /* ...nor previously non-executable VMAs becoming executable. */
+ if (!(old & VM_EXEC))
return true;
return false;
diff --git a/mm/mmap.c b/mm/mmap.c
index ac0604f146f6..ab71d4c3464c 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
vma_set_anonymous(vma);
}
- if (map_deny_write_exec(vma, vma->vm_flags)) {
+ if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) {
error = -EACCES;
goto close_and_free_vma;
}
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 0c5d6d06107d..6f450af3252e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
break;
}
- if (map_deny_write_exec(vma, newflags)) {
+ if (map_deny_write_exec(vma->vm_flags, newflags)) {
error = -EACCES;
break;
}
diff --git a/mm/vma.h b/mm/vma.h
index 75558b5e9c8c..d58068c0ff2e 100644
--- a/mm/vma.h
+++ b/mm/vma.h
@@ -42,7 +42,7 @@ struct vma_munmap_struct {
int vma_count; /* Number of vmas that will be removed */
bool unlock; /* Unlock after the munmap */
bool clear_ptes; /* If there are outstanding PTE to be cleared */
- /* 1 byte hole */
+ /* 2 byte hole */
unsigned long nr_pages; /* Number of pages being removed */
unsigned long locked_vm; /* Number of locked pages */
unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */
Commit 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of
tpidrro_el0 for native tasks") tried to optimise the context switching
of tpidrro_el0 by eliding the clearing of the register when switching
to a native task with kpti enabled, on the erroneous assumption that
the kpti trampoline entry code would already have taken care of the
write.
Although the kpti trampoline does zero the register on entry from a
native task, the check in tls_thread_switch() is on the *next* task and
so we can end up leaving a stale, non-zero value in the register if the
previous task was 32-bit.
Drop the broken optimisation and zero tpidrro_el0 unconditionally when
switching to a native 64-bit task.
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Fixes: 18011eac28c7 ("arm64: tls: Avoid unconditional zeroing of tpidrro_el0 for native tasks")
Signed-off-by: Will Deacon <will(a)kernel.org>
---
You fix one side-channel and introduce another... :(
arch/arm64/kernel/process.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 3e7c8c8195c3..2bbcbb11d844 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -442,7 +442,7 @@ static void tls_thread_switch(struct task_struct *next)
if (is_compat_thread(task_thread_info(next)))
write_sysreg(next->thread.uw.tp_value, tpidrro_el0);
- else if (!arm64_kernel_unmapped_at_el0())
+ else
write_sysreg(0, tpidrro_el0);
write_sysreg(*task_user_tls(next), tpidr_el0);
--
2.47.0.277.g8800431eea-goog