If 'list_limit' is set to a very high value, 'lsize' computation could
overflow if 'head.count' is big enough.
In such a case, udmabuf_create() will access to memory beyond 'list'.
Use size_mul() to saturate the value, and have memdup_user() fail.
Fixes: fbb0de795078 ("Add udmabuf misc device")
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
---
drivers/dma-buf/udmabuf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index c40645999648..fb4c4b5b3332 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -314,13 +314,13 @@ static long udmabuf_ioctl_create_list(struct file *filp, unsigned long arg)
struct udmabuf_create_list head;
struct udmabuf_create_item *list;
int ret = -EINVAL;
- u32 lsize;
+ size_t lsize;
if (copy_from_user(&head, (void __user *)arg, sizeof(head)))
return -EFAULT;
if (head.count > list_limit)
return -EINVAL;
- lsize = sizeof(struct udmabuf_create_item) * head.count;
+ lsize = size_mul(sizeof(struct udmabuf_create_item), head.count);
list = memdup_user((void __user *)(arg + sizeof(head)), lsize);
if (IS_ERR(list))
return PTR_ERR(list);
--
2.34.1
On Tue, Sep 19, 2023 at 10:26 PM CK Hu (胡俊光) <ck.hu(a)mediatek.com> wrote:
>
> Hi, Jason:
>
> On Tue, 2023-09-19 at 11:03 +0800, Jason-JH.Lin wrote:
> > The patch series provides drm driver support for enabling secure
> > video
> > path (SVP) playback on MediaiTek hardware in the Linux kernel.
> >
> > Memory Definitions:
> > secure memory - Memory allocated in the TEE (Trusted Execution
> > Environment) which is inaccessible in the REE (Rich Execution
> > Environment, i.e. linux kernel/userspace).
> > secure handle - Integer value which acts as reference to 'secure
> > memory'. Used in communication between TEE and REE to reference
> > 'secure memory'.
> > secure buffer - 'secure memory' that is used to store decrypted,
> > compressed video or for other general purposes in the TEE.
> > secure surface - 'secure memory' that is used to store graphic
> > buffers.
> >
> > Memory Usage in SVP:
> > The overall flow of SVP starts with encrypted video coming in from an
> > outside source into the REE. The REE will then allocate a 'secure
> > buffer' and send the corresponding 'secure handle' along with the
> > encrypted, compressed video data to the TEE. The TEE will then
> > decrypt
> > the video and store the result in the 'secure buffer'. The REE will
> > then allocate a 'secure surface'. The REE will pass the 'secure
> > handles' for both the 'secure buffer' and 'secure surface' into the
> > TEE for video decoding. The video decoder HW will then decode the
> > contents of the 'secure buffer' and place the result in the 'secure
> > surface'. The REE will then attach the 'secure surface' to the
> > overlay
> > plane for rendering of the video.
> >
> > Everything relating to ensuring security of the actual contents of
> > the
> > 'secure buffer' and 'secure surface' is out of scope for the REE and
> > is the responsibility of the TEE.
> >
> > DRM driver handles allocation of gem objects that are backed by a
> > 'secure
> > surface' and for displaying a 'secure surface' on the overlay plane.
> > This introduces a new flag for object creation called
> > DRM_MTK_GEM_CREATE_ENCRYPTED which indicates it should be a 'secure
> > surface'. All changes here are in MediaTek specific code.
>
> How do you define SVP? Is there standard requirement we could refer to?
> If the secure video buffer is read by display hardware and output to
> HDMI without any protection and user could capture HDMI signal, is this
> secure?
SVP (Secure Video Path) is essentially the video being completed
isolated from the kernel/userspace. The specific requirements for it
vary between implementations.
Regarding HDMI/HDCP output; it's the responsibility of the TEE to
enforce that. Nothing on the kernel/userspace side needs to be
concerned about enforcing HDCP. The only thing userspace is involved
in there is actually turning on HDCP via the kernel drivers; and then
the TEE ensures that it is active if the policy for the encrypted
content requires it.
>
> Regards,
> CK
>
> >
> > ---
> > Based on 2 series:
> > [1] Add CMDQ secure driver for SVP
> > -
> > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-medi…
> >
> >
> > [2] dma-buf: heaps: Add MediaTek secure heap
> > -
> > https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-medi…
> >
> > ---
> >
> > CK Hu (1):
> > drm/mediatek: Add interface to allocate MediaTek GEM buffer.
> >
> > Jason-JH.Lin (9):
> > drm/mediatek/uapi: Add DRM_MTK_GEM_CREATED_ENCRYPTTED flag
> > drm/mediatek: Add secure buffer control flow to mtk_drm_gem
> > drm/mediatek: Add secure identify flag and funcution to
> > mtk_drm_plane
> > drm/mediatek: Add mtk_ddp_sec_write to config secure buffer info
> > drm/mediatek: Add get_sec_port interface to mtk_ddp_comp
> > drm/mediatek: Add secure layer config support for ovl
> > drm/mediatek: Add secure layer config support for ovl_adaptor
> > drm/mediatek: Add secure flow support to mediatek-drm
> > arm64: dts: mt8195-cherry: Add secure mbox settings for vdosys
> >
> > .../boot/dts/mediatek/mt8195-cherry.dtsi | 10 +
> > drivers/gpu/drm/mediatek/mtk_disp_drv.h | 3 +
> > drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 31 +-
> > .../gpu/drm/mediatek/mtk_disp_ovl_adaptor.c | 15 +
> > drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 271
> > +++++++++++++++++-
> > drivers/gpu/drm/mediatek/mtk_drm_crtc.h | 1 +
> > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 14 +
> > drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 13 +
> > drivers/gpu/drm/mediatek/mtk_drm_drv.c | 16 +-
> > drivers/gpu/drm/mediatek/mtk_drm_gem.c | 121 ++++++++
> > drivers/gpu/drm/mediatek/mtk_drm_gem.h | 16 ++
> > drivers/gpu/drm/mediatek/mtk_drm_plane.c | 7 +
> > drivers/gpu/drm/mediatek/mtk_drm_plane.h | 2 +
> > drivers/gpu/drm/mediatek/mtk_mdp_rdma.c | 11 +-
> > drivers/gpu/drm/mediatek/mtk_mdp_rdma.h | 2 +
> > include/uapi/drm/mediatek_drm.h | 59 ++++
> > 16 files changed, 575 insertions(+), 17 deletions(-)
> > create mode 100644 include/uapi/drm/mediatek_drm.h
> >
The patch series provides drm driver support for enabling secure video
path (SVP) playback on MediaiTek hardware in the Linux kernel.
Memory Definitions:
secure memory - Memory allocated in the TEE (Trusted Execution
Environment) which is inaccessible in the REE (Rich Execution
Environment, i.e. linux kernel/userspace).
secure handle - Integer value which acts as reference to 'secure
memory'. Used in communication between TEE and REE to reference
'secure memory'.
secure buffer - 'secure memory' that is used to store decrypted,
compressed video or for other general purposes in the TEE.
secure surface - 'secure memory' that is used to store graphic buffers.
Memory Usage in SVP:
The overall flow of SVP starts with encrypted video coming in from an
outside source into the REE. The REE will then allocate a 'secure
buffer' and send the corresponding 'secure handle' along with the
encrypted, compressed video data to the TEE. The TEE will then decrypt
the video and store the result in the 'secure buffer'. The REE will
then allocate a 'secure surface'. The REE will pass the 'secure
handles' for both the 'secure buffer' and 'secure surface' into the
TEE for video decoding. The video decoder HW will then decode the
contents of the 'secure buffer' and place the result in the 'secure
surface'. The REE will then attach the 'secure surface' to the overlay
plane for rendering of the video.
Everything relating to ensuring security of the actual contents of the
'secure buffer' and 'secure surface' is out of scope for the REE and
is the responsibility of the TEE.
DRM driver handles allocation of gem objects that are backed by a 'secure
surface' and for displaying a 'secure surface' on the overlay plane.
This introduces a new flag for object creation called
DRM_MTK_GEM_CREATE_ENCRYPTED which indicates it should be a 'secure
surface'. All changes here are in MediaTek specific code.
---
Based on 2 series:
[1] Add CMDQ secure driver for SVP
- https://patchwork.kernel.org/project/linux-mediatek/list/?series=785332
[2] dma-buf: heaps: Add MediaTek secure heap
- https://patchwork.kernel.org/project/linux-mediatek/list/?series=782776
---
CK Hu (1):
drm/mediatek: Add interface to allocate MediaTek GEM buffer.
Jason-JH.Lin (9):
drm/mediatek/uapi: Add DRM_MTK_GEM_CREATED_ENCRYPTTED flag
drm/mediatek: Add secure buffer control flow to mtk_drm_gem
drm/mediatek: Add secure identify flag and funcution to mtk_drm_plane
drm/mediatek: Add mtk_ddp_sec_write to config secure buffer info
drm/mediatek: Add get_sec_port interface to mtk_ddp_comp
drm/mediatek: Add secure layer config support for ovl
drm/mediatek: Add secure layer config support for ovl_adaptor
drm/mediatek: Add secure flow support to mediatek-drm
arm64: dts: mt8195-cherry: Add secure mbox settings for vdosys
.../boot/dts/mediatek/mt8195-cherry.dtsi | 10 +
drivers/gpu/drm/mediatek/mtk_disp_drv.h | 3 +
drivers/gpu/drm/mediatek/mtk_disp_ovl.c | 31 +-
.../gpu/drm/mediatek/mtk_disp_ovl_adaptor.c | 15 +
drivers/gpu/drm/mediatek/mtk_drm_crtc.c | 271 +++++++++++++++++-
drivers/gpu/drm/mediatek/mtk_drm_crtc.h | 1 +
drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.c | 14 +
drivers/gpu/drm/mediatek/mtk_drm_ddp_comp.h | 13 +
drivers/gpu/drm/mediatek/mtk_drm_drv.c | 16 +-
drivers/gpu/drm/mediatek/mtk_drm_gem.c | 121 ++++++++
drivers/gpu/drm/mediatek/mtk_drm_gem.h | 16 ++
drivers/gpu/drm/mediatek/mtk_drm_plane.c | 7 +
drivers/gpu/drm/mediatek/mtk_drm_plane.h | 2 +
drivers/gpu/drm/mediatek/mtk_mdp_rdma.c | 11 +-
drivers/gpu/drm/mediatek/mtk_mdp_rdma.h | 2 +
include/uapi/drm/mediatek_drm.h | 59 ++++
16 files changed, 575 insertions(+), 17 deletions(-)
create mode 100644 include/uapi/drm/mediatek_drm.h
--
2.18.0
Dzień dobry,
czy jest możliwość, by na pewien czas Państwa zakład produkcyjny mógł zrezygnować z części lub całości zużywanej energii?
W zamian za gotowość do redukcji i jej wykonanie mogą otrzymać Państwo stałe wynagrodzenie, które w przeliczeniu na 1 MW redukcji mocy wynosi od 500 do 720 tys. zł w zależności od czasu zdolności do redukcji.
Uczestnictwo w programie DSR (Demand Side Response) to dla Państwa brak kosztów implementacji i wzrost bezpieczeństwa energetycznego.
Jeśli interesuje Państwa generowanie wieloletnich przychodów z programu DSR, proszę o wiadomość.
Pozdrawiam
Dawid Jarocki
Hi all,
This is v2 to the linked patch series; thanks to everyone for reviewing
the initial version. I've moved this out of a pure DRM scope and into
the general userspace-API design section. Hopefully it helps others and
answers a bunch of questions.
I think it'd be great to have input/links/reflections from other
subsystems as well here.
Cheers,
Daniel
From: Rob Clark <robdclark(a)chromium.org>
If a signal callback releases the sw_sync fence, that will trigger a
deadlock as the timeline_fence_release recurses onto the fence->lock
(used both for signaling and the the timeline tree).
To avoid that, temporarily hold an extra reference to the signalled
fences until after we drop the lock.
(This is an alternative implementation of https://patchwork.kernel.org/patch/11664717/
which avoids some potential UAF issues with the original patch.)
v2: Remove now obsolete comment, use list_move_tail() and
list_del_init()
Reported-by: Bas Nieuwenhuizen <bas(a)basnieuwenhuizen.nl>
Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
---
drivers/dma-buf/sw_sync.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 63f0aeb66db6..f0a35277fd84 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -191,6 +191,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
*/
static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
{
+ LIST_HEAD(signalled);
struct sync_pt *pt, *next;
trace_sync_timeline(obj);
@@ -203,21 +204,20 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
if (!timeline_fence_signaled(&pt->base))
break;
- list_del_init(&pt->link);
+ dma_fence_get(&pt->base);
+
+ list_move_tail(&pt->link, &signalled);
rb_erase(&pt->node, &obj->pt_tree);
- /*
- * A signal callback may release the last reference to this
- * fence, causing it to be freed. That operation has to be
- * last to avoid a use after free inside this loop, and must
- * be after we remove the fence from the timeline in order to
- * prevent deadlocking on timeline->lock inside
- * timeline_fence_release().
- */
dma_fence_signal_locked(&pt->base);
}
spin_unlock_irq(&obj->lock);
+
+ list_for_each_entry_safe(pt, next, &signalled, link) {
+ list_del_init(&pt->link);
+ dma_fence_put(&pt->base);
+ }
}
/**
--
2.41.0
Hi,
This is your friendly bug reporter.
The environment is vanilla torvalds tree kernel on Ubuntu 22.04 LTS and a Ryzen 7950X box.
Please find attached the complete dmesg output from the ring buffer and lshw output.
NOTE: The kernel reports tainted kernel, but to my knowledge there are no proprietary (G) modules,
but this taint is turned on by the previous bugs.
dmesg excerpt:
[ 8791.864576] ==================================================================
[ 8791.864648] BUG: KCSAN: data-race in drm_sched_entity_is_ready [gpu_sched] / drm_sched_entity_push_job [gpu_sched]
[ 8791.864776] write (marked) to 0xffff9b74491b7c40 of 8 bytes by task 3807 on cpu 18:
[ 8791.864788] drm_sched_entity_push_job+0xf4/0x2a0 [gpu_sched]
[ 8791.864852] amdgpu_cs_ioctl+0x3888/0x3de0 [amdgpu]
[ 8791.868731] drm_ioctl_kernel+0x127/0x210 [drm]
[ 8791.869222] drm_ioctl+0x38f/0x6f0 [drm]
[ 8791.869711] amdgpu_drm_ioctl+0x7e/0xe0 [amdgpu]
[ 8791.873660] __x64_sys_ioctl+0xd2/0x120
[ 8791.873676] do_syscall_64+0x58/0x90
[ 8791.873688] entry_SYSCALL_64_after_hwframe+0x73/0xdd
[ 8791.873710] read to 0xffff9b74491b7c40 of 8 bytes by task 1119 on cpu 27:
[ 8791.873722] drm_sched_entity_is_ready+0x16/0x50 [gpu_sched]
[ 8791.873786] drm_sched_select_entity+0x1c7/0x220 [gpu_sched]
[ 8791.873849] drm_sched_main+0xd2/0x500 [gpu_sched]
[ 8791.873912] kthread+0x18b/0x1d0
[ 8791.873924] ret_from_fork+0x43/0x70
[ 8791.873939] ret_from_fork_asm+0x1b/0x30
[ 8791.873955] value changed: 0x0000000000000000 -> 0xffff9b750ebcfc00
[ 8791.873971] Reported by Kernel Concurrency Sanitizer on:
[ 8791.873980] CPU: 27 PID: 1119 Comm: gfx_0.0.0 Tainted: G L 6.5.0-rc6-net-cfg-kcsan-00038-g16931859a650 #35
[ 8791.873994] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023
[ 8791.874002] ==================================================================
Best regards,
Mirsad Todorovac
Documentation for drm_crtc_init_with_planes() in
drivers/gpu/drm/drm_crtc.c states: «The crtc structure should not be
allocated with devm_kzalloc()».
However, in drivers/gpu/drm/stm/ltdc.c
the 2nd argument of the function drm_crtc_init_with_planes()
is a structure allocated with devm_kzalloc()
Also, in
drivers/gpu/drm/mediatek/mtk_drm_crtc.c
drivers/gpu/drm/hisilicon/kirin/kirin_drm_drv.c
drivers/gpu/drm/logicvc/logicvc_crtc.c
drivers/gpu/drm/meson/meson_crtc.c
drivers/gpu/drm/mxsfb/lcdif_kms.c
drivers/gpu/drm/mxsfb/mxsfb_kms.c
drivers/gpu/drm/renesas/shmobile/shmob_drm_crtc.c
drivers/gpu/drm/rockchip/rockchip_drm_vop.c
drivers/gpu/drm/rockchip/rockchip_drm_vop2.c
drivers/gpu/drm/sun4i/sun4i_crtc.c
drivers/gpu/drm/tegra/dc.c
drivers/gpu/drm/tilcdc/tilcdc_crtc.c
the 2nd argument of the function drm_crtc_init_with_planes()
is a field of the structure allocated with devm_kzalloc()
Is it correct or can it lead to any problems?
--
Ekaterina Orlova
Linux Verification Center, ISPRAS
From: Rob Clark <robdclark(a)chromium.org>
If a signal callback releases the sw_sync fence, that will trigger a
deadlock as the timeline_fence_release recurses onto the fence->lock
(used both for signaling and the the timeline tree).
To avoid that, temporarily hold an extra reference to the signalled
fences until after we drop the lock.
(This is an alternative implementation of https://patchwork.kernel.org/patch/11664717/
which avoids some potential UAF issues with the original patch.)
Reported-by: Bas Nieuwenhuizen <bas(a)basnieuwenhuizen.nl>
Fixes: d3c6dd1fb30d ("dma-buf/sw_sync: Synchronize signal vs syncpt free")
Signed-off-by: Rob Clark <robdclark(a)chromium.org>
---
drivers/dma-buf/sw_sync.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 63f0aeb66db6..ceb6a0408624 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -191,6 +191,7 @@ static const struct dma_fence_ops timeline_fence_ops = {
*/
static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
{
+ LIST_HEAD(signalled);
struct sync_pt *pt, *next;
trace_sync_timeline(obj);
@@ -203,9 +204,13 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
if (!timeline_fence_signaled(&pt->base))
break;
+ dma_fence_get(&pt->base);
+
list_del_init(&pt->link);
rb_erase(&pt->node, &obj->pt_tree);
+ list_add_tail(&pt->link, &signalled);
+
/*
* A signal callback may release the last reference to this
* fence, causing it to be freed. That operation has to be
@@ -218,6 +223,11 @@ static void sync_timeline_signal(struct sync_timeline *obj, unsigned int inc)
}
spin_unlock_irq(&obj->lock);
+
+ list_for_each_entry_safe(pt, next, &signalled, link) {
+ list_del(&pt->link);
+ dma_fence_put(&pt->base);
+ }
}
/**
--
2.41.0
Hi Pintu,
On Sat, Jul 29, 2023 at 08:05:15AM +0530, Pintu Kumar wrote:
> The current global cma region name defined as "reserved"
> which is misleading, creates confusion and too generic.
>
> Also, the default cma allocation happens from global cma region,
> so, if one has to figure out all allocations happening from
> global cma region, this seems easier.
>
> Thus, change the name from "reserved" to "global-cma-region".
I agree that reserved is not a very useful name. Unfortuately the
name of the region leaks to userspace through cma_heap.
So I think we need prep patches to hardcode "reserved" in
add_default_cma_heap first, and then remove the cma_get_name
first.
From: Boris Brezillon <boris.brezillon(a)collabora.com>
[ Upstream commit e30cb0599799aac099209e3b045379613c80730e ]
drm_sched_entity_kill_jobs_cb() logic is omitting the last fence popped
from the dependency array that was waited upon before
drm_sched_entity_kill() was called (drm_sched_entity::dependency field),
so we're basically waiting for all dependencies except one.
In theory, this wait shouldn't be needed because resources should have
their users registered to the dma_resv object, thus guaranteeing that
future jobs wanting to access these resources wait on all the previous
users (depending on the access type, of course). But we want to keep
these explicit waits in the kill entity path just in case.
Let's make sure we keep all dependencies in the array in
drm_sched_job_dependency(), so we can iterate over the array and wait
in drm_sched_entity_kill_jobs_cb().
We also make sure we wait on drm_sched_fence::finished if we were
originally asked to wait on drm_sched_fence::scheduled. In that case,
we assume the intent was to delegate the wait to the firmware/GPU or
rely on the pipelining done at the entity/scheduler level, but when
killing jobs, we really want to wait for completion not just scheduling.
v2:
- Don't evict deps in drm_sched_job_dependency()
v3:
- Always wait for drm_sched_fence::finished fences in
drm_sched_entity_kill_jobs_cb() when we see a sched_fence
v4:
- Fix commit message
- Fix a use-after-free bug
v5:
- Flag deps on which we should only wait for the scheduled event
at insertion time
v6:
- Back to v4 implementation
- Add Christian's R-b
Cc: Frank Binns <frank.binns(a)imgtec.com>
Cc: Sarah Walker <sarah.walker(a)imgtec.com>
Cc: Donald Robson <donald.robson(a)imgtec.com>
Cc: Luben Tuikov <luben.tuikov(a)amd.com>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: "Christian König" <christian.koenig(a)amd.com>
Signed-off-by: Boris Brezillon <boris.brezillon(a)collabora.com>
Suggested-by: "Christian König" <christian.koenig(a)amd.com>
Reviewed-by: "Christian König" <christian.koenig(a)amd.com>
Acked-by: Luben Tuikov <luben.tuikov(a)amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230619071921.3465992-1-bori…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/scheduler/sched_entity.c | 41 +++++++++++++++++++-----
1 file changed, 33 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index e0a8890a62e23..42021d1f7e016 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -155,16 +155,32 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence *f,
{
struct drm_sched_job *job = container_of(cb, struct drm_sched_job,
finish_cb);
- int r;
+ unsigned long index;
dma_fence_put(f);
/* Wait for all dependencies to avoid data corruptions */
- while (!xa_empty(&job->dependencies)) {
- f = xa_erase(&job->dependencies, job->last_dependency++);
- r = dma_fence_add_callback(f, &job->finish_cb,
- drm_sched_entity_kill_jobs_cb);
- if (!r)
+ xa_for_each(&job->dependencies, index, f) {
+ struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
+
+ if (s_fence && f == &s_fence->scheduled) {
+ /* The dependencies array had a reference on the scheduled
+ * fence, and the finished fence refcount might have
+ * dropped to zero. Use dma_fence_get_rcu() so we get
+ * a NULL fence in that case.
+ */
+ f = dma_fence_get_rcu(&s_fence->finished);
+
+ /* Now that we have a reference on the finished fence,
+ * we can release the reference the dependencies array
+ * had on the scheduled fence.
+ */
+ dma_fence_put(&s_fence->scheduled);
+ }
+
+ xa_erase(&job->dependencies, index);
+ if (f && !dma_fence_add_callback(f, &job->finish_cb,
+ drm_sched_entity_kill_jobs_cb))
return;
dma_fence_put(f);
@@ -394,8 +410,17 @@ static struct dma_fence *
drm_sched_job_dependency(struct drm_sched_job *job,
struct drm_sched_entity *entity)
{
- if (!xa_empty(&job->dependencies))
- return xa_erase(&job->dependencies, job->last_dependency++);
+ struct dma_fence *f;
+
+ /* We keep the fence around, so we can iterate over all dependencies
+ * in drm_sched_entity_kill_jobs_cb() to ensure all deps are signaled
+ * before killing the job.
+ */
+ f = xa_load(&job->dependencies, job->last_dependency);
+ if (f) {
+ job->last_dependency++;
+ return dma_fence_get(f);
+ }
if (job->sched->ops->prepare_job)
return job->sched->ops->prepare_job(job, entity);
--
2.40.1
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
With this proposal we're able to reach ~96.6% line rate speeds with data sent
and received directly from/to device memory.
* Patch overview:
** Part 1: struct paged device memory
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to understand a new memory type.
This proposal implements option #1. We implement a small framework to generate
struct pages for an sg_table returned from dma_buf_map_attachment(). The support
added here should be generic and easily extended to other use cases interested
in struct paged device memory. We use this framework to generate pages that can
be used in the networking stack.
** Part 2: recvmsg() & sendmsg() APIs
We define user APIs for the user to send and receive these dmabuf pages.
** part 3: support for unreadable skb frags
Dmabuf pages are not accessible by the host; we implement changes throughput the
networking stack to correctly handle skbs with unreadable frags.
** part 4: page pool support
We piggy back on Jakub's page pool memory providers idea:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory TCP
changes from the driver.
This is not strictly necessary, the driver can choose to allocate dmabuf pages
and use them directly without going through the page pool (if acceptable to
their maintainers).
Not included with this RFC is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This RFC is built on top of v6.4-rc7 with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using dmabuf pages
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
Not included in this series is our devmem TCP benchmark, which
transfers data to/from GPU dmabufs directly.
With this implementation & benchmark we're able to reach ~96.6% line rate
speeds with 4 GPU/NIC pairs running bi-direction traffic, with all the
packet payloads going straight to the GPU memory (no host buffer bounce).
** Test Setup
Kernel: v6.4-rc7, with this RFC and Jakub's memory provider API
cherry-picked locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Benchmark: custom devmem TCP benchmark not yet open sourced.
Mina Almasry (10):
dma-buf: add support for paged attachment mappings
dma-buf: add support for NET_RX pages
dma-buf: add support for NET_TX pages
net: add support for skbs with unreadable frags
tcp: implement recvmsg() RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX pages
tcp: implement sendmsg() TX path for for devmem tcp
selftests: add ncdevmem, netcat for devmem TCP
memory-provider: updates core provider API for devmem TCP
memory-provider: add dmabuf devmem provider
drivers/dma-buf/dma-buf.c | 444 ++++++++++++++++
include/linux/dma-buf.h | 142 +++++
include/linux/netdevice.h | 1 +
include/linux/skbuff.h | 34 +-
include/linux/socket.h | 1 +
include/net/page_pool.h | 21 +
include/net/sock.h | 4 +
include/net/tcp.h | 6 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/dma-buf.h | 12 +
include/uapi/linux/uio.h | 10 +
net/core/datagram.c | 3 +
net/core/page_pool.c | 111 +++-
net/core/skbuff.c | 81 ++-
net/core/sock.c | 47 ++
net/ipv4/tcp.c | 262 +++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 8 +
net/ipv4/tcp_output.c | 5 +-
net/packet/af_packet.c | 4 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/ncdevmem.c | 693 +++++++++++++++++++++++++
23 files changed, 1868 insertions(+), 42 deletions(-)
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.41.0.390.g38632f3daf-goog
From: Luc Ma <luc(a)sietium.com>
The kernel-doc for DMA-BUF statistics mentions /sys/kernel/dma-buf/buffers
but the correct path is /sys/kernel/dmabuf/buffers instead.
Signed-off-by: Luc Ma <luc(a)sietium.com>
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
---
drivers/dma-buf/dma-buf-sysfs-stats.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
index 6cfbbf0720bd..b5b62e40ccc1 100644
--- a/drivers/dma-buf/dma-buf-sysfs-stats.c
+++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
@@ -33,7 +33,7 @@
* into their address space. This necessitated the creation of the DMA-BUF sysfs
* statistics interface to provide per-buffer information on production systems.
*
- * The interface at ``/sys/kernel/dma-buf/buffers`` exposes information about
+ * The interface at ``/sys/kernel/dmabuf/buffers`` exposes information about
* every DMA-BUF when ``CONFIG_DMABUF_SYSFS_STATS`` is enabled.
*
* The following stats are exposed by the interface:
--
2.25.1
As &ndlp->lock is acquired by timer lpfc_els_retry_delay() under softirq
context, process context code acquiring the lock &ndlp->lock should
disable irq or bh, otherwise deadlock could happen if the timer preempt
the execution while the lock is held in process context on the same CPU.
The two lock acquisition inside lpfc_cleanup_pending_mbox() does not
disable irq or softirq.
[Deadlock Scenario]
lpfc_cmpl_els_fdisc()
-> lpfc_cleanup_pending_mbox()
-> spin_lock(&ndlp->lock);
<irq>
-> lpfc_els_retry_delay()
-> lpfc_nlp_get()
-> spin_lock_irqsave(&ndlp->lock, flags); (deadlock here)
This flaw was found by an experimental static analysis tool I am
developing for irq-related deadlock.
The patch fix the potential deadlock by spin_lock_irq() to disable
irq.
Signed-off-by: Chengfeng Ye <dg573847474(a)gmail.com>
---
drivers/scsi/lpfc/lpfc_sli.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c
index 58d10f8f75a7..8555f6bb9742 100644
--- a/drivers/scsi/lpfc/lpfc_sli.c
+++ b/drivers/scsi/lpfc/lpfc_sli.c
@@ -21049,9 +21049,9 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport)
mb->mbox_flag |= LPFC_MBX_IMED_UNREG;
restart_loop = 1;
spin_unlock_irq(&phba->hbalock);
- spin_lock(&ndlp->lock);
+ spin_lock_irq(&ndlp->lock);
ndlp->nlp_flag &= ~NLP_IGNR_REG_CMPL;
- spin_unlock(&ndlp->lock);
+ spin_unlock_irq(&ndlp->lock);
spin_lock_irq(&phba->hbalock);
break;
}
@@ -21067,9 +21067,9 @@ lpfc_cleanup_pending_mbox(struct lpfc_vport *vport)
ndlp = (struct lpfc_nodelist *)mb->ctx_ndlp;
mb->ctx_ndlp = NULL;
if (ndlp) {
- spin_lock(&ndlp->lock);
+ spin_lock_irq(&ndlp->lock);
ndlp->nlp_flag &= ~NLP_IGNR_REG_CMPL;
- spin_unlock(&ndlp->lock);
+ spin_unlock_irq(&ndlp->lock);
lpfc_nlp_put(ndlp);
}
}
--
2.17.1