Processes can pin shared memory by keeping a handle to it through a
file descriptor; for instance dmabufs, memfd, and ashsmem (in Android).
In the case of a memory leak, to identify the process pinning the
memory, userspace needs to:
- Iterate the /proc/<pid>/fd/* for each process
- Do a readlink on each entry to identify the type of memory from
the file path.
- stat() each entry to get the size of the memory.
The file permissions on /proc/<pid>/fd/* only allows for the owner
or root to perform the operations above; and so is not suitable for
capturing the system-wide state in a production environment.
This issue was addressed for dmabufs by making /proc/*/fdinfo/*
accessible to a process with PTRACE_MODE_READ_FSCREDS credentials[1]
To allow the same kind of tracking for other types of shared memory,
add the following fields to /proc/<pid>/fdinfo/<fd>:
path - This allows identifying the type of memory based on common
prefixes: e.g. "/memfd...", "/dmabuf...", "/dev/ashmem..."
This was not an issued when dmabuf tracking was introduced
because the exp_name field of dmabuf fdinfo could be used
to distinguish dmabuf fds from other types.
size - To track the amount of memory that is being pinned.
dmabufs expose size as an additional field in fdinfo. Remove
this and make it a common field for all fds.
Access to /proc/<pid>/fdinfo is governed by PTRACE_MODE_READ_FSCREDS
-- the same as for /proc/<pid>/maps which also exposes the path and
size for mapped memory regions.
This allows for a system process with PTRACE_MODE_READ_FSCREDS to
account the pinned per-process memory via fdinfo.
[1] https://lore.kernel.org/lkml/20210308170651.919148-1-kaleshsingh@google.com/
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
---
Documentation/filesystems/proc.rst | 22 ++++++++++++++++++++--
drivers/dma-buf/dma-buf.c | 1 -
fs/proc/fd.c | 9 +++++++--
3 files changed, 27 insertions(+), 5 deletions(-)
diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 061744c436d9..ad66d78aca51 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1922,13 +1922,16 @@ if precise results are needed.
3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
---------------------------------------------------------------
This file provides information associated with an opened file. The regular
-files have at least four fields -- 'pos', 'flags', 'mnt_id' and 'ino'.
+files have at least six fields -- 'pos', 'flags', 'mnt_id', 'ino', 'size',
+and 'path'.
+
The 'pos' represents the current offset of the opened file in decimal
form [see lseek(2) for details], 'flags' denotes the octal O_xxx mask the
file has been created with [see open(2) for details] and 'mnt_id' represents
mount ID of the file system containing the opened file [see 3.5
/proc/<pid>/mountinfo for details]. 'ino' represents the inode number of
-the file.
+the file, 'size' represents the size of the file in bytes, and 'path'
+represents the file path.
A typical output is::
@@ -1936,6 +1939,8 @@ A typical output is::
flags: 0100002
mnt_id: 19
ino: 63107
+ size: 0
+ path: /dev/null
All locks associated with a file descriptor are shown in its fdinfo too::
@@ -1953,6 +1958,8 @@ Eventfd files
flags: 04002
mnt_id: 9
ino: 63107
+ size: 0
+ path: anon_inode:[eventfd]
eventfd-count: 5a
where 'eventfd-count' is hex value of a counter.
@@ -1966,6 +1973,8 @@ Signalfd files
flags: 04002
mnt_id: 9
ino: 63107
+ size: 0
+ path: anon_inode:[signalfd]
sigmask: 0000000000000200
where 'sigmask' is hex value of the signal mask associated
@@ -1980,6 +1989,8 @@ Epoll files
flags: 02
mnt_id: 9
ino: 63107
+ size: 0
+ path: anon_inode:[eventpoll]
tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7
where 'tfd' is a target file descriptor number in decimal form,
@@ -1998,6 +2009,8 @@ For inotify files the format is the following::
flags: 02000000
mnt_id: 9
ino: 63107
+ size: 0
+ path: anon_inode:inotify
inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
where 'wd' is a watch descriptor in decimal form, i.e. a target file
@@ -2021,6 +2034,8 @@ For fanotify files the format is::
flags: 02
mnt_id: 9
ino: 63107
+ size: 0
+ path: anon_inode:[fanotify]
fanotify flags:10 event-flags:0
fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
@@ -2046,6 +2061,8 @@ Timerfd files
flags: 02
mnt_id: 9
ino: 63107
+ size: 0
+ path: anon_inode:[timerfd]
clockid: 0
ticks: 0
settime flags: 01
@@ -2070,6 +2087,7 @@ DMA Buffer files
mnt_id: 9
ino: 63107
size: 32768
+ path: /dmabuf:
count: 2
exp_name: system-heap
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index b1e25ae98302..d61183ff3c30 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -377,7 +377,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
{
struct dma_buf *dmabuf = file->private_data;
- seq_printf(m, "size:\t%zu\n", dmabuf->size);
/* Don't count the temporary reference taken inside procfs seq_show */
seq_printf(m, "count:\t%ld\n", file_count(dmabuf->file) - 1);
seq_printf(m, "exp_name:\t%s\n", dmabuf->exp_name);
diff --git a/fs/proc/fd.c b/fs/proc/fd.c
index 913bef0d2a36..a8a968bc58f0 100644
--- a/fs/proc/fd.c
+++ b/fs/proc/fd.c
@@ -54,10 +54,15 @@ static int seq_show(struct seq_file *m, void *v)
if (ret)
return ret;
- seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\n",
+ seq_printf(m, "pos:\t%lli\nflags:\t0%o\nmnt_id:\t%i\nino:\t%lu\nsize:\t%zu\n",
(long long)file->f_pos, f_flags,
real_mount(file->f_path.mnt)->mnt_id,
- file_inode(file)->i_ino);
+ file_inode(file)->i_ino,
+ file_inode(file)->i_size);
+
+ seq_puts(m, "path:\t");
+ seq_file_path(m, file, "\n");
+ seq_putc(m, '\n');
/* show_fd_locks() never deferences files so a stale value is safe */
show_fd_locks(m, file, files);
base-commit: b015dcd62b86d298829990f8261d5d154b8d7af5
--
2.36.1.124.g0e6072fb45-goog
The dmabuf file uses get_next_ino()(through dma_buf_getfile() ->
alloc_anon_inode()) to get an inode number and uses the same as a
directory name under /sys/kernel/dmabuf/buffers/<ino>. This directory is
used to collect the dmabuf stats and it is created through
dma_buf_stats_setup(). At current, failure to create this directory
entry can make the dma_buf_export() to fail.
Now, as the get_next_ino() can definitely give a repetitive inode no
causing the directory entry creation to fail with -EEXIST. This is a
problem on the systems where dmabuf stats functionality is enabled on
the production builds can make the dma_buf_export(), though the dmabuf
memory is allocated successfully, to fail just because it couldn't
create stats entry.
This issue we are able to see on the snapdragon system within 13 days
where there already exists a directory with inode no "122602" so
dma_buf_stats_setup() failed with -EEXIST as it is trying to create
the same directory entry.
To make the dentry name as unique, use the dmabuf fs specific inode
which is based on the simple atomic variable increment. There is tmpfs
subsystem too which relies on its own inode generation rather than
relying on the get_next_ino() for the same reason of avoiding the
duplicate inodes[1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=…
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: <stable(a)vger.kernel.org> # 5.15.x+
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
Changes in V3-resend:
-- Collect all the tags and apply stable tag.
Changes in V3:
-- Used the atomic64 variable to have dmabuf files its own inodes.
-- Ensured no UAPI breakage as suggested by Christian.
Changes in V2:
-- Used the atomic64_t variable to generate a unique_id to be appended to inode
to have an unique directory with name <inode_number-unique_id> -- Suggested by christian
-- Updated the ABI documentation -- Identified by Greg.
-- Massaged the commit log.
-- https://lore.kernel.org/all/1652191562-18700-1-git-send-email-quic_charante…
Changes in V1:
-- Used the inode->i_ctime->tv_secs as an id appended to inode to create the
unique directory with name <inode_number-time_in_secs>.
-- https://lore.kernel.org/all/1652178212-22383-1-git-send-email-quic_charante…
drivers/dma-buf/dma-buf.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a6fc96e..0ad5039 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -407,6 +407,7 @@ static inline int is_dma_buf_file(struct file *file)
static struct file *dma_buf_getfile(struct dma_buf *dmabuf, int flags)
{
+ static atomic64_t dmabuf_inode = ATOMIC64_INIT(0);
struct file *file;
struct inode *inode = alloc_anon_inode(dma_buf_mnt->mnt_sb);
@@ -416,6 +417,13 @@ static struct file *dma_buf_getfile(struct dma_buf *dmabuf, int flags)
inode->i_size = dmabuf->size;
inode_set_bytes(inode, dmabuf->size);
+ /*
+ * The ->i_ino acquired from get_next_ino() is not unique thus
+ * not suitable for using it as dentry name by dmabuf stats.
+ * Override ->i_ino with the unique and dmabuffs specific
+ * value.
+ */
+ inode->i_ino = atomic64_add_return(1, &dmabuf_inode);
file = alloc_file_pseudo(inode, dma_buf_mnt, "dmabuf",
flags, &dma_buf_fops);
if (IS_ERR(file))
--
2.7.4
The dmabuf file uses get_next_ino()(through dma_buf_getfile() ->
alloc_anon_inode()) to get an inode number and uses the same as a
directory name under /sys/kernel/dmabuf/buffers/<ino>. This directory is
used to collect the dmabuf stats and it is created through
dma_buf_stats_setup(). At current, failure to create this directory
entry can make the dma_buf_export() to fail.
Now, as the get_next_ino() can definitely give a repetitive inode no
causing the directory entry creation to fail with -EEXIST. This is a
problem on the systems where dmabuf stats functionality is enabled on
the production builds can make the dma_buf_export(), though the dmabuf
memory is allocated successfully, to fail just because it couldn't
create stats entry.
This issue we are able to see on the snapdragon system within 13 days
where there already exists a directory with inode no "122602" so
dma_buf_stats_setup() failed with -EEXIST as it is trying to create
the same directory entry.
To make the dentry name as unique, use the dmabuf fs specific inode
which is based on the simple atomic variable increment. There is tmpfs
subsystem too which relies on its own inode generation rather than
relying on the get_next_ino() for the same reason of avoiding the
duplicate inodes[1].
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/?id=…
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
---
Changes in V3:
-- Used the atomic64 variable to have dmabuf files its own inodes.
-- Ensured no UAPI breakage as suggested by Christian.
Changes in V2:
-- Used the atomic64_t variable to generate a unique_id to be appended to inode
to have an unique directory with name <inode_number-unique_id> -- Suggested by christian
-- Updated the ABI documentation -- Identified by Greg.
-- Massaged the commit log.
-- https://lore.kernel.org/all/1652191562-18700-1-git-send-email-quic_charante…
Changes in V1:
-- Used the inode->i_ctime->tv_secs as an id appended to inode to create the
unique directory with name <inode_number-time_in_secs>.
-- https://lore.kernel.org/all/1652178212-22383-1-git-send-email-quic_charante…
drivers/dma-buf/dma-buf.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index a6fc96e..0ad5039 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -407,6 +407,7 @@ static inline int is_dma_buf_file(struct file *file)
static struct file *dma_buf_getfile(struct dma_buf *dmabuf, int flags)
{
+ static atomic64_t dmabuf_inode = ATOMIC64_INIT(0);
struct file *file;
struct inode *inode = alloc_anon_inode(dma_buf_mnt->mnt_sb);
@@ -416,6 +417,13 @@ static struct file *dma_buf_getfile(struct dma_buf *dmabuf, int flags)
inode->i_size = dmabuf->size;
inode_set_bytes(inode, dmabuf->size);
+ /*
+ * The ->i_ino acquired from get_next_ino() is not unique thus
+ * not suitable for using it as dentry name by dmabuf stats.
+ * Override ->i_ino with the unique and dmabuffs specific
+ * value.
+ */
+ inode->i_ino = atomic64_add_return(1, &dmabuf_inode);
file = alloc_file_pseudo(inode, dma_buf_mnt, "dmabuf",
flags, &dma_buf_fops);
if (IS_ERR(file))
--
2.7.4
The dmabuf file uses get_next_ino()(through dma_buf_getfile() ->
alloc_anon_inode()) to get an inode number and uses the same as a
directory name under /sys/kernel/dmabuf/buffers/<ino>. This directory is
used to collect the dmabuf stats and it is created through
dma_buf_stats_setup(). At current, failure to create this directory
entry can make the dma_buf_export() to fail.
Now, as the get_next_ino() can definitely give a repetitive inode no
causing the directory entry creation to fail with -EEXIST. This is a
problem on the systems where dmabuf stats functionality is enabled on
the production builds can make the dma_buf_export(), though the dmabuf
memory is allocated successfully, to fail just because it couldn't
create stats entry.
This issue we are able to see on the snapdragon system within 13 days
where there already exists a directory with inode no "122602" so
dma_buf_stats_setup() failed with -EEXIST as it is trying to create
the same directory entry.
To make the directory entry as unique, append the unique_id for every
inode. With this change the stats directory entries will be in the
format of: /sys/kernel/dmabuf/buffers/<inode_number-unique_id>.
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
---
Changes in V2:
-- Used the atomic64_t variable to generate a unique_id to be appended to inode
to have an unique directory with name <inode_number-unique_id> -- Suggested by christian
-- Updated the ABI documentation -- Identified by Greg.
-- Massaged the commit log.
Changes in V1:
-- Used the inode->i_ctime->tv_secs as an id appended to inode to create the
unique directory with name <inode_number-time_in_secs>.
-- https://lore.kernel.org/all/1652178212-22383-1-git-send-email-quic_charante…
Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers | 10 +++++-----
drivers/dma-buf/Kconfig | 6 +++---
drivers/dma-buf/dma-buf-sysfs-stats.c | 8 +++++---
3 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers b/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
index 5d3bc99..9fffbd3 100644
--- a/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
+++ b/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
@@ -4,19 +4,19 @@ KernelVersion: v5.13
Contact: Hridya Valsaraju <hridya(a)google.com>
Description: The /sys/kernel/dmabuf/buffers directory contains a
snapshot of the internal state of every DMA-BUF.
- /sys/kernel/dmabuf/buffers/<inode_number> will contain the
- statistics for the DMA-BUF with the unique inode number
- <inode_number>
+ /sys/kernel/dmabuf/buffers/<inode_number-unique_id> will
+ contain the statistics for the DMA-BUF with the unique
+ pair <inode_number-unique_id>
Users: kernel memory tuning/debugging tools
-What: /sys/kernel/dmabuf/buffers/<inode_number>/exporter_name
+What: /sys/kernel/dmabuf/buffers/<inode_number-unique_id>/exporter_name
Date: May 2021
KernelVersion: v5.13
Contact: Hridya Valsaraju <hridya(a)google.com>
Description: This file is read-only and contains the name of the exporter of
the DMA-BUF.
-What: /sys/kernel/dmabuf/buffers/<inode_number>/size
+What: /sys/kernel/dmabuf/buffers/<inode_number-unique_id>/size
Date: May 2021
KernelVersion: v5.13
Contact: Hridya Valsaraju <hridya(a)google.com>
diff --git a/drivers/dma-buf/Kconfig b/drivers/dma-buf/Kconfig
index 541efe0..5bcbdb1 100644
--- a/drivers/dma-buf/Kconfig
+++ b/drivers/dma-buf/Kconfig
@@ -81,9 +81,9 @@ menuconfig DMABUF_SYSFS_STATS
Choose this option to enable DMA-BUF sysfs statistics
in location /sys/kernel/dmabuf/buffers.
- /sys/kernel/dmabuf/buffers/<inode_number> will contain
- statistics for the DMA-BUF with the unique inode number
- <inode_number>.
+ /sys/kernel/dmabuf/buffers/<inode_number-unique_id> will contain
+ statistics for the DMA-BUF with the unique pair
+ <inode_number-unique_id>.
source "drivers/dma-buf/heaps/Kconfig"
diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
index 2bba0ba..29e9e23 100644
--- a/drivers/dma-buf/dma-buf-sysfs-stats.c
+++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
@@ -38,8 +38,8 @@
*
* The following stats are exposed by the interface:
*
- * * ``/sys/kernel/dmabuf/buffers/<inode_number>/exporter_name``
- * * ``/sys/kernel/dmabuf/buffers/<inode_number>/size``
+ * * ``/sys/kernel/dmabuf/buffers/<inode_number-unique_id>/exporter_name``
+ * * ``/sys/kernel/dmabuf/buffers/<inode_number-unique_id>/size``
*
* The information in the interface can also be used to derive per-exporter
* statistics. The data from the interface can be gathered on error conditions
@@ -172,6 +172,7 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
{
struct dma_buf_sysfs_entry *sysfs_entry;
int ret;
+ static atomic64_t unique_id = ATOMIC_INIT(0);
if (!dmabuf || !dmabuf->file)
return -EINVAL;
@@ -192,7 +193,8 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
/* create the directory for buffer stats */
ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
- "%lu", file_inode(dmabuf->file)->i_ino);
+ "%lu-%lu", file_inode(dmabuf->file)->i_ino,
+ atomic64_add_return(1, &unique_id));
if (ret)
goto err_sysfs_dmabuf;
--
2.7.4
The dmabuf file uses get_next_ino()(through dma_buf_getfile() ->
alloc_anon_inode()) to get an inode number and uses the same as a
directory name under /sys/kernel/dmabuf/buffers/<ino>. This directory is
used to collect the dmabuf stats and it is created through
dma_buf_stats_setup(). At current, failure to create this directory
entry can make the dma_buf_export() to fail.
Now, as the get_next_ino() can definitely give a repetitive inode no
causing the directory entry creation to fail with -EEXIST. This is a
problem on the systems where dmabuf stats functionality is enabled on
the production builds can make the dma_buf_export(), though the dmabuf
memory is allocated successfully, to fail just because it couldn't
create stats entry.
This issue we are able to see on the snapdragon system within 13 days
where there already exists a directory with inode no "122602" so
dma_buf_stats_setup() failed with -EEXIST as it is trying to create
the same directory entry.
To make the directory entry as unique, append the inode creation time to
the inode. With this change the stats directory entries will be in the
format of: /sys/kernel/dmabuf/buffers/<inode no>-<inode creation time in
secs>.
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
---
drivers/dma-buf/dma-buf-sysfs-stats.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c
index 2bba0ba..292cb31 100644
--- a/drivers/dma-buf/dma-buf-sysfs-stats.c
+++ b/drivers/dma-buf/dma-buf-sysfs-stats.c
@@ -192,7 +192,8 @@ int dma_buf_stats_setup(struct dma_buf *dmabuf)
/* create the directory for buffer stats */
ret = kobject_init_and_add(&sysfs_entry->kobj, &dma_buf_ktype, NULL,
- "%lu", file_inode(dmabuf->file)->i_ino);
+ "%lu-%lu", file_inode(dmabuf->file)->i_ino,
+ file_inode(dmabuf->file)->i_ctime.tv_sec);
if (ret)
goto err_sysfs_dmabuf;
--
2.7.4
From: Charan Teja Reddy <quic_charante(a)quicinc.com>
When dma_buf_stats_setup() fails, it closes the dmabuf file which
results into the calling of dma_buf_file_release() where it does
list_del(&dmabuf->list_node) with out first adding it to the proper
list. This is resulting into panic in the below path:
__list_del_entry_valid+0x38/0xac
dma_buf_file_release+0x74/0x158
__fput+0xf4/0x428
____fput+0x14/0x24
task_work_run+0x178/0x24c
do_notify_resume+0x194/0x264
work_pending+0xc/0x5f0
Fix it by moving the dma_buf_stats_setup() after dmabuf is added to the
list.
Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs")
Signed-off-by: Charan Teja Reddy <quic_charante(a)quicinc.com>
---
drivers/dma-buf/dma-buf.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 602b12d..a6fc96e 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -543,10 +543,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
file->f_mode |= FMODE_LSEEK;
dmabuf->file = file;
- ret = dma_buf_stats_setup(dmabuf);
- if (ret)
- goto err_sysfs;
-
mutex_init(&dmabuf->lock);
INIT_LIST_HEAD(&dmabuf->attachments);
@@ -554,6 +550,10 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
list_add(&dmabuf->list_node, &db_list.head);
mutex_unlock(&db_list.lock);
+ ret = dma_buf_stats_setup(dmabuf);
+ if (ret)
+ goto err_sysfs;
+
return dmabuf;
err_sysfs:
--
2.7.4
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v6:
Move documentation into cgroup-v2.rst per Tejun Heo.
Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.
Return error on transfer failure per Carlos Llamas.
v5:
Rebase on top of v5.18-rc3
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.
Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.
Remove all GPU cgroup code except what's necessary to support charge transfer
from dma_buf. Previously charging was done in export, but for non-Android
graphics use-cases this is not ideal since there may be a delay between
allocation and export, during which time there is no accounting.
Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
result of above.
Put the charge and uncharge code in the same file (system_heap_allocate,
system_heap_dma_buf_release) instead of splitting them between the heap and
the dma_buf_release. This avoids asymmetric management of the gpucg charges.
Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.
Support all strings for gpucg_register_bucket instead of just string
literals.
Enforce globally unique gpucg_bucket names.
Constrain gpucg_bucket name lengths to 64 bytes.
Append "-heap" to gpucg_bucket names from dmabuf-heaps.
Drop patch 7 from the series, which changed the types of
binder_transaction_data's sender_pid and sender_euid fields. This was
done in another commit here:
https://lore.kernel.org/all/20220210021129.3386083-4-masahiroy@kernel.org/
Rename:
gpucg_try_charge -> gpucg_charge
find_cg_rpool_locked -> cg_rpool_find_locked
init_cg_rpool -> cg_rpool_init
get_cg_rpool_locked -> cg_rpool_get_locked
"gpu cgroup controller" -> "GPU controller"
gpucg_device -> gpucg_bucket
usage -> size
Tests:
Support both binder_fd_array_object and binder_fd_object. This is
necessary because new versions of Android will use binder_fd_object
instead of binder_fd_array_object, and we need to support both.
Tests for both binder_fd_array_object and binder_fd_object.
For binder_utils return error codes instead of
struct binder{fs}_ctx.
Use ifdef __ANDROID__ to choose platform-dependent temp path instead
of a runtime fallback.
Ensure binderfs_mntpt ends with a trailing '/' character instead of
prepending it where used.
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (3):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
binder: Add flags to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
selftests: Add binder cgroup gpu memory transfer tests
Documentation/admin-guide/cgroup-v2.rst | 24 +
drivers/android/binder.c | 31 +-
drivers/dma-buf/dma-buf.c | 80 ++-
drivers/dma-buf/dma-heap.c | 39 ++
drivers/dma-buf/heaps/system_heap.c | 28 +-
include/linux/cgroup_gpu.h | 137 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 49 +-
include/linux/dma-heap.h | 15 +
include/uapi/linux/android/binder.h | 23 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 386 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 250 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 526 ++++++++++++++++++
18 files changed, 1621 insertions(+), 23 deletions(-)
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.36.0.464.gb9c8b46e94-goog
krealloc_array() ignores attempts to reduce the array size, so the attempt
to save memory is completely pointless here.
Also move testing for the no fence case into sync_file_set_fence(), this
way we don't even touch the fence array when we don't have any fences.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
---
drivers/dma-buf/sync_file.c | 21 +++++++--------------
1 file changed, 7 insertions(+), 14 deletions(-)
diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 514d213261df..0fe564539166 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -157,9 +157,15 @@ static int sync_file_set_fence(struct sync_file *sync_file,
* we already own a new reference to the fence. For num_fence > 1
* we own the reference of the dma_fence_array creation.
*/
- if (num_fences == 1) {
+
+ if (num_fences == 0) {
+ sync_file->fence = dma_fence_get_stub();
+ kfree(fences);
+
+ } else if (num_fences == 1) {
sync_file->fence = fences[0];
kfree(fences);
+
} else {
array = dma_fence_array_create(num_fences, fences,
dma_fence_context_alloc(1),
@@ -261,19 +267,6 @@ static struct sync_file *sync_file_merge(const char *name, struct sync_file *a,
}
}
- if (index == 0)
- fences[index++] = dma_fence_get_stub();
-
- if (num_fences > index) {
- struct dma_fence **tmp;
-
- /* Keep going even when reducing the size failed */
- tmp = krealloc_array(fences, index, sizeof(*fences),
- GFP_KERNEL);
- if (tmp)
- fences = tmp;
- }
-
if (sync_file_set_fence(sync_file, fences, index) < 0)
goto err_put_fences;
--
2.25.1
'cat /sys/kernel/debug/dma_buf/bufinfo' will print the Dma-buf
Objects' information when the CONFIG_DEBUG_FS=y.
However, the printed table header information does not contain
the name field. So we need to add the name field to the table
header and use the '<none>' to replace the empty buf_obj->name.
Signed-off-by: Yuanzheng Song <songyuanzheng(a)huawei.com>
---
drivers/dma-buf/dma-buf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 79795857be3e..a2f9a1815e38 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -1351,7 +1351,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
return ret;
seq_puts(s, "\nDma-buf Objects:\n");
- seq_printf(s, "%-8s\t%-8s\t%-8s\t%-8s\texp_name\t%-8s\n",
+ seq_printf(s, "%-8s\t%-8s\t%-8s\t%-8s\texp_name\t%-8s\tname\n",
"size", "flags", "mode", "count", "ino");
list_for_each_entry(buf_obj, &db_list.head, list_node) {
@@ -1368,7 +1368,7 @@ static int dma_buf_debug_show(struct seq_file *s, void *unused)
file_count(buf_obj->file),
buf_obj->exp_name,
file_inode(buf_obj->file)->i_ino,
- buf_obj->name ?: "");
+ buf_obj->name ?: "<none>");
spin_unlock(&buf_obj->name_lock);
dma_resv_describe(buf_obj->resv, s);
--
2.25.1
It will cause null-ptr-deref when using 'res', if platform_get_resource()
returns NULL, so move using 'res' after devm_ioremap_resource() that
will check it to avoid null-ptr-deref.
And use devm_platform_get_and_ioremap_resource() to simplify code.
Fixes: ec4ba01e894d ("mtd: rawnand: Add new Cadence NAND driver to MTD subsystem")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
---
drivers/mtd/nand/raw/cadence-nand-controller.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/nand/raw/cadence-nand-controller.c b/drivers/mtd/nand/raw/cadence-nand-controller.c
index 7eec60ea9056..0d72672f8b64 100644
--- a/drivers/mtd/nand/raw/cadence-nand-controller.c
+++ b/drivers/mtd/nand/raw/cadence-nand-controller.c
@@ -2983,11 +2983,10 @@ static int cadence_nand_dt_probe(struct platform_device *ofdev)
if (IS_ERR(cdns_ctrl->reg))
return PTR_ERR(cdns_ctrl->reg);
- res = platform_get_resource(ofdev, IORESOURCE_MEM, 1);
- cdns_ctrl->io.dma = res->start;
- cdns_ctrl->io.virt = devm_ioremap_resource(&ofdev->dev, res);
+ cdns_ctrl->io.virt = devm_platform_get_and_ioremap_resource(ofdev, 1, &res);
if (IS_ERR(cdns_ctrl->io.virt))
return PTR_ERR(cdns_ctrl->io.virt);
+ cdns_ctrl->io.dma = res->start;
dt->clk = devm_clk_get(cdns_ctrl->dev, "nf_clk");
if (IS_ERR(dt->clk))
--
2.25.1