On Wed, Dec 02, 2020 at 07:51:07PM +0100, Michal Hocko wrote:
> On Wed 02-12-20 09:54:29, Minchan Kim wrote:
> > On Wed, Dec 02, 2020 at 05:48:34PM +0100, Michal Hocko wrote:
> > > On Wed 02-12-20 08:15:49, Minchan Kim wrote:
> > > > On Wed, Dec 02, 2020 at 04:49:15PM +0100, Michal Hocko wrote:
> > > [...]
> > > > > Well, what I can see is that this new interface is an antipatern to our
> > > > > allocation routines. We tend to control allocations by gfp mask yet you
> > > > > are introducing a bool parameter to make something faster... What that
> > > > > really means is rather arbitrary. Would it make more sense to teach
> > > > > cma_alloc resp. alloc_contig_range to recognize GFP_NOWAIT, GFP_NORETRY resp.
> > > > > GFP_RETRY_MAYFAIL instead?
> > > >
> > > > If we use cma_alloc, that interface requires "allocate one big memory
> > > > chunk". IOW, return value is just struct page and expected that the page
> > > > is a big contiguos memory. That means it couldn't have a hole in the
> > > > range.
> > > > However the idea here, what we asked is much smaller chunk rather
> > > > than a big contiguous memory so we could skip some of pages if they are
> > > > randomly pinned(long-term/short-term whatever) and search other pages
> > > > in the CMA area to avoid long stall. Thus, it couldn't work with exising
> > > > cma_alloc API with simple gfp_mak.
> > >
> > > I really do not see that as something really alient to the cma_alloc
> > > interface. All you should care about, really, is what size of the object
> > > you want and how hard the system should try. If you have a problem with
> > > an internal implementation of CMA and how it chooses a range and deal
> > > with pinned pages then it should be addressed inside the CMA allocator.
> > > I suspect that you are effectivelly trying to workaround those problems
> > > by a side implementation with a slightly different API. Or maybe I still
> > > do not follow the actual problem.
> > >
> > > > > I am not deeply familiar with the cma allocator so sorry for a
> > > > > potentially stupid question. Why does a bulk interface performs better
> > > > > than repeated calls to cma_alloc? Is this because a failure would help
> > > > > to move on to the next pfn range while a repeated call would have to
> > > > > deal with the same range?
> > > >
> > > > Yub, true with other overheads(e.g., migration retrial, waiting writeback
> > > > PCP/LRU draining IPI)
> > >
> > > Why cannot this be implemented in the cma_alloc layer? I mean you can
> > > cache failed cases and optimize the proper pfn range search.
> >
> > So do you suggest this?
> >
> > enum cma_alloc_mode {
> > CMA_ALLOC_NORMAL,
> > CMA_ALLOC_FAIL_FAST,
> > };
> >
> > struct page *cma_alloc(struct cma *cma, size_t count, unsigned int
> > align, enum cma_alloc_mode mode);
> >
> > >From now on, cma_alloc will keep last failed pfn and then start to
> > search from the next pfn for both CMA_ALLOC_NORMAL and
> > CMA_ALLOC_FAIL_FAST if requested size from the cached pfn is okay
> > within CMA area and then wraparound it couldn't find right pages
> > from the cached pfn. Othewise, the cached pfn will reset to the zero
> > so that it starts the search from the 0. I like the idea since it's
> > general improvement, I think.
>
> Yes something like that. There are more options to be clever here - e.g.
> track ranges etc. but I am not sure this is worth the complexity.
Agree. Just last pfn caching would be good enough as simple start.
>
> > Furthemore, With CMA_ALLOC_FAIL_FAST, it could avoid several overheads
> > at the cost of sacrificing allocation success ratio like GFP_NORETRY.
>
> I am still not sure a specific flag is a good interface. Really can this
> be gfp_mask instead?
I am not strong(even, I did it with GFP_NORETRY) but David wanted to
have special mode and I agreed when he mentioned ALLOC_CONTIG_HARD as
one of options in future(it would be hard to indicate that mode with
gfp flags).
>
> > I think that would solve the issue with making the API more flexible.
> > Before diving into it, I'd like to confirm we are on same page.
> > Please correct me if I misunderstood.
>
> I am not sure you are still thinking about a bulk interface.
No I am thinking of just using cma_alloc API with cached pfn
as interal improvement and adding new fast fail mode to the API
so driver could call the API repeatedly until then can get enough
pages.
On Wed, Dec 02, 2020 at 05:48:34PM +0100, Michal Hocko wrote:
> On Wed 02-12-20 08:15:49, Minchan Kim wrote:
> > On Wed, Dec 02, 2020 at 04:49:15PM +0100, Michal Hocko wrote:
> [...]
> > > Well, what I can see is that this new interface is an antipatern to our
> > > allocation routines. We tend to control allocations by gfp mask yet you
> > > are introducing a bool parameter to make something faster... What that
> > > really means is rather arbitrary. Would it make more sense to teach
> > > cma_alloc resp. alloc_contig_range to recognize GFP_NOWAIT, GFP_NORETRY resp.
> > > GFP_RETRY_MAYFAIL instead?
> >
> > If we use cma_alloc, that interface requires "allocate one big memory
> > chunk". IOW, return value is just struct page and expected that the page
> > is a big contiguos memory. That means it couldn't have a hole in the
> > range.
> > However the idea here, what we asked is much smaller chunk rather
> > than a big contiguous memory so we could skip some of pages if they are
> > randomly pinned(long-term/short-term whatever) and search other pages
> > in the CMA area to avoid long stall. Thus, it couldn't work with exising
> > cma_alloc API with simple gfp_mak.
>
> I really do not see that as something really alient to the cma_alloc
> interface. All you should care about, really, is what size of the object
> you want and how hard the system should try. If you have a problem with
> an internal implementation of CMA and how it chooses a range and deal
> with pinned pages then it should be addressed inside the CMA allocator.
> I suspect that you are effectivelly trying to workaround those problems
> by a side implementation with a slightly different API. Or maybe I still
> do not follow the actual problem.
>
> > > I am not deeply familiar with the cma allocator so sorry for a
> > > potentially stupid question. Why does a bulk interface performs better
> > > than repeated calls to cma_alloc? Is this because a failure would help
> > > to move on to the next pfn range while a repeated call would have to
> > > deal with the same range?
> >
> > Yub, true with other overheads(e.g., migration retrial, waiting writeback
> > PCP/LRU draining IPI)
>
> Why cannot this be implemented in the cma_alloc layer? I mean you can
> cache failed cases and optimize the proper pfn range search.
So do you suggest this?
enum cma_alloc_mode {
CMA_ALLOC_NORMAL,
CMA_ALLOC_FAIL_FAST,
};
struct page *cma_alloc(struct cma *cma, size_t count, unsigned int
align, enum cma_alloc_mode mode);
>From now on, cma_alloc will keep last failed pfn and then start to
search from the next pfn for both CMA_ALLOC_NORMAL and
CMA_ALLOC_FAIL_FAST if requested size from the cached pfn is okay
within CMA area and then wraparound it couldn't find right pages
from the cached pfn. Othewise, the cached pfn will reset to the zero
so that it starts the search from the 0. I like the idea since it's
general improvement, I think.
Furthemore, With CMA_ALLOC_FAIL_FAST, it could avoid several overheads
at the cost of sacrificing allocation success ratio like GFP_NORETRY.
I think that would solve the issue with making the API more flexible.
Before diving into it, I'd like to confirm we are on same page.
Please correct me if I misunderstood.
David, any objection?
On Wed, Dec 02, 2020 at 04:49:15PM +0100, Michal Hocko wrote:
> On Wed 02-12-20 10:14:41, David Hildenbrand wrote:
> > On 01.12.20 18:51, Minchan Kim wrote:
> > > There is a need for special HW to require bulk allocation of
> > > high-order pages. For example, 4800 * order-4 pages, which
> > > would be minimum, sometimes, it requires more.
> > >
> > > To meet the requirement, a option reserves 300M CMA area and
> > > requests the whole 300M contiguous memory. However, it doesn't
> > > work if even one of those pages in the range is long-term pinned
> > > directly or indirectly. The other option is to ask higher-order
> >
> > My latest knowledge is that pages in the CMA area are never long term
> > pinned.
> >
> > https://lore.kernel.org/lkml/20201123090129.GD27488@dhcp22.suse.cz/
> >
> > "gup already tries to deal with long term pins on CMA regions and migrate
> > to a non CMA region. Have a look at __gup_longterm_locked."
> >
> > We should rather identify ways how that is still possible and get rid of
> > them.
> >
> >
> > Now, short-term pinnings and PCP are other issues where
> > alloc_contig_range() could be improved (e.g., in contrast to a FAST
> > mode, a HARD mode which temporarily disables the PCP, ...).
>
> Agreed!
>
> > > size (e.g., 2M) than requested order(64K) repeatedly until driver
> > > could gather necessary amount of memory. Basically, this approach
> > > makes the allocation very slow due to cma_alloc's function
> > > slowness and it could be stuck on one of the pageblocks if it
> > > encounters unmigratable page.
> > >
> > > To solve the issue, this patch introduces cma_alloc_bulk.
> > >
> > > int cma_alloc_bulk(struct cma *cma, unsigned int align,
> > > bool fast, unsigned int order, size_t nr_requests,
> > > struct page **page_array, size_t *nr_allocated);
> > >
> > > Most parameters are same with cma_alloc but it additionally passes
> > > vector array to store allocated memory. What's different with cma_alloc
> > > is it will skip pageblocks without waiting/stopping if it has unmovable
> > > page so that API continues to scan other pageblocks to find requested
> > > order page.
> > >
> > > cma_alloc_bulk is best effort approach in that it skips some pageblocks
> > > if they have unmovable pages unlike cma_alloc. It doesn't need to be
> > > perfect from the beginning at the cost of performance. Thus, the API
> > > takes "bool fast parameter" which is propagated into alloc_contig_range to
> > > avoid significat overhead functions to inrecase CMA allocation success
> > > ratio(e.g., migration retrial, PCP, LRU draining per pageblock)
> > > at the cost of less allocation success ratio. If the caller couldn't
> > > allocate enough, they could call it with "false" to increase success ratio
> > > if they are okay to expense the overhead for the success ratio.
> >
> > Just so I understand what the idea is:
> >
> > alloc_contig_range() sometimes fails on CMA regions when trying to
> > allocate big chunks (e.g., 300M). Instead of tackling that issue, you
> > rather allocate plenty of small chunks, and make these small allocations
> > fail faster/ make the allocations less reliable. Correct?
> >
> > I don't really have a strong opinion on that. Giving up fast rather than
> > trying for longer sounds like a useful thing to have - but I wonder if
> > it's strictly necessary for the use case you describe.
> >
> > I'd like to hear Michals opinion on that.
>
> Well, what I can see is that this new interface is an antipatern to our
> allocation routines. We tend to control allocations by gfp mask yet you
> are introducing a bool parameter to make something faster... What that
> really means is rather arbitrary. Would it make more sense to teach
> cma_alloc resp. alloc_contig_range to recognize GFP_NOWAIT, GFP_NORETRY resp.
> GFP_RETRY_MAYFAIL instead?
If we use cma_alloc, that interface requires "allocate one big memory
chunk". IOW, return value is just struct page and expected that the page
is a big contiguos memory. That means it couldn't have a hole in the
range. However the idea here, what we asked is much smaller chunk rather
than a big contiguous memory so we could skip some of pages if they are
randomly pinned(long-term/short-term whatever) and search other pages
in the CMA area to avoid long stall. Thus, it couldn't work with exising
cma_alloc API with simple gfp_mak.
>
> I am not deeply familiar with the cma allocator so sorry for a
> potentially stupid question. Why does a bulk interface performs better
> than repeated calls to cma_alloc? Is this because a failure would help
> to move on to the next pfn range while a repeated call would have to
> deal with the same range?
Yub, true with other overheads(e.g., migration retrial, waiting writeback
PCP/LRU draining IPI)
>
> > > Signed-off-by: Minchan Kim <minchan(a)kernel.org>
> > > ---
> > > include/linux/cma.h | 5 ++
> > > include/linux/gfp.h | 2 +
> > > mm/cma.c | 126 ++++++++++++++++++++++++++++++++++++++++++--
> > > mm/page_alloc.c | 19 ++++---
> > > 4 files changed, 140 insertions(+), 12 deletions(-)
> > >
> --
> Michal Hocko
> SUSE Labs
On 02.12.20 16:49, Michal Hocko wrote:
> On Wed 02-12-20 10:14:41, David Hildenbrand wrote:
>> On 01.12.20 18:51, Minchan Kim wrote:
>>> There is a need for special HW to require bulk allocation of
>>> high-order pages. For example, 4800 * order-4 pages, which
>>> would be minimum, sometimes, it requires more.
>>>
>>> To meet the requirement, a option reserves 300M CMA area and
>>> requests the whole 300M contiguous memory. However, it doesn't
>>> work if even one of those pages in the range is long-term pinned
>>> directly or indirectly. The other option is to ask higher-order
>>
>> My latest knowledge is that pages in the CMA area are never long term
>> pinned.
>>
>> https://lore.kernel.org/lkml/20201123090129.GD27488@dhcp22.suse.cz/
>>
>> "gup already tries to deal with long term pins on CMA regions and migrate
>> to a non CMA region. Have a look at __gup_longterm_locked."
>>
>> We should rather identify ways how that is still possible and get rid of
>> them.
>>
>>
>> Now, short-term pinnings and PCP are other issues where
>> alloc_contig_range() could be improved (e.g., in contrast to a FAST
>> mode, a HARD mode which temporarily disables the PCP, ...).
>
> Agreed!
>
>>> size (e.g., 2M) than requested order(64K) repeatedly until driver
>>> could gather necessary amount of memory. Basically, this approach
>>> makes the allocation very slow due to cma_alloc's function
>>> slowness and it could be stuck on one of the pageblocks if it
>>> encounters unmigratable page.
>>>
>>> To solve the issue, this patch introduces cma_alloc_bulk.
>>>
>>> int cma_alloc_bulk(struct cma *cma, unsigned int align,
>>> bool fast, unsigned int order, size_t nr_requests,
>>> struct page **page_array, size_t *nr_allocated);
>>>
>>> Most parameters are same with cma_alloc but it additionally passes
>>> vector array to store allocated memory. What's different with cma_alloc
>>> is it will skip pageblocks without waiting/stopping if it has unmovable
>>> page so that API continues to scan other pageblocks to find requested
>>> order page.
>>>
>>> cma_alloc_bulk is best effort approach in that it skips some pageblocks
>>> if they have unmovable pages unlike cma_alloc. It doesn't need to be
>>> perfect from the beginning at the cost of performance. Thus, the API
>>> takes "bool fast parameter" which is propagated into alloc_contig_range to
>>> avoid significat overhead functions to inrecase CMA allocation success
>>> ratio(e.g., migration retrial, PCP, LRU draining per pageblock)
>>> at the cost of less allocation success ratio. If the caller couldn't
>>> allocate enough, they could call it with "false" to increase success ratio
>>> if they are okay to expense the overhead for the success ratio.
>>
>> Just so I understand what the idea is:
>>
>> alloc_contig_range() sometimes fails on CMA regions when trying to
>> allocate big chunks (e.g., 300M). Instead of tackling that issue, you
>> rather allocate plenty of small chunks, and make these small allocations
>> fail faster/ make the allocations less reliable. Correct?
>>
>> I don't really have a strong opinion on that. Giving up fast rather than
>> trying for longer sounds like a useful thing to have - but I wonder if
>> it's strictly necessary for the use case you describe.
>>
>> I'd like to hear Michals opinion on that.
>
> Well, what I can see is that this new interface is an antipatern to our
> allocation routines. We tend to control allocations by gfp mask yet you
> are introducing a bool parameter to make something faster... What that
> really means is rather arbitrary. Would it make more sense to teach
> cma_alloc resp. alloc_contig_range to recognize GFP_NOWAIT, GFP_NORETRY resp.
> GFP_RETRY_MAYFAIL instead?
Minchan did that before, but I disliked gluing things like "don't drain
lru, don't drain pcp" to GFP_NORETRY and shifting responsibility to the
user.
--
Thanks,
David / dhildenb
This patchset introduces a new dma heap, chunk heap that makes it
easy to perform the bulk allocation of high order pages.
It has been created to help optimize the 4K/8K HDR video playback
with secure DRM HW to protect contents on memory. The HW needs
physically contiguous memory chunks up to several hundred MB memory.
This patchset is against on next-20201130.
The patchset includes the following:
- cma_alloc_bulk API
- export dma-heap API to register kernel module dma heap.
- add chunk heap implementation.
* Since v1 -
https://lore.kernel.org/linux-mm/20201117181935.3613581-1-minchan@kernel.or…
* introduce alloc_contig_mode - David
* use default CMA instead of device tree - John
Hyesoo Yu (2):
dma-buf: add export symbol for dma-heap
dma-buf: heaps: add chunk heap to dmabuf heaps
Minchan Kim (2):
mm: introduce alloc_contig_mode
mm: introduce cma_alloc_bulk API
drivers/dma-buf/dma-heap.c | 2 +
drivers/dma-buf/heaps/Kconfig | 15 +
drivers/dma-buf/heaps/Makefile | 1 +
drivers/dma-buf/heaps/chunk_heap.c | 429 +++++++++++++++++++++++++++++
drivers/virtio/virtio_mem.c | 2 +-
include/linux/cma.h | 5 +
include/linux/gfp.h | 10 +-
kernel/dma/contiguous.c | 1 +
mm/cma.c | 134 ++++++++-
mm/page_alloc.c | 25 +-
10 files changed, 607 insertions(+), 17 deletions(-)
create mode 100644 drivers/dma-buf/heaps/chunk_heap.c
--
2.29.2.454.gaff20da3a2-goog
On Tue, Nov 24, 2020 at 2:45 PM Lee Jones <lee.jones(a)linaro.org> wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:403: warning: Function parameter or member 'job' not described in 'sdma_v5_0_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:403: warning: Function parameter or member 'flags' not described in 'sdma_v5_0_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:480: warning: Function parameter or member 'addr' not described in 'sdma_v5_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:480: warning: Function parameter or member 'seq' not described in 'sdma_v5_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:480: warning: Function parameter or member 'flags' not described in 'sdma_v5_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:480: warning: Excess function parameter 'fence' description in 'sdma_v5_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:967: warning: Function parameter or member 'timeout' not described in 'sdma_v5_0_ring_test_ib'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1074: warning: Function parameter or member 'value' not described in 'sdma_v5_0_vm_write_pte'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1074: warning: Excess function parameter 'addr' description in 'sdma_v5_0_vm_write_pte'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1074: warning: Excess function parameter 'flags' description in 'sdma_v5_0_vm_write_pte'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1126: warning: Function parameter or member 'ring' not described in 'sdma_v5_0_ring_pad_ib'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1180: warning: Function parameter or member 'vmid' not described in 'sdma_v5_0_ring_emit_vm_flush'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1180: warning: Function parameter or member 'pd_addr' not described in 'sdma_v5_0_ring_emit_vm_flush'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1180: warning: Excess function parameter 'vm' description in 'sdma_v5_0_ring_emit_vm_flush'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1703: warning: Function parameter or member 'ib' not described in 'sdma_v5_0_emit_copy_buffer'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1703: warning: Function parameter or member 'tmz' not described in 'sdma_v5_0_emit_copy_buffer'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1703: warning: Excess function parameter 'ring' description in 'sdma_v5_0_emit_copy_buffer'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1729: warning: Function parameter or member 'ib' not described in 'sdma_v5_0_emit_fill_buffer'
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c:1729: warning: Excess function parameter 'ring' description in 'sdma_v5_0_emit_fill_buffer'
>
> Cc: Alex Deucher <alexander.deucher(a)amd.com>
> Cc: "Christian König" <christian.koenig(a)amd.com>
> Cc: David Airlie <airlied(a)linux.ie>
> Cc: Daniel Vetter <daniel(a)ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> Cc: amd-gfx(a)lists.freedesktop.org
> Cc: dri-devel(a)lists.freedesktop.org
> Cc: linux-media(a)vger.kernel.org
> Cc: linaro-mm-sig(a)lists.linaro.org
> Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
Applied with minor fixes. Thanks!
Alex
> ---
> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> index 9c72b95b74639..5180a52a79a54 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c
> @@ -392,7 +392,9 @@ static void sdma_v5_0_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count)
> * sdma_v5_0_ring_emit_ib - Schedule an IB on the DMA engine
> *
> * @ring: amdgpu ring pointer
> + * @job: job to retrive vmid from
> * @ib: IB object to schedule
> + * @flags: unused
> *
> * Schedule an IB in the DMA ring (NAVI10).
> */
> @@ -469,7 +471,9 @@ static void sdma_v5_0_ring_emit_hdp_flush(struct amdgpu_ring *ring)
> * sdma_v5_0_ring_emit_fence - emit a fence on the DMA ring
> *
> * @ring: amdgpu ring pointer
> - * @fence: amdgpu fence object
> + * @addr: address
> + * @seq: sequence number
> + * @flags: fence related flags
> *
> * Add a DMA fence packet to the ring to write
> * the fence seq number and DMA trap packet to generate
> @@ -959,6 +963,7 @@ static int sdma_v5_0_ring_test_ring(struct amdgpu_ring *ring)
> * sdma_v5_0_ring_test_ib - test an IB on the DMA engine
> *
> * @ring: amdgpu_ring structure holding ring information
> + * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
> *
> * Test a simple IB in the DMA ring (NAVI10).
> * Returns 0 on success, error on failure.
> @@ -1061,10 +1066,9 @@ static void sdma_v5_0_vm_copy_pte(struct amdgpu_ib *ib,
> *
> * @ib: indirect buffer to fill with commands
> * @pe: addr of the page entry
> - * @addr: dst addr to write into pe
> + * @value: dst addr to write into pe
> * @count: number of page entries to update
> * @incr: increase next addr by incr bytes
> - * @flags: access flags
> *
> * Update PTEs by writing them manually using sDMA (NAVI10).
> */
> @@ -1118,6 +1122,7 @@ static void sdma_v5_0_vm_set_pte_pde(struct amdgpu_ib *ib,
>
> /**
> * sdma_v5_0_ring_pad_ib - pad the IB
> + * @ring: amdgpu_ring structure holding ring information
> * @ib: indirect buffer to fill with padding
> *
> * Pad the IB with NOPs to a boundary multiple of 8.
> @@ -1170,7 +1175,8 @@ static void sdma_v5_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring)
> * sdma_v5_0_ring_emit_vm_flush - vm flush using sDMA
> *
> * @ring: amdgpu_ring pointer
> - * @vm: amdgpu_vm pointer
> + * @vmid: vmid number to use
> + * @pd_addr: address
> *
> * Update the page table base and flush the VM TLB
> * using sDMA (NAVI10).
> @@ -1686,10 +1692,11 @@ static void sdma_v5_0_set_irq_funcs(struct amdgpu_device *adev)
> /**
> * sdma_v5_0_emit_copy_buffer - copy buffer using the sDMA engine
> *
> - * @ring: amdgpu_ring structure holding ring information
> + * @ib: indirect buffer to copy to
> * @src_offset: src GPU address
> * @dst_offset: dst GPU address
> * @byte_count: number of bytes to xfer
> + * @tmz: if a secure copy should be used
> *
> * Copy GPU buffers using the DMA engine (NAVI10).
> * Used by the amdgpu ttm implementation to move pages if
> @@ -1715,7 +1722,7 @@ static void sdma_v5_0_emit_copy_buffer(struct amdgpu_ib *ib,
> /**
> * sdma_v5_0_emit_fill_buffer - fill buffer using the sDMA engine
> *
> - * @ring: amdgpu_ring structure holding ring information
> + * @ib: indirect buffer to fill
> * @src_data: value to write to buffer
> * @dst_offset: dst GPU address
> * @byte_count: number of bytes to xfer
> --
> 2.25.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Nov 24, 2020 at 2:45 PM Lee Jones <lee.jones(a)linaro.org> wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:219: warning: Function parameter or member 'bo' not described in 'uvd_v7_0_enc_get_create_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:219: warning: Excess function parameter 'adev' description in 'uvd_v7_0_enc_get_create_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:282: warning: Function parameter or member 'bo' not described in 'uvd_v7_0_enc_get_destroy_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:282: warning: Excess function parameter 'adev' description in 'uvd_v7_0_enc_get_destroy_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:339: warning: Function parameter or member 'timeout' not described in 'uvd_v7_0_enc_ring_test_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:527: warning: Function parameter or member 'handle' not described in 'uvd_v7_0_hw_init'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:527: warning: Excess function parameter 'adev' description in 'uvd_v7_0_hw_init'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:605: warning: Function parameter or member 'handle' not described in 'uvd_v7_0_hw_fini'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:605: warning: Excess function parameter 'adev' description in 'uvd_v7_0_hw_fini'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1156: warning: Function parameter or member 'addr' not described in 'uvd_v7_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1156: warning: Function parameter or member 'seq' not described in 'uvd_v7_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1156: warning: Function parameter or member 'flags' not described in 'uvd_v7_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1156: warning: Excess function parameter 'fence' description in 'uvd_v7_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1195: warning: Function parameter or member 'addr' not described in 'uvd_v7_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1195: warning: Function parameter or member 'seq' not described in 'uvd_v7_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1195: warning: Function parameter or member 'flags' not described in 'uvd_v7_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1195: warning: Excess function parameter 'fence' description in 'uvd_v7_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1293: warning: Function parameter or member 'job' not described in 'uvd_v7_0_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1293: warning: Function parameter or member 'flags' not described in 'uvd_v7_0_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1324: warning: Function parameter or member 'job' not described in 'uvd_v7_0_enc_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c:1324: warning: Function parameter or member 'flags' not described in 'uvd_v7_0_enc_ring_emit_ib'
>
> Cc: Alex Deucher <alexander.deucher(a)amd.com>
> Cc: "Christian König" <christian.koenig(a)amd.com>
> Cc: David Airlie <airlied(a)linux.ie>
> Cc: Daniel Vetter <daniel(a)ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> Cc: amd-gfx(a)lists.freedesktop.org
> Cc: dri-devel(a)lists.freedesktop.org
> Cc: linux-media(a)vger.kernel.org
> Cc: linaro-mm-sig(a)lists.linaro.org
> Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
Applied with minor fixes. Thanks!
Alex
> ---
> drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> index b44c8677ce8d5..9911ff80a6776 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c
> @@ -206,9 +206,9 @@ static int uvd_v7_0_enc_ring_test_ring(struct amdgpu_ring *ring)
> /**
> * uvd_v7_0_enc_get_create_msg - generate a UVD ENC create msg
> *
> - * @adev: amdgpu_device pointer
> * @ring: ring we should submit the msg to
> * @handle: session handle to use
> + * @bo: amdgpu object for which we query the offset
> * @fence: optional fence to return
> *
> * Open up a stream for HW test
> @@ -269,9 +269,9 @@ static int uvd_v7_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
> /**
> * uvd_v7_0_enc_get_destroy_msg - generate a UVD ENC destroy msg
> *
> - * @adev: amdgpu_device pointer
> * @ring: ring we should submit the msg to
> * @handle: session handle to use
> + * @bo: amdgpu object for which we query the offset
> * @fence: optional fence to return
> *
> * Close up a stream for HW test or if userspace failed to do so
> @@ -333,6 +333,7 @@ static int uvd_v7_0_enc_get_destroy_msg(struct amdgpu_ring *ring, uint32_t handl
> * uvd_v7_0_enc_ring_test_ib - test if UVD ENC IBs are working
> *
> * @ring: the engine to test on
> + * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
> *
> */
> static int uvd_v7_0_enc_ring_test_ib(struct amdgpu_ring *ring, long timeout)
> @@ -519,7 +520,7 @@ static int uvd_v7_0_sw_fini(void *handle)
> /**
> * uvd_v7_0_hw_init - start and test UVD block
> *
> - * @adev: amdgpu_device pointer
> + * @handle: handle used to pass amdgpu_device pointer
> *
> * Initialize the hardware, boot up the VCPU and do some testing
> */
> @@ -597,7 +598,7 @@ static int uvd_v7_0_hw_init(void *handle)
> /**
> * uvd_v7_0_hw_fini - stop the hardware block
> *
> - * @adev: amdgpu_device pointer
> + * @handle: handle used to pass amdgpu_device pointer
> *
> * Stop the UVD block, mark ring as not ready any more
> */
> @@ -1147,7 +1148,9 @@ static void uvd_v7_0_stop(struct amdgpu_device *adev)
> * uvd_v7_0_ring_emit_fence - emit an fence & trap command
> *
> * @ring: amdgpu_ring pointer
> - * @fence: fence to emit
> + * @addr: address
> + * @seq: sequence number
> + * @flags: fence related flags
> *
> * Write a fence and a trap command to the ring.
> */
> @@ -1186,7 +1189,9 @@ static void uvd_v7_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr, u64 seq
> * uvd_v7_0_enc_ring_emit_fence - emit an enc fence & trap command
> *
> * @ring: amdgpu_ring pointer
> - * @fence: fence to emit
> + * @addr: address
> + * @seq: sequence number
> + * @flags: fence related flags
> *
> * Write enc a fence and a trap command to the ring.
> */
> @@ -1282,7 +1287,9 @@ static int uvd_v7_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
> * uvd_v7_0_ring_emit_ib - execute indirect buffer
> *
> * @ring: amdgpu_ring pointer
> + * @job: job to retrive vmid from
> * @ib: indirect buffer to execute
> + * @flags: unused
> *
> * Write ring commands to execute the indirect buffer
> */
> @@ -1313,7 +1320,9 @@ static void uvd_v7_0_ring_emit_ib(struct amdgpu_ring *ring,
> * uvd_v7_0_enc_ring_emit_ib - enc execute indirect buffer
> *
> * @ring: amdgpu_ring pointer
> + * @job: job to retrive vmid from
> * @ib: indirect buffer to execute
> + * @flags: unused
> *
> * Write enc ring commands to execute the indirect buffer
> */
> --
> 2.25.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Nov 24, 2020 at 2:44 PM Lee Jones <lee.jones(a)linaro.org> wrote:
>
> Fixes the following W=1 kernel build warning(s):
>
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:211: warning: Function parameter or member 'bo' not described in 'uvd_v6_0_enc_get_create_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:211: warning: Excess function parameter 'adev' description in 'uvd_v6_0_enc_get_create_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:275: warning: Function parameter or member 'bo' not described in 'uvd_v6_0_enc_get_destroy_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:275: warning: Excess function parameter 'adev' description in 'uvd_v6_0_enc_get_destroy_msg'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:332: warning: Function parameter or member 'timeout' not described in 'uvd_v6_0_enc_ring_test_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:472: warning: Function parameter or member 'handle' not described in 'uvd_v6_0_hw_init'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:472: warning: Excess function parameter 'adev' description in 'uvd_v6_0_hw_init'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:541: warning: Function parameter or member 'handle' not described in 'uvd_v6_0_hw_fini'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:541: warning: Excess function parameter 'adev' description in 'uvd_v6_0_hw_fini'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:900: warning: Function parameter or member 'addr' not described in 'uvd_v6_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:900: warning: Function parameter or member 'seq' not described in 'uvd_v6_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:900: warning: Function parameter or member 'flags' not described in 'uvd_v6_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:900: warning: Excess function parameter 'fence' description in 'uvd_v6_0_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:930: warning: Function parameter or member 'addr' not described in 'uvd_v6_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:930: warning: Function parameter or member 'seq' not described in 'uvd_v6_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:930: warning: Function parameter or member 'flags' not described in 'uvd_v6_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:930: warning: Excess function parameter 'fence' description in 'uvd_v6_0_enc_ring_emit_fence'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:997: warning: Function parameter or member 'job' not described in 'uvd_v6_0_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:997: warning: Function parameter or member 'flags' not described in 'uvd_v6_0_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:1023: warning: Function parameter or member 'job' not described in 'uvd_v6_0_enc_ring_emit_ib'
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c:1023: warning: Function parameter or member 'flags' not described in 'uvd_v6_0_enc_ring_emit_ib'
>
> Cc: Alex Deucher <alexander.deucher(a)amd.com>
> Cc: "Christian König" <christian.koenig(a)amd.com>
> Cc: David Airlie <airlied(a)linux.ie>
> Cc: Daniel Vetter <daniel(a)ffwll.ch>
> Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
> Cc: amd-gfx(a)lists.freedesktop.org
> Cc: dri-devel(a)lists.freedesktop.org
> Cc: linux-media(a)vger.kernel.org
> Cc: linaro-mm-sig(a)lists.linaro.org
> Signed-off-by: Lee Jones <lee.jones(a)linaro.org>
Applied with minor fixes. Thanks!
Alex
> ---
> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> index 666bfa4a0b8ea..69cf7edf4cc61 100644
> --- a/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c
> @@ -198,9 +198,9 @@ static int uvd_v6_0_enc_ring_test_ring(struct amdgpu_ring *ring)
> /**
> * uvd_v6_0_enc_get_create_msg - generate a UVD ENC create msg
> *
> - * @adev: amdgpu_device pointer
> * @ring: ring we should submit the msg to
> * @handle: session handle to use
> + * @bo: amdgpu object for which we query the offset
> * @fence: optional fence to return
> *
> * Open up a stream for HW test
> @@ -261,9 +261,9 @@ static int uvd_v6_0_enc_get_create_msg(struct amdgpu_ring *ring, uint32_t handle
> /**
> * uvd_v6_0_enc_get_destroy_msg - generate a UVD ENC destroy msg
> *
> - * @adev: amdgpu_device pointer
> * @ring: ring we should submit the msg to
> * @handle: session handle to use
> + * @bo: amdgpu object for which we query the offset
> * @fence: optional fence to return
> *
> * Close up a stream for HW test or if userspace failed to do so
> @@ -326,6 +326,7 @@ static int uvd_v6_0_enc_get_destroy_msg(struct amdgpu_ring *ring,
> * uvd_v6_0_enc_ring_test_ib - test if UVD ENC IBs are working
> *
> * @ring: the engine to test on
> + * @timeout: timeout value in jiffies, or MAX_SCHEDULE_TIMEOUT
> *
> */
> static int uvd_v6_0_enc_ring_test_ib(struct amdgpu_ring *ring, long timeout)
> @@ -464,7 +465,7 @@ static int uvd_v6_0_sw_fini(void *handle)
> /**
> * uvd_v6_0_hw_init - start and test UVD block
> *
> - * @adev: amdgpu_device pointer
> + * @handle: handle used to pass amdgpu_device pointer
> *
> * Initialize the hardware, boot up the VCPU and do some testing
> */
> @@ -533,7 +534,7 @@ static int uvd_v6_0_hw_init(void *handle)
> /**
> * uvd_v6_0_hw_fini - stop the hardware block
> *
> - * @adev: amdgpu_device pointer
> + * @handle: handle used to pass amdgpu_device pointer
> *
> * Stop the UVD block, mark ring as not ready any more
> */
> @@ -891,7 +892,9 @@ static void uvd_v6_0_stop(struct amdgpu_device *adev)
> * uvd_v6_0_ring_emit_fence - emit an fence & trap command
> *
> * @ring: amdgpu_ring pointer
> - * @fence: fence to emit
> + * @addr: address
> + * @seq: sequence number
> + * @flags: fence related flags
> *
> * Write a fence and a trap command to the ring.
> */
> @@ -921,7 +924,9 @@ static void uvd_v6_0_ring_emit_fence(struct amdgpu_ring *ring, u64 addr, u64 seq
> * uvd_v6_0_enc_ring_emit_fence - emit an enc fence & trap command
> *
> * @ring: amdgpu_ring pointer
> - * @fence: fence to emit
> + * @addr: address
> + * @seq: sequence number
> + * @flags: fence related flags
> *
> * Write enc a fence and a trap command to the ring.
> */
> @@ -986,7 +991,9 @@ static int uvd_v6_0_ring_test_ring(struct amdgpu_ring *ring)
> * uvd_v6_0_ring_emit_ib - execute indirect buffer
> *
> * @ring: amdgpu_ring pointer
> + * @job: job to retrive vmid from
> * @ib: indirect buffer to execute
> + * @flags: unused
> *
> * Write ring commands to execute the indirect buffer
> */
> @@ -1012,7 +1019,9 @@ static void uvd_v6_0_ring_emit_ib(struct amdgpu_ring *ring,
> * uvd_v6_0_enc_ring_emit_ib - enc execute indirect buffer
> *
> * @ring: amdgpu_ring pointer
> + * @job: job to retrive vmid from
> * @ib: indirect buffer to execute
> + * @flags: unused
> *
> * Write enc ring commands to execute the indirect buffer
> */
> --
> 2.25.1
>
> _______________________________________________
> dri-devel mailing list
> dri-devel(a)lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel