Hi,
I'm working on Android Linux Kernel Vesion 3.0.15 and seeing a
"deadlock" in the ashmem driver, while handling mmap request. I seek
your support in finding the correct fix.
The locks that involved in the dead lock are
1) mm->mmap_sem
2) ashmem_mutex
The following is the sequence of events that leads to the deadlock.
There are two threads A and B that belong to the same process
(system_server) and hence share the mm struct.
A1) In the A's context an mmap system call is made with an fd of ashmem
A2) The system call sys_mmap_pgoff acquires the mmap_sem of the "mm"
and sleeps before calling the .mmap of ashmem i.e before calling
ashmem_mmap
Now the thread B runs and proceeds to do the following
B1) In the B's context ashmem ioctl with option ASHMEM_SET_NAME is called.
B2) Now the code proceeds to acquire the ashmem_mutex and performs a
"copy_from_user"
B3) copy_from_user raises a valid exception to copy the data from user
space and proceeds to handle it gracefully, do_DataAbort -->
do_page_fault
B4) In do_page_fault it finds that the mm->mmap_sem is not available
(Note A & B share the mm) since A has it and sleeps
Now the thread A runs again
A3) It proceeds to call ashmem_mmap and tries to acquired
ashmem_mutex, which is not available (is with B) and sleeps.
Now A has acquired mmap_sem and waits for B to release ashmem_mutex
B has acquired ashmem_mutex and waits for the mmap_sem to be
available, which is held by A
This creates a dead lock in the system.
I'm not sure how to use these locks in such a way as to prevent this
scenario. Any suggestions would be of great help.
Workaround:
One possible work around is to replace the mutex_lock call made in the
ashmem_mmap with mutex_trylock and if it fails, wait for few
milliseconds and try back for few iterations and finally give up after
few iterations. This will bring the system out deadlock if this
scneario happens. I myself feel that this suggestion is not clean. But
I'm unable to think of anything.
Is there any suggestion to avoid this scenario.
Warm Regards,
Shankar
Hello,
This is the last missing piece to let us efficiently use large DMA
buffers on systems with lots of memory, which have support for himem
enabled. The first patch adds support for CMA regions placed in high
memory zones, the second one also enables allocations of individual
pages from high memory zone for IOMMU-mapped devices. Those two changes
let us to significantly save low memory for other tasks.
Best regards
Marek Szyprowski
Samsung Poland R&D Center
Patch summary:
Marek Szyprowski (2):
ARM: dma-mapping: add support for CMA regions placed in highmem zone
ARM: dma-mapping: use himem for DMA buffers for IOMMU-mapped devices
arch/arm/mm/dma-mapping.c | 70 +++++++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 18 deletions(-)
--
1.7.9.5
Hi,
On 2013-02-06 00:27, Laurent Pinchart wrote:
> Hello,
>
> We've hosted a CDF meeting at the FOSDEM on Sunday morning. Here's a summary
> of the discussions.
Thanks for the summary. I've been on a longish leave, and just got back,
so I haven't read the recent CDF discussions on lists yet. I thought
I'll start by replying to this summary first =).
> 0. Abbreviations
> ----------------
>
> DBI - Display Bus Interface, a parallel video control and data bus that
> transmits data using parallel data, read/write, chip select and address
> signals, similarly to 8051-style microcontroller parallel busses. This is a
> mixed video control and data bus.
>
> DPI - Display Pixel Interface, a parallel video data bus that transmits data
> using parallel data, h/v sync and clock signals. This is a video data bus
> only.
>
> DSI - Display Serial Interface, a serial video control and data bus that
> transmits data using one or more differential serial lines. This is a mixed
> video control and data bus.
In case you'll re-use these abbrevs in later posts, I think it would be
good to mention that DPI is a one-way bus, whereas DBI and DSI are
two-way (perhaps that's implicit with control bus, though).
> 1. Goals
> --------
>
> The meeting started with a brief discussion about the CDF goals.
>
> Tomi Valkeinin and Tomasz Figa have sent RFC patches to show their views of
> what CDF could/should be. Many others have provided very valuable feedback.
> Given the early development stage propositions were sometimes contradictory,
> and focused on different areas of interest. We have thus started the meeting
> with a discussion about what CDF should try to achieve, and what it shouldn't.
>
> CDF has two main purposes. The original goal was to support display panels in
> a platform- and subsystem-independent way. While mostly useful for embedded
> systems, the emergence of platforms such as Intel Medfield and ARM-based PCs
> that blends the embedded and PC worlds makes panel support useful for the PC
> world as well.
>
> The second purpose is to provide a cross-subsystem interface to support video
> encoders. The idea originally came from a generalisation of the original RFC
> that supported panels only. While encoder support is considered as lower
> priority than display panel support by developers focussed on display
> controller driver (Intel, Renesas, ST Ericsson, TI), companies that produce
> video encoders (Analog Devices, and likely others) don't share that point of
> view and would like to provide a single encoder driver that can be used in
> both KMS and V4L2 drivers.
What is an encoder? Something that takes a video signal in, and lets the
CPU store the received data to memory? Isn't that a decoder?
Or do you mean something that takes a video signal in, and outputs a
video signal in another format? (transcoder?)
If the latter, I don't see them as lower priority. If we use CDF also
for SoC internal components (which I think would be great), then every
OMAP board has transcoders.
I'm not sure about the vocabulary on this area, but a normal OMAP
scenario could have a following video pipeline:
1. encoder (OMAP's DISPC, reads pixels from memory and outputs parallel RGB)
2. transcoder (OMAP's DSI, gets parallel RGB and outputs DSI)
3. transcoder (external DSI-to-LVDS chip)
4. panel (LVDS panel)
Even in the case where a panel would be connected directly to the OMAP,
there would be the internal transcoder.
> 2. Subsystems
> -------------
>
> Display panels are used in conjunction with FBDEV and KMS drivers. There was
> to the audience knowledge no V4L2 driver that needs to explicitly handle
> display panels. Even though at least one V4L2 output drivers (omap_vout) can
> output video to a display panel, it does so in conjunction with the KMS and/or
> FBDEV APIs that handle panel configuration. Panels are thus not exposed to
> V4L2 drivers.
Hmm, I'm no expert on omap_vout, but it doesn't use KMS nor omapfb. It
uses omapdss directly, and thus accesses the panels.
That said, I'm fine with leaving omap_vout out from the equation.
> 3. KMS Extensions
> -----------------
>
> The usefulness of V4L2 for output devices was questioned, and the possibility
> of using KMS for complex video devices usually associated with V4L2 was
> raised. The TI DaVinci 8xxx family is an example of chips that could benefit
> from KMS support.
>
> The KMS API is lacking support for deep-pipelining ("framebuffers" that are
> sourced from a data stream instead of a memory buffer) today. Extending the
> KMS API with deep-pipelining support was considered as a sensible goal that
> would mostly require the creation of a new KMS source object. Exposing the
> topology of the whole device would then be handled by the Media Controller
> API.
Isn't there also the problem that KSM doesn't support arbitrarily long
chains of display devices? That actually sounds more like
"deep-pipelining" than what you said, getting the source data from a
data stream.
> 5. Bus Model
> ------------
>
> Display panels are connected to a video bus that transmits video data and
> optionally to a control bus. Those two busses can be separate physical
> interfaces or combined into a single physical interface.
>
> The Linux device model represents the system as a tree of devices (not to be
> confused by the device tree, abreviated as DT). The tree is organized around
> control busses, with every device being a child of its control bus master. For
> instance an I2C device will be a child of its I2C controller device, which can
> itself be a child of its parent PCI device.
>
> Display panels will be represented as Linux devices. They will have a single
> parent from the Linux device model point of view, but will be potentially
> connected to multiple physical busses. CDF thus needs to define what bus to
> select as the Linux parent bus.
>
> In theory any physical bus that the device is attached to can be selected as
> the parent bus. However, selecting a video data bus would depart from the
> traditional Linux device model that uses control busses only. This caused
> concern among several people who argued that not presenting the device to the
> kernel as attached to its control bus would bring issues in embedded system.
> Unlike on PC systems where the control bus master is usually the same physical
> device as the data bus master, embedded systems are made of a potentially
> complex assembly of completely unrelated devices. Not representing an I2C-
> controlled panel as a child of its I2C master in DT was thus frown upon, even
> though no clear agreement was reached on the subject.
I've been thinking that a good rule of thumb would be that the device
must be somewhat usable after the parent bus is ready. So for, say,
DPI+SPI panel, when the SPI is set up the driver can send messages to
the panel, perhaps read an ID or such, even if the actual video cannot
be shown yet (presuming DPI bus is still missing).
Of course there are the funny cases, as always. Say, a DSI panel,
controlled via i2c, and the panel gets its functional clock from the DSI
bus's clock. In that case both busses need to be up and running before
the panel can do anything.
> - Combined video and control busses
>
> When the two busses are combined in a single physical bus the panel device
> will obviously be represented as a child of that single physical bus.
>
> In such cases the control bus could expose video bus control methods. This
> would remove the need for a video source as proposed by Tomi Valkeinen in his
> CDF model. However, if the bus can be used for video data transfer in
> combination with a different control bus, a video source corresponding to the
> data bus will be needed.
I think this is always the case. If a bus can be used for control and
video data, you can always use it only for video data.
> No decision has been taken on whether to use a video source in addition to the
> control bus in the combined busses case. Experimentation will be needed, and
> the right solution might depend on the bus type.
>
> - Multiple control busses
>
> One panel was mentioned as being connected to a DSI bus and an I2C bus. The
> DSI bus is used for both control and video, and the I2C bus for control only.
> configuring the panel requires sending commands through both DSI and I2C. The
I have luckily not encountered such a device. However, many of the DSI
devices do have i2c control as an option. From the device's point of
view, both can be used at the same time, but I think usually it's saner
to just pick one and use it.
The driver for the device should support both control busses, though.
Probably 99% of the driver code is common for both cases.
> 6. Miscellaneous
> ----------------
>
> - If the OMAP3 DSS driver is used as a model for the DSI support
> implementation, Daniel Vetter requested the DSI bus lock semaphore to be
> killed as it prevents lockdep from working correctly (reference needed ;-)).
I don't think OMAP DSS should be used as a model. It has too much legacy
crap that should be rewritten. However, it can be used as a reference to
see what kind of features are needed, as it does support both video and
command mode DSI modes, and has been used with many different kinds of
DSI panels and DSI transcoders.
As for the semaphore, sure, it can be removed, although I'm not aware of
this lockdep problem. If there's a problem it should be fixed in any case.
> - Do we need to support chaining several encoders ? We can come up with
> several theoretical use cases, some of them probably exist in real hardware,
> but the details are still a bit fuzzy.
If encoder means the same as the "transcoder" term I used earlier, then
yes, I think so.
As I wrote, I'd like to model the OMAP DSS internal components with CDF.
The internal IP blocks are in no way different than external IP blocks,
they just happen to be integrated into OMAP. The same display IPs are
used with multiple different TI SoCs.
Also, the IPs vary between TI SoCs (for ex, omap2 doesn't have DSI,
omap3 has one DSI, omap4 has two DSIs), so we'll anyway need to have
some kind of dynamic system inside omapdss driver. If I can't use CDF
for that, I'll need to implement a custom one, which I believe would
resemble CDF in many ways.
I'm guessing that having multiple external transcoders is quite rare on
production hardware, but is a very useful feature with development
boards. It's not just once or twice that we've used a transcoder or two
between a SoC and a panel, because we haven't had the final panel yet.
Also, sometimes there are small simple chips in the video pipeline, that
do things like level shifting or ESD protection. In some cases these
chips just work automatically, but in some cases one needs to setup
regulators and gpios to get them up and running (for example,
http://www.ti.com/product/tpd12s015). And if that's the case, then I
believe having a CDF "transcoder" driver for the chip is the easiest
solution.
Tomi
Alignment order for a dma iommu buffer is set by buffer size. For
large buffer, it is a waste of iommu address space. So configurable
parameter to limit maximum alignment order can reduce the waste.
Signed-off-by: Seung-Woo Kim <sw0312.kim(a)samsung.com>
Signed-off-by: Kyungmin.park <kyungmin.park(a)samsung.com>
---
There was stupid misses in previous patch, so I resend fixed version.
arch/arm/Kconfig | 21 +++++++++++++++++++++
arch/arm/mm/dma-mapping.c | 3 +++
2 files changed, 24 insertions(+), 0 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 67874b8..4e89112 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -75,6 +75,27 @@ config ARM_DMA_USE_IOMMU
select ARM_HAS_SG_CHAIN
select NEED_SG_DMA_LENGTH
+if ARM_DMA_USE_IOMMU
+
+config ARM_DMA_IOMMU_ALIGNMENT
+ int "Maximum PAGE_SIZE order of alignment for DMA IOMMU buffers"
+ range 4 9
+ default 8
+ help
+ DMA mapping framework by default aligns all buffers to the smallest
+ PAGE_SIZE order which is greater than or equal to the requested buffer
+ size. This works well for buffers up to a few hundreds kilobytes, but
+ for larger buffers it just a waste of address space. Drivers which has
+ relatively small addressing window (like 64Mib) might run out of
+ virtual space with just a few allocations.
+
+ With this parameter you can specify the maximum PAGE_SIZE order for
+ DMA IOMMU buffers. Larger buffers will be aligned only to this
+ specified order. The order is expressed as a power of two multiplied
+ by the PAGE_SIZE.
+
+endif
+
config HAVE_PWM
bool
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 076c26d..4cce854 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1002,6 +1002,9 @@ static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
unsigned int count, start;
unsigned long flags;
+ if (order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT)
+ order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT;
+
count = ((PAGE_ALIGN(size) >> PAGE_SHIFT) +
(1 << mapping->order) - 1) >> mapping->order;
--
1.7.4.1
So I'm resending the patch series for reservations. This is identical to my git
tree at
http://cgit.freedesktop.org/~mlankhorst/linux/
Some changes have been made since my last version. Most notably is the use of
mutexes now instead of creating my own lock primitive, that would end up being
duplicate anyway.
The git tree also has a version of i915 and radeon working together like that.
It's probably a bit hacky, but it works on my macbook pro 8.2. :)
I haven't had any reply on the mutex extensions when I sent them out separately,
so I'm including it in the series.
The idea is that for lockdep purposes, the seqno is tied to a locking a class.
This locking class it not exclusive, but as can be seen from the last patch in
the series, it catches all violations we care about.
Hi everybody,
Would anyone be interested in meeting at the FOSDEM to discuss the Common
Display Framework ? There will be a CDF meeting at the ELC at the end of
February, the FOSDEM would be a good venue for European developers.
--
Regards,
Laurent Pinchart
Hi all,
I've just confirmed the schedule for ELC (
http://events.linuxfoundation.org/events/embedded-linux-conference/schedule),
and we're on for Thursday at 4pm. I requested 2 slots to give us some
flexibility (that's the "Part I & Part II" on the schedule).
Whether you are attending the rest of ELC or not, if you care about CDF,
please come by.
cheers,
Jesse
Adding a new dma attribute which can be used by the
platform drivers to avoid creating iommu mappings.
In some cases the buffers are allocated by display
controller driver using dma alloc apis but are not
used for scanout. Though the buffers are allocated
by display controller but are only used for sharing
among different devices.
With this attribute the platform drivers can choose
not to create iommu mapping at the time of buffer
allocation and only create the mapping when they
access this buffer.
Change-Id: I2178b3756170982d814e085ca62474d07b616a21
Signed-off-by: Abhinav Kochhar <abhinav.k(a)samsung.com>
---
arch/arm/mm/dma-mapping.c | 8 +++++---
include/linux/dma-attrs.h | 1 +
2 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index c0f0f43..e73003c 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1279,9 +1279,11 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
if (!pages)
return NULL;
- *handle = __iommu_create_mapping(dev, pages, size);
- if (*handle == DMA_ERROR_CODE)
- goto err_buffer;
+ if (!dma_get_attr(DMA_ATTR_NO_IOMMU_MAPPING, attrs)) {
+ *handle = __iommu_create_mapping(dev, pages, size);
+ if (*handle == DMA_ERROR_CODE)
+ goto err_buffer;
+ }
if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
return pages;
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index c8e1831..1f04419 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -15,6 +15,7 @@ enum dma_attr {
DMA_ATTR_WEAK_ORDERING,
DMA_ATTR_WRITE_COMBINE,
DMA_ATTR_NON_CONSISTENT,
+ DMA_ATTR_NO_IOMMU_MAPPING,
DMA_ATTR_NO_KERNEL_MAPPING,
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
--
1.7.8.6
Hi all,
With all of the attention that the Common Display Framework is getting, I
was wondering if it was worth having a BoF discussion at ELC next month in
San Francisco. This will be only a couple of weeks after FOSDEM, but given
the pace that things seem to be moving, that could be a great opportunity
either to have a follow-on discussion, or simply to involve a slightly
different cross-section of the community in a face-to-face discussion. If
folks could let me know that they'll be at ELC and are interested in a BoF
there, I'll look into getting it set up.
cheers,
Jesse
This patch adds EXPORT_SYMBOL_GPL calls to the three arm iommu
functions - arm_iommu_create_mapping, arm_iommu_free_mapping
and arm_iommu_attach_device. These three functions are arm specific
wrapper functions for creating/freeing/using an iommu mapping and
they are called by various drivers. If any of these drivers need
to be built as dynamic modules, these functions need to be exported.
Changelog v2: using EXPORT_SYMBOL_GPL as suggested by Marek.
Signed-off-by: Prathyush K <prathyush.k(a)samsung.com>
---
arch/arm/mm/dma-mapping.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b2fb87..226ebcf 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1797,6 +1797,7 @@ err2:
err:
return ERR_PTR(err);
}
+EXPORT_SYMBOL_GPL(arm_iommu_create_mapping);
static void release_iommu_mapping(struct kref *kref)
{
@@ -1813,6 +1814,7 @@ void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
if (mapping)
kref_put(&mapping->kref, release_iommu_mapping);
}
+EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);
/**
* arm_iommu_attach_device
@@ -1841,5 +1843,6 @@ int arm_iommu_attach_device(struct device *dev,
pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
return 0;
}
+EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
#endif
--
1.8.0
From: Inki Dae <inki.dae(a)samsung.com>
This patch adds a new attribute, DMA_ATTR_SKIP_BUFFER_CLEAR
to skip buffer clearing. The buffer clearing also flushes CPU cache
so this operation has performance deterioration a little bit.
With this patch, allocated buffer region is cleared as default.
So if you want to skip the buffer clearing, just set this attribute.
But this flag should be used carefully because this use might get
access to some vulnerable content such as security data. So with this
patch, we make sure that all pages will be somehow cleared before
exposing to userspace.
For example, let's say that the security data had been stored
in some memory and freed without clearing it.
And then malicious process allocated the region though some buffer
allocator such as gem and ion without clearing it, and requested blit
operation with cleared another buffer though gpu or other drivers.
At this time, the malicious process could access the security data.
Signed-off-by: Inki Dae <inki.dae(a)samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park(a)samsung.com>
---
arch/arm/mm/dma-mapping.c | 6 ++++--
include/linux/dma-attrs.h | 1 +
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b2fb87..fbe9dff 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1058,7 +1058,8 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
if (!page)
goto error;
- __dma_clear_buffer(page, size);
+ if (!dma_get_attr(DMA_ATTR_SKIP_BUFFER_CLEAR, attrs))
+ __dma_clear_buffer(page, size);
for (i = 0; i < count; i++)
pages[i] = page + i;
@@ -1082,7 +1083,8 @@ static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
pages[i + j] = pages[i] + j;
}
- __dma_clear_buffer(pages[i], PAGE_SIZE << order);
+ if (!dma_get_attr(DMA_ATTR_SKIP_BUFFER_CLEAR, attrs))
+ __dma_clear_buffer(pages[i], PAGE_SIZE << order);
i += 1 << order;
count -= 1 << order;
}
diff --git a/include/linux/dma-attrs.h b/include/linux/dma-attrs.h
index c8e1831..2592c05 100644
--- a/include/linux/dma-attrs.h
+++ b/include/linux/dma-attrs.h
@@ -18,6 +18,7 @@ enum dma_attr {
DMA_ATTR_NO_KERNEL_MAPPING,
DMA_ATTR_SKIP_CPU_SYNC,
DMA_ATTR_FORCE_CONTIGUOUS,
+ DMA_ATTR_SKIP_BUFFER_CLEAR,
DMA_ATTR_MAX,
};
--
1.7.4.1
All drivers which implement this need to have some sort of refcount to
allow concurrent vmap usage. Hence implement this in the dma-buf core.
To protect against concurrent calls we need a lock, which potentially
causes new funny locking inversions. But this shouldn't be a problem
for exporters with statically allocated backing storage, and more
dynamic drivers have decent issues already anyway.
Inspired by some refactoring patches from Aaron Plattner, who
implemented the same idea, but only for drm/prime drivers.
v2: Check in dma_buf_release that no dangling vmaps are left.
Suggested by Aaron Plattner. We might want to do similar checks for
attachments, but that's for another patch. Also fix up ERR_PTR return
for vmap.
Cc: Aaron Plattner <aplattner(a)nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
---
Compile-tested only - Aaron has been bugging me too a bit too often
about this on irc.
Cheers, Daniel
---
Documentation/dma-buf-sharing.txt | 6 +++++-
drivers/base/dma-buf.c | 42 ++++++++++++++++++++++++++++++++++-----
include/linux/dma-buf.h | 4 +++-
3 files changed, 45 insertions(+), 7 deletions(-)
diff --git a/Documentation/dma-buf-sharing.txt b/Documentation/dma-buf-sharing.txt
index 0188903..4966b1b 100644
--- a/Documentation/dma-buf-sharing.txt
+++ b/Documentation/dma-buf-sharing.txt
@@ -302,7 +302,11 @@ Access to a dma_buf from the kernel context involves three steps:
void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
The vmap call can fail if there is no vmap support in the exporter, or if it
- runs out of vmalloc space. Fallback to kmap should be implemented.
+ runs out of vmalloc space. Fallback to kmap should be implemented. Note that
+ the dma-buf layer keeps a reference count for all vmap access and calls down
+ into the exporter's vmap function only when no vmapping exists, and only
+ unmaps it once. Protection against concurrent vmap/vunmap calls is provided
+ by taking the dma_buf->lock mutex.
3. Finish access
diff --git a/drivers/base/dma-buf.c b/drivers/base/dma-buf.c
index a3f79c4..36af5de 100644
--- a/drivers/base/dma-buf.c
+++ b/drivers/base/dma-buf.c
@@ -39,6 +39,8 @@ static int dma_buf_release(struct inode *inode, struct file *file)
dmabuf = file->private_data;
+ BUG_ON(dmabuf->vmapping_counter);
+
dmabuf->ops->release(dmabuf);
kfree(dmabuf);
return 0;
@@ -482,12 +484,34 @@ EXPORT_SYMBOL_GPL(dma_buf_mmap);
*/
void *dma_buf_vmap(struct dma_buf *dmabuf)
{
+ void *ptr;
+
if (WARN_ON(!dmabuf))
return NULL;
- if (dmabuf->ops->vmap)
- return dmabuf->ops->vmap(dmabuf);
- return NULL;
+ if (!dmabuf->ops->vmap)
+ return NULL;
+
+ mutex_lock(&dmabuf->lock);
+ if (dmabuf->vmapping_counter) {
+ dmabuf->vmapping_counter++;
+ BUG_ON(!dmabuf->vmap_ptr);
+ ptr = dmabuf->vmap_ptr;
+ goto out_unlock;
+ }
+
+ BUG_ON(dmabuf->vmap_ptr);
+
+ ptr = dmabuf->ops->vmap(dmabuf);
+ if (IS_ERR_OR_NULL(ptr))
+ goto out_unlock;
+
+ dmabuf->vmap_ptr = ptr;
+ dmabuf->vmapping_counter = 1;
+
+out_unlock:
+ mutex_unlock(&dmabuf->lock);
+ return ptr;
}
EXPORT_SYMBOL_GPL(dma_buf_vmap);
@@ -501,7 +525,15 @@ void dma_buf_vunmap(struct dma_buf *dmabuf, void *vaddr)
if (WARN_ON(!dmabuf))
return;
- if (dmabuf->ops->vunmap)
- dmabuf->ops->vunmap(dmabuf, vaddr);
+ BUG_ON(!dmabuf->vmap_ptr);
+ BUG_ON(dmabuf->vmapping_counter > 0);
+
+ mutex_lock(&dmabuf->lock);
+ if (--dmabuf->vmapping_counter == 0) {
+ if (dmabuf->ops->vunmap)
+ dmabuf->ops->vunmap(dmabuf, vaddr);
+ dmabuf->vmap_ptr = NULL;
+ }
+ mutex_unlock(&dmabuf->lock);
}
EXPORT_SYMBOL_GPL(dma_buf_vunmap);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index bd2e52c..e3bf2f6 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -119,8 +119,10 @@ struct dma_buf {
struct file *file;
struct list_head attachments;
const struct dma_buf_ops *ops;
- /* mutex to serialize list manipulation and attach/detach */
+ /* mutex to serialize list manipulation, attach/detach and vmap/unmap */
struct mutex lock;
+ unsigned vmapping_counter;
+ void *vmap_ptr;
void *priv;
};
--
1.7.11.7
Hi Linus,
A fairly small dma-buf pull request for 3.8 - only 2 patches. Could
you please pull?
Thanks!
~Sumit.
The following changes since commit f01af9f85855e38fbd601e033a8eac204cc4cc1c:
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
(2012-12-19 20:31:02 -0800)
are available in the git repository at:
git://git.linaro.org/people/sumitsemwal/linux-dma-buf.git
tags/tag-for-linus-3.8
for you to fetch changes up to ada65c74059f8c104f1b467c126205471634c435:
dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER (2012-12-20
12:05:06 +0530)
----------------------------------------------------------------
3.8: dma-buf minor updates
----------------------------------------------------------------
Maarten Lankhorst (1):
dma-buf: remove fallback for !CONFIG_DMA_SHARED_BUFFER
Rob Clark (1):
dma-buf: might_sleep() in dma_buf_unmap_attachment()
drivers/base/dma-buf.c | 2 +
include/linux/dma-buf.h | 99 -----------------------------------------------
2 files changed, 2 insertions(+), 99 deletions(-)
So I've gotten back to playing with prime for a day, and found some
old intel/radeon tests I had failing,
Tracked it down to a lifetime issue with the current code and can
think of two fixes,
The problem scenario is
1. i915: create gem object
2. i915: export gem object to prime
3. radeon: import gem object
4. close prime fd
5. radeon: unref object
6. i915: unref object
So we end up at this point, with a imported buffer record for the
dma_buf on the i915 file private.
Now if a subsequent test (without closing the drm fd) reallocates the
dma_buf with the same address, we can end up seeing that.
So why doesn't that reference get cleaned up?
So the reference gets added above at step 2, and when radeon unrefs
the object, i915 gets the dma-buf release callback, however at that
stage
we don't actually have the file priv to remove the pointer from, so it
dangles there.
Possible fixes:
a) take a reference on the dma_buf attached to the gem handle when we
export it, keep it until the gem handle goes away. I'm unsure if this
could create zombie objects, since the dma buf has a reference on the
gem object, but since the gem handle is separate to the object it
might work.
b) don't keep track of dma_buf, keep track of gem objects, when we get
a lookup, check inside the gem object, since currently we NULL out the
export_dma_buf when we hit the release path, apart from the fact I'm
sure the locking is foobar,
c) scan all the file privs for all the devices, no.
Anyone else any better plans?
Dave.
The debugfs show functions for client and heap were showing info
for the heap type instead of showing info of the individual heap.
Change-Id: Id5afe7963c8ddfafae1f959ce48dd5c2a5fcca07
Signed-off-by: Nishanth Peethambaran <nishanth(a)broadcom.com>
---
drivers/gpu/ion/ion.c | 18 +++++++++---------
include/linux/ion.h | 4 ++--
2 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/ion/ion.c b/drivers/gpu/ion/ion.c
index 6aa817a..48cda5d 100644
--- a/drivers/gpu/ion/ion.c
+++ b/drivers/gpu/ion/ion.c
@@ -413,7 +413,7 @@ struct ion_handle *ion_alloc(struct ion_client
*client, size_t len,
/* if the client doesn't support this heap type */
if (!((1 << heap->type) & client->heap_mask))
continue;
- /* if the caller didn't specify this heap type */
+ /* if the caller didn't specify this heap id */
if (!((1 << heap->id) & heap_mask))
continue;
buffer = ion_buffer_create(heap, dev, len, align, flags);
@@ -597,11 +597,11 @@ static int ion_debug_client_show(struct seq_file
*s, void *unused)
for (n = rb_first(&client->handles); n; n = rb_next(n)) {
struct ion_handle *handle = rb_entry(n, struct ion_handle,
node);
- enum ion_heap_type type = handle->buffer->heap->type;
+ int id = handle->buffer->heap->id;
- if (!names[type])
- names[type] = handle->buffer->heap->name;
- sizes[type] += handle->buffer->size;
+ if (!names[id])
+ names[id] = handle->buffer->heap->name;
+ sizes[id] += handle->buffer->size;
}
mutex_unlock(&client->lock);
@@ -1176,7 +1176,7 @@ static const struct file_operations ion_fops = {
};
static size_t ion_debug_heap_total(struct ion_client *client,
- enum ion_heap_type type)
+ int id)
{
size_t size = 0;
struct rb_node *n;
@@ -1186,7 +1186,7 @@ static size_t ion_debug_heap_total(struct
ion_client *client,
struct ion_handle *handle = rb_entry(n,
struct ion_handle,
node);
- if (handle->buffer->heap->type == type)
+ if (handle->buffer->heap->id == id)
size += handle->buffer->size;
}
mutex_unlock(&client->lock);
@@ -1207,7 +1207,7 @@ static int ion_debug_heap_show(struct seq_file
*s, void *unused)
for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
struct ion_client *client = rb_entry(n, struct ion_client,
node);
- size_t size = ion_debug_heap_total(client, heap->type);
+ size_t size = ion_debug_heap_total(client, heap->id);
if (!size)
continue;
if (client->task) {
@@ -1228,7 +1228,7 @@ static int ion_debug_heap_show(struct seq_file
*s, void *unused)
for (n = rb_first(&dev->buffers); n; n = rb_next(n)) {
struct ion_buffer *buffer = rb_entry(n, struct ion_buffer,
node);
- if (buffer->heap->type != heap->type)
+ if (buffer->heap->id != heap->id)
continue;
total_size += buffer->size;
if (!buffer->handle_count) {
diff --git a/include/linux/ion.h b/include/linux/ion.h
index a7d399c..d8168fb 100644
--- a/include/linux/ion.h
+++ b/include/linux/ion.h
@@ -135,7 +135,7 @@ void ion_client_destroy(struct ion_client *client);
* @len: size of the allocation
* @align: requested allocation alignment, lots of hardware blocks have
* alignment requirements of some kind
- * @heap_mask: mask of heaps to allocate from, if multiple bits are set
+ * @heap_mask: mask of heap ids to allocate from, if multiple bits are set
* heaps will be tried in order from lowest to highest order bit
* @flags: heap flags, the low 16 bits are consumed by ion, the high 16
* bits are passed on to the respective heap and can be heap
@@ -236,7 +236,7 @@ struct ion_handle *ion_import_dma_buf(struct
ion_client *client, int fd);
* struct ion_allocation_data - metadata passed from userspace for allocations
* @len: size of the allocation
* @align: required alignment of the allocation
- * @heap_mask: mask of heaps to allocate from
+ * @heap_mask: mask of heap ids to allocate from
* @flags: flags passed to heap
* @handle: pointer that will be populated with a cookie to use to refer
* to this allocation
--
1.7.0.4
The heap id is compared against the heap mask in ion_alloc which is
incorrect, the heap type should be used.
Signed-off-by: Johan Mossberg <johan.mossberg(a)stericsson.com>
---
drivers/gpu/ion/ion.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/ion/ion.c b/drivers/gpu/ion/ion.c
index fc152b9..c811380f 100644
--- a/drivers/gpu/ion/ion.c
+++ b/drivers/gpu/ion/ion.c
@@ -414,7 +414,7 @@ struct ion_handle *ion_alloc(struct ion_client *client, size_t len,
if (!((1 << heap->type) & client->heap_mask))
continue;
/* if the caller didn't specify this heap type */
- if (!((1 << heap->id) & heap_mask))
+ if (!((1 << heap->type) & heap_mask))
continue;
buffer = ion_buffer_create(heap, dev, len, align, flags);
if (!IS_ERR_OR_NULL(buffer))
--
1.8.0
The heap mask used during client create was a bitmask of heap types
and the heap mask used during buffer allocation was a bitmask of
heap ids. Now, both of them assume a bitmask of heap ids.
The debugfs show functions for client and heap were showing info
for the heap type instead of showing info of the individual heap.
Change-Id: I9c2c142a63865f3d4250cd681dc5cde9638ca09f
Signed-off-by: Nishanth Peethambaran <nishanth(a)broadcom.com>
---
drivers/gpu/ion/ion.c | 26 +++++++++++++-------------
include/linux/ion.h | 8 ++++----
2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/gpu/ion/ion.c b/drivers/gpu/ion/ion.c
index 6aa817a..b808972 100644
--- a/drivers/gpu/ion/ion.c
+++ b/drivers/gpu/ion/ion.c
@@ -62,7 +62,7 @@ struct ion_device {
* @dev: backpointer to ion device
* @handles: an rb tree of all the handles in this client
* @lock: lock protecting the tree of handles
- * @heap_mask: mask of all supported heaps
+ * @heap_mask: mask of all supported heap ids
* @name: used for debugging
* @task: used for debugging
*
@@ -398,7 +398,7 @@ struct ion_handle *ion_alloc(struct ion_client
*client, size_t len,
align, heap_mask, flags);
/*
* traverse the list of heaps available in this system in priority
- * order. If the heap type is supported by the client, and matches the
+ * order. If the heap id is supported by the client, and matches the
* request of the caller allocate from it. Repeat until allocate has
* succeeded or all heaps have been tried
*/
@@ -410,10 +410,10 @@ struct ion_handle *ion_alloc(struct ion_client
*client, size_t len,
down_read(&dev->lock);
for (n = rb_first(&dev->heaps); n != NULL; n = rb_next(n)) {
struct ion_heap *heap = rb_entry(n, struct ion_heap, node);
- /* if the client doesn't support this heap type */
- if (!((1 << heap->type) & client->heap_mask))
+ /* if the client doesn't support this heap id */
+ if (!((1 << heap->id) & client->heap_mask))
continue;
- /* if the caller didn't specify this heap type */
+ /* if the caller didn't specify this heap id */
if (!((1 << heap->id) & heap_mask))
continue;
buffer = ion_buffer_create(heap, dev, len, align, flags);
@@ -597,11 +597,11 @@ static int ion_debug_client_show(struct seq_file
*s, void *unused)
for (n = rb_first(&client->handles); n; n = rb_next(n)) {
struct ion_handle *handle = rb_entry(n, struct ion_handle,
node);
- enum ion_heap_type type = handle->buffer->heap->type;
+ int id = handle->buffer->heap->id;
- if (!names[type])
- names[type] = handle->buffer->heap->name;
- sizes[type] += handle->buffer->size;
+ if (!names[id])
+ names[id] = handle->buffer->heap->name;
+ sizes[id] += handle->buffer->size;
}
mutex_unlock(&client->lock);
@@ -1176,7 +1176,7 @@ static const struct file_operations ion_fops = {
};
static size_t ion_debug_heap_total(struct ion_client *client,
- enum ion_heap_type type)
+ int id)
{
size_t size = 0;
struct rb_node *n;
@@ -1186,7 +1186,7 @@ static size_t ion_debug_heap_total(struct
ion_client *client,
struct ion_handle *handle = rb_entry(n,
struct ion_handle,
node);
- if (handle->buffer->heap->type == type)
+ if (handle->buffer->heap->id == id)
size += handle->buffer->size;
}
mutex_unlock(&client->lock);
@@ -1207,7 +1207,7 @@ static int ion_debug_heap_show(struct seq_file
*s, void *unused)
for (n = rb_first(&dev->clients); n; n = rb_next(n)) {
struct ion_client *client = rb_entry(n, struct ion_client,
node);
- size_t size = ion_debug_heap_total(client, heap->type);
+ size_t size = ion_debug_heap_total(client, heap->id);
if (!size)
continue;
if (client->task) {
@@ -1228,7 +1228,7 @@ static int ion_debug_heap_show(struct seq_file
*s, void *unused)
for (n = rb_first(&dev->buffers); n; n = rb_next(n)) {
struct ion_buffer *buffer = rb_entry(n, struct ion_buffer,
node);
- if (buffer->heap->type != heap->type)
+ if (buffer->heap->id != heap->id)
continue;
total_size += buffer->size;
if (!buffer->handle_count) {
diff --git a/include/linux/ion.h b/include/linux/ion.h
index a7d399c..f0399ae 100644
--- a/include/linux/ion.h
+++ b/include/linux/ion.h
@@ -72,7 +72,7 @@ struct ion_buffer;
/**
* struct ion_platform_heap - defines a heap in the given platform
* @type: type of the heap from ion_heap_type enum
- * @id: unique identifier for heap. When allocating (lower numbers
+ * @id: unique identifier for heap. When allocating (lower numbers
* will be allocated from first)
* @name: used for debug purposes
* @base: base address of heap in physical memory if applicable
@@ -114,7 +114,7 @@ void ion_reserve(struct ion_platform_data *data);
/**
* ion_client_create() - allocate a client and returns it
* @dev: the global ion device
- * @heap_mask: mask of heaps this client can allocate from
+ * @heap_mask: mask of heap ids this client can allocate from
* @name: used for debugging
*/
struct ion_client *ion_client_create(struct ion_device *dev,
@@ -135,7 +135,7 @@ void ion_client_destroy(struct ion_client *client);
* @len: size of the allocation
* @align: requested allocation alignment, lots of hardware blocks have
* alignment requirements of some kind
- * @heap_mask: mask of heaps to allocate from, if multiple bits are set
+ * @heap_mask: mask of heap ids to allocate from, if multiple bits are set
* heaps will be tried in order from lowest to highest order bit
* @flags: heap flags, the low 16 bits are consumed by ion, the high 16
* bits are passed on to the respective heap and can be heap
@@ -236,7 +236,7 @@ struct ion_handle *ion_import_dma_buf(struct
ion_client *client, int fd);
* struct ion_allocation_data - metadata passed from userspace for allocations
* @len: size of the allocation
* @align: required alignment of the allocation
- * @heap_mask: mask of heaps to allocate from
+ * @heap_mask: mask of heap ids to allocate from
* @flags: flags passed to heap
* @handle: pointer that will be populated with a cookie to use to refer
* to this allocation
--
1.7.0.4