[PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA

List overview All Threads
Download

newer

older

[PATCH] fuse: prevent folio...

[PATCH v3 2/5] misc:...

yangge1116＠126.com

24 Jan 2025 24 Jan '25

11:21 a.m.

From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") Signed-off-by: yangge yangge1116@126.com Cc: stable@vger.kernel.org --- include/linux/cma.h | 2 ++ mm/cma.c | 22 ++++++++++++++++++++-- mm/cma.h | 1 + 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h index d15b64f..2384624 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

extern void cma_reserve_pages_on_error(struct cma *cma);

+extern bool cma_set_concurrency(struct cma *cma, bool concurrency); + #ifdef CONFIG_CMA struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp); bool cma_free_folio(struct cma *cma, const struct folio *folio); diff --git a/mm/cma.c b/mm/cma.c index de5bc0c..49a7186 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count, spin_unlock_irq(&cma->lock);

pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit); - mutex_lock(&cma_mutex); + + /* + * If the user sets the concurr_alloc of CMA to true, concurrent + * memory allocation is allowed. If the user sets it to false or + * does not set it, concurrent memory allocation is not allowed. + */ + if (!cma->concurr_alloc) + mutex_lock(&cma_mutex); ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp); - mutex_unlock(&cma_mutex); + if (!cma->concurr_alloc) + mutex_unlock(&cma_mutex); if (ret == 0) { page = pfn_to_page(pfn); break; @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

return 0; } + +bool cma_set_concurrency(struct cma *cma, bool concurrency) +{ + if (!cma) + return false; + + cma->concurr_alloc = concurrency; + + return true; +} diff --git a/mm/cma.h b/mm/cma.h index 8485ef8..30f489d 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -16,6 +16,7 @@ struct cma { unsigned long *bitmap; unsigned int order_per_bit; /* Order of pages represented by one bit */ spinlock_t lock; + bool concurr_alloc; #ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; spinlock_t mem_head_lock;

-- 2.7.4

Show replies by date

Andrew Morton

27 Jan 27 Jan

11:04 p.m.

On Fri, 24 Jan 2025 19:21:27 +0800 yangge1116@126.com wrote:

...

From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

The term "users" tends to refer to userspace code. Here I'm thinking you mean in-kernel code, so a better term to use is "callers".

This new interface has no callers. We prefer not to merge unused code! Please send along the patch which calls cma_set_concurrency() so we can better understand this proposal and so that the new code is testable. In fact the patch has cc:stable, which makes things stranger. Why should the -stable maintainers merge a patch which doesn't do anything?

And please quantify the benefit. "negatively impact" is too vague. How much benefit can we expect our users to see from this? Some runtime testing results would be good.

And please describe in more detail why this particular caller doesn't require concurrency protection. And help other developers understand when it is safe for them to use concurr_alloc==false.

Ge Yang

8 Feb 8 Feb

8:19 a.m.

在 2025/1/28 7:04, Andrew Morton 写道:

...

On Fri, 24 Jan 2025 19:21:27 +0800 yangge1116@126.com wrote:

...
From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

The term "users" tends to refer to userspace code. Here I'm thinking you mean in-kernel code, so a better term to use is "callers".

Ok, thank you. I will change it in the next version.

...

This new interface has no callers. We prefer not to merge unused code! Please send along the patch which calls cma_set_concurrency() so we can better understand this proposal and so that the new code is testable.

Ok, thank you. I will add the caller in the next version.

In fact the patch has cc:stable, which makes things

...

stranger. Why should the -stable maintainers merge a patch which doesn't do anything?

And please quantify the benefit. "negatively impact" is too vague. How much benefit can we expect our users to see from this? Some runtime testing results would be good.

And please describe in more detail why this particular caller doesn't require concurrency protection. And help other developers understand when it is safe for them to use concurr_alloc==false.

Ok, thank you.

Christoph Hellwig

28 Jan 28 Jan

6:11 a.m.

On Fri, Jan 24, 2025 at 07:21:27PM +0800, yangge1116@126.com wrote:

...

From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") Signed-off-by: yangge yangge1116@126.com Cc: stable@vger.kernel.org

Umm, you're adding new unused functions while not even reporting what the problem is. This looks sketchy as hell and surely is not a stable candidate.

Barry Song

9:58 a.m.

On Sat, Jan 25, 2025 at 12:21 AM yangge1116@126.com wrote:

...

From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

Do we have some data?

...

To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Who is the intended user of cma_set_concurrency? I also feel it is somewhat unsafe since cma->concurr_alloc is not protected by any locks.

Will a user setting cma->concurr_alloc = 1 encounter the original issue that commit 60a60e32cf91 was attempting to fix?

...

Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") Signed-off-by: yangge yangge1116@126.com Cc: stable@vger.kernel.org

include/linux/cma.h | 2 ++ mm/cma.c | 22 ++++++++++++++++++++-- mm/cma.h | 1 + 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h index d15b64f..2384624 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

extern void cma_reserve_pages_on_error(struct cma *cma);

+extern bool cma_set_concurrency(struct cma *cma, bool concurrency);

#ifdef CONFIG_CMA struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp); bool cma_free_folio(struct cma *cma, const struct folio *folio); diff --git a/mm/cma.c b/mm/cma.c index de5bc0c..49a7186 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count, spin_unlock_irq(&cma->lock);
            pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
          mutex_lock(&cma_mutex);
          /*
           * If the user sets the concurr_alloc of CMA to true, concurrent
           * memory allocation is allowed. If the user sets it to false or
           * does not set it, concurrent memory allocation is not allowed.
           */
          if (!cma->concurr_alloc)
                  mutex_lock(&cma_mutex);
          ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
          mutex_unlock(&cma_mutex);
          if (!cma->concurr_alloc)
                  mutex_unlock(&cma_mutex);
          if (ret == 0) {
                  page = pfn_to_page(pfn);
                  break;
@@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
    return 0;
}

+bool cma_set_concurrency(struct cma *cma, bool concurrency) +{
  if (!cma)
          return false;
  cma->concurr_alloc = concurrency;
  return true;
+} diff --git a/mm/cma.h b/mm/cma.h index 8485ef8..30f489d 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -16,6 +16,7 @@ struct cma { unsigned long *bitmap; unsigned int order_per_bit; /* Order of pages represented by one bit */ spinlock_t lock;
  bool concurr_alloc;
#ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; spinlock_t mem_head_lock; -- 2.7.4

Thanks Barry

Ge Yang

8 Feb 8 Feb

8:50 a.m.

在 2025/1/28 17:58, Barry Song 写道:

...

On Sat, Jan 25, 2025 at 12:21 AM yangge1116@126.com wrote:

...
From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

Do we have some data?

Yes, I will add it in the next version, thanks.

...

...
To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Who is the intended user of cma_set_concurrency?

We have some drivers that use cma_set_concurrency(), but they have not yet been merged into the mainline. The cma_alloc_mem() function in the mainline also supports concurrent allocation of CMA memory. By applying this patch, we can also achieve significant performance improvements in certain scenarios. I will provide performance data in the next version. I also feel it is somewhat

...

unsafe since cma->concurr_alloc is not protected by any locks.

Ok, thanks.

...

Will a user setting cma->concurr_alloc = 1 encounter the original issue that commit 60a60e32cf91 was attempting to fix?

Yes, if a user encounters the issue described in commit 60a60e32cf91, they will not be able to set cma->concurr_alloc to 1.

...

...
Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") Signed-off-by: yangge yangge1116@126.com Cc: stable@vger.kernel.org

include/linux/cma.h | 2 ++ mm/cma.c | 22 ++++++++++++++++++++-- mm/cma.h | 1 + 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h index d15b64f..2384624 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

extern void cma_reserve_pages_on_error(struct cma *cma);

+extern bool cma_set_concurrency(struct cma *cma, bool concurrency);

#ifdef CONFIG_CMA struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp); bool cma_free_folio(struct cma *cma, const struct folio *folio);

diff --git a/mm/cma.c b/mm/cma.c index de5bc0c..49a7186 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count, spin_unlock_irq(&cma->lock);
             pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
          mutex_lock(&cma_mutex);
          /*
           * If the user sets the concurr_alloc of CMA to true, concurrent
           * memory allocation is allowed. If the user sets it to false or
           * does not set it, concurrent memory allocation is not allowed.
           */
          if (!cma->concurr_alloc)
                  mutex_lock(&cma_mutex);
           ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
          mutex_unlock(&cma_mutex);
          if (!cma->concurr_alloc)
                  mutex_unlock(&cma_mutex);
           if (ret == 0) {
                   page = pfn_to_page(pfn);
                   break;
@@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
     return 0;
}

+bool cma_set_concurrency(struct cma *cma, bool concurrency) +{
  if (!cma)
          return false;
  cma->concurr_alloc = concurrency;
  return true;
+} diff --git a/mm/cma.h b/mm/cma.h index 8485ef8..30f489d 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -16,6 +16,7 @@ struct cma { unsigned long *bitmap; unsigned int order_per_bit; /* Order of pages represented by one bit */ spinlock_t lock;
  bool concurr_alloc;
#ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; spinlock_t mem_head_lock;
-- 2.7.4
Thanks Barry

Barry Song

9:34 p.m.

On Sat, Feb 8, 2025 at 9:50 PM Ge Yang yangge1116@126.com wrote:

...

在 2025/1/28 17:58, Barry Song 写道:

...
On Sat, Jan 25, 2025 at 12:21 AM yangge1116@126.com wrote:

...
From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

Do we have some data?

Yes, I will add it in the next version, thanks.

...
...
To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Who is the intended user of cma_set_concurrency?

We have some drivers that use cma_set_concurrency(), but they have not yet been merged into the mainline. The cma_alloc_mem() function in the mainline also supports concurrent allocation of CMA memory. By applying this patch, we can also achieve significant performance improvements in certain scenarios. I will provide performance data in the next version. I also feel it is somewhat

...
unsafe since cma->concurr_alloc is not protected by any locks.

Ok, thanks.

...
Will a user setting cma->concurr_alloc = 1 encounter the original issue that commit 60a60e32cf91 was attempting to fix?

Yes, if a user encounters the issue described in commit 60a60e32cf91, they will not be able to set cma->concurr_alloc to 1.

A user who hasn't encountered a problem yet doesn't mean they won't encounter it; it most likely just means the testing time hasn't been long enough.

Is it possible to implement a per-CMA lock or range lock that simultaneously improves performance and prevents the original issue that commit 60a60e32cf91 aimed to fix?

I strongly believe that cma->concurr_alloc is not the right approach. Let's not waste our time on this kind of hack or workaround. Instead, we should find a proper fix that remains transparent to users.

...

...
...
Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") Signed-off-by: yangge yangge1116@126.com Cc: stable@vger.kernel.org

include/linux/cma.h | 2 ++ mm/cma.c | 22 ++++++++++++++++++++-- mm/cma.h | 1 + 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h index d15b64f..2384624 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

extern void cma_reserve_pages_on_error(struct cma *cma);

+extern bool cma_set_concurrency(struct cma *cma, bool concurrency);

#ifdef CONFIG_CMA struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp); bool cma_free_folio(struct cma *cma, const struct folio *folio);

diff --git a/mm/cma.c b/mm/cma.c index de5bc0c..49a7186 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count, spin_unlock_irq(&cma->lock);
             pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
          mutex_lock(&cma_mutex);
          /*
           * If the user sets the concurr_alloc of CMA to true, concurrent
           * memory allocation is allowed. If the user sets it to false or
           * does not set it, concurrent memory allocation is not allowed.
           */
          if (!cma->concurr_alloc)
                  mutex_lock(&cma_mutex);
           ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
          mutex_unlock(&cma_mutex);
          if (!cma->concurr_alloc)
                  mutex_unlock(&cma_mutex);
           if (ret == 0) {
                   page = pfn_to_page(pfn);
                   break;
@@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
     return 0;
}

+bool cma_set_concurrency(struct cma *cma, bool concurrency) +{
  if (!cma)
          return false;
  cma->concurr_alloc = concurrency;
  return true;
+} diff --git a/mm/cma.h b/mm/cma.h index 8485ef8..30f489d 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -16,6 +16,7 @@ struct cma { unsigned long *bitmap; unsigned int order_per_bit; /* Order of pages represented by one bit */ spinlock_t lock;
  bool concurr_alloc;
#ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; spinlock_t mem_head_lock;
-- 2.7.4

Thanks Barry

Ge Yang

9 Feb 9 Feb

10:49 a.m.

在 2025/2/9 5:34, Barry Song 写道:

...

On Sat, Feb 8, 2025 at 9:50 PM Ge Yang yangge1116@126.com wrote:

...
在 2025/1/28 17:58, Barry Song 写道:

...
On Sat, Jan 25, 2025 at 12:21 AM yangge1116@126.com wrote:

...
From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

Do we have some data?

Yes, I will add it in the next version, thanks.

...
...
To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Who is the intended user of cma_set_concurrency?

We have some drivers that use cma_set_concurrency(), but they have not yet been merged into the mainline. The cma_alloc_mem() function in the mainline also supports concurrent allocation of CMA memory. By applying this patch, we can also achieve significant performance improvements in certain scenarios. I will provide performance data in the next version. I also feel it is somewhat

...
unsafe since cma->concurr_alloc is not protected by any locks.

Ok, thanks.

...
Will a user setting cma->concurr_alloc = 1 encounter the original issue that commit 60a60e32cf91 was attempting to fix?

Yes, if a user encounters the issue described in commit 60a60e32cf91, they will not be able to set cma->concurr_alloc to 1.

A user who hasn't encountered a problem yet doesn't mean they won't encounter it; it most likely just means the testing time hasn't been long enough.

Is it possible to implement a per-CMA lock or range lock that simultaneously improves performance and prevents the original issue that commit 60a60e32cf91 aimed to fix?

Using per-CMA locks can improve performance and prevent the original issue. I am currently preparing the patch. Thanks.

...

I strongly believe that cma->concurr_alloc is not the right approach. Let's not waste our time on this kind of hack or workaround. Instead, we should find a proper fix that remains transparent to users.

...
...
...
Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") Signed-off-by: yangge yangge1116@126.com Cc: stable@vger.kernel.org

include/linux/cma.h | 2 ++ mm/cma.c | 22 ++++++++++++++++++++-- mm/cma.h | 1 + 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h index d15b64f..2384624 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)

extern void cma_reserve_pages_on_error(struct cma *cma);

+extern bool cma_set_concurrency(struct cma *cma, bool concurrency);

#ifdef CONFIG_CMA struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp); bool cma_free_folio(struct cma *cma, const struct folio *folio);

diff --git a/mm/cma.c b/mm/cma.c index de5bc0c..49a7186 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count, spin_unlock_irq(&cma->lock);
              pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
          mutex_lock(&cma_mutex);
          /*
           * If the user sets the concurr_alloc of CMA to true, concurrent
           * memory allocation is allowed. If the user sets it to false or
           * does not set it, concurrent memory allocation is not allowed.
           */
          if (!cma->concurr_alloc)
                  mutex_lock(&cma_mutex);
            ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
          mutex_unlock(&cma_mutex);
          if (!cma->concurr_alloc)
                  mutex_unlock(&cma_mutex);
            if (ret == 0) {
                    page = pfn_to_page(pfn);
                    break;
@@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
      return 0;
}

+bool cma_set_concurrency(struct cma *cma, bool concurrency) +{
  if (!cma)
          return false;
  cma->concurr_alloc = concurrency;
  return true;
+} diff --git a/mm/cma.h b/mm/cma.h index 8485ef8..30f489d 100644 --- a/mm/cma.h +++ b/mm/cma.h @@ -16,6 +16,7 @@ struct cma { unsigned long *bitmap; unsigned int order_per_bit; /* Order of pages represented by one bit */ spinlock_t lock;
  bool concurr_alloc;
#ifdef CONFIG_CMA_DEBUGFS struct hlist_head mem_head; spinlock_t mem_head_lock;
-- 2.7.4
Thanks Barry

David Hildenbrand

10 Feb 10 Feb

8:28 a.m.

On 08.02.25 22:34, Barry Song wrote:

...

On Sat, Feb 8, 2025 at 9:50 PM Ge Yang yangge1116@126.com wrote:

...
在 2025/1/28 17:58, Barry Song 写道:

...
On Sat, Jan 25, 2025 at 12:21 AM yangge1116@126.com wrote:

...
From: yangge yangge1116@126.com

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"") simply reverts to the original method of using the cma_mutex to ensure that alloc_contig_range() runs sequentially. This change was made to avoid concurrency allocation failures. However, it can negatively impact performance when concurrent allocation of CMA memory is required.

Do we have some data?

Yes, I will add it in the next version, thanks.

...
...
To address this issue, we could introduce an API for concurrency settings, allowing users to decide whether their CMA can perform concurrent memory allocations or not.

Who is the intended user of cma_set_concurrency?

We have some drivers that use cma_set_concurrency(), but they have not yet been merged into the mainline. The cma_alloc_mem() function in the mainline also supports concurrent allocation of CMA memory. By applying this patch, we can also achieve significant performance improvements in certain scenarios. I will provide performance data in the next version. I also feel it is somewhat

...
unsafe since cma->concurr_alloc is not protected by any locks.

Ok, thanks.

...
Will a user setting cma->concurr_alloc = 1 encounter the original issue that commit 60a60e32cf91 was attempting to fix?

Yes, if a user encounters the issue described in commit 60a60e32cf91, they will not be able to set cma->concurr_alloc to 1.

A user who hasn't encountered a problem yet doesn't mean they won't encounter it; it most likely just means the testing time hasn't been long enough.

Is it possible to implement a per-CMA lock or range lock that simultaneously improves performance and prevents the original issue that commit 60a60e32cf91 aimed to fix?

I strongly believe that cma->concurr_alloc is not the right approach. Let's not waste our time on this kind of hack or workaround. Instead, we should find a proper fix that remains transparent to users.

Fully agreed.

IIUC, the problem is that we find a pageblock is already isolated. It might be sufficient to return -EAGAIN in that case from alloc_contig_range_noprof()->start_isolate_page_range() and retry in CMA. ideally, we'd have a way to wait on some event (e.g., any pageblock transitioning from isolated -> !isolated).

-- Cheers, David / dhildenb

293

days inactive

310

days old

linux-stable-mirror@lists.linaro.org

8 comments

participants

tags (0)

participants (6)

Andrew Morton
Barry Song
Christoph Hellwig
David Hildenbrand
Ge Yang
yangge1116＠126.com