From: Matthew Wilcox mawilcox@microsoft.com
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org --- mm/slab.c | 6 ++++-- mm/slob.c | 4 +++- mm/slub.c | 6 ++++-- 3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c index 38d3f4fd17d7..8b2cb7db85db 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3313,8 +3313,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, local_irq_restore(save_flags); ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
- if (unlikely(flags & __GFP_ZERO) && ptr) - memset(ptr, 0, cachep->object_size); + if (unlikely(flags & __GFP_ZERO) && ptr) { + if (!WARN_ON_ONCE(cachep->ctor)) + memset(ptr, 0, cachep->object_size); + }
slab_post_alloc_hook(cachep, flags, 1, &ptr); return ptr; diff --git a/mm/slob.c b/mm/slob.c index 1a46181b675c..958173fd7c24 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -556,8 +556,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node) flags, node); }
- if (b && c->ctor) + if (b && c->ctor) { + WARN_ON_ONCE(flags & __GFP_ZERO); c->ctor(b); + }
kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags); return b; diff --git a/mm/slub.c b/mm/slub.c index 9e1100f9298f..0f55f0a0dcaa 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2714,8 +2714,10 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, stat(s, ALLOC_FASTPATH); }
- if (unlikely(gfpflags & __GFP_ZERO) && object) - memset(object, 0, s->object_size); + if (unlikely(gfpflags & __GFP_ZERO) && object) { + if (!WARN_ON_ONCE(s->ctor)) + memset(object, 0, s->object_size); + }
slab_post_alloc_hook(s, gfpflags, 1, &object);
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations") Reported-by: Minchan Kim minchan@kernel.org Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org --- mm/filemap.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index c2147682f4c3..1a4bfc5ed3dc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) VM_BUG_ON_PAGE(!PageLocked(new), new); VM_BUG_ON_PAGE(new->mapping, new);
- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM); + error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); if (!error) { struct address_space *mapping = old->mapping; void (*freepage)(struct page *); @@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page, return error; }
- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM); + error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); if (error) { if (!huge) mem_cgroup_cancel_charge(page, memcg, false); @@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (fgp_flags & FGP_ACCESSED) __SetPageReferenced(page);
- err = add_to_page_cache_lru(page, mapping, offset, - gfp_mask & GFP_RECLAIM_MASK); + err = add_to_page_cache_lru(page, mapping, offset, gfp_mask); if (unlikely(err)) { put_page(page); page = NULL; @@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) if (!page) return -ENOMEM;
- ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL); + ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask); if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST)
On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Could you please mention the nullptr crash here, maybe even in the patch subject? That makes it much easier to find this patch when you run into that bug or when evaluating backport candidates.
Other than that,
Fixes: 19f99cee206c ("f2fs: add core inode operations") Reported-by: Minchan Kim minchan@kernel.org Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
Acked-by: Johannes Weiner hannes@cmpxchg.org
On Tue 10-04-18 05:53:51, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations") Reported-by: Minchan Kim minchan@kernel.org Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
I would push this into __radix_tree_preload... Anyway Acked-by: Michal Hocko mhocko@suse.com
mm/filemap.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index c2147682f4c3..1a4bfc5ed3dc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) VM_BUG_ON_PAGE(!PageLocked(new), new); VM_BUG_ON_PAGE(new->mapping, new);
- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
- error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); if (!error) { struct address_space *mapping = old->mapping; void (*freepage)(struct page *);
@@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page, return error; }
- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
- error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); if (error) { if (!huge) mem_cgroup_cancel_charge(page, memcg, false);
@@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (fgp_flags & FGP_ACCESSED) __SetPageReferenced(page);
err = add_to_page_cache_lru(page, mapping, offset,
gfp_mask & GFP_RECLAIM_MASK);
if (unlikely(err)) { put_page(page); page = NULL;err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
@@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) if (!page) return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST)ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
-- 2.16.3
On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations")
Why this patch fix 19f99cee206c instead of 449dd6984d0e? F2FS doesn't have any problem before introducing 449dd6984d0e?
Reported-by: Minchan Kim minchan@kernel.org Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
mm/filemap.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index c2147682f4c3..1a4bfc5ed3dc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) VM_BUG_ON_PAGE(!PageLocked(new), new); VM_BUG_ON_PAGE(new->mapping, new);
- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
- error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); if (!error) { struct address_space *mapping = old->mapping; void (*freepage)(struct page *);
@@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page, return error; }
- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
- error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); if (error) { if (!huge) mem_cgroup_cancel_charge(page, memcg, false);
@@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (fgp_flags & FGP_ACCESSED) __SetPageReferenced(page);
err = add_to_page_cache_lru(page, mapping, offset,
gfp_mask & GFP_RECLAIM_MASK);
if (unlikely(err)) { put_page(page); page = NULL;err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
@@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) if (!page) return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST)ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
-- 2.16.3
On Tue, Apr 10, 2018 at 10:45:45PM +0900, Minchan Kim wrote:
On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations")
Why this patch fix 19f99cee206c instead of 449dd6984d0e? F2FS doesn't have any problem before introducing 449dd6984d0e?
Well, there's the problem. This bug is the combination of three different things:
1. The working set code relying on list_empty. 2. The page cache not filtering out the bad flags. 3. F2FS specifying a flag nobody had ever specified before.
So what single patch does this patch fix? I don't think it really matters.
On 04/10, Matthew Wilcox wrote:
On Tue, Apr 10, 2018 at 10:45:45PM +0900, Minchan Kim wrote:
On Tue, Apr 10, 2018 at 05:53:51AM -0700, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations")
Why this patch fix 19f99cee206c instead of 449dd6984d0e? F2FS doesn't have any problem before introducing 449dd6984d0e?
Well, there's the problem. This bug is the combination of three different things:
- The working set code relying on list_empty.
- The page cache not filtering out the bad flags.
- F2FS specifying a flag nobody had ever specified before.
So what single patch does this patch fix? I don't think it really matters.
Hope there'd be someone who does care about patch description though, IMHO, this fixes the MM regression introduced by: 449dd6984d0e ("mm: keep page cache radix tree nodes in check") merged in v3.15, 2014.
19f99cee206c ("f2fs: add core inode operations) merged in v3.8, 2012, just revealed this out. In fact, I've never hit this bug in old kernels.
From the user viewpoint, may I suggest to describe what kind of symptom we're
able to see due to this bug?
Something like:
[ 7858.792946] [<ffffff80086f4de0>] __list_del_entry+0x30/0xd0 [ 7858.792951] [<ffffff8008362018>] list_lru_del+0xac/0x1ac [ 7858.792957] [<ffffff800830f04c>] page_cache_tree_insert+0xd8/0x110 [ 7858.792962] [<ffffff8008310188>] __add_to_page_cache_locked+0xf8/0x4e0 [ 7858.792967] [<ffffff800830ff34>] add_to_page_cache_lru+0x50/0x1ac [ 7858.792972] [<ffffff800830fdd0>] pagecache_get_page+0x468/0x57c [ 7858.792979] [<ffffff80085d081c>] __get_node_page+0x84/0x764 [ 7858.792986] [<ffffff800859cd94>] f2fs_iget+0x264/0xdc8 [ 7858.792991] [<ffffff800859ee00>] f2fs_lookup+0x3b4/0x660 [ 7858.792998] [<ffffff80083d2540>] lookup_slow+0x1e4/0x348 [ 7858.793003] [<ffffff80083d0eb8>] walk_component+0x21c/0x320 [ 7858.793008] [<ffffff80083d0010>] path_lookupat+0x90/0x1bc [ 7858.793013] [<ffffff80083cfe6c>] filename_lookup+0x8c/0x1a0 [ 7858.793018] [<ffffff80083c52d0>] vfs_fstatat+0x84/0x10c [ 7858.793023] [<ffffff80083c5b00>] SyS_newfstatat+0x28/0x64
Thanks,
On Tue 10-04-18 05:53:51, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
The page cache has used the mapping's GFP flags for allocating radix tree nodes for a long time. It took care to always mask off the __GFP_HIGHMEM flag, and masked off other flags in other paths, but the __GFP_ZERO flag was still able to sneak through. The __GFP_DMA and __GFP_DMA32 flags would also have been able to sneak through if they were ever used. Fix them all by using GFP_RECLAIM_MASK at the innermost location, and remove it from earlier in the callchain.
Fixes: 19f99cee206c ("f2fs: add core inode operations") Reported-by: Minchan Kim minchan@kernel.org Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
Looks good to me. You can add:
Reviewed-by: Jan Kara jack@suse.cz
Honza
mm/filemap.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c index c2147682f4c3..1a4bfc5ed3dc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -785,7 +785,7 @@ int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask) VM_BUG_ON_PAGE(!PageLocked(new), new); VM_BUG_ON_PAGE(new->mapping, new);
- error = radix_tree_preload(gfp_mask & ~__GFP_HIGHMEM);
- error = radix_tree_preload(gfp_mask & GFP_RECLAIM_MASK); if (!error) { struct address_space *mapping = old->mapping; void (*freepage)(struct page *);
@@ -841,7 +841,7 @@ static int __add_to_page_cache_locked(struct page *page, return error; }
- error = radix_tree_maybe_preload(gfp_mask & ~__GFP_HIGHMEM);
- error = radix_tree_maybe_preload(gfp_mask & GFP_RECLAIM_MASK); if (error) { if (!huge) mem_cgroup_cancel_charge(page, memcg, false);
@@ -1574,8 +1574,7 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, if (fgp_flags & FGP_ACCESSED) __SetPageReferenced(page);
err = add_to_page_cache_lru(page, mapping, offset,
gfp_mask & GFP_RECLAIM_MASK);
if (unlikely(err)) { put_page(page); page = NULL;err = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
@@ -2378,7 +2377,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask) if (!page) return -ENOMEM;
ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask & GFP_KERNEL);
if (ret == 0) ret = mapping->a_ops->readpage(file, page); else if (ret == -EEXIST)ret = add_to_page_cache_lru(page, mapping, offset, gfp_mask);
-- 2.16.3
On Tue, Apr 10, 2018 at 05:53:50AM -0700, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
Acked-by: Johannes Weiner hannes@cmpxchg.org
On Tue 10-04-18 05:53:50, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
mm/slab.c | 6 ++++-- mm/slob.c | 4 +++- mm/slub.c | 6 ++++-- 3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c index 38d3f4fd17d7..8b2cb7db85db 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3313,8 +3313,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, local_irq_restore(save_flags); ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
- if (unlikely(flags & __GFP_ZERO) && ptr)
memset(ptr, 0, cachep->object_size);
- if (unlikely(flags & __GFP_ZERO) && ptr) {
if (!WARN_ON_ONCE(cachep->ctor))
memset(ptr, 0, cachep->object_size);
- }
slab_post_alloc_hook(cachep, flags, 1, &ptr); return ptr;
Why don't we need to cover this in slab_alloc and kmem_cache_alloc_bulk as well?
Other than that this patch makes sense to me.
On 04/10/2018 02:53 PM, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
It doesn't fix any known problem, does it? Then the stable tag seems too much IMHO. Especially for fixing a 2007 commit.
Otherwise Acked-by: Vlastimil Babka vbabka@suse.cz
mm/slab.c | 6 ++++-- mm/slob.c | 4 +++- mm/slub.c | 6 ++++-- 3 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/mm/slab.c b/mm/slab.c index 38d3f4fd17d7..8b2cb7db85db 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3313,8 +3313,10 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid, local_irq_restore(save_flags); ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
- if (unlikely(flags & __GFP_ZERO) && ptr)
memset(ptr, 0, cachep->object_size);
- if (unlikely(flags & __GFP_ZERO) && ptr) {
if (!WARN_ON_ONCE(cachep->ctor))
memset(ptr, 0, cachep->object_size);
- }
slab_post_alloc_hook(cachep, flags, 1, &ptr); return ptr; diff --git a/mm/slob.c b/mm/slob.c index 1a46181b675c..958173fd7c24 100644 --- a/mm/slob.c +++ b/mm/slob.c @@ -556,8 +556,10 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t flags, int node) flags, node); }
- if (b && c->ctor)
- if (b && c->ctor) {
c->ctor(b);WARN_ON_ONCE(flags & __GFP_ZERO);
- }
kmemleak_alloc_recursive(b, c->size, 1, c->flags, flags); return b; diff --git a/mm/slub.c b/mm/slub.c index 9e1100f9298f..0f55f0a0dcaa 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2714,8 +2714,10 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, stat(s, ALLOC_FASTPATH); }
- if (unlikely(gfpflags & __GFP_ZERO) && object)
memset(object, 0, s->object_size);
- if (unlikely(gfpflags & __GFP_ZERO) && object) {
if (!WARN_ON_ONCE(s->ctor))
memset(object, 0, s->object_size);
- }
slab_post_alloc_hook(s, gfpflags, 1, &object);
On 04/10/2018 05:53 AM, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
Since there are probably no bug to fix, what about adding the extra check only for some DEBUG option ?
How many caches are still using ctor these days ?
On Tue, Apr 10, 2018 at 06:53:04AM -0700, Eric Dumazet wrote:
On 04/10/2018 05:53 AM, Matthew Wilcox wrote:
From: Matthew Wilcox mawilcox@microsoft.com
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Fixes: d07dbea46405 ("Slab allocators: support __GFP_ZERO in all allocators") Signed-off-by: Matthew Wilcox mawilcox@microsoft.com Cc: stable@vger.kernel.org
Since there are probably no bug to fix, what about adding the extra check only for some DEBUG option ?
How many caches are still using ctor these days ?
That's a really good question, and strangely hard to find out. I settled on "git grep -A4 kmem_cache_alloc" and then searching the 'less' output with '[^L]);'.
-- arch/powerpc/kvm/book3s_64_mmu_radix.c: kvm_pte_cache = kmem_cache_create("kvm-pte", size, size, 0, pte_ctor); -- arch/powerpc/mm/init-common.c: new = kmem_cache_create(name, table_size, align, 0, ctor); -- arch/powerpc/platforms/cell/spufs/inode.c: spufs_inode_cache = kmem_cache_create("spufs_inode_cache", arch/powerpc/platforms/cell/spufs/inode.c- sizeof(struct spufs_inode_info), 0, arch/powerpc/platforms/cell/spufs/inode.c- SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT, spufs_init_once); -- arch/sh/mm/pgtable.c: pgd_cachep = kmem_cache_create("pgd_cache", arch/sh/mm/pgtable.c- PTRS_PER_PGD * (1<<PTE_MAGNITUDE), arch/sh/mm/pgtable.c- PAGE_SIZE, SLAB_PANIC, pgd_ctor); -- arch/sparc/mm/tsb.c: pgtable_cache = kmem_cache_create("pgtable_cache", arch/sparc/mm/tsb.c- PAGE_SIZE, PAGE_SIZE, arch/sparc/mm/tsb.c- 0, arch/sparc/mm/tsb.c- _clear_page); -- drivers/dax/super.c: dax_cache = kmem_cache_create("dax_cache", sizeof(struct dax_device), 0, drivers/dax/super.c- (SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT | drivers/dax/super.c- SLAB_MEM_SPREAD|SLAB_ACCOUNT), drivers/dax/super.c- init_once); -- drivers/staging/ncpfs/inode.c: ncp_inode_cachep = kmem_cache_create("ncp_inode_ cache", drivers/staging/ncpfs/inode.c- sizeof(stru ct ncp_inode_info), drivers/staging/ncpfs/inode.c- 0, (SLAB_RE CLAIM_ACCOUNT| drivers/staging/ncpfs/inode.c- SLAB_MEM _SPREAD|SLAB_ACCOUNT), drivers/staging/ncpfs/inode.c- init_once); -- drivers/usb/mon/mon_text.c: rp->e_slab = kmem_cache_create(rp->slab_name, drivers/usb/mon/mon_text.c- sizeof(struct mon_event_text), sizeof(long), 0, drivers/usb/mon/mon_text.c- mon_text_ctor); -- fs/9p/v9fs.c: v9fs_inode_cache = kmem_cache_create("v9fs_inode_cache", fs/9p/v9fs.c- sizeof(struct v9fs_inode), fs/9p/v9fs.c- 0, (SLAB_RECLAIM_ACCOUNT| fs/9p/v9fs.c- SLAB_MEM_SPREAD|SLAB_ACCOUNT), fs/9p/v9fs.c- v9fs_inode_init_once); -- fs/adfs/super.c: adfs_inode_cachep = kmem_cache_create("adfs_inode_cache", fs/adfs/super.c- sizeof(struct adfs_inode_info), fs/adfs/super.c- 0, (SLAB_RECLAIM_ACCOUNT| fs/adfs/super.c- SLAB_MEM_SPREAD|SLAB_ACCOUNT), fs/adfs/super.c- init_once); ... snip a huge number of filesystems ... -- ipc/mqueue.c: mqueue_inode_cachep = kmem_cache_create("mqueue_inode_cache", ipc/mqueue.c- sizeof(struct mqueue_inode_info), 0, ipc/mqueue.c- SLAB_HWCACHE_ALIGN|SLAB_ACCOUNT, init_once); -- kernel/fork.c: sighand_cachep = kmem_cache_create("sighand_cache", kernel/fork.c- sizeof(struct sighand_struct), 0, kernel/fork.c- SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_TYPESAFE_BY_R CU| kernel/fork.c- SLAB_ACCOUNT, sighand_ctor); -- lib/radix-tree.c: radix_tree_node_cachep = kmem_cache_create("radix_tree_n ode", lib/radix-tree.c- sizeof(struct radix_tree_node), 0, lib/radix-tree.c- SLAB_PANIC | SLAB_RECLAIM_ACCOUNT, lib/radix-tree.c- radix_tree_node_ctor); -- mm/rmap.c: anon_vma_cachep = kmem_cache_create("anon_vma", sizeof(struct an on_vma), mm/rmap.c- 0, SLAB_TYPESAFE_BY_RCU|SLAB_PANIC|SLAB_ACCOUNT, mm/rmap.c- anon_vma_ctor); -- mm/shmem.c: shmem_inode_cachep = kmem_cache_create("shmem_inode_cache", mm/shmem.c- sizeof(struct shmem_inode_info), mm/shmem.c- 0, SLAB_PANIC|SLAB_ACCOUNT, shmem_init_inode); -- net/sunrpc/rpc_pipe.c: rpc_inode_cachep = kmem_cache_create("rpc_inode_cache", net/sunrpc/rpc_pipe.c- sizeof(struct rpc_inode), net/sunrpc/rpc_pipe.c- 0, (SLAB_HWCACHE_ALIGN|SLAB_RECL AIM_ACCOUNT| net/sunrpc/rpc_pipe.c- SLAB_MEM_SPREAD| SLAB_ACCOUNT), net/sunrpc/rpc_pipe.c- init_once); -- security/integrity/iint.c: kmem_cache_create("iint_cache", sizeof(struc t integrity_iint_cache), security/integrity/iint.c- 0, SLAB_PANIC, init_once);
So aside from the filesystems, about fourteen places use it in the kernel.
If we want to get rid of the concept of constructors, it's doable, but somebody needs to do the work to show what the effects will be.
For example, I took a quick look at the sighand_struct in kernel/fork.c. That initialises the spinlock and waitqueue head which are at the end of sighand_struct. The caller who allocates sighand_struct touches the head of the struct. So if we removed the ctor, we'd touch two cachelines on allocation instead of one ... but we could rearrange the sighand_struct to put all the initialised bits in the first cacheline (and we probably should).
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
If we want to get rid of the concept of constructors, it's doable, but somebody needs to do the work to show what the effects will be.
How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches? Those must have a defined state of the objects at all times and a constructor is required for that. And their use of RCU is required for numerous lockless lookup algorithms in the kernhel.
On Tue, Apr 10, 2018 at 12:30:23PM -0500, Christopher Lameter wrote:
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
If we want to get rid of the concept of constructors, it's doable, but somebody needs to do the work to show what the effects will be.
How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches? Those must have a defined state of the objects at all times and a constructor is required for that. And their use of RCU is required for numerous lockless lookup algorithms in the kernhel.
Not at all times. Only once they've been used. Re-constructing them once they've been used might break the rcu typesafety, I suppose ... would need to examine the callers.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches? Those must have a defined state of the objects at all times and a constructor is required for that. And their use of RCU is required for numerous lockless lookup algorithms in the kernhel.
Not at all times. Only once they've been used. Re-constructing them once they've been used might break the rcu typesafety, I suppose ... would need to examine the callers.
Objects can be freed and reused and still be accessed from code that thinks the object is the old and not the new object....
On Tue, Apr 10, 2018 at 12:45:56PM -0500, Christopher Lameter wrote:
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
How do you envision dealing with the SLAB_TYPESAFE_BY_RCU slab caches? Those must have a defined state of the objects at all times and a constructor is required for that. And their use of RCU is required for numerous lockless lookup algorithms in the kernhel.
Not at all times. Only once they've been used. Re-constructing them once they've been used might break the rcu typesafety, I suppose ... would need to examine the callers.
Objects can be freed and reused and still be accessed from code that thinks the object is the old and not the new object....
Yes, I know, that's the point of RCU typesafety. My point is that an object *which has never been used* can't be accessed. So you don't *need* a constructor.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
Objects can be freed and reused and still be accessed from code that thinks the object is the old and not the new object....
Yes, I know, that's the point of RCU typesafety. My point is that an object *which has never been used* can't be accessed. So you don't *need* a constructor.
But the object needs to have the proper contents after it was released and re-allocated. Some objects may rely on contents (like list heads) surviving the realloc process because access must always be possible.
validate_slab() checks on proper metadata content in a slab although it does not access the payload. So that may work you separate the payload init from the metadata init.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Can we move this check out of the critical paths and check for a ctor and GFP_ZERO when calling the page allocator? F.e. in allocate_slab()?
On Tue, 10 Apr 2018, Christopher Lameter wrote:
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Can we move this check out of the critical paths and check for a ctor and GFP_ZERO when calling the page allocator? F.e. in allocate_slab()?
The ctor's are run at that point from setup_object() so it makes sense.
On Tue, Apr 10, 2018 at 09:21:20AM -0500, Christopher Lameter wrote:
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
__GFP_ZERO requests that the object be initialised to all-zeroes, while the purpose of a constructor is to initialise an object to a particular pattern. We cannot do both. Add a warning to catch any users who mistakenly pass a __GFP_ZERO flag when allocating a slab with a constructor.
Can we move this check out of the critical paths and check for a ctor and GFP_ZERO when calling the page allocator? F.e. in allocate_slab()?
Are you willing to have this kind of bug go uncaught for a while? In this specific case, __GFP_ZERO was only being passed on a few of the calls to kmem_cache_alloc. So we'd happily trash the constructed object any time we didn't allocate a page.
I appreciate it's a tradeoff, and we don't want to clutter the critical path unnecessarily.
On Tue, 10 Apr 2018, Matthew Wilcox wrote:
Are you willing to have this kind of bug go uncaught for a while?
There will be frequent allocations and this will show up at some point.
Also you could put this into the debug only portions somehwere so we always catch it when debugging is on, '
linux-stable-mirror@lists.linaro.org