Hi, Daniel & Greg
This patch (979d63d50c0c bpf: prevent out of bounds speculation on
pointer arithmetic) was assigned a CVE (CVE-2019-7308) with a high score:
CVSS v3.0 Severity and Metrics:
Base Score: 9.8 CRITICAL
And this patch is not in stable-4.4, would you please backport this
patch to 4.4?
Thanks,
Jason
The commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded
memory to zones until online") introduced move_pfn_range_to_zone() which
calls memmap_init_zone() during onlining a memory block.
memmap_init_zone() will reset pagetype flags and makes migrate type to
be MOVABLE.
However, in __offline_pages(), it also call undo_isolate_page_range()
after offline_isolated_pages() to do the same thing. Due to
the commit 2ce13640b3f4 ("mm: __first_valid_page skip over offline
pages") changed __first_valid_page() to skip offline pages,
undo_isolate_page_range() here just waste CPU cycles looping around the
offlining PFN range while doing nothing, because __first_valid_page()
will return NULL as offline_isolated_pages() has already marked all
memory sections within the pfn range as offline via
offline_mem_sections().
Also, after calling the "useless" undo_isolate_page_range() here, it
reaches the point of no returning by notifying MEM_OFFLINE. Those pages
will be marked as MIGRATE_MOVABLE again once onlining. The only thing
left to do is to decrease the number of isolated pageblocks zone
counter which would make some paths of the page allocation slower that
the above commit introduced.
Even if alloc_contig_range() can be used to isolate 16GB-hugetlb pages
on ppc64, an "int" should still be enough to represent the number of
pageblocks there. Fix an incorrect comment along the way.
Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
Cc: <stable(a)vger.kernel.org> # v4.13+
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Signed-off-by: Qian Cai <cai(a)lca.pw>
---
v4: Further consolidate comments.
Turn on kernel-doc and add a stable tag per Michal.
v3: Reconstruct the kernel-doc comments.
Use a more meaningful variable name per Oscar.
Update the commit log a bit.
v2: Return the nubmer of isolated pageblocks in start_isolate_page_range() per
Oscar; take the zone lock when undoing zone->nr_isolate_pageblock per
Michal.
include/linux/page-isolation.h | 10 -------
mm/memory_hotplug.c | 17 +++++++++---
mm/page_alloc.c | 2 +-
mm/page_isolation.c | 48 +++++++++++++++++++++-------------
mm/sparse.c | 2 +-
5 files changed, 45 insertions(+), 34 deletions(-)
diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 4eb26d278046..280ae96dc4c3 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -41,16 +41,6 @@ int move_freepages_block(struct zone *zone, struct page *page,
/*
* Changes migrate type in [start_pfn, end_pfn) to be MIGRATE_ISOLATE.
- * If specified range includes migrate types other than MOVABLE or CMA,
- * this will fail with -EBUSY.
- *
- * For isolating all pages in the range finally, the caller have to
- * free all pages in the range. test_page_isolated() can be used for
- * test it.
- *
- * The following flags are allowed (they can be combined in a bit mask)
- * SKIP_HWPOISON - ignore hwpoison pages
- * REPORT_FAILURE - report details about the failure to isolate the range
*/
int
start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d63c5a2959cf..cd1a8c4c6183 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1580,7 +1580,7 @@ static int __ref __offline_pages(unsigned long start_pfn,
{
unsigned long pfn, nr_pages;
long offlined_pages;
- int ret, node;
+ int ret, node, nr_isolate_pageblock;
unsigned long flags;
unsigned long valid_start, valid_end;
struct zone *zone;
@@ -1606,10 +1606,11 @@ static int __ref __offline_pages(unsigned long start_pfn,
ret = start_isolate_page_range(start_pfn, end_pfn,
MIGRATE_MOVABLE,
SKIP_HWPOISON | REPORT_FAILURE);
- if (ret) {
+ if (ret < 0) {
reason = "failure to isolate range";
goto failed_removal;
}
+ nr_isolate_pageblock = ret;
arg.start_pfn = start_pfn;
arg.nr_pages = nr_pages;
@@ -1661,8 +1662,16 @@ static int __ref __offline_pages(unsigned long start_pfn,
/* Ok, all of our target is isolated.
We cannot do rollback at this point. */
offline_isolated_pages(start_pfn, end_pfn);
- /* reset pagetype flags and makes migrate type to be MOVABLE */
- undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
+
+ /*
+ * Onlining will reset pagetype flags and makes migrate type
+ * MOVABLE, so just need to decrease the number of isolated
+ * pageblocks zone counter here.
+ */
+ spin_lock_irqsave(&zone->lock, flags);
+ zone->nr_isolate_pageblock -= nr_isolate_pageblock;
+ spin_unlock_irqrestore(&zone->lock, flags);
+
/* removal success */
adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
zone->present_pages -= offlined_pages;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 03fcf73d47da..d96ca5bc555b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8233,7 +8233,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
ret = start_isolate_page_range(pfn_max_align_down(start),
pfn_max_align_up(end), migratetype, 0);
- if (ret)
+ if (ret < 0)
return ret;
/*
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index e8baab91b1d1..019280712e1b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -161,27 +161,36 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages)
return NULL;
}
-/*
- * start_isolate_page_range() -- make page-allocation-type of range of pages
- * to be MIGRATE_ISOLATE.
- * @start_pfn: The lower PFN of the range to be isolated.
- * @end_pfn: The upper PFN of the range to be isolated.
- * @migratetype: migrate type to set in error recovery.
+/**
+ * start_isolate_page_range() - make page-allocation-type of range of pages to
+ * be MIGRATE_ISOLATE.
+ * @start_pfn: The lower PFN of the range to be isolated.
+ * @end_pfn: The upper PFN of the range to be isolated.
+ * start_pfn/end_pfn must be aligned to pageblock_order.
+ * @migratetype: Migrate type to set in error recovery.
+ * @flags: The following flags are allowed (they can be combined in
+ * a bit mask)
+ * SKIP_HWPOISON - ignore hwpoison pages
+ * REPORT_FAILURE - report details about the failure to
+ * isolate the range
*
* Making page-allocation-type to be MIGRATE_ISOLATE means free pages in
* the range will never be allocated. Any free pages and pages freed in the
- * future will not be allocated again.
- *
- * start_pfn/end_pfn must be aligned to pageblock_order.
- * Return 0 on success and -EBUSY if any part of range cannot be isolated.
+ * future will not be allocated again. If specified range includes migrate types
+ * other than MOVABLE or CMA, this will fail with -EBUSY. For isolating all
+ * pages in the range finally, the caller have to free all pages in the range.
+ * test_page_isolated() can be used for test it.
*
* There is no high level synchronization mechanism that prevents two threads
- * from trying to isolate overlapping ranges. If this happens, one thread
+ * from trying to isolate overlapping ranges. If this happens, one thread
* will notice pageblocks in the overlapping range already set to isolate.
* This happens in set_migratetype_isolate, and set_migratetype_isolate
- * returns an error. We then clean up by restoring the migration type on
- * pageblocks we may have modified and return -EBUSY to caller. This
+ * returns an error. We then clean up by restoring the migration type on
+ * pageblocks we may have modified and return -EBUSY to caller. This
* prevents two threads from simultaneously working on overlapping ranges.
+ *
+ * Return: the number of isolated pageblocks on success and -EBUSY if any part
+ * of range cannot be isolated.
*/
int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
unsigned migratetype, int flags)
@@ -189,6 +198,7 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
unsigned long pfn;
unsigned long undo_pfn;
struct page *page;
+ int nr_isolate_pageblock = 0;
BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
BUG_ON(!IS_ALIGNED(end_pfn, pageblock_nr_pages));
@@ -197,13 +207,15 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
pfn < end_pfn;
pfn += pageblock_nr_pages) {
page = __first_valid_page(pfn, pageblock_nr_pages);
- if (page &&
- set_migratetype_isolate(page, migratetype, flags)) {
- undo_pfn = pfn;
- goto undo;
+ if (page) {
+ if (set_migratetype_isolate(page, migratetype, flags)) {
+ undo_pfn = pfn;
+ goto undo;
+ }
+ nr_isolate_pageblock++;
}
}
- return 0;
+ return nr_isolate_pageblock;
undo:
for (pfn = start_pfn;
pfn < undo_pfn;
diff --git a/mm/sparse.c b/mm/sparse.c
index 69904aa6165b..56e057c432f9 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -567,7 +567,7 @@ void online_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
}
#ifdef CONFIG_MEMORY_HOTREMOVE
-/* Mark all memory sections within the pfn range as online */
+/* Mark all memory sections within the pfn range as offline */
void offline_mem_sections(unsigned long start_pfn, unsigned long end_pfn)
{
unsigned long pfn;
--
2.17.2 (Apple Git-113)
Hi,
> Hi,
>
> the cpu frequency scaling never worked right (only on the 4.4 kernel from
> marvell). If you use the 1000 MHz firmware you are running with just 800
> MHz (this is the case on my board with a current firmware).
>
> Just have a look what the kernel thinks it is running at (frequency).
Ok, probably my bad here. By 'worked fine' i mean that this didn't lead to any
freezes or panics. I know the actual frequency wasn't set properly
Regards
/Ilias
>
> Regards,
> Christian
>
> Ilias Apalodimas <ilias.apalodimas(a)linaro.org> schrieb am Do., 14. März
> 2019, 14:44:
>
> > Hello Christian,
> > > Hi,
> > >
> > > I assume you use the 1000 MHz firmware. This does also not work on my
> > Rev 7
> > > board. But I'm pretty sure this is not a problem of the patches, because
> > if
> > > I take a newer kernel (4.19.20/27) without the patches it also does not
> > > work. A kernel 4.19.17 does work for me. My opinion on that is that this
> > is
> > > another problem which does just occure now because now the cpu frequency
> > > scaling is working with the right frequencies.
> > I am not sure which firmware i am running, i did all my tests on 5.0.0 and
> > changing between governors worked fine without the patches
> >
> > Regards
> > /Ilias
> > >
> > > Ilias Apalodimas <ilias.apalodimas(a)linaro.org> schrieb am Do., 14. März
> > > 2019, 13:15:
> > >
> > > > Hi Gregory,
> > > > > The clock parenting was not setup properly when DVFS was enabled. It
> > was
> > > > > expected that the same clock source was used with and without DVFS
> > which
> > > > > was not the case.
> > > > >
> > > > > This patch fixes this issue, allowing to make the cpufreq support
> > work
> > > > > when the CPU clocks source are not the default ones.
> > > > >
> > > > > Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
> > > > > Cc: <stable(a)vger.kernel.org>
> > > > > Reported-by: Christian Neubert <christian.neubert.86(a)gmail.com>
> > > > > Reported-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
> > > > > Signed-off-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
> > > > > ---
> > > > > drivers/clk/mvebu/armada-37xx-periph.c | 11 +++++++++++
> > > > > 1 file changed, 11 insertions(+)
> > > > >
> > > > > diff --git a/drivers/clk/mvebu/armada-37xx-periph.c
> > > > b/drivers/clk/mvebu/armada-37xx-periph.c
> > > > > index 1f1cff428d78..26ed3c18a239 100644
> > > > > --- a/drivers/clk/mvebu/armada-37xx-periph.c
> > > > > +++ b/drivers/clk/mvebu/armada-37xx-periph.c
> > > > > @@ -671,6 +671,17 @@ static int armada_3700_add_composite_clk(const
> > > > struct clk_periph_data *data,
> > > > > map = syscon_regmap_lookup_by_compatible(
> > > > > "marvell,armada-3700-nb-pm");
> > > > > pmcpu_clk->nb_pm_base = map;
> > > > > +
> > > > > + /*
> > > > > + * Use the same parent when DVFS is enabled that the
> > > > > + * default parent received at boot time. When this
> > > > > + * function is called, DVFS is not enabled yet, so we
> > > > > + * get the default parent and we can set the parent
> > > > > + * for DVFS.
> > > > > + */
> > > > > + if (clk_pm_cpu_set_parent(muxrate_hw,
> > > > > +
> > > > clk_pm_cpu_get_parent(muxrate_hw)))
> > > > > + dev_warn(dev, "Failed to setup default parent
> > > > clock for DVFS\n");
> > > > > }
> > > > >
> > > > > *hw = clk_hw_register_composite(dev, data->name,
> > > > data->parent_names,
> > > > > --
> > > > > 2.20.1
> > > > >
> > > > Applied this and selected only
> > > >
> > > > CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
> > > > CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
> > > > CONFIG_CPU_FREQ_GOV_POWERSAVE=y
> > > >
> > > > After changing the governor from 'powersave' to 'performance' the board
> > > > completely froze (i even lost access to the serial port)
> > > >
> > > > Cheers
> > > > /Ilias
> > > >
> >
The clock parenting was not setup properly when DVFS was enabled. It was
expected that the same clock source was used with and without DVFS which
was not the case.
This patch fixes this issue, allowing to make the cpufreq support work
when the CPU clocks source are not the default ones.
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Cc: <stable(a)vger.kernel.org>
Reported-by: Christian Neubert <christian.neubert.86(a)gmail.com>
Reported-by: Ilias Apalodimas <ilias.apalodimas(a)linaro.org>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
---
drivers/clk/mvebu/armada-37xx-periph.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/clk/mvebu/armada-37xx-periph.c b/drivers/clk/mvebu/armada-37xx-periph.c
index 1f1cff428d78..26ed3c18a239 100644
--- a/drivers/clk/mvebu/armada-37xx-periph.c
+++ b/drivers/clk/mvebu/armada-37xx-periph.c
@@ -671,6 +671,17 @@ static int armada_3700_add_composite_clk(const struct clk_periph_data *data,
map = syscon_regmap_lookup_by_compatible(
"marvell,armada-3700-nb-pm");
pmcpu_clk->nb_pm_base = map;
+
+ /*
+ * Use the same parent when DVFS is enabled that the
+ * default parent received at boot time. When this
+ * function is called, DVFS is not enabled yet, so we
+ * get the default parent and we can set the parent
+ * for DVFS.
+ */
+ if (clk_pm_cpu_set_parent(muxrate_hw,
+ clk_pm_cpu_get_parent(muxrate_hw)))
+ dev_warn(dev, "Failed to setup default parent clock for DVFS\n");
}
*hw = clk_hw_register_composite(dev, data->name, data->parent_names,
--
2.20.1
From: Christian Neubert <christian.neubert.86(a)gmail.com>
The clock parenting was not setup properly when DVFS was enabled. It was
expected that the same clock source was used with and without DVFS which
was not the case.
This patch fixes this issue, allowing to make the cpufreq support work
when the CPU clock source are not the default ones.
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Cc: <stable(a)vger.kernel.org>
[gregory: extract from a larger patch, modify comments and commit log]
Signed-off-by: Christian Neubert <christian.neubert.86(a)gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
---
drivers/cpufreq/armada-37xx-cpufreq.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
index 75491fc841a6..ad4463e4266e 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -162,11 +162,25 @@ static void __init armada37xx_cpufreq_dvfs_setup(struct regmap *base,
}
/*
- * Set cpu clock source, for all the level we keep the same
- * clock source that the one already configured. For this one
- * we need to use the clock framework
+ * Set CPU clock source, for all the level we keep the same
+ * clock source that the one already configured with DVS
+ * disabled. For this one we need to use the clock framework
*/
parent = clk_get_parent(clk);
+
+ /*
+ * Unset parent clock to force the clock framework setting again
+ * the clock parent
+ */
+ clk_set_parent(clk, NULL);
+
+ /*
+ * For the Armada 37xx CPU clocks, setting the parent will
+ * actually configure the parent when DVFS is enabled. At
+ * hardware level it will be a different register from the one
+ * read when doing clk_get_parent that will be set with
+ * clk_set_parent.
+ */
clk_set_parent(clk, parent);
}
--
2.20.1
FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header
+ that max_write bytes. A filesystem server is free to set its max_write
in anywhere in the range between [1·page, fc->max_pages·page]. In
particular go-fuse[2] sets max_write by default as 64K, wheres default
fc->max_pages corresponds to 128K. Libfuse also allows users to
configure max_write, but by default presets it to possible maximum.
If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract,
will be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server
request buffer. In turn the filesystem server could get stuck waiting
indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK
which is understood by clients as that NOTIFY_REPLY was queued and will
be sent back.
-> Cap requested size to negotiate max_write to avoid the problem.
This aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages.
This way it should not hurt NOTIFY_RETRIEVE semantic if we return less
data than was originally requested.
Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.
[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
[2] https://github.com/hanwen/go-fuse
Signed-off-by: Kirill Smelkov <kirr(a)nexedi.com>
Cc: Han-Wen Nienhuys <hanwen(a)google.com>
Cc: Jakob Unterwurzacher <jakobunt(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # v2.6.36+
---
fs/fuse/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8a63e52785e9..38e94bc43053 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1749,7 +1749,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
offset = outarg->offset & ~PAGE_MASK;
file_size = i_size_read(inode);
- num = outarg->size;
+ num = min(outarg->size, fc->max_write);
if (outarg->offset > file_size)
num = 0;
else if (outarg->offset + num > file_size)
--
2.21.0.225.g810b269d1a