<gregkh(a)linuxfoundation.org> writes:
> This is a note to let you know that I've just added the patch titled
>
> net-sysfs: Fix reference count leak
>
> to the 4.4-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> net-sysfs-fix-reference-count-leak.patch
> and it can be found in the queue-4.4 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
This patch shouldn't be taken into 4.4 or 4.9 stable branches. Memory
leak it's fixing doesn't exist in 4.4 or 4.9. It's introduced by these two
patches which are not merged into 4.4 or 4.9 branches:
commit e331c9066901dfe40bea4647521b86e9fb9901bb
Author: YueHaibing <yuehaibing(a)huawei.com>
Date: Tue Mar 19 10:16:53 2019 +0800
net-sysfs: call dev_hold if kobject_init_and_add success
[ Upstream commit a3e23f719f5c4a38ffb3d30c8d7632a4ed8ccd9e ]
In netdev_queue_add_kobject and rx_queue_add_kobject,
if sysfs_create_group failed, kobject_put will call
netdev_queue_release to decrease dev refcont, however
dev_hold has not be called. So we will see this while
unregistering dev:
unregister_netdevice: waiting for bcsh0 to become free. Usage count = -1
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Fixes: d0d668371679 ("net: don't decrement kobj reference count on init fail
ure")
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
commit d0d6683716791b2a2761a1bb025c613eb73da6c3
Author: stephen hemminger <stephen(a)networkplumber.org>
Date: Fri Aug 18 13:46:19 2017 -0700
net: don't decrement kobj reference count on init failure
If kobject_init_and_add failed, then the failure path would
decrement the reference count of the queue kobject whose reference
count was already zero.
Fixes: 114cf5802165 ("bql: Byte queue limits")
Signed-off-by: Stephen Hemminger <sthemmin(a)microsoft.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
>
>
> From foo@baz Mon 27 Jan 2020 04:14:17 PM CET
> From: Jouni Hogander <jouni.hogander(a)unikie.com>
> Date: Mon, 20 Jan 2020 09:51:03 +0200
> Subject: net-sysfs: Fix reference count leak
>
> From: Jouni Hogander <jouni.hogander(a)unikie.com>
>
> [ Upstream commit cb626bf566eb4433318d35681286c494f04fedcc ]
>
> Netdev_register_kobject is calling device_initialize. In case of error
> reference taken by device_initialize is not given up.
>
> Drivers are supposed to call free_netdev in case of error. In non-error
> case the last reference is given up there and device release sequence
> is triggered. In error case this reference is kept and the release
> sequence is never started.
>
> Fix this by setting reg_state as NETREG_UNREGISTERED if registering
> fails.
>
> This is the rootcause for couple of memory leaks reported by Syzkaller:
>
> BUG: memory leak unreferenced object 0xffff8880675ca008 (size 256):
> comm "netdev_register", pid 281, jiffies 4294696663 (age 6.808s)
> hex dump (first 32 bytes):
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<0000000058ca4711>] kmem_cache_alloc_trace+0x167/0x280
> [<000000002340019b>] device_add+0x882/0x1750
> [<000000001d588c3a>] netdev_register_kobject+0x128/0x380
> [<0000000011ef5535>] register_netdevice+0xa1b/0xf00
> [<000000007fcf1c99>] __tun_chr_ioctl+0x20d5/0x3dd0
> [<000000006a5b7b2b>] tun_chr_ioctl+0x2f/0x40
> [<00000000f30f834a>] do_vfs_ioctl+0x1c7/0x1510
> [<00000000fba062ea>] ksys_ioctl+0x99/0xb0
> [<00000000b1c1b8d2>] __x64_sys_ioctl+0x78/0xb0
> [<00000000984cabb9>] do_syscall_64+0x16f/0x580
> [<000000000bde033d>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<00000000e6ca2d9f>] 0xffffffffffffffff
>
> BUG: memory leak
> unreferenced object 0xffff8880668ba588 (size 8):
> comm "kobject_set_nam", pid 286, jiffies 4294725297 (age 9.871s)
> hex dump (first 8 bytes):
> 6e 72 30 00 cc be df 2b nr0....+
> backtrace:
> [<00000000a322332a>] __kmalloc_track_caller+0x16e/0x290
> [<00000000236fd26b>] kstrdup+0x3e/0x70
> [<00000000dd4a2815>] kstrdup_const+0x3e/0x50
> [<0000000049a377fc>] kvasprintf_const+0x10e/0x160
> [<00000000627fc711>] kobject_set_name_vargs+0x5b/0x140
> [<0000000019eeab06>] dev_set_name+0xc0/0xf0
> [<0000000069cb12bc>] netdev_register_kobject+0xc8/0x320
> [<00000000f2e83732>] register_netdevice+0xa1b/0xf00
> [<000000009e1f57cc>] __tun_chr_ioctl+0x20d5/0x3dd0
> [<000000009c560784>] tun_chr_ioctl+0x2f/0x40
> [<000000000d759e02>] do_vfs_ioctl+0x1c7/0x1510
> [<00000000351d7c31>] ksys_ioctl+0x99/0xb0
> [<000000008390040a>] __x64_sys_ioctl+0x78/0xb0
> [<0000000052d196b7>] do_syscall_64+0x16f/0x580
> [<0000000019af9236>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<00000000bc384531>] 0xffffffffffffffff
>
> v3 -> v4:
> Set reg_state to NETREG_UNREGISTERED if registering fails
>
> v2 -> v3:
> * Replaced BUG_ON with WARN_ON in free_netdev and netdev_release
>
> v1 -> v2:
> * Relying on driver calling free_netdev rather than calling
> put_device directly in error path
>
> Reported-by: syzbot+ad8ca40ecd77896d51e2(a)syzkaller.appspotmail.com
> Cc: David Miller <davem(a)davemloft.net>
> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Cc: Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
> Signed-off-by: Jouni Hogander <jouni.hogander(a)unikie.com>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> net/core/dev.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -6806,8 +6806,10 @@ int register_netdevice(struct net_device
> goto err_uninit;
>
> ret = netdev_register_kobject(dev);
> - if (ret)
> + if (ret) {
> + dev->reg_state = NETREG_UNREGISTERED;
> goto err_uninit;
> + }
> dev->reg_state = NETREG_REGISTERED;
>
> __netdev_update_features(dev);
>
>
> Patches currently in stable-queue which might be from jouni.hogander(a)unikie.com are
>
> queue-4.4/net-sysfs-fix-reference-count-leak.patch
BR,
Jouni Högander
[ Upstream commit 730766bae3280a25d40ea76a53dc6342e84e6513 ]
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2997aa4063d97fdb39 ("coresight: etb10: implementing AUX API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # v4.9 to v4.19
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-5-mathieu.poirier@linaro.org
---
drivers/hwtracing/coresight/coresight-etb10.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index 0dad8626bcfb..6cf28b049635 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -275,9 +275,7 @@ static void *etb_alloc_buffer(struct coresight_device *csdev, int cpu,
int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (cpu == -1) ? NUMA_NO_NODE : cpu_to_node(cpu);
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
if (!buf)
--
2.24.1
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 883f616530692d81cb70f8a32d85c0d2afc05f69 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lars=20M=C3=B6llendorf?= <lars.moellendorf(a)plating.de>
Date: Fri, 13 Dec 2019 14:50:55 +0100
Subject: [PATCH] iio: buffer: align the size of scan bytes to size of the
largest element
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Previous versions of `iio_compute_scan_bytes` only aligned each element
to its own length (i.e. its own natural alignment). Because multiple
consecutive sets of scan elements are buffered this does not work in
case the computed scan bytes do not align with the natural alignment of
the first scan element in the set.
This commit fixes this by aligning the scan bytes to the natural
alignment of the largest scan element in the set.
Fixes: 959d2952d124 ("staging:iio: make iio_sw_buffer_preenable much more general.")
Signed-off-by: Lars Möllendorf <lars.moellendorf(a)plating.de>
Reviewed-by: Lars-Peter Clausen <lars(a)metafoo.de>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/iio/industrialio-buffer.c b/drivers/iio/industrialio-buffer.c
index c193d64e5217..112225c0e486 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -566,7 +566,7 @@ static int iio_compute_scan_bytes(struct iio_dev *indio_dev,
const unsigned long *mask, bool timestamp)
{
unsigned bytes = 0;
- int length, i;
+ int length, i, largest = 0;
/* How much space will the demuxed element take? */
for_each_set_bit(i, mask,
@@ -574,13 +574,17 @@ static int iio_compute_scan_bytes(struct iio_dev *indio_dev,
length = iio_storage_bytes_for_si(indio_dev, i);
bytes = ALIGN(bytes, length);
bytes += length;
+ largest = max(largest, length);
}
if (timestamp) {
length = iio_storage_bytes_for_timestamp(indio_dev);
bytes = ALIGN(bytes, length);
bytes += length;
+ largest = max(largest, length);
}
+
+ bytes = ALIGN(bytes, largest);
return bytes;
}
From: Will Deacon <will.deacon(a)arm.com>
commit 2a355ec25729053bb9a1a89b6c1d1cdd6c3b3fb1 upstream.
While the CSV3 field of the ID_AA64_PFR0 CPU ID register can be checked
to see if a CPU is susceptible to Meltdown and therefore requires kpti
to be enabled, existing CPUs do not implement this field.
We therefore whitelist all unaffected Cortex-A CPUs that do not implement
the CSV3 field.
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
[florian: adjust whilelist location and table to stable-4.9.y]
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
arch/arm64/kernel/cpufeature.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9a8e45dc36bd..8cf001baee21 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -789,6 +789,11 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
case MIDR_CAVIUM_THUNDERX2:
case MIDR_BRCM_VULCAN:
+ case MIDR_CORTEX_A53:
+ case MIDR_CORTEX_A55:
+ case MIDR_CORTEX_A57:
+ case MIDR_CORTEX_A72:
+ case MIDR_CORTEX_A73:
return false;
}
--
2.17.1
From: Jeremy Linton <jeremy.linton(a)arm.com>
commit de19055564c8f8f9d366f8db3395836da0b2176c upstream
For a while Arm64 has been capable of force enabling
or disabling the kpti mitigations. Lets make sure the
documentation reflects that.
Signed-off-by: Jeremy Linton <jeremy.linton(a)arm.com>
Reviewed-by: Andre Przywara <andre.przywara(a)arm.com>
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
[florian: patch the correct file]
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
Documentation/kernel-parameters.txt | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1bc12619bedd..b2d2f4539a3f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
+ kpti= [ARM64] Control page table isolation of user
+ and kernel address spaces.
+ Default: enabled on cores which need mitigation.
+ 0: force disabled
+ 1: force enabled
+
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
--
2.17.1
ZBC/ZAC report zones command may return less bytes than requested if the
number of matching zones for the report request is small. However, unlike
read or write commands, the remainder of incomplete report zones commands
cannot be automatically requested by the block layer: the start sector of
the next report cannot be known, and the report reply may not be 512B
aligned for SAS drives (a report zone reply size is always a multiple of
64B). The regular request completion code executing bio_advance() and
restart of the command remainder part currently causes invalid zone
descriptor data to be reported to the caller if the report zone size is
smaller than 512B (a case that can happen easily for a report of the last
zones of a SAS drive for example).
Since blkdev_report_zones() handles report zone command processing in a
loop until completion (no more zones are being reported), we can safely
avoid that the block layer performs an incorrect bio_advance() call and
restart of the remainder of incomplete report zone BIOs. To do so, always
indicate a full completion of REQ_OP_ZONE_REPORT by setting good_bytes to
the request buffer size and by setting the command resid to 0. This does
not affect the post processing of the report zone reply done by
sd_zbc_complete() since the reply header indicates the number of zones
reported.
Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: <stable(a)vger.kernel.org> # 4.19
Cc: <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Masato Suzuki <masato.suzuki(a)wdc.com>
---
drivers/scsi/sd.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2955b856e9ec..e8c2afbb82e9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1981,9 +1981,13 @@ static int sd_done(struct scsi_cmnd *SCpnt)
}
break;
case REQ_OP_ZONE_REPORT:
+ /* To avoid that the block layer performs an incorrect
+ * bio_advance() call and restart of the remainder of
+ * incomplete report zone BIOs, always indicate a full
+ * completion of REQ_OP_ZONE_REPORT.
+ */
if (!result) {
- good_bytes = scsi_bufflen(SCpnt)
- - scsi_get_resid(SCpnt);
+ good_bytes = scsi_bufflen(SCpnt);
scsi_set_resid(SCpnt, 0);
} else {
good_bytes = 0;
--
2.24.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8bcebc77e85f3d7536f96845a0fe94b1dddb6af0 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 20 Jan 2020 13:07:31 -0500
Subject: [PATCH] tracing: Fix histogram code when expression has same var as
value
While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:
# echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
# echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
Would not display any histograms in the sched_switch histogram side.
But if I were to swap the location of
"delta=common_timestamp-$start" with "start2=$start"
Such that the last line had:
# echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
The histogram works as expected.
What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:
delta=common_timestamp-$start
The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.
When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.
>From Tom Zanussi:
"Just to clarify some more about what the problem was is that without
your patch, we would have two separate references to the same variable,
and during resolve_var_refs(), they'd both want to be resolved
separately, so in this case, since the first reference to start wasn't
part of an expression, it wouldn't get the read-once flag set, so would
be read normally, and then the second reference would do the read-once
read and also be read but using read-once. So everything worked and
you didn't see a problem:
from: start2=$start,delta=common_timestamp-$start
In the second case, when you switched them around, the first reference
would be resolved by doing the read-once, and following that the second
reference would try to resolve and see that the variable had already
been read, so failed as unset, which caused it to short-circuit out and
not do the trigger action to generate the synthetic event:
to: delta=common_timestamp-$start,start2=$start
With your patch, we only have the single resolution which happens
correctly the one time it's resolved, so this can't happen."
Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home
Cc: stable(a)vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi(a)kernel.org>
Tested-by: Tom Zanussi <zanussi(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index d33b046f985a..6ac35b9e195d 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -116,6 +116,7 @@ struct hist_field {
struct ftrace_event_field *field;
unsigned long flags;
hist_field_fn_t fn;
+ unsigned int ref;
unsigned int size;
unsigned int offset;
unsigned int is_signed;
@@ -2427,8 +2428,16 @@ static int contains_operator(char *str)
return field_op;
}
+static void get_hist_field(struct hist_field *hist_field)
+{
+ hist_field->ref++;
+}
+
static void __destroy_hist_field(struct hist_field *hist_field)
{
+ if (--hist_field->ref > 1)
+ return;
+
kfree(hist_field->var.name);
kfree(hist_field->name);
kfree(hist_field->type);
@@ -2470,6 +2479,8 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (!hist_field)
return NULL;
+ hist_field->ref = 1;
+
hist_field->hist_data = hist_data;
if (flags & HIST_FIELD_FL_EXPR || flags & HIST_FIELD_FL_ALIAS)
@@ -2665,6 +2676,17 @@ static struct hist_field *create_var_ref(struct hist_trigger_data *hist_data,
{
unsigned long flags = HIST_FIELD_FL_VAR_REF;
struct hist_field *ref_field;
+ int i;
+
+ /* Check if the variable already exists */
+ for (i = 0; i < hist_data->n_var_refs; i++) {
+ ref_field = hist_data->var_refs[i];
+ if (ref_field->var.idx == var_field->var.idx &&
+ ref_field->var.hist_data == var_field->hist_data) {
+ get_hist_field(ref_field);
+ return ref_field;
+ }
+ }
ref_field = create_hist_field(var_field->hist_data, NULL, flags, NULL);
if (ref_field) {
Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
the semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page). This was an unintentional change that hasn't been noticed
except for LTP tests which checked for the documented behavior.
There are two ways to go around this change. We can even get back to the
original behavior and return -EAGAIN whenever migrate_pages is not able
to migrate pages due to non-fatal reasons. Another option would be to
simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g. page is pinned or locked for other reasons).
This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it. Also it allows to have a slightly easier error handling
as the caller knows that it is worth to retry when err > 0.
But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of non-attempted
pages in the return value too.
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Suggested-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
---
v4: Fixed some typo and grammar errors caught by Willy
v3: Rephrased the commit log per Michal and added Michal's Acked-by
v2: Rebased on top of the latest mainline kernel per Andrew
mm/migrate.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 86873b6..2530860 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1627,8 +1627,19 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
start = i;
} else if (node != current_node) {
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ /*
+ * Positive err means the number of failed
+ * pages to migrate. Since we are going to
+ * abort and return the number of non-migrated
+ * pages, so need to incude the rest of the
+ * nr_pages that have not been attempted as
+ * well.
+ */
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
err = store_status(status, start, current_node, i - start);
if (err)
goto out;
@@ -1659,8 +1670,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
goto out_flush;
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
if (i > start) {
err = store_status(status, start, current_node, i - start);
if (err)
@@ -1674,6 +1688,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
/* Make sure we do not overwrite the existing error */
err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+ /*
+ * Don't have to report non-attempted pages here since:
+ * - If the above loop is done gracefully all pages have been
+ * attempted.
+ * - If the above loop is aborted it means a fatal error
+ * happened, should return ret.
+ */
if (!err1)
err1 = store_status(status, start, current_node, i - start);
if (!err)
--
1.8.3.1
From: Suren Baghdasaryan <surenb(a)google.com>
When ashmem file is mmapped, the resulting vma->vm_file points to the
backing shmem file with the generic fops that do not check ashmem
permissions like fops of ashmem do. If an mremap is done on the ashmem
region, then the permission checks will be skipped. Fix that by disallowing
mapping operation on the backing shmem file.
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable <stable(a)vger.kernel.org> # 4.4,4.9,4.14,4.18,5.4
Signed-off-by: Todd Kjos <tkjos(a)google.com>
---
drivers/staging/android/ashmem.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
v2: update commit message as suggested by joelaf(a)google.com.
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 74d497d39c5a..c6695354b123 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -351,8 +351,23 @@ static inline vm_flags_t calc_vm_may_flags(unsigned long prot)
_calc_vm_trans(prot, PROT_EXEC, VM_MAYEXEC);
}
+static int ashmem_vmfile_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ /* do not allow to mmap ashmem backing shmem file directly */
+ return -EPERM;
+}
+
+static unsigned long
+ashmem_vmfile_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags)
+{
+ return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
{
+ static struct file_operations vmfile_fops;
struct ashmem_area *asma = file->private_data;
int ret = 0;
@@ -393,6 +408,19 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
}
vmfile->f_mode |= FMODE_LSEEK;
asma->file = vmfile;
+ /*
+ * override mmap operation of the vmfile so that it can't be
+ * remapped which would lead to creation of a new vma with no
+ * asma permission checks. Have to override get_unmapped_area
+ * as well to prevent VM_BUG_ON check for f_ops modification.
+ */
+ if (!vmfile_fops.mmap) {
+ vmfile_fops = *vmfile->f_op;
+ vmfile_fops.mmap = ashmem_vmfile_mmap;
+ vmfile_fops.get_unmapped_area =
+ ashmem_vmfile_get_unmapped_area;
+ }
+ vmfile->f_op = &vmfile_fops;
}
get_file(asma->file);
--
2.25.0.341.g760bfbb309-goog