Building ia64:defconfig ... failed
--------------
Error log:
drivers/acpi/numa.c: In function 'pxm_to_node':
drivers/acpi/numa.c:49:43: error: 'numa_off' undeclared
Building powerpc:defconfig ... failed
--------------
Error log:
arch/powerpc/kvm/book3s_hv.c: In function ‘kvm_arch_vm_ioctl_hv’:
arch/powerpc/kvm/book3s_hv.c:3161:7: error: implicit declaration of function ‘kvmhv_on_pseries’
Guenter
The patch titled
Subject: mm/zsmalloc: include sparsemem.h for MAX_PHYSMEM_BITS
has been removed from the -mm tree. Its filename was
mm-zsmalloc-include-sparsememh-for-max_physmem_bits.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Stefan Agner <stefan(a)agner.ch>
Subject: mm/zsmalloc: include sparsemem.h for MAX_PHYSMEM_BITS
Most architectures define MAX_PHYSMEM_BITS in asm/sparsemem.h and don't
include it in asm/pgtable.h.
If MAX_PHYSMEM_BITS is not set in mm/zsmalloc.c it will set
MAX_PHYSMEM_BITS to BITS_PER_LONG. And this is 32-bit, too short when
LPAE is in use.
So include asm/sparsemem.h directly to get the MAX_PHYSMEM_BITS define
on all architectures.
This fixes a crash when accessing zram on 32-bit ARM platform with LPAE and
more than 4GB of memory:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = a27bd01c
[00000000] *pgd=236a0003, *pmd=1ffa64003
Internal error: Oops: 207 [#1] SMP ARM
Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
Hardware name: BCM2711
PC is at zs_map_object+0x94/0x338
LR is at zram_bvec_rw.constprop.0+0x330/0xa64
pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013
sp : e376bbe0 ip : 00000000 fp : c1e2921c
r10: 00000002 r9 : c1dda730 r8 : 00000000
r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000
r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5383d Table: 235c2a80 DAC: fffffffd
Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
Stack: (0xe376bbe0 to 0xe376c000)
...
[<c0602b38>] (zs_map_object) from [<c0bda6a0>] (zram_bvec_rw.constprop.0+0x330/0xa64)
[<c0bda6a0>] (zram_bvec_rw.constprop.0) from [<c0bdaf78>] (zram_submit_bio+0x1a4/0x40c)
[<c0bdaf78>] (zram_submit_bio) from [<c085806c>] (submit_bio_noacct+0xd0/0x3c8)
[<c085806c>] (submit_bio_noacct) from [<c08583b0>] (submit_bio+0x4c/0x190)
[<c08583b0>] (submit_bio) from [<c06496b4>] (submit_bh_wbc+0x188/0x1b8)
[<c06496b4>] (submit_bh_wbc) from [<c064ce98>] (__block_write_full_page+0x340/0x5e4)
[<c064ce98>] (__block_write_full_page) from [<c064d3ec>] (block_write_full_page+0x128/0x170)
[<c064d3ec>] (block_write_full_page) from [<c0591ae8>] (__writepage+0x14/0x68)
[<c0591ae8>] (__writepage) from [<c0593efc>] (write_cache_pages+0x1bc/0x494)
[<c0593efc>] (write_cache_pages) from [<c059422c>] (generic_writepages+0x58/0x8c)
[<c059422c>] (generic_writepages) from [<c0594c24>] (do_writepages+0x48/0xec)
[<c0594c24>] (do_writepages) from [<c0589330>] (__filemap_fdatawrite_range+0xf0/0x128)
[<c0589330>] (__filemap_fdatawrite_range) from [<c05894bc>] (file_write_and_wait_range+0x48/0x98)
[<c05894bc>] (file_write_and_wait_range) from [<c064f3f8>] (blkdev_fsync+0x1c/0x44)
[<c064f3f8>] (blkdev_fsync) from [<c064408c>] (do_fsync+0x3c/0x70)
[<c064408c>] (do_fsync) from [<c0400374>] (__sys_trace_return+0x0/0x2c)
Exception stack(0xe376bfa8 to 0xe376bff0)
bfa0: 0003d2e0 b6f7b6f0 00000003 00046e40 00001000 00000000
bfc0: 0003d2e0 b6f7b6f0 00000000 00000076 00000000 00000000 befcbb20 befcbb28
bfe0: b6f4e060 befcbad8 b6f23e0c b6dc4a80
Code: e5927000 e0050391 e0871005 e5918018 (e5983000)
Link: https://lkml.kernel.org/r/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.16047621…
Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Signed-off-by: Stefan Agner <stefan(a)agner.ch>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/zsmalloc.c~mm-zsmalloc-include-sparsememh-for-max_physmem_bits
+++ a/mm/zsmalloc.c
@@ -40,6 +40,7 @@
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/pgtable.h>
+#include <asm/sparsemem.h>
#include <asm/tlbflush.h>
#include <linux/cpumask.h>
#include <linux/cpu.h>
_
Patches currently in -mm which might be from stefan(a)agner.ch are
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cf1ad559a20d1930aa7b47a52f54e1f8718de301 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= <mirq-linux(a)rere.qmqm.pl>
Date: Mon, 2 Nov 2020 22:27:27 +0100
Subject: [PATCH] regulator: defer probe when trying to get voltage from
unresolved supply
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
regulator_get_voltage_rdev() is called in regulator probe() when
applying machine constraints. The "fixed" commit exposed the problem
that non-bypassed regulators can forward the request to its parent
(like bypassed ones) supply. Return -EPROBE_DEFER when the supply
is expected but not resolved yet.
Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator")
Cc: stable(a)vger.kernel.org
Signed-off-by: Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
Reported-by: Ondřej Jirman <megous(a)megous.com>
Reported-by: Corentin Labbe <clabbe.montjoie(a)gmail.com>
Tested-by: Ondřej Jirman <megous(a)megous.com>
Link: https://lore.kernel.org/r/a9041d68b4d35e4a2dd71629c8a6422662acb5ee.16043519…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index a4ffd71696da..a5ad553da8cd 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -4165,6 +4165,8 @@ int regulator_get_voltage_rdev(struct regulator_dev *rdev)
ret = rdev->desc->fixed_uV;
} else if (rdev->supply) {
ret = regulator_get_voltage_rdev(rdev->supply->rdev);
+ } else if (rdev->supply_name) {
+ return -EPROBE_DEFER;
} else {
return -EINVAL;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cf1ad559a20d1930aa7b47a52f54e1f8718de301 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= <mirq-linux(a)rere.qmqm.pl>
Date: Mon, 2 Nov 2020 22:27:27 +0100
Subject: [PATCH] regulator: defer probe when trying to get voltage from
unresolved supply
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
regulator_get_voltage_rdev() is called in regulator probe() when
applying machine constraints. The "fixed" commit exposed the problem
that non-bypassed regulators can forward the request to its parent
(like bypassed ones) supply. Return -EPROBE_DEFER when the supply
is expected but not resolved yet.
Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator")
Cc: stable(a)vger.kernel.org
Signed-off-by: Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
Reported-by: Ondřej Jirman <megous(a)megous.com>
Reported-by: Corentin Labbe <clabbe.montjoie(a)gmail.com>
Tested-by: Ondřej Jirman <megous(a)megous.com>
Link: https://lore.kernel.org/r/a9041d68b4d35e4a2dd71629c8a6422662acb5ee.16043519…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index a4ffd71696da..a5ad553da8cd 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -4165,6 +4165,8 @@ int regulator_get_voltage_rdev(struct regulator_dev *rdev)
ret = rdev->desc->fixed_uV;
} else if (rdev->supply) {
ret = regulator_get_voltage_rdev(rdev->supply->rdev);
+ } else if (rdev->supply_name) {
+ return -EPROBE_DEFER;
} else {
return -EINVAL;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From cf1ad559a20d1930aa7b47a52f54e1f8718de301 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= <mirq-linux(a)rere.qmqm.pl>
Date: Mon, 2 Nov 2020 22:27:27 +0100
Subject: [PATCH] regulator: defer probe when trying to get voltage from
unresolved supply
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
regulator_get_voltage_rdev() is called in regulator probe() when
applying machine constraints. The "fixed" commit exposed the problem
that non-bypassed regulators can forward the request to its parent
(like bypassed ones) supply. Return -EPROBE_DEFER when the supply
is expected but not resolved yet.
Fixes: aea6cb99703e ("regulator: resolve supply after creating regulator")
Cc: stable(a)vger.kernel.org
Signed-off-by: Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
Reported-by: Ondřej Jirman <megous(a)megous.com>
Reported-by: Corentin Labbe <clabbe.montjoie(a)gmail.com>
Tested-by: Ondřej Jirman <megous(a)megous.com>
Link: https://lore.kernel.org/r/a9041d68b4d35e4a2dd71629c8a6422662acb5ee.16043519…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index a4ffd71696da..a5ad553da8cd 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -4165,6 +4165,8 @@ int regulator_get_voltage_rdev(struct regulator_dev *rdev)
ret = rdev->desc->fixed_uV;
} else if (rdev->supply) {
ret = regulator_get_voltage_rdev(rdev->supply->rdev);
+ } else if (rdev->supply_name) {
+ return -EPROBE_DEFER;
} else {
return -EINVAL;
}
When the command feature of the ADC is used, it is possible to program
the ADC, and specify at each step what input should be processed, and in
comparison to what reference.
This broke the AUX and battery readings when the touchscreen was
enabled, most likely because the CMD feature would change the VREF all
the time.
Now, when AUX or battery are read, we temporarily disable the CMD
feature, which means that we won't get touchscreen readings in that time
frame. But it now gives correct values for AUX / battery, and the
touchscreen isn't disabled for long enough to be an actual issue.
Fixes: b96952f498db ("IIO: Ingenic JZ47xx: Add touchscreen mode.")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Paul Cercueil <paul(a)crapouillou.net>
---
drivers/iio/adc/ingenic-adc.c | 33 +++++++++++++++++++++++++++------
1 file changed, 27 insertions(+), 6 deletions(-)
diff --git a/drivers/iio/adc/ingenic-adc.c b/drivers/iio/adc/ingenic-adc.c
index 92b25083e23f..ecaff6a9b716 100644
--- a/drivers/iio/adc/ingenic-adc.c
+++ b/drivers/iio/adc/ingenic-adc.c
@@ -177,13 +177,12 @@ static void ingenic_adc_set_config(struct ingenic_adc *adc,
mutex_unlock(&adc->lock);
}
-static void ingenic_adc_enable(struct ingenic_adc *adc,
- int engine,
- bool enabled)
+static void ingenic_adc_enable_unlocked(struct ingenic_adc *adc,
+ int engine,
+ bool enabled)
{
u8 val;
- mutex_lock(&adc->lock);
val = readb(adc->base + JZ_ADC_REG_ENABLE);
if (enabled)
@@ -192,20 +191,42 @@ static void ingenic_adc_enable(struct ingenic_adc *adc,
val &= ~BIT(engine);
writeb(val, adc->base + JZ_ADC_REG_ENABLE);
+}
+
+static void ingenic_adc_enable(struct ingenic_adc *adc,
+ int engine,
+ bool enabled)
+{
+ mutex_lock(&adc->lock);
+ ingenic_adc_enable_unlocked(adc, engine, enabled);
mutex_unlock(&adc->lock);
}
static int ingenic_adc_capture(struct ingenic_adc *adc,
int engine)
{
+ u32 cfg;
u8 val;
int ret;
- ingenic_adc_enable(adc, engine, true);
+ /*
+ * Disable CMD_SEL temporarily, because it causes wrong VBAT readings,
+ * probably due to the switch of VREF. We must keep the lock here to
+ * avoid races with the buffer enable/disable functions.
+ */
+ mutex_lock(&adc->lock);
+ cfg = readl(adc->base + JZ_ADC_REG_CFG);
+ writel(cfg & ~JZ_ADC_REG_CFG_CMD_SEL, adc->base + JZ_ADC_REG_CFG);
+
+
+ ingenic_adc_enable_unlocked(adc, engine, true);
ret = readb_poll_timeout(adc->base + JZ_ADC_REG_ENABLE, val,
!(val & BIT(engine)), 250, 1000);
if (ret)
- ingenic_adc_enable(adc, engine, false);
+ ingenic_adc_enable_unlocked(adc, engine, false);
+
+ writel(cfg, adc->base + JZ_ADC_REG_CFG);
+ mutex_unlock(&adc->lock);
return ret;
}
--
2.28.0
Hi,
I get the following build warning in v5.4.75.
drivers/hwtracing/coresight/coresight-etm-perf.c: In function 'etm_setup_aux':
drivers/hwtracing/coresight/coresight-etm-perf.c:226:37: warning:
passing argument 1 of 'coresight_get_enabled_sink' makes pointer from integer without a cast
Actually, the warning is fatal, since the call is
sink = coresight_get_enabled_sink(true);
However, the argument to coresight_get_enabled_sink() is now a pointer.
The parameter change was introduced with commit 8fd52a21ab57
("coresight: Make sysfs functional on topologies with per core sink").
In the upstream kernel, the call is removed with commit bb1860efc817
("coresight: etm: perf: Sink selection using sysfs is deprecated").
That commit alone would, however, likely not solve the problem.
It looks like at least two more commits would be needed.
716f5652a131 coresight: etm: perf: Fix warning caused by etm_setup_aux failure
8e264c52e1da coresight: core: Allow the coresight core driver to be built as a module
39a7661dcf65 coresight: Fix uninitialised pointer bug in etm_setup_aux()
Looking into the coresight code, I see several additional commits affecting
the sysfs interface since v5.4. I have no idea what would actually be needed
for stable code in v5.4.y, short of applying them all.
With all this in mind, I would suggest to revert commit 8fd52a21ab57
("coresight: Make sysfs functional on topologies with per core sink")
from v5.4.y, especially since it is not marked as bug fix or for stable.
Thanks,
Guenter
The patch titled
Subject: mm/zsmalloc: include sparsemem.h for MAX_PHYSMEM_BITS
has been added to the -mm tree. Its filename is
mm-zsmalloc-include-sparsememh-for-max_physmem_bits.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-zsmalloc-include-sparsememh-fo…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-zsmalloc-include-sparsememh-fo…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Stefan Agner <stefan(a)agner.ch>
Subject: mm/zsmalloc: include sparsemem.h for MAX_PHYSMEM_BITS
Most architectures define MAX_PHYSMEM_BITS in asm/sparsemem.h and don't
include it in asm/pgtable.h. Include asm/sparsemem.h directly to get the
MAX_PHYSMEM_BITS define on all architectures.
This fixes a crash when accessing zram on 32-bit ARM platform with LPAE and
more than 4GB of memory:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = a27bd01c
[00000000] *pgd=236a0003, *pmd=1ffa64003
Internal error: Oops: 207 [#1] SMP ARM
Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
Hardware name: BCM2711
PC is at zs_map_object+0x94/0x338
LR is at zram_bvec_rw.constprop.0+0x330/0xa64
pc : [<c0602b38>] lr : [<c0bda6a0>] psr: 60000013
sp : e376bbe0 ip : 00000000 fp : c1e2921c
r10: 00000002 r9 : c1dda730 r8 : 00000000
r7 : e8ff7a00 r6 : 00000000 r5 : 02f9ffa0 r4 : e3710000
r3 : 000fdffe r2 : c1e0ce80 r1 : ebf979a0 r0 : 00000000
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 30c5383d Table: 235c2a80 DAC: fffffffd
Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
Stack: (0xe376bbe0 to 0xe376c000)
...
[<c0602b38>] (zs_map_object) from [<c0bda6a0>] (zram_bvec_rw.constprop.0+0x330/0xa64)
[<c0bda6a0>] (zram_bvec_rw.constprop.0) from [<c0bdaf78>] (zram_submit_bio+0x1a4/0x40c)
[<c0bdaf78>] (zram_submit_bio) from [<c085806c>] (submit_bio_noacct+0xd0/0x3c8)
[<c085806c>] (submit_bio_noacct) from [<c08583b0>] (submit_bio+0x4c/0x190)
[<c08583b0>] (submit_bio) from [<c06496b4>] (submit_bh_wbc+0x188/0x1b8)
[<c06496b4>] (submit_bh_wbc) from [<c064ce98>] (__block_write_full_page+0x340/0x5e4)
[<c064ce98>] (__block_write_full_page) from [<c064d3ec>] (block_write_full_page+0x128/0x170)
[<c064d3ec>] (block_write_full_page) from [<c0591ae8>] (__writepage+0x14/0x68)
[<c0591ae8>] (__writepage) from [<c0593efc>] (write_cache_pages+0x1bc/0x494)
[<c0593efc>] (write_cache_pages) from [<c059422c>] (generic_writepages+0x58/0x8c)
[<c059422c>] (generic_writepages) from [<c0594c24>] (do_writepages+0x48/0xec)
[<c0594c24>] (do_writepages) from [<c0589330>] (__filemap_fdatawrite_range+0xf0/0x128)
[<c0589330>] (__filemap_fdatawrite_range) from [<c05894bc>] (file_write_and_wait_range+0x48/0x98)
[<c05894bc>] (file_write_and_wait_range) from [<c064f3f8>] (blkdev_fsync+0x1c/0x44)
[<c064f3f8>] (blkdev_fsync) from [<c064408c>] (do_fsync+0x3c/0x70)
[<c064408c>] (do_fsync) from [<c0400374>] (__sys_trace_return+0x0/0x2c)
Exception stack(0xe376bfa8 to 0xe376bff0)
bfa0: 0003d2e0 b6f7b6f0 00000003 00046e40 00001000 00000000
bfc0: 0003d2e0 b6f7b6f0 00000000 00000076 00000000 00000000 befcbb20 befcbb28
bfe0: b6f4e060 befcbad8 b6f23e0c b6dc4a80
Code: e5927000 e0050391 e0871005 e5918018 (e5983000)
Link: https://lkml.kernel.org/r/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.16047621…
Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Signed-off-by: Stefan Agner <stefan(a)agner.ch>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/zsmalloc.c~mm-zsmalloc-include-sparsememh-for-max_physmem_bits
+++ a/mm/zsmalloc.c
@@ -40,6 +40,7 @@
#include <linux/string.h>
#include <linux/slab.h>
#include <linux/pgtable.h>
+#include <asm/sparsemem.h>
#include <asm/tlbflush.h>
#include <linux/cpumask.h>
#include <linux/cpu.h>
_
Patches currently in -mm which might be from stefan(a)agner.ch are
mm-zsmalloc-include-sparsememh-for-max_physmem_bits.patch
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: 9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Gitweb: https://git.kernel.org/tip/9f5d1c336a10c0d24e83e40b4c1b9539f7dba627
Author: Mike Galbraith <efault(a)gmx.de>
AuthorDate: Wed, 04 Nov 2020 16:12:44 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Sat, 07 Nov 2020 22:07:04 +01:00
futex: Handle transient "ownerless" rtmutex state correctly
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
[ tglx: Wrote comment and changelog ]
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan(a)ni.com>
Signed-off-by: Mike Galbraith <efault(a)gmx.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
---
kernel/futex.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index f8614ef..ac32887 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@ retry:
}
/*
- * Since we just failed the trylock; there must be an owner.
+ * The trylock just failed, so either there is an owner or
+ * there is a higher priority waiter than this one.
*/
newowner = rt_mutex_owner(&pi_state->pi_mutex);
- BUG_ON(!newowner);
+ /*
+ * If the higher priority waiter has not yet taken over the
+ * rtmutex then newowner is NULL. We can't return here with
+ * that state because it's inconsistent vs. the user space
+ * state. So drop the locks and try again. It's a valid
+ * situation and not any different from the other retry
+ * conditions.
+ */
+ if (unlikely(!newowner)) {
+ err = -EAGAIN;
+ goto handle_err;
+ }
} else {
WARN_ON_ONCE(argowner != current);
if (oldowner == current) {
The following commit has been merged into the locking/urgent branch of tip:
Commit-ID: 63c1b4db662a0967dd7839a2fbaa5300e553901d
Gitweb: https://git.kernel.org/tip/63c1b4db662a0967dd7839a2fbaa5300e553901d
Author: Mike Galbraith <efault(a)gmx.de>
AuthorDate: Wed, 04 Nov 2020 16:12:44 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Fri, 06 Nov 2020 22:24:58 +01:00
futex: Handle transient "ownerless" rtmutex state correctly
Gratian managed to trigger the BUG_ON(!newowner) in fixup_pi_state_owner().
This is one possible chain of events leading to this:
Task Prio Operation
T1 120 lock(F)
T2 120 lock(F) -> blocks (top waiter)
T3 50 (RT) lock(F) -> boosts T1 and blocks (new top waiter)
XX timeout/ -> wakes T2
signal
T1 50 unlock(F) -> wakes T3 (rtmutex->owner == NULL, waiter bit is set)
T2 120 cleanup -> try_to_take_mutex() fails because T3 is the top waiter
and the lower priority T2 cannot steal the lock.
-> fixup_pi_state_owner() sees newowner == NULL -> BUG_ON()
The comment states that this is invalid and rt_mutex_real_owner() must
return a non NULL owner when the trylock failed, but in case of a queued
and woken up waiter rt_mutex_real_owner() == NULL is a valid transient
state. The higher priority waiter has simply not yet managed to take over
the rtmutex.
The BUG_ON() is therefore wrong and this is just another retry condition in
fixup_pi_state_owner().
Drop the locks, so that T3 can make progress, and then try the fixup again.
Gratian provided a great analysis, traces and a reproducer. The analysis is
to the point, but it confused the hell out of that tglx dude who had to
page in all the futex horrors again. Condensed version is above.
[ tglx: Wrote comment and changelog ]
Fixes: c1e2f0eaf015 ("futex: Avoid violating the 10th rule of futex")
Reported-by: Gratian Crisan <gratian.crisan(a)ni.com>
Signed-off-by: Mike Galbraith <efault(a)gmx.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87a6w6x7bb.fsf@ni.com
Link: https://lore.kernel.org/r/87sg9pkvf7.fsf@nanos.tec.linutronix.de
---
kernel/futex.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index f8614ef..7406914 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2380,10 +2380,22 @@ retry:
}
/*
- * Since we just failed the trylock; there must be an owner.
+ * The trylock just failed, so either there is an owner or
+ * there is a higher priority waiter than this one.
*/
newowner = rt_mutex_owner(&pi_state->pi_mutex);
- BUG_ON(!newowner);
+ /*
+ * If the higher priority waiter has not yet taken over the
+ * rtmutex then newowner is NULL. We can't return here with
+ * that state because it's inconsistent vs. the user space
+ * state. So drop the locks and try again. It's a valid
+ * situation and not any different from the other retry
+ * conditions.
+ */
+ if (unlikely(!newowner)) {
+ ret = -EAGAIN;
+ goto handle_err;
+ }
} else {
WARN_ON_ONCE(argowner != current);
if (oldowner == current) {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b02414c8f045ab3b9afc816c3735bc98c5c3d262 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 2 Nov 2020 15:31:27 -0500
Subject: [PATCH] ring-buffer: Fix recursion protection transitions between
interrupt context
The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.
Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.
Cc: stable(a)vger.kernel.org
Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7f45fd9d5a45..dc83b3fa9fe7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -438,14 +438,16 @@ enum {
};
/*
* Used for which event context the event is in.
- * NMI = 0
- * IRQ = 1
- * SOFTIRQ = 2
- * NORMAL = 3
+ * TRANSITION = 0
+ * NMI = 1
+ * IRQ = 2
+ * SOFTIRQ = 3
+ * NORMAL = 4
*
* See trace_recursive_lock() comment below for more details.
*/
enum {
+ RB_CTX_TRANSITION,
RB_CTX_NMI,
RB_CTX_IRQ,
RB_CTX_SOFTIRQ,
@@ -3014,10 +3016,10 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
* a bit of overhead in something as critical as function tracing,
* we use a bitmask trick.
*
- * bit 0 = NMI context
- * bit 1 = IRQ context
- * bit 2 = SoftIRQ context
- * bit 3 = normal context.
+ * bit 1 = NMI context
+ * bit 2 = IRQ context
+ * bit 3 = SoftIRQ context
+ * bit 4 = normal context.
*
* This works because this is the order of contexts that can
* preempt other contexts. A SoftIRQ never preempts an IRQ
@@ -3040,6 +3042,30 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
* The least significant bit can be cleared this way, and it
* just so happens that it is the same bit corresponding to
* the current context.
+ *
+ * Now the TRANSITION bit breaks the above slightly. The TRANSITION bit
+ * is set when a recursion is detected at the current context, and if
+ * the TRANSITION bit is already set, it will fail the recursion.
+ * This is needed because there's a lag between the changing of
+ * interrupt context and updating the preempt count. In this case,
+ * a false positive will be found. To handle this, one extra recursion
+ * is allowed, and this is done by the TRANSITION bit. If the TRANSITION
+ * bit is already set, then it is considered a recursion and the function
+ * ends. Otherwise, the TRANSITION bit is set, and that bit is returned.
+ *
+ * On the trace_recursive_unlock(), the TRANSITION bit will be the first
+ * to be cleared. Even if it wasn't the context that set it. That is,
+ * if an interrupt comes in while NORMAL bit is set and the ring buffer
+ * is called before preempt_count() is updated, since the check will
+ * be on the NORMAL bit, the TRANSITION bit will then be set. If an
+ * NMI then comes in, it will set the NMI bit, but when the NMI code
+ * does the trace_recursive_unlock() it will clear the TRANSTION bit
+ * and leave the NMI bit set. But this is fine, because the interrupt
+ * code that set the TRANSITION bit will then clear the NMI bit when it
+ * calls trace_recursive_unlock(). If another NMI comes in, it will
+ * set the TRANSITION bit and continue.
+ *
+ * Note: The TRANSITION bit only handles a single transition between context.
*/
static __always_inline int
@@ -3055,8 +3081,16 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
bit = pc & NMI_MASK ? RB_CTX_NMI :
pc & HARDIRQ_MASK ? RB_CTX_IRQ : RB_CTX_SOFTIRQ;
- if (unlikely(val & (1 << (bit + cpu_buffer->nest))))
- return 1;
+ if (unlikely(val & (1 << (bit + cpu_buffer->nest)))) {
+ /*
+ * It is possible that this was called by transitioning
+ * between interrupt context, and preempt_count() has not
+ * been updated yet. In this case, use the TRANSITION bit.
+ */
+ bit = RB_CTX_TRANSITION;
+ if (val & (1 << (bit + cpu_buffer->nest)))
+ return 1;
+ }
val |= (1 << (bit + cpu_buffer->nest));
cpu_buffer->current_context = val;
@@ -3071,8 +3105,8 @@ trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->current_context - (1 << cpu_buffer->nest);
}
-/* The recursive locking above uses 4 bits */
-#define NESTED_BITS 4
+/* The recursive locking above uses 5 bits */
+#define NESTED_BITS 5
/**
* ring_buffer_nest_start - Allow to trace while nested
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b02414c8f045ab3b9afc816c3735bc98c5c3d262 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 2 Nov 2020 15:31:27 -0500
Subject: [PATCH] ring-buffer: Fix recursion protection transitions between
interrupt context
The recursion protection of the ring buffer depends on preempt_count() to be
correct. But it is possible that the ring buffer gets called after an
interrupt comes in but before it updates the preempt_count(). This will
trigger a false positive in the recursion code.
Use the same trick from the ftrace function callback recursion code which
uses a "transition" bit that gets set, to allow for a single recursion for
to handle transitions between contexts.
Cc: stable(a)vger.kernel.org
Fixes: 567cd4da54ff4 ("ring-buffer: User context bit recursion checking")
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 7f45fd9d5a45..dc83b3fa9fe7 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -438,14 +438,16 @@ enum {
};
/*
* Used for which event context the event is in.
- * NMI = 0
- * IRQ = 1
- * SOFTIRQ = 2
- * NORMAL = 3
+ * TRANSITION = 0
+ * NMI = 1
+ * IRQ = 2
+ * SOFTIRQ = 3
+ * NORMAL = 4
*
* See trace_recursive_lock() comment below for more details.
*/
enum {
+ RB_CTX_TRANSITION,
RB_CTX_NMI,
RB_CTX_IRQ,
RB_CTX_SOFTIRQ,
@@ -3014,10 +3016,10 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
* a bit of overhead in something as critical as function tracing,
* we use a bitmask trick.
*
- * bit 0 = NMI context
- * bit 1 = IRQ context
- * bit 2 = SoftIRQ context
- * bit 3 = normal context.
+ * bit 1 = NMI context
+ * bit 2 = IRQ context
+ * bit 3 = SoftIRQ context
+ * bit 4 = normal context.
*
* This works because this is the order of contexts that can
* preempt other contexts. A SoftIRQ never preempts an IRQ
@@ -3040,6 +3042,30 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer)
* The least significant bit can be cleared this way, and it
* just so happens that it is the same bit corresponding to
* the current context.
+ *
+ * Now the TRANSITION bit breaks the above slightly. The TRANSITION bit
+ * is set when a recursion is detected at the current context, and if
+ * the TRANSITION bit is already set, it will fail the recursion.
+ * This is needed because there's a lag between the changing of
+ * interrupt context and updating the preempt count. In this case,
+ * a false positive will be found. To handle this, one extra recursion
+ * is allowed, and this is done by the TRANSITION bit. If the TRANSITION
+ * bit is already set, then it is considered a recursion and the function
+ * ends. Otherwise, the TRANSITION bit is set, and that bit is returned.
+ *
+ * On the trace_recursive_unlock(), the TRANSITION bit will be the first
+ * to be cleared. Even if it wasn't the context that set it. That is,
+ * if an interrupt comes in while NORMAL bit is set and the ring buffer
+ * is called before preempt_count() is updated, since the check will
+ * be on the NORMAL bit, the TRANSITION bit will then be set. If an
+ * NMI then comes in, it will set the NMI bit, but when the NMI code
+ * does the trace_recursive_unlock() it will clear the TRANSTION bit
+ * and leave the NMI bit set. But this is fine, because the interrupt
+ * code that set the TRANSITION bit will then clear the NMI bit when it
+ * calls trace_recursive_unlock(). If another NMI comes in, it will
+ * set the TRANSITION bit and continue.
+ *
+ * Note: The TRANSITION bit only handles a single transition between context.
*/
static __always_inline int
@@ -3055,8 +3081,16 @@ trace_recursive_lock(struct ring_buffer_per_cpu *cpu_buffer)
bit = pc & NMI_MASK ? RB_CTX_NMI :
pc & HARDIRQ_MASK ? RB_CTX_IRQ : RB_CTX_SOFTIRQ;
- if (unlikely(val & (1 << (bit + cpu_buffer->nest))))
- return 1;
+ if (unlikely(val & (1 << (bit + cpu_buffer->nest)))) {
+ /*
+ * It is possible that this was called by transitioning
+ * between interrupt context, and preempt_count() has not
+ * been updated yet. In this case, use the TRANSITION bit.
+ */
+ bit = RB_CTX_TRANSITION;
+ if (val & (1 << (bit + cpu_buffer->nest)))
+ return 1;
+ }
val |= (1 << (bit + cpu_buffer->nest));
cpu_buffer->current_context = val;
@@ -3071,8 +3105,8 @@ trace_recursive_unlock(struct ring_buffer_per_cpu *cpu_buffer)
cpu_buffer->current_context - (1 << cpu_buffer->nest);
}
-/* The recursive locking above uses 4 bits */
-#define NESTED_BITS 4
+/* The recursive locking above uses 5 bits */
+#define NESTED_BITS 5
/**
* ring_buffer_nest_start - Allow to trace while nested
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From da7d554f7c62d0c17c1ac3cc2586473c2d99f0bd Mon Sep 17 00:00:00 2001
From: Alexander Aring <aahringo(a)redhat.com>
Date: Mon, 26 Oct 2020 10:52:29 -0400
Subject: [PATCH] gfs2: Wake up when sd_glock_disposal becomes zero
Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero. As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up. Add the missing wakeup.
Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race")
Cc: stable(a)vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 5441c17562c5..d98a2e5dab9f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1078,7 +1078,8 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
out_free:
kfree(gl->gl_lksb.sb_lvbptr);
kmem_cache_free(cachep, gl);
- atomic_dec(&sdp->sd_glock_disposal);
+ if (atomic_dec_and_test(&sdp->sd_glock_disposal))
+ wake_up(&sdp->sd_glock_wait);
out:
return ret;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From da7d554f7c62d0c17c1ac3cc2586473c2d99f0bd Mon Sep 17 00:00:00 2001
From: Alexander Aring <aahringo(a)redhat.com>
Date: Mon, 26 Oct 2020 10:52:29 -0400
Subject: [PATCH] gfs2: Wake up when sd_glock_disposal becomes zero
Commit fc0e38dae645 ("GFS2: Fix glock deallocation race") fixed a
sd_glock_disposal accounting bug by adding a missing atomic_dec
statement, but it failed to wake up sd_glock_wait when that decrement
causes sd_glock_disposal to reach zero. As a consequence,
gfs2_gl_hash_clear can now run into a 10-minute timeout instead of
being woken up. Add the missing wakeup.
Fixes: fc0e38dae645 ("GFS2: Fix glock deallocation race")
Cc: stable(a)vger.kernel.org # v2.6.39+
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba(a)redhat.com>
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 5441c17562c5..d98a2e5dab9f 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1078,7 +1078,8 @@ int gfs2_glock_get(struct gfs2_sbd *sdp, u64 number,
out_free:
kfree(gl->gl_lksb.sb_lvbptr);
kmem_cache_free(cachep, gl);
- atomic_dec(&sdp->sd_glock_disposal);
+ if (atomic_dec_and_test(&sdp->sd_glock_disposal))
+ wake_up(&sdp->sd_glock_wait);
out:
return ret;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f08842098e842c51e3b97d0dcdebf810b32558e Mon Sep 17 00:00:00 2001
From: Shijie Luo <luoshijie1(a)huawei.com>
Date: Sun, 1 Nov 2020 17:07:40 -0800
Subject: [PATCH] mm: mempolicy: fix potential pte_unmap_unlock pte error
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1(a)huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Feilong Lin <linfeilong(a)huawei.com>
Cc: Shijie Luo <luoshijie1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3fde772ef5ef..3ca4898f3f24 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -525,7 +525,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long flags = qp->flags;
int ret;
bool has_unmovable = false;
- pte_t *pte;
+ pte_t *pte, *mapped_pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
@@ -539,7 +539,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
if (pmd_trans_unstable(pmd))
return 0;
- pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
if (!pte_present(*pte))
continue;
@@ -571,7 +571,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
} else
break;
}
- pte_unmap_unlock(pte - 1, ptl);
+ pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
if (has_unmovable)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f08842098e842c51e3b97d0dcdebf810b32558e Mon Sep 17 00:00:00 2001
From: Shijie Luo <luoshijie1(a)huawei.com>
Date: Sun, 1 Nov 2020 17:07:40 -0800
Subject: [PATCH] mm: mempolicy: fix potential pte_unmap_unlock pte error
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1(a)huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Feilong Lin <linfeilong(a)huawei.com>
Cc: Shijie Luo <luoshijie1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3fde772ef5ef..3ca4898f3f24 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -525,7 +525,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long flags = qp->flags;
int ret;
bool has_unmovable = false;
- pte_t *pte;
+ pte_t *pte, *mapped_pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
@@ -539,7 +539,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
if (pmd_trans_unstable(pmd))
return 0;
- pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
if (!pte_present(*pte))
continue;
@@ -571,7 +571,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
} else
break;
}
- pte_unmap_unlock(pte - 1, ptl);
+ pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
if (has_unmovable)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3f08842098e842c51e3b97d0dcdebf810b32558e Mon Sep 17 00:00:00 2001
From: Shijie Luo <luoshijie1(a)huawei.com>
Date: Sun, 1 Nov 2020 17:07:40 -0800
Subject: [PATCH] mm: mempolicy: fix potential pte_unmap_unlock pte error
When flags in queue_pages_pte_range don't have MPOL_MF_MOVE or
MPOL_MF_MOVE_ALL bits, code breaks and passing origin pte - 1 to
pte_unmap_unlock seems like not a good idea.
queue_pages_pte_range can run in MPOL_MF_MOVE_ALL mode which doesn't
migrate misplaced pages but returns with EIO when encountering such a
page. Since commit a7f40cfe3b7a ("mm: mempolicy: make mbind() return
-EIO when MPOL_MF_STRICT is specified") and early break on the first pte
in the range results in pte_unmap_unlock on an underflow pte. This can
lead to lockups later on when somebody tries to lock the pte resp.
page_table_lock again..
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Shijie Luo <luoshijie1(a)huawei.com>
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Feilong Lin <linfeilong(a)huawei.com>
Cc: Shijie Luo <luoshijie1(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20201019074853.50856-1-luoshijie1@huawei.com
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3fde772ef5ef..3ca4898f3f24 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -525,7 +525,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
unsigned long flags = qp->flags;
int ret;
bool has_unmovable = false;
- pte_t *pte;
+ pte_t *pte, *mapped_pte;
spinlock_t *ptl;
ptl = pmd_trans_huge_lock(pmd, vma);
@@ -539,7 +539,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
if (pmd_trans_unstable(pmd))
return 0;
- pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
+ mapped_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
for (; addr != end; pte++, addr += PAGE_SIZE) {
if (!pte_present(*pte))
continue;
@@ -571,7 +571,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
} else
break;
}
- pte_unmap_unlock(pte - 1, ptl);
+ pte_unmap_unlock(mapped_pte, ptl);
cond_resched();
if (has_unmovable)
The patch below does not apply to the 5.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8de15e920dc85d1705ab9c202c95d56845bc2d48 Mon Sep 17 00:00:00 2001
From: Roman Gushchin <guro(a)fb.com>
Date: Sun, 1 Nov 2020 17:07:34 -0800
Subject: [PATCH] mm: memcg: link page counters to root if use_hierarchy is
false
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Richard reported a warning which can be reproduced by running the LTP
madvise6 test (cgroup v1 in the non-hierarchical mode should be used):
WARNING: CPU: 0 PID: 12 at mm/page_counter.c:57 page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
Modules linked in:
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.9.0-rc7-22-default #77
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-48-gd9c812d-rebuilt.opensuse.org 04/01/2014
Workqueue: events drain_local_stock
RIP: 0010:page_counter_uncharge (mm/page_counter.c:57 mm/page_counter.c:50 mm/page_counter.c:156)
Call Trace:
__memcg_kmem_uncharge (mm/memcontrol.c:3022)
drain_obj_stock (./include/linux/rcupdate.h:689 mm/memcontrol.c:3114)
drain_local_stock (mm/memcontrol.c:2255)
process_one_work (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/workqueue.h:108 kernel/workqueue.c:2274)
worker_thread (./include/linux/list.h:282 kernel/workqueue.c:2416)
kthread (kernel/kthread.c:292)
ret_from_fork (arch/x86/entry/entry_64.S:300)
The problem occurs because in the non-hierarchical mode non-root page
counters are not linked to root page counters, so the charge is not
propagated to the root memory cgroup.
After the removal of the original memory cgroup and reparenting of the
object cgroup, the root cgroup might be uncharged by draining a objcg
stock, for example. It leads to an eventual underflow of the charge and
triggers a warning.
Fix it by linking all page counters to corresponding root page counters
in the non-hierarchical mode.
Please note, that in the non-hierarchical mode all objcgs are always
reparented to the root memory cgroup, even if the hierarchy has more
than 1 level. This patch doesn't change it.
The patch also doesn't affect how the hierarchical mode is working,
which is the only sane and truly supported mode now.
Thanks to Richard for reporting, debugging and providing an alternative
version of the fix!
Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API")
Reported-by: <ltp(a)lists.linux.it>
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Michal Koutný <mkoutny(a)suse.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Link: https://lkml.kernel.org/r/20201026231326.3212225-1-guro@fb.com
Debugged-by: Richard Palethorpe <rpalethorpe(a)suse.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c3b6dc7d5c94..3dcbf24d2227 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5345,17 +5345,22 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
memcg->swappiness = mem_cgroup_swappiness(parent);
memcg->oom_kill_disable = parent->oom_kill_disable;
}
- if (parent && parent->use_hierarchy) {
+ if (!parent) {
+ page_counter_init(&memcg->memory, NULL);
+ page_counter_init(&memcg->swap, NULL);
+ page_counter_init(&memcg->kmem, NULL);
+ page_counter_init(&memcg->tcpmem, NULL);
+ } else if (parent->use_hierarchy) {
memcg->use_hierarchy = true;
page_counter_init(&memcg->memory, &parent->memory);
page_counter_init(&memcg->swap, &parent->swap);
page_counter_init(&memcg->kmem, &parent->kmem);
page_counter_init(&memcg->tcpmem, &parent->tcpmem);
} else {
- page_counter_init(&memcg->memory, NULL);
- page_counter_init(&memcg->swap, NULL);
- page_counter_init(&memcg->kmem, NULL);
- page_counter_init(&memcg->tcpmem, NULL);
+ page_counter_init(&memcg->memory, &root_mem_cgroup->memory);
+ page_counter_init(&memcg->swap, &root_mem_cgroup->swap);
+ page_counter_init(&memcg->kmem, &root_mem_cgroup->kmem);
+ page_counter_init(&memcg->tcpmem, &root_mem_cgroup->tcpmem);
/*
* Deeper hierachy with use_hierarchy == false doesn't make
* much sense so let cgroup subsystem know about this
Here are backports of some fixes to the 4.14 stable branch.
I tested the blktrace fix with the script referenced in the commit
message.
I wasn't able to test the i40e changes (no hardware and no reproducer
available).
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
Here are backports of some fixes to the 4.19 stable branch.
I tested the blktrace fix with the script referenced in the commit
message.
I tested the btrfs changes with the reproducers for CVE-2019-19039,
CVE-2019-19377, and CVE-2019-19816, and checked for regressions with
xfstests.
Ben.
--
Ben Hutchings, Software Developer Codethink Ltd
https://www.codethink.co.uk/ Dale House, 35 Dale Street
Manchester, M1 2HF, United Kingdom
This is a note to let you know that I've just added the patch titled
USB: serial: cyberjack: fix write-URB completion race
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 985616f0457d9f555fff417d0da56174f70cc14f Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 26 Oct 2020 09:25:48 +0100
Subject: USB: serial: cyberjack: fix write-URB completion race
The write-URB busy flag was being cleared before the completion handler
was done with the URB, something which could lead to corrupt transfers
due to a racing write request if the URB is resubmitted.
Fixes: 507ca9bc0476 ("[PATCH] USB: add ability for usb-serial drivers to determine if their write urb is currently being used.")
Cc: stable <stable(a)vger.kernel.org> # 2.6.13
Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/cyberjack.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/serial/cyberjack.c b/drivers/usb/serial/cyberjack.c
index 821970609695..2e40908963da 100644
--- a/drivers/usb/serial/cyberjack.c
+++ b/drivers/usb/serial/cyberjack.c
@@ -357,11 +357,12 @@ static void cyberjack_write_bulk_callback(struct urb *urb)
struct device *dev = &port->dev;
int status = urb->status;
unsigned long flags;
+ bool resubmitted = false;
- set_bit(0, &port->write_urbs_free);
if (status) {
dev_dbg(dev, "%s - nonzero write bulk status received: %d\n",
__func__, status);
+ set_bit(0, &port->write_urbs_free);
return;
}
@@ -394,6 +395,8 @@ static void cyberjack_write_bulk_callback(struct urb *urb)
goto exit;
}
+ resubmitted = true;
+
dev_dbg(dev, "%s - priv->wrsent=%d\n", __func__, priv->wrsent);
dev_dbg(dev, "%s - priv->wrfilled=%d\n", __func__, priv->wrfilled);
@@ -410,6 +413,8 @@ static void cyberjack_write_bulk_callback(struct urb *urb)
exit:
spin_unlock_irqrestore(&priv->lock, flags);
+ if (!resubmitted)
+ set_bit(0, &port->write_urbs_free);
usb_serial_port_softint(port);
}
--
2.29.2
We can get a crash when disconnecting the iSCSI session,
the call trace like this:
[ffff00002a00fb70] kfree at ffff00000830e224
[ffff00002a00fba0] ses_intf_remove at ffff000001f200e4
[ffff00002a00fbd0] device_del at ffff0000086b6a98
[ffff00002a00fc50] device_unregister at ffff0000086b6d58
[ffff00002a00fc70] __scsi_remove_device at ffff00000870608c
[ffff00002a00fca0] scsi_remove_device at ffff000008706134
[ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4
[ffff00002a00fd10] scsi_remove_target at ffff0000087064c0
[ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4
[ffff00002a00fdb0] process_one_work at ffff00000810f35c
[ffff00002a00fe00] worker_thread at ffff00000810f648
[ffff00002a00fe70] kthread at ffff000008116e98
In ses_intf_add, components count can be 0, and kcalloc 0 size scomp,
but not saved at edev->component[i].scratch
In this situation, edev->component[0].scratch is an invalid pointer,
when kfree it in ses_intf_remove_enclosure, a crash like above would happen
The call trace also could be other random cases when kfree cannot detect
the invalid pointer
We should not use edev->component[] array when we get components count is 0
We also need check index when use edev->component[] array in
ses_enclosure_data_process
Tested-by: Zeng Zhicong <timmyzeng(a)163.com>
Cc: stable <stable(a)vger.kernel.org> # 2.6.25+
Signed-off-by: Ding Hui <dinghui(a)sangfor.com.cn>
---
drivers/scsi/ses.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/ses.c b/drivers/scsi/ses.c
index c2afba2a5414..f5ef0a91f0eb 100644
--- a/drivers/scsi/ses.c
+++ b/drivers/scsi/ses.c
@@ -477,9 +477,6 @@ static int ses_enclosure_find_by_addr(struct enclosure_device *edev,
int i;
struct ses_component *scomp;
- if (!edev->component[0].scratch)
- return 0;
-
for (i = 0; i < edev->components; i++) {
scomp = edev->component[i].scratch;
if (scomp->addr != efd->addr)
@@ -565,8 +562,10 @@ static void ses_enclosure_data_process(struct enclosure_device *edev,
components++,
type_ptr[0],
name);
- else
+ else if (components < edev->components)
ecomp = &edev->component[components++];
+ else
+ ecomp = ERR_PTR(-EINVAL);
if (!IS_ERR(ecomp)) {
if (addl_desc_ptr)
@@ -731,9 +730,11 @@ static int ses_intf_add(struct device *cdev,
buf = NULL;
}
page2_not_supported:
- scomp = kcalloc(components, sizeof(struct ses_component), GFP_KERNEL);
- if (!scomp)
- goto err_free;
+ if (components > 0) {
+ scomp = kcalloc(components, sizeof(struct ses_component), GFP_KERNEL);
+ if (!scomp)
+ goto err_free;
+ }
edev = enclosure_register(cdev->parent, dev_name(&sdev->sdev_gendev),
components, &ses_enclosure_callbacks);
@@ -813,7 +814,8 @@ static void ses_intf_remove_enclosure(struct scsi_device *sdev)
kfree(ses_dev->page2);
kfree(ses_dev);
- kfree(edev->component[0].scratch);
+ if (edev->components > 0)
+ kfree(edev->component[0].scratch);
put_device(&edev->edev);
enclosure_unregister(edev);
--
2.17.1
The patch titled
Subject: hugetlbfs: fix anon huge page migration race
has been added to the -mm tree. Its filename is
hugetlbfs-fix-anon-huge-page-migration-race.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/hugetlbfs-fix-anon-huge-page-migr…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/hugetlbfs-fix-anon-huge-page-migr…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlbfs: fix anon huge page migration race
Qian Cai reported the following BUG in [1]
[ 6147.019063][T45242] LTP: starting move_pages12
[ 6147.475680][T64921] BUG: unable to handle page fault for address: ffffffffffffffe0
...
[ 6147.525866][T64921] RIP: 0010:anon_vma_interval_tree_iter_first+0xa2/0x170
avc_start_pgoff at mm/interval_tree.c:63
[ 6147.620914][T64921] Call Trace:
[ 6147.624078][T64921] rmap_walk_anon+0x141/0xa30
rmap_walk_anon at mm/rmap.c:1864
[ 6147.628639][T64921] try_to_unmap+0x209/0x2d0
try_to_unmap at mm/rmap.c:1763
[ 6147.633026][T64921] ? rmap_walk_locked+0x140/0x140
[ 6147.637936][T64921] ? page_remove_rmap+0x1190/0x1190
[ 6147.643020][T64921] ? page_not_mapped+0x10/0x10
[ 6147.647668][T64921] ? page_get_anon_vma+0x290/0x290
[ 6147.652664][T64921] ? page_mapcount_is_zero+0x10/0x10
[ 6147.657838][T64921] ? hugetlb_page_mapping_lock_write+0x97/0x180
[ 6147.663972][T64921] migrate_pages+0x1005/0x1fb0
[ 6147.668617][T64921] ? remove_migration_pte+0xac0/0xac0
[ 6147.673875][T64921] move_pages_and_store_status.isra.47+0xd7/0x1a0
[ 6147.680181][T64921] ? migrate_pages+0x1fb0/0x1fb0
[ 6147.685002][T64921] __x64_sys_move_pages+0xa5c/0x1100
[ 6147.690176][T64921] ? trace_hardirqs_on+0x20/0x1b5
[ 6147.695084][T64921] ? move_pages_and_store_status.isra.47+0x1a0/0x1a0
[ 6147.701653][T64921] ? rcu_read_lock_sched_held+0xaa/0xd0
[ 6147.707088][T64921] ? switch_fpu_return+0x196/0x400
[ 6147.712083][T64921] ? lockdep_hardirqs_on_prepare+0x38c/0x550
[ 6147.717954][T64921] ? do_syscall_64+0x24/0x310
[ 6147.722513][T64921] do_syscall_64+0x5f/0x310
[ 6147.726897][T64921] ? trace_hardirqs_off+0x12/0x1a0
[ 6147.731894][T64921] ? asm_exc_page_fault+0x8/0x30
[ 6147.736714][T64921] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Hugh Dickens diagnosed this as a migration bug caused by code introduced
to use i_mmap_rwsem for pmd sharing synchronization. Specifically, the
routine unmap_and_move_huge_page() is always passing the TTU_RMAP_LOCKED
flag to try_to_unmap() while holding i_mmap_rwsem. This is wrong for
anon pages as the anon_vma_lock should be held in this case. Further
analysis suggested that i_mmap_rwsem was not required to he held at all
when calling try_to_unmap for anon pages as an anon page could never be
part of a shared pmd mapping.
Discussion also revealed that the hack in hugetlb_page_mapping_lock_write
to drop page lock and acquire i_mmap_rwsem is wrong. There is no way to
keep mapping valid while dropping page lock.
This patch does the following:
- Do not take i_mmap_rwsem and set TTU_RMAP_LOCKED for anon pages when
calling try_to_unmap.
- Remove the hacky code in hugetlb_page_mapping_lock_write. The routine
will now simply do a 'trylock' while still holding the page lock. If
the trylock fails, it will return NULL. This could impact the callers:
- migration calling code will receive -EAGAIN and retry up to the hard
coded limit (10).
- memory error code will treat the page as BUSY. This will force
killing (SIGKILL) instead of SIGBUS any mapping tasks.
Do note that this change in behavior only happens when there is a race.
None of the standard kernel testing suites actually hit this race, but
it is possible.
[1] https://lore.kernel.org/lkml/20200708012044.GC992@lca.pw/
[2] https://lore.kernel.org/linux-mm/alpine.LSU.2.11.2010071833100.2214@eggly.a…
Link: https://lkml.kernel.org/r/20201105195058.78401-1-mike.kravetz@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Qian Cai <cai(a)lca.pw>
Suggested-by: Hugh Dickins <hughd(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 90 ++----------------------------------------
mm/memory-failure.c | 36 +++++++---------
mm/migrate.c | 46 +++++++++++----------
mm/rmap.c | 5 --
4 files changed, 48 insertions(+), 129 deletions(-)
--- a/mm/hugetlb.c~hugetlbfs-fix-anon-huge-page-migration-race
+++ a/mm/hugetlb.c
@@ -1568,103 +1568,23 @@ int PageHeadHuge(struct page *page_head)
}
/*
- * Find address_space associated with hugetlbfs page.
- * Upon entry page is locked and page 'was' mapped although mapped state
- * could change. If necessary, use anon_vma to find vma and associated
- * address space. The returned mapping may be stale, but it can not be
- * invalid as page lock (which is held) is required to destroy mapping.
- */
-static struct address_space *_get_hugetlb_page_mapping(struct page *hpage)
-{
- struct anon_vma *anon_vma;
- pgoff_t pgoff_start, pgoff_end;
- struct anon_vma_chain *avc;
- struct address_space *mapping = page_mapping(hpage);
-
- /* Simple file based mapping */
- if (mapping)
- return mapping;
-
- /*
- * Even anonymous hugetlbfs mappings are associated with an
- * underlying hugetlbfs file (see hugetlb_file_setup in mmap
- * code). Find a vma associated with the anonymous vma, and
- * use the file pointer to get address_space.
- */
- anon_vma = page_lock_anon_vma_read(hpage);
- if (!anon_vma)
- return mapping; /* NULL */
-
- /* Use first found vma */
- pgoff_start = page_to_pgoff(hpage);
- pgoff_end = pgoff_start + pages_per_huge_page(page_hstate(hpage)) - 1;
- anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
- pgoff_start, pgoff_end) {
- struct vm_area_struct *vma = avc->vma;
-
- mapping = vma->vm_file->f_mapping;
- break;
- }
-
- anon_vma_unlock_read(anon_vma);
- return mapping;
-}
-
-/*
* Find and lock address space (mapping) in write mode.
*
- * Upon entry, the page is locked which allows us to find the mapping
- * even in the case of an anon page. However, locking order dictates
- * the i_mmap_rwsem be acquired BEFORE the page lock. This is hugetlbfs
- * specific. So, we first try to lock the sema while still holding the
- * page lock. If this works, great! If not, then we need to drop the
- * page lock and then acquire i_mmap_rwsem and reacquire page lock. Of
- * course, need to revalidate state along the way.
+ * Upon entry, the page is locked which means that page_mapping() is
+ * stable. Due to locking order, we can only trylock_write. If we can
+ * not get the lock, simply return NULL to caller.
*/
struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage)
{
- struct address_space *mapping, *mapping2;
+ struct address_space *mapping = page_mapping(hpage);
- mapping = _get_hugetlb_page_mapping(hpage);
-retry:
if (!mapping)
return mapping;
- /*
- * If no contention, take lock and return
- */
if (i_mmap_trylock_write(mapping))
return mapping;
- /*
- * Must drop page lock and wait on mapping sema.
- * Note: Once page lock is dropped, mapping could become invalid.
- * As a hack, increase map count until we lock page again.
- */
- atomic_inc(&hpage->_mapcount);
- unlock_page(hpage);
- i_mmap_lock_write(mapping);
- lock_page(hpage);
- atomic_add_negative(-1, &hpage->_mapcount);
-
- /* verify page is still mapped */
- if (!page_mapped(hpage)) {
- i_mmap_unlock_write(mapping);
- return NULL;
- }
-
- /*
- * Get address space again and verify it is the same one
- * we locked. If not, drop lock and retry.
- */
- mapping2 = _get_hugetlb_page_mapping(hpage);
- if (mapping2 != mapping) {
- i_mmap_unlock_write(mapping);
- mapping = mapping2;
- goto retry;
- }
-
- return mapping;
+ return NULL;
}
pgoff_t __basepage_index(struct page *page)
--- a/mm/memory-failure.c~hugetlbfs-fix-anon-huge-page-migration-race
+++ a/mm/memory-failure.c
@@ -1057,27 +1057,25 @@ static bool hwpoison_user_mappings(struc
if (!PageHuge(hpage)) {
unmap_success = try_to_unmap(hpage, ttu);
} else {
- /*
- * For hugetlb pages, try_to_unmap could potentially call
- * huge_pmd_unshare. Because of this, take semaphore in
- * write mode here and set TTU_RMAP_LOCKED to indicate we
- * have taken the lock at this higer level.
- *
- * Note that the call to hugetlb_page_mapping_lock_write
- * is necessary even if mapping is already set. It handles
- * ugliness of potentially having to drop page lock to obtain
- * i_mmap_rwsem.
- */
- mapping = hugetlb_page_mapping_lock_write(hpage);
-
- if (mapping) {
- unmap_success = try_to_unmap(hpage,
+ if (!PageAnon(hpage)) {
+ /*
+ * For hugetlb pages in shared mappings, try_to_unmap
+ * could potentially call huge_pmd_unshare. Because of
+ * this, take semaphore in write mode here and set
+ * TTU_RMAP_LOCKED to indicate we have taken the lock
+ * at this higer level.
+ */
+ mapping = hugetlb_page_mapping_lock_write(hpage);
+ if (mapping) {
+ unmap_success = try_to_unmap(hpage,
ttu|TTU_RMAP_LOCKED);
- i_mmap_unlock_write(mapping);
+ i_mmap_unlock_write(mapping);
+ } else {
+ pr_info("Memory failure: %#lx: could not lock mapping for mapped huge page\n", pfn);
+ unmap_success = false;
+ }
} else {
- pr_info("Memory failure: %#lx: could not find mapping for mapped huge page\n",
- pfn);
- unmap_success = false;
+ unmap_success = try_to_unmap(hpage, ttu);
}
}
if (!unmap_success)
--- a/mm/migrate.c~hugetlbfs-fix-anon-huge-page-migration-race
+++ a/mm/migrate.c
@@ -1328,34 +1328,38 @@ static int unmap_and_move_huge_page(new_
goto put_anon;
if (page_mapped(hpage)) {
- /*
- * try_to_unmap could potentially call huge_pmd_unshare.
- * Because of this, take semaphore in write mode here and
- * set TTU_RMAP_LOCKED to let lower levels know we have
- * taken the lock.
- */
- mapping = hugetlb_page_mapping_lock_write(hpage);
- if (unlikely(!mapping))
- goto unlock_put_anon;
-
- try_to_unmap(hpage,
- TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
- TTU_RMAP_LOCKED);
+ bool mapping_locked = false;
+ enum ttu_flags ttu = TTU_MIGRATION|TTU_IGNORE_MLOCK|
+ TTU_IGNORE_ACCESS;
+
+ if (!PageAnon(hpage)) {
+ /*
+ * In shared mappings, try_to_unmap could potentially
+ * call huge_pmd_unshare. Because of this, take
+ * semaphore in write mode here and set TTU_RMAP_LOCKED
+ * to let lower levels know we have taken the lock.
+ */
+ mapping = hugetlb_page_mapping_lock_write(hpage);
+ if (unlikely(!mapping))
+ goto unlock_put_anon;
+
+ mapping_locked = true;
+ ttu |= TTU_RMAP_LOCKED;
+ }
+
+ try_to_unmap(hpage, ttu);
page_was_mapped = 1;
- /*
- * Leave mapping locked until after subsequent call to
- * remove_migration_ptes()
- */
+
+ if (mapping_locked)
+ i_mmap_unlock_write(mapping);
}
if (!page_mapped(hpage))
rc = move_to_new_page(new_hpage, hpage, mode);
- if (page_was_mapped) {
+ if (page_was_mapped)
remove_migration_ptes(hpage,
- rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, true);
- i_mmap_unlock_write(mapping);
- }
+ rc == MIGRATEPAGE_SUCCESS ? new_hpage : hpage, false);
unlock_put_anon:
unlock_page(new_hpage);
--- a/mm/rmap.c~hugetlbfs-fix-anon-huge-page-migration-race
+++ a/mm/rmap.c
@@ -1413,9 +1413,6 @@ static bool try_to_unmap_one(struct page
/*
* If sharing is possible, start and end will be adjusted
* accordingly.
- *
- * If called for a huge page, caller must hold i_mmap_rwsem
- * in write mode as it is possible to call huge_pmd_unshare.
*/
adjust_range_if_pmd_sharing_possible(vma, &range.start,
&range.end);
@@ -1462,7 +1459,7 @@ static bool try_to_unmap_one(struct page
subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte);
address = pvmw.address;
- if (PageHuge(page)) {
+ if (PageHuge(page) && !PageAnon(page)) {
/*
* To call huge_pmd_unshare, i_mmap_rwsem must be
* held in write mode. Caller needs to explicitly
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlbfs-fix-anon-huge-page-migration-race.patch
bpftrace parses the kernel headers and uses Clang under the hood. Remove
the version check when __BPF_TRACING__ is defined (as bpftrace does) so
that this tool can continue to parse kernel headers, even with older
clang sources.
Cc: <stable(a)vger.kernel.org>
Fixes: commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1")
Reported-by: Chen Yu <yu.chen.surf(a)gmail.com>
Reported-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers(a)google.com>
---
include/linux/compiler-clang.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index dd7233c48bf3..98cff1b4b088 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -8,8 +8,10 @@
+ __clang_patchlevel__)
#if CLANG_VERSION < 100001
+#ifndef __BPF_TRACING__
# error Sorry, your version of Clang is too old - please use 10.0.1 or newer.
#endif
+#endif
/* Compiler specific definitions for Clang compiler */
--
2.29.1.341.ge80a0c044ae-goog
From: Andrew Jones <drjones(a)redhat.com>
ID registers are RAZ until they've been allocated a purpose, but
that doesn't mean they should be removed from the KVM_GET_REG_LIST
list. So far we only have one register, SYS_ID_AA64ZFR0_EL1, that
is hidden from userspace when its function, SVE, is not present.
Expose SYS_ID_AA64ZFR0_EL1 to userspace as RAZ when SVE is not
implemented. Removing the userspace visibility checks is enough
to reexpose it, as it will already return zero to userspace when
SVE is not present. The register already behaves as RAZ for the
guest when SVE is not present.
Fixes: 73433762fcae ("KVM: arm64/sve: System register context switch and access support")
Reported-by: 张东旭 <xu910121(a)sina.com>
Signed-off-by: Andrew Jones <drjones(a)redhat.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org#v5.2+
Link: https://lore.kernel.org/r/20201105091022.15373-2-drjones@redhat.com
---
arch/arm64/kvm/sys_regs.c | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 983994f01a63..3af306e6b9cd 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1193,16 +1193,6 @@ static unsigned int sve_visibility(const struct kvm_vcpu *vcpu,
return REG_HIDDEN_USER | REG_HIDDEN_GUEST;
}
-/* Visibility overrides for SVE-specific ID registers */
-static unsigned int sve_id_visibility(const struct kvm_vcpu *vcpu,
- const struct sys_reg_desc *rd)
-{
- if (vcpu_has_sve(vcpu))
- return 0;
-
- return REG_HIDDEN_USER;
-}
-
/* Generate the emulated ID_AA64ZFR0_EL1 value exposed to the guest */
static u64 guest_id_aa64zfr0_el1(const struct kvm_vcpu *vcpu)
{
@@ -1229,9 +1219,6 @@ static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
{
u64 val;
- if (WARN_ON(!vcpu_has_sve(vcpu)))
- return -ENOENT;
-
val = guest_id_aa64zfr0_el1(vcpu);
return reg_to_user(uaddr, &val, reg->id);
}
@@ -1244,9 +1231,6 @@ static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
int err;
u64 val;
- if (WARN_ON(!vcpu_has_sve(vcpu)))
- return -ENOENT;
-
err = reg_from_user(&val, uaddr, id);
if (err)
return err;
@@ -1509,7 +1493,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
ID_SANITISED(ID_AA64PFR1_EL1),
ID_UNALLOCATED(4,2),
ID_UNALLOCATED(4,3),
- { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = sve_id_visibility },
+ { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, },
ID_UNALLOCATED(4,5),
ID_UNALLOCATED(4,6),
ID_UNALLOCATED(4,7),
--
2.28.0
This is a note to let you know that I've just added the patch titled
serial: txx9: add missing platform_driver_unregister() on error in
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 0c5fc92622ed5531ff324b20f014e9e3092f0187 Mon Sep 17 00:00:00 2001
From: Qinglang Miao <miaoqinglang(a)huawei.com>
Date: Tue, 3 Nov 2020 16:49:42 +0800
Subject: serial: txx9: add missing platform_driver_unregister() on error in
serial_txx9_init
Add the missing platform_driver_unregister() before return
from serial_txx9_init in the error handling case when failed
to register serial_txx9_pci_driver with macro ENABLE_SERIAL_TXX9_PCI
defined.
Fixes: ab4382d27412 ("tty: move drivers/serial/ to drivers/tty/serial/")
Signed-off-by: Qinglang Miao <miaoqinglang(a)huawei.com>
Link: https://lore.kernel.org/r/20201103084942.109076-1-miaoqinglang@huawei.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/serial_txx9.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/serial_txx9.c b/drivers/tty/serial/serial_txx9.c
index b4d89e31730e..7a07e7272de1 100644
--- a/drivers/tty/serial/serial_txx9.c
+++ b/drivers/tty/serial/serial_txx9.c
@@ -1280,6 +1280,9 @@ static int __init serial_txx9_init(void)
#ifdef ENABLE_SERIAL_TXX9_PCI
ret = pci_register_driver(&serial_txx9_pci_driver);
+ if (ret) {
+ platform_driver_unregister(&serial_txx9_plat_driver);
+ }
#endif
if (ret == 0)
goto out;
--
2.29.2
This is a note to let you know that I've just added the patch titled
tty: fix crash in release_tty if tty->port is not set
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 4466d6d2f80c1193e0845d110277c56da77a6418 Mon Sep 17 00:00:00 2001
From: Matthias Reichl <hias(a)horus.com>
Date: Thu, 5 Nov 2020 13:34:32 +0100
Subject: tty: fix crash in release_tty if tty->port is not set
Commit 2ae0b31e0face ("tty: don't crash in tty_init_dev when missing
tty_port") didn't fully prevent the crash as the cleanup path in
tty_init_dev() calls release_tty() which dereferences tty->port
without checking it for non-null.
Add tty->port checks to release_tty to avoid the kernel crash.
Fixes: 2ae0b31e0face ("tty: don't crash in tty_init_dev when missing tty_port")
Signed-off-by: Matthias Reichl <hias(a)horus.com>
Link: https://lore.kernel.org/r/20201105123432.4448-1-hias@horus.com
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/tty_io.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index 7a4c02548fb3..9f8b9a567b35 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1515,10 +1515,12 @@ static void release_tty(struct tty_struct *tty, int idx)
tty->ops->shutdown(tty);
tty_save_termios(tty);
tty_driver_remove_tty(tty->driver, tty);
- tty->port->itty = NULL;
+ if (tty->port)
+ tty->port->itty = NULL;
if (tty->link)
tty->link->port->itty = NULL;
- tty_buffer_cancel_work(tty->port);
+ if (tty->port)
+ tty_buffer_cancel_work(tty->port);
if (tty->link)
tty_buffer_cancel_work(tty->link->port);
--
2.29.2
This is a note to let you know that I've just added the patch titled
tty: serial: imx: enable earlycon by default if IMX_SERIAL_CONSOLE is
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 427627a23c3e86e31113f9db9bfdca41698a0ee5 Mon Sep 17 00:00:00 2001
From: Lucas Stach <l.stach(a)pengutronix.de>
Date: Thu, 5 Nov 2020 21:40:26 +0100
Subject: tty: serial: imx: enable earlycon by default if IMX_SERIAL_CONSOLE is
enabled
Since 699cc4dfd140 (tty: serial: imx: add imx earlycon driver), the earlycon
part of imx serial is a separate driver and isn't necessarily enabled anymore
when the console is enabled. This causes users to loose the earlycon
functionality when upgrading their kenrel configuration via oldconfig.
Enable earlycon by default when IMX_SERIAL_CONSOLE is enabled.
Fixes: 699cc4dfd140 (tty: serial: imx: add imx earlycon driver)
Reviewed-by: Fabio Estevam <festevam(a)gmail.com>
Reviewed-by: Fugang Duan <fugang.duan(a)nxp.com>
Signed-off-by: Lucas Stach <l.stach(a)pengutronix.de>
Link: https://lore.kernel.org/r/20201105204026.1818219-1-l.stach@pengutronix.de
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/tty/serial/Kconfig b/drivers/tty/serial/Kconfig
index 1044fc387691..28f22e58639c 100644
--- a/drivers/tty/serial/Kconfig
+++ b/drivers/tty/serial/Kconfig
@@ -522,6 +522,7 @@ config SERIAL_IMX_EARLYCON
depends on OF
select SERIAL_EARLYCON
select SERIAL_CORE_CONSOLE
+ default y if SERIAL_IMX_CONSOLE
help
If you have enabled the earlycon on the Freescale IMX
CPU you can make it the earlycon by answering Y to this option.
--
2.29.2
This is a note to let you know that I've just added the patch titled
serial: 8250_mtk: Fix uart_get_baud_rate warning
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 912ab37c798770f21b182d656937072b58553378 Mon Sep 17 00:00:00 2001
From: Claire Chang <tientzu(a)chromium.org>
Date: Mon, 2 Nov 2020 20:07:49 +0800
Subject: serial: 8250_mtk: Fix uart_get_baud_rate warning
Mediatek 8250 port supports speed higher than uartclk / 16. If the baud
rates in both the new and the old termios setting are higher than
uartclk / 16, the WARN_ON in uart_get_baud_rate() will be triggered.
Passing NULL as the old termios so uart_get_baud_rate() will use
uartclk / 16 - 1 as the new baud rate which will be replaced by the
original baud rate later by tty_termios_encode_baud_rate() in
mtk8250_set_termios().
Fixes: 551e553f0d4a ("serial: 8250_mtk: Fix high-speed baud rates clamping")
Signed-off-by: Claire Chang <tientzu(a)chromium.org>
Link: https://lore.kernel.org/r/20201102120749.374458-1-tientzu@chromium.org
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/serial/8250/8250_mtk.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index 41f4120abdf2..fa876e2c13e5 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -317,7 +317,7 @@ mtk8250_set_termios(struct uart_port *port, struct ktermios *termios,
*/
baud = tty_termios_baud_rate(termios);
- serial8250_do_set_termios(port, termios, old);
+ serial8250_do_set_termios(port, termios, NULL);
tty_termios_encode_baud_rate(termios, baud, baud);
--
2.29.2
This is a note to let you know that I've just added the patch titled
staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From aee9dccc5b64e878cf1b18207436e73f66d74157 Mon Sep 17 00:00:00 2001
From: Brian O'Keefe <bokeefe(a)alum.wpi.edu>
Date: Fri, 6 Nov 2020 10:10:34 -0500
Subject: staging: rtl8723bs: Add 024c:0627 to the list of SDIO device-ids
Add 024c:0627 to the list of SDIO device-ids, based on hardware found in
the wild. This hardware exists on at least some Acer SW1-011 tablets.
Signed-off-by: Brian O'Keefe <bokeefe(a)alum.wpi.edu>
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/b9e1523f-2ba7-fb82-646a-37f095b4440e@alum.wpi.edu
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/rtl8723bs/os_dep/sdio_intf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/rtl8723bs/os_dep/sdio_intf.c b/drivers/staging/rtl8723bs/os_dep/sdio_intf.c
index 79b55ec827a4..b2208e5f190a 100644
--- a/drivers/staging/rtl8723bs/os_dep/sdio_intf.c
+++ b/drivers/staging/rtl8723bs/os_dep/sdio_intf.c
@@ -20,6 +20,7 @@ static const struct sdio_device_id sdio_ids[] = {
{ SDIO_DEVICE(0x024c, 0x0525), },
{ SDIO_DEVICE(0x024c, 0x0623), },
{ SDIO_DEVICE(0x024c, 0x0626), },
+ { SDIO_DEVICE(0x024c, 0x0627), },
{ SDIO_DEVICE(0x024c, 0xb723), },
{ /* end: all zeroes */ },
};
--
2.29.2
The patch below does not apply to the 5.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f5449e74802c1112dea984aec8af7a33c4516af1 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)ziepe.ca>
Date: Mon, 14 Sep 2020 08:59:56 -0300
Subject: [PATCH] RDMA/ucma: Rework ucma_migrate_id() to avoid races with
destroy
ucma_destroy_id() assumes that all things accessing the ctx will do so via
the xarray. This assumption violated only in the case the FD is being
closed, then the ctx is reached via the ctx_list. Normally this is OK
since ucma_destroy_id() cannot run concurrenty with release(), however
with ucma_migrate_id() is involved this can violated as the close of the
2nd FD can run concurrently with destroy on the first:
CPU0 CPU1
ucma_destroy_id(fda)
ucma_migrate_id(fda -> fdb)
ucma_get_ctx()
xa_lock()
_ucma_find_context()
xa_erase()
xa_unlock()
xa_lock()
ctx->file = new_file
list_move()
xa_unlock()
ucma_put_ctx()
ucma_close(fdb)
_destroy_id()
kfree(ctx)
_destroy_id()
wait_for_completion()
// boom, ctx was freed
The ctx->file must be modified under the handler and xa_lock, and prior to
modification the ID must be rechecked that it is still reachable from
cur_file, ie there is no parallel destroy or migrate.
To make this work remove the double locking and streamline the control
flow. The double locking was obsoleted by the handler lock now directly
preventing new uevents from being created, and the ctx_list cannot be read
while holding fgets on both files. Removing the double locking also
removes the need to check for the same file.
Fixes: 88314e4dda1e ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvid…
Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c(a)syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index a5595bd489b0..b3a7dbb12f25 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1587,45 +1587,15 @@ static ssize_t ucma_leave_multicast(struct ucma_file *file,
return ret;
}
-static void ucma_lock_files(struct ucma_file *file1, struct ucma_file *file2)
-{
- /* Acquire mutex's based on pointer comparison to prevent deadlock. */
- if (file1 < file2) {
- mutex_lock(&file1->mut);
- mutex_lock_nested(&file2->mut, SINGLE_DEPTH_NESTING);
- } else {
- mutex_lock(&file2->mut);
- mutex_lock_nested(&file1->mut, SINGLE_DEPTH_NESTING);
- }
-}
-
-static void ucma_unlock_files(struct ucma_file *file1, struct ucma_file *file2)
-{
- if (file1 < file2) {
- mutex_unlock(&file2->mut);
- mutex_unlock(&file1->mut);
- } else {
- mutex_unlock(&file1->mut);
- mutex_unlock(&file2->mut);
- }
-}
-
-static void ucma_move_events(struct ucma_context *ctx, struct ucma_file *file)
-{
- struct ucma_event *uevent, *tmp;
-
- list_for_each_entry_safe(uevent, tmp, &ctx->file->event_list, list)
- if (uevent->ctx == ctx)
- list_move_tail(&uevent->list, &file->event_list);
-}
-
static ssize_t ucma_migrate_id(struct ucma_file *new_file,
const char __user *inbuf,
int in_len, int out_len)
{
struct rdma_ucm_migrate_id cmd;
struct rdma_ucm_migrate_resp resp;
+ struct ucma_event *uevent, *tmp;
struct ucma_context *ctx;
+ LIST_HEAD(event_list);
struct fd f;
struct ucma_file *cur_file;
int ret = 0;
@@ -1641,43 +1611,52 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
ret = -EINVAL;
goto file_put;
}
+ cur_file = f.file->private_data;
/* Validate current fd and prevent destruction of id. */
- ctx = ucma_get_ctx(f.file->private_data, cmd.id);
+ ctx = ucma_get_ctx(cur_file, cmd.id);
if (IS_ERR(ctx)) {
ret = PTR_ERR(ctx);
goto file_put;
}
rdma_lock_handler(ctx->cm_id);
- cur_file = ctx->file;
- if (cur_file == new_file) {
- mutex_lock(&cur_file->mut);
- resp.events_reported = ctx->events_reported;
- mutex_unlock(&cur_file->mut);
- goto response;
- }
-
/*
- * Migrate events between fd's, maintaining order, and avoiding new
- * events being added before existing events.
+ * ctx->file can only be changed under the handler & xa_lock. xa_load()
+ * must be checked again to ensure the ctx hasn't begun destruction
+ * since the ucma_get_ctx().
*/
- ucma_lock_files(cur_file, new_file);
xa_lock(&ctx_table);
-
- list_move_tail(&ctx->list, &new_file->ctx_list);
- ucma_move_events(ctx, new_file);
+ if (_ucma_find_context(cmd.id, cur_file) != ctx) {
+ xa_unlock(&ctx_table);
+ ret = -ENOENT;
+ goto err_unlock;
+ }
ctx->file = new_file;
+ xa_unlock(&ctx_table);
+
+ mutex_lock(&cur_file->mut);
+ list_del(&ctx->list);
+ /*
+ * At this point lock_handler() prevents addition of new uevents for
+ * this ctx.
+ */
+ list_for_each_entry_safe(uevent, tmp, &cur_file->event_list, list)
+ if (uevent->ctx == ctx)
+ list_move_tail(&uevent->list, &event_list);
resp.events_reported = ctx->events_reported;
+ mutex_unlock(&cur_file->mut);
- xa_unlock(&ctx_table);
- ucma_unlock_files(cur_file, new_file);
+ mutex_lock(&new_file->mut);
+ list_add_tail(&ctx->list, &new_file->ctx_list);
+ list_splice_tail(&event_list, &new_file->event_list);
+ mutex_unlock(&new_file->mut);
-response:
if (copy_to_user(u64_to_user_ptr(cmd.response),
&resp, sizeof(resp)))
ret = -EFAULT;
+err_unlock:
rdma_unlock_handler(ctx->cm_id);
ucma_put_ctx(ctx);
file_put:
Option "mediatek,str-clock-on" means to keep clock on during system
suspend and resume. Some platform will flush register settings if clock has
been disabled when system is suspended. Set this option to avoid clock off.
Fixes: 0cbd4b34cda9 ("xhci: mediatek: support MTK xHCI host controller")
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
Cc: stable(a)vger.kernel.org
---
Changes for v3:
- Remove unnecessary Change-Id in commit message.
- Add "Fixes" tag as a bug fix on phone system.
Changes for v2:
- Rename "mediatek,keep-clock-on" to "mediatek,str-clock-on" which implies
this option related to STR functions.
- After discussion with Chunfeng, resend dt-bindings descritption based on
mediatek,mtk-xhci.txt instead of yaml format.
.../devicetree/bindings/usb/mediatek,mtk-xhci.txt | 3 +++
1 file changed, 3 insertions(+)
diff --git a/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.txt b/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.txt
index 42d8814..fc93bcf 100644
--- a/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.txt
+++ b/Documentation/devicetree/bindings/usb/mediatek,mtk-xhci.txt
@@ -37,6 +37,9 @@ Required properties:
Optional properties:
- wakeup-source : enable USB remote wakeup;
+ - mediatek,str-clock-on: Keep clock on during system suspend and resume.
+ Some platform will flush register settings if clock has been disabled
+ when system is suspended.
- mediatek,syscon-wakeup : phandle to syscon used to access the register
of the USB wakeup glue layer between xHCI and SPM; it depends on
"wakeup-source", and has two arguments:
--
1.7.9.5
Commit eddd2a4c675c ("staging: comedi: cb_pcidas: refactor
write_calibration_bitstream()") inadvertently removed one of the
`udelay(1)` calls when writing to the calibration register in
`cb_pcidas_calib_write()`. Reinstate the delay. It may seem strange
that the delay is placed before the register write, but this function is
called in a loop so the extra delay can make a difference.
This _might_ solve reported issues reading analog inputs on a
PCIe-DAS1602/16 card where the analog input values "were scaled in a
strange way that didn't make sense". On the same hardware running a
system with a 3.13 kernel, and then a system with a 4.4 kernel, but with
the same application software, the system with the 3.13 kernel was fine,
but the one with the 4.4 kernel exhibited the problem. Of the 90
changes to the driver between those kernel versions, this change looked
like the most likely culprit.
Fixes: eddd2a4c675c ("staging: comedi: cb_pcidas: refactor write_calibration_bitstream()")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/cb_pcidas.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/comedi/drivers/cb_pcidas.c b/drivers/staging/comedi/drivers/cb_pcidas.c
index 48ec2ee953dc..4f2ac39aa619 100644
--- a/drivers/staging/comedi/drivers/cb_pcidas.c
+++ b/drivers/staging/comedi/drivers/cb_pcidas.c
@@ -529,6 +529,7 @@ static void cb_pcidas_calib_write(struct comedi_device *dev,
if (trimpot) {
/* select trimpot */
calib_bits |= PCIDAS_CALIB_TRIM_SEL;
+ udelay(1);
outw(calib_bits, devpriv->pcibar1 + PCIDAS_CALIB_REG);
}
--
2.28.0
We were relying on GNU ld's ability to re-link executable files in order
to extract our VDSO symbols. This behavior was deemed a bug as of
binutils-2.35 (specifically the binutils-gdb commit a87e1817a4 ("Have
the linker fail if any attempt to link in an executable is made."), but as that
has been backported to at least Debian's binutils-2.34 in may manifest in other
places.
The previous version of this was a bit of a mess: we were linking a
static executable version of the VDSO, containing only a subset of the
input symbols, which we then linked into the kernel. This worked, but
certainly wasn't a supported path through the toolchain. Instead this
new version parses the textual output of nm to produce a symbol table.
Both rely on near-zero addresses being linkable, but as we rely on weak
undefined symbols being linkable elsewhere I don't view this as a major
issue.
Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API")
Cc: clang-built-linux(a)googlegroups.com
Cc: stable(a)vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt(a)google.com>
---
Changes since v2 <20201019235630.762886-1-palmerdabbelt(a)google.com>:
* Uses $(srctree)/$(src) to allow for out-of-tree builds.
Changes since v1 <20201017002500.503011-1-palmerdabbelt(a)google.com>:
* Uses $(NM) instead of $(CROSS_COMPILE)nm. We use the $(CROSS_COMPILE) form
elsewhere in this file, but we'll fix that later.
* Removed the unnecesary .map file creation.
---
arch/riscv/kernel/vdso/.gitignore | 1 +
arch/riscv/kernel/vdso/Makefile | 17 ++++++++---------
arch/riscv/kernel/vdso/so2s.sh | 6 ++++++
3 files changed, 15 insertions(+), 9 deletions(-)
create mode 100755 arch/riscv/kernel/vdso/so2s.sh
diff --git a/arch/riscv/kernel/vdso/.gitignore b/arch/riscv/kernel/vdso/.gitignore
index 11ebee9e4c1d..3a19def868ec 100644
--- a/arch/riscv/kernel/vdso/.gitignore
+++ b/arch/riscv/kernel/vdso/.gitignore
@@ -1,3 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
vdso.lds
*.tmp
+vdso-syms.S
diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
index 478e7338ddc1..a8ecf102e09b 100644
--- a/arch/riscv/kernel/vdso/Makefile
+++ b/arch/riscv/kernel/vdso/Makefile
@@ -43,19 +43,14 @@ $(obj)/vdso.o: $(obj)/vdso.so
SYSCFLAGS_vdso.so.dbg = $(c_flags)
$(obj)/vdso.so.dbg: $(src)/vdso.lds $(obj-vdso) FORCE
$(call if_changed,vdsold)
+SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \
+ -Wl,--build-id -Wl,--hash-style=both
# We also create a special relocatable object that should mirror the symbol
# table and layout of the linked DSO. With ld --just-symbols we can then
# refer to these symbols in the kernel code rather than hand-coded addresses.
-
-SYSCFLAGS_vdso.so.dbg = -shared -s -Wl,-soname=linux-vdso.so.1 \
- -Wl,--build-id -Wl,--hash-style=both
-$(obj)/vdso-dummy.o: $(src)/vdso.lds $(obj)/rt_sigreturn.o FORCE
- $(call if_changed,vdsold)
-
-LDFLAGS_vdso-syms.o := -r --just-symbols
-$(obj)/vdso-syms.o: $(obj)/vdso-dummy.o FORCE
- $(call if_changed,ld)
+$(obj)/vdso-syms.S: $(obj)/vdso.so FORCE
+ $(call if_changed,so2s)
# strip rule for the .so file
$(obj)/%.so: OBJCOPYFLAGS := -S
@@ -73,6 +68,10 @@ quiet_cmd_vdsold = VDSOLD $@
$(patsubst %, -G __vdso_%, $(vdso-syms)) $@.tmp $@ && \
rm $@.tmp
+# Extracts
+quiet_cmd_so2s = SO2S $@
+ cmd_so2s = $(NM) -D $< | $(srctree)/$(src)/so2s.sh > $@
+
# install commands for the unstripped file
quiet_cmd_vdso_install = INSTALL $@
cmd_vdso_install = cp $(obj)/$@.dbg $(MODLIB)/vdso/$@
diff --git a/arch/riscv/kernel/vdso/so2s.sh b/arch/riscv/kernel/vdso/so2s.sh
new file mode 100755
index 000000000000..3c5b43207658
--- /dev/null
+++ b/arch/riscv/kernel/vdso/so2s.sh
@@ -0,0 +1,6 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0+
+# Copyright 2020 Palmer Dabbelt <palmerdabbelt(a)google.com>
+
+sed 's!\([0-9a-f]*\) T \([a-z0-9_]*\)@@LINUX_4.15!.global \2\n.set \2,0x\1!' \
+| grep '^\.'
--
2.29.0.rc1.297.gfa9743e501-goog