On Tue, Jul 16, 2019 at 3:27 AM Sasha Levin <sashal(a)kernel.org> wrote:
> v5.2.1: Build OK!
> v5.1.18: Failed to apply! Possible dependencies:
(...)
> How should we proceed with this patch?
Only apply it to 5.2.y once upstream.
Thanks!
Linus Walleij
This reverts commit fbbf145a0e0a0177e089c52275fbfa55763e7d1d.
It seems I was misguided in my fixup, which was working at the
time but did not work on the final v5.2.
The patch tried to avoid a quirk the gpiolib code not to treat
"spi-gpio" CS gpios "special" by enforcing them to be active
low, in the belief that since the "spi-gpio" driver was
parsing the device tree on its own, it did not care to inspect
the "spi-cs-high" attribute on the device nodes.
That's wrong. The SPI core was inspecting them inside the
of_spi_parse_dt() funtion and setting SPI_CS_HIGH on the
nodes, and the driver inspected this flag when driving the
line.
As of now, the core handles the GPIO and it will consistently
set the GPIO descriptor to 1 to enable CS, strictly requireing
the gpiolib to invert it. And the gpiolib should indeed
enforce active low on the CS line.
Device trees should of course put the right flag on the GPIO
handles, but it used to not matter. If we don't enforce active
low on "gpio-gpio" we may run into ABI backward compatibility
issues, so revert this.
Cc: linux-spi(a)vger.kernel.org
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
I am sorry that this at one point fixed a problem for me, it
doesn't anymore and I don't know why it ever did. :(
---
drivers/gpio/gpiolib-of.c | 9 +--------
1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c
index f974075ff00e..a8f02f551d6b 100644
--- a/drivers/gpio/gpiolib-of.c
+++ b/drivers/gpio/gpiolib-of.c
@@ -118,15 +118,8 @@ static void of_gpio_flags_quirks(struct device_node *np,
* Legacy handling of SPI active high chip select. If we have a
* property named "cs-gpios" we need to inspect the child node
* to determine if the flags should have inverted semantics.
- *
- * This does not apply to an SPI device named "spi-gpio", because
- * these have traditionally obtained their own GPIOs by parsing
- * the device tree directly and did not respect any "spi-cs-high"
- * property on the SPI bus children.
*/
- if (IS_ENABLED(CONFIG_SPI_MASTER) &&
- !strcmp(propname, "cs-gpios") &&
- !of_device_is_compatible(np, "spi-gpio") &&
+ if (IS_ENABLED(CONFIG_SPI_MASTER) && !strcmp(propname, "cs-gpios") &&
of_property_read_bool(np, "cs-gpios")) {
struct device_node *child;
u32 cs;
--
2.21.0
The SPI to the display on the DIR-685 is active low, we were
just saved by the SPI library enforcing active low on everything
before, so set it as active low to avoid ambiguity.
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
ARM SoC folks: please apply this directly to fixes.
---
arch/arm/boot/dts/gemini-dlink-dir-685.dts | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/gemini-dlink-dir-685.dts b/arch/arm/boot/dts/gemini-dlink-dir-685.dts
index 3613f05f8a80..bfaa2de63a10 100644
--- a/arch/arm/boot/dts/gemini-dlink-dir-685.dts
+++ b/arch/arm/boot/dts/gemini-dlink-dir-685.dts
@@ -64,7 +64,7 @@
gpio-sck = <&gpio1 5 GPIO_ACTIVE_HIGH>;
gpio-miso = <&gpio1 8 GPIO_ACTIVE_HIGH>;
gpio-mosi = <&gpio1 7 GPIO_ACTIVE_HIGH>;
- cs-gpios = <&gpio0 20 GPIO_ACTIVE_HIGH>;
+ cs-gpios = <&gpio0 20 GPIO_ACTIVE_LOW>;
num-chipselects = <1>;
panel: display@0 {
--
2.21.0
When testing with a device which uses the drm/udl driver, KASAN shows
that on hot-remove we have a use-after-free:
==================================================================
BUG: KASAN: use-after-free in do_raw_spin_lock+0x1c/0xdd
Read of size 4 at addr ffff88841863668c by task kworker/2:3/2085
CPU: 2 PID: 2085 Comm: kworker/2:3 Tainted: G U W 4.19.59-dirty #15
Hardware name: GOOGLE Samus, BIOS Google_Samus.6300.276.0 08/17/2016
Workqueue: events drm_mode_rmfb_work_fn
Call Trace:
dump_stack+0x62/0x8b
print_address_description+0x80/0x2d2
? do_raw_spin_lock+0x1c/0xdd
kasan_report+0x26a/0x2aa
do_raw_spin_lock+0x1c/0xdd
_raw_spin_lock_irqsave+0x19/0x1f
down_timeout+0x19/0x58
udl_get_urb+0x3d/0x13c
? udl_crtc_dpms+0x2e/0x270
udl_crtc_dpms+0x45/0x270
__drm_helper_disable_unused_functions+0xed/0x150
drm_crtc_helper_set_config+0x214/0xf25
? ___slab_alloc.constprop.75+0xdd/0x48c
? drm_modeset_lock_all+0x33/0xbb
? ___might_sleep+0x80/0x1b6
__drm_mode_set_config_internal+0x103/0x22c
drm_crtc_force_disable+0x4e/0x69
drm_framebuffer_remove+0x169/0x508
? do_raw_spin_unlock+0xd4/0xde
? mmdrop+0x16/0x29
drm_mode_rmfb_work_fn+0x8d/0x9b
process_one_work+0x309/0x4df
worker_thread+0x369/0x447
? create_worker+0x2f1/0x2f1
kthread+0x223/0x233
? kthread_worker_fn+0x29c/0x29c
ret_from_fork+0x35/0x40
Allocated by task 2085:
kasan_kmalloc+0x99/0xa8
kmem_cache_alloc_trace+0x105/0x12b
udl_driver_load+0x52/0x776
drm_dev_register+0x151/0x2d6
udl_usb_probe+0x4f/0xa6
usb_probe_interface+0x25e/0x311
really_probe+0x1f1/0x3ee
driver_probe_device+0xd6/0x112
bus_for_each_drv+0xbb/0xe2
__device_attach+0xdb/0x159
bus_probe_device+0x5a/0x100
device_add+0x4bf/0x847
usb_set_configuration+0x972/0x9df
generic_probe+0x45/0x77
really_probe+0x1f1/0x3ee
driver_probe_device+0xd6/0x112
bus_for_each_drv+0xbb/0xe2
__device_attach+0xdb/0x159
bus_probe_device+0x5a/0x100
device_add+0x4bf/0x847
usb_new_device+0x540/0x6ba
hub_event+0x1017/0x161c
process_one_work+0x309/0x4df
worker_thread+0x2de/0x447
kthread+0x223/0x233
ret_from_fork+0x35/0x40
Freed by task 2085:
__kasan_slab_free+0x102/0x126
slab_free_freelist_hook+0x4d/0x9d
kfree+0x127/0x1bd
drm_dev_unregister+0xae/0x167
drm_dev_unplug+0x2e/0x38
usb_unbind_interface+0xc5/0x2be
device_release_driver_internal+0x229/0x381
bus_remove_device+0x1a2/0x1cd
device_del+0x26b/0x42c
usb_disable_device+0x112/0x2c9
usb_disconnect+0xed/0x28c
usb_disconnect+0xde/0x28c
hub_event+0x7eb/0x161c
process_one_work+0x309/0x4df
worker_thread+0x2de/0x447
kthread+0x223/0x233
ret_from_fork+0x35/0x40
The buggy address belongs to the object at ffff888418636600
which belongs to the cache kmalloc-2048 of size 2048
The buggy address is located 140 bytes inside of
2048-byte region [ffff888418636600, ffff888418636e00)
The buggy address belongs to the page:
page:ffffea0010618c00 count:1 mapcount:0 mapping:ffff88842d403040 index:0x0 compound_mapcount: 0
flags: 0x8000000000008100(slab|head)
raw: 8000000000008100 dead000000000100 dead000000000200 ffff88842d403040
raw: 0000000000000000 00000000000f000f 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff888418636580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff888418636600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888418636680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff888418636700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff888418636780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taint
[drm] wait for urb interrupted: ffffffc2 available: 4
This happens 100% of the time and is resolved by the following patch
upstream:
commit 6ecac85eadb9 ("drm/udl: move to embedding drm device inside udl device.")
This patch is the third in this series, and requires the first two
patches as dependencies. All three were clean cherry-picks on top of
v4.19.59.
Dave Airlie (2):
drm/udl: introduce a macro to convert dev to udl.
drm/udl: move to embedding drm device inside udl device.
Thomas Zimmermann (1):
drm/udl: Replace drm_dev_unref with drm_dev_put
drivers/gpu/drm/udl/udl_drv.c | 56 +++++++++++++++++++++++++++-------
drivers/gpu/drm/udl/udl_drv.h | 9 +++---
drivers/gpu/drm/udl/udl_fb.c | 12 ++++----
drivers/gpu/drm/udl/udl_gem.c | 2 +-
drivers/gpu/drm/udl/udl_main.c | 35 ++++++---------------
5 files changed, 66 insertions(+), 48 deletions(-)
--
2.22.0.510.g264f2c817a-goog
Since commit ed194d136769 ("usb: core: remove local_irq_save() around
->complete() handler") the handler rt2x00usb_interrupt_rxdone() is
not running with interrupts disabled anymore. So this completion handler
is not guaranteed to run completely before workqueue processing starts
for the same queue entry.
Be sure to set all other flags in the entry correctly before marking
this entry ready for workqueue processing. This way we cannot miss error
conditions that need to be signalled from the completion handler to the
worker thread.
Note that rt2x00usb_work_rxdone() processes all available entries, not
only such for which queue_work() was called.
This patch is similar to what commit df71c9cfceea ("rt2x00: fix order
of entry flags modification") did for TX processing.
This fixes a regression on a RT5370 based wifi stick in AP mode, which
suddenly stopped data transmission after some period of heavy load. Also
stopping the hanging hostapd resulted in the error message "ieee80211
phy0: rt2x00queue_flush_queue: Warning - Queue 14 failed to flush".
Other operation modes are probably affected as well, this just was
the used testcase.
Fixes: ed194d136769 ("usb: core: remove local_irq_save() around ->complete() handler")
Cc: stable(a)vger.kernel.org # 4.20+
Signed-off-by: Soeren Moch <smoch(a)web.de>
---
Changes in v2:
complete rework
Cc: Stanislaw Gruszka <sgruszka(a)redhat.com>
Cc: Helmut Schaa <helmut.schaa(a)googlemail.com>
Cc: Kalle Valo <kvalo(a)codeaurora.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: linux-wireless(a)vger.kernel.org
Cc: netdev(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
---
drivers/net/wireless/ralink/rt2x00/rt2x00usb.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
index 67b81c7221c4..7e3a621b9c0d 100644
--- a/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
+++ b/drivers/net/wireless/ralink/rt2x00/rt2x00usb.c
@@ -372,14 +372,9 @@ static void rt2x00usb_interrupt_rxdone(struct urb *urb)
struct queue_entry *entry = (struct queue_entry *)urb->context;
struct rt2x00_dev *rt2x00dev = entry->queue->rt2x00dev;
- if (!test_and_clear_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags))
+ if (!test_bit(ENTRY_OWNER_DEVICE_DATA, &entry->flags))
return;
- /*
- * Report the frame as DMA done
- */
- rt2x00lib_dmadone(entry);
-
/*
* Check if the received data is simply too small
* to be actually valid, or if the urb is signaling
@@ -388,6 +383,11 @@ static void rt2x00usb_interrupt_rxdone(struct urb *urb)
if (urb->actual_length < entry->queue->desc_size || urb->status)
set_bit(ENTRY_DATA_IO_FAILED, &entry->flags);
+ /*
+ * Report the frame as DMA done
+ */
+ rt2x00lib_dmadone(entry);
+
/*
* Schedule the delayed work for reading the RX status
* from the device.
--
2.17.1
From: Surabhi Vishnoi <svishnoi(a)codeaurora.org>
[ Upstream commit 97354f2c432788e3163134df6bb144f4b6289d87 ]
Currently mac80211 do not support probe response template for
mesh point. When WMI_SERVICE_BEACON_OFFLOAD is enabled, host
driver tries to configure probe response template for mesh, but
it fails because the interface type is not NL80211_IFTYPE_AP but
NL80211_IFTYPE_MESH_POINT.
To avoid this failure, skip sending probe response template to
firmware for mesh point.
Tested HW: WCN3990/QCA6174/QCA9984
Signed-off-by: Surabhi Vishnoi <svishnoi(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/mac.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 398068ad0b62..5a0138c1c045 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -1502,6 +1502,10 @@ static int ath10k_mac_setup_prb_tmpl(struct ath10k_vif *arvif)
if (arvif->vdev_type != WMI_VDEV_TYPE_AP)
return 0;
+ /* For mesh, probe response and beacon share the same template */
+ if (ieee80211_vif_is_mesh(vif))
+ return 0;
+
prb = ieee80211_proberesp_get(hw, vif);
if (!prb) {
ath10k_warn(ar, "failed to get probe resp template from mac80211\n");
--
2.20.1
From: Surabhi Vishnoi <svishnoi(a)codeaurora.org>
[ Upstream commit 97354f2c432788e3163134df6bb144f4b6289d87 ]
Currently mac80211 do not support probe response template for
mesh point. When WMI_SERVICE_BEACON_OFFLOAD is enabled, host
driver tries to configure probe response template for mesh, but
it fails because the interface type is not NL80211_IFTYPE_AP but
NL80211_IFTYPE_MESH_POINT.
To avoid this failure, skip sending probe response template to
firmware for mesh point.
Tested HW: WCN3990/QCA6174/QCA9984
Signed-off-by: Surabhi Vishnoi <svishnoi(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/mac.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index fb632a454fc2..1588fe8110d0 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -1596,6 +1596,10 @@ static int ath10k_mac_setup_prb_tmpl(struct ath10k_vif *arvif)
if (arvif->vdev_type != WMI_VDEV_TYPE_AP)
return 0;
+ /* For mesh, probe response and beacon share the same template */
+ if (ieee80211_vif_is_mesh(vif))
+ return 0;
+
prb = ieee80211_proberesp_get(hw, vif);
if (!prb) {
ath10k_warn(ar, "failed to get probe resp template from mac80211\n");
--
2.20.1
From: Yingying Tang <yintang(a)codeaurora.org>
[ Upstream commit 9e7251fa38978b85108c44743e1436d48e8d0d76 ]
tx_stats will be freed and set to NULL before debugfs_sta node is
removed in station disconnetion process. So if read the debugfs_sta
node there may be NULL pointer error. Add check for tx_stats before
use it to resove this issue.
Signed-off-by: Yingying Tang <yintang(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/debugfs_sta.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/debugfs_sta.c b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
index c704ae371c4d..42931a669b02 100644
--- a/drivers/net/wireless/ath/ath10k/debugfs_sta.c
+++ b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
@@ -663,6 +663,13 @@ static ssize_t ath10k_dbg_sta_dump_tx_stats(struct file *file,
mutex_lock(&ar->conf_mutex);
+ if (!arsta->tx_stats) {
+ ath10k_warn(ar, "failed to get tx stats");
+ mutex_unlock(&ar->conf_mutex);
+ kfree(buf);
+ return 0;
+ }
+
spin_lock_bh(&ar->data_lock);
for (k = 0; k < ATH10K_STATS_TYPE_MAX; k++) {
for (j = 0; j < ATH10K_COUNTER_TYPE_MAX; j++) {
--
2.20.1
From: Yingying Tang <yintang(a)codeaurora.org>
[ Upstream commit 9e7251fa38978b85108c44743e1436d48e8d0d76 ]
tx_stats will be freed and set to NULL before debugfs_sta node is
removed in station disconnetion process. So if read the debugfs_sta
node there may be NULL pointer error. Add check for tx_stats before
use it to resove this issue.
Signed-off-by: Yingying Tang <yintang(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/debugfs_sta.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/debugfs_sta.c b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
index c704ae371c4d..42931a669b02 100644
--- a/drivers/net/wireless/ath/ath10k/debugfs_sta.c
+++ b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
@@ -663,6 +663,13 @@ static ssize_t ath10k_dbg_sta_dump_tx_stats(struct file *file,
mutex_lock(&ar->conf_mutex);
+ if (!arsta->tx_stats) {
+ ath10k_warn(ar, "failed to get tx stats");
+ mutex_unlock(&ar->conf_mutex);
+ kfree(buf);
+ return 0;
+ }
+
spin_lock_bh(&ar->data_lock);
for (k = 0; k < ATH10K_STATS_TYPE_MAX; k++) {
for (j = 0; j < ATH10K_COUNTER_TYPE_MAX; j++) {
--
2.20.1
Please pick revert-commit
commit caff422ea81e144842bc44bab408d85ac449377b
("Revert "e1000e: fix cyclic resets at link up with active tx"")
and (optionally) proper fix
commit d17ba0f616a08f597d9348c372d89b8c0405ccf3
("e1000e: start network tx queue only when link is up")
into 4.9, 4.14, 4.19, 5.1, 5.2
buggy commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61
("e1000e: fix cyclic resets at link up with active tx")
commited in 5.0 and was picked into 4.9, 4.14, 4.19
It generates annoying false-positive hung-hardware warnings
https://bugzilla.kernel.org/show_bug.cgi?id=203175
Original problem isn't so severe so feel free to skip second commit if it doesn't apply clearly.
buffer_migrate_page_norefs() can race with bh users in a following way:
CPU1 CPU2
buffer_migrate_page_norefs()
buffer_migrate_lock_buffers()
checks bh refs
spin_unlock(&mapping->private_lock)
__find_get_block()
spin_lock(&mapping->private_lock)
grab bh ref
spin_unlock(&mapping->private_lock)
move page do bh work
This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.
Closing this race window is relatively difficult. We could hold
mapping->private_lock in buffer_migrate_page_norefs() until we are
finished with migrating the page but the lock hold times would be rather
big. So let's revert to a more careful variant of page migration requiring
eviction of buffers on migrated page. This is effectively
fallback_migrate_page() that additionally invalidates bh LRUs in case
try_to_free_buffers() failed.
CC: stable(a)vger.kernel.org
Fixes: 89cb0888ca14 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
mm/migrate.c | 161 +++++++++++++++++++++++++++++------------------------------
1 file changed, 78 insertions(+), 83 deletions(-)
I've lightly tested this with config-workload-thpfioscale which didn't
show any obvious issue but the patch probably needs more testing (especially
to verify that memory unplug is still able to succeed in reasonable time).
That's why this is RFC.
diff --git a/mm/migrate.c b/mm/migrate.c
index e9594bc0d406..893698d37d50 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -697,6 +697,47 @@ int migrate_page(struct address_space *mapping,
}
EXPORT_SYMBOL(migrate_page);
+/*
+ * Writeback a page to clean the dirty state
+ */
+static int writeout(struct address_space *mapping, struct page *page)
+{
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_NONE,
+ .nr_to_write = 1,
+ .range_start = 0,
+ .range_end = LLONG_MAX,
+ .for_reclaim = 1
+ };
+ int rc;
+
+ if (!mapping->a_ops->writepage)
+ /* No write method for the address space */
+ return -EINVAL;
+
+ if (!clear_page_dirty_for_io(page))
+ /* Someone else already triggered a write */
+ return -EAGAIN;
+
+ /*
+ * A dirty page may imply that the underlying filesystem has
+ * the page on some queue. So the page must be clean for
+ * migration. Writeout may mean we loose the lock and the
+ * page state is no longer what we checked for earlier.
+ * At this point we know that the migration attempt cannot
+ * be successful.
+ */
+ remove_migration_ptes(page, page, false);
+
+ rc = mapping->a_ops->writepage(page, &wbc);
+
+ if (rc != AOP_WRITEPAGE_ACTIVATE)
+ /* unlocked. Relock */
+ lock_page(page);
+
+ return (rc < 0) ? -EIO : -EAGAIN;
+}
+
#ifdef CONFIG_BLOCK
/* Returns true if all buffers are successfully locked */
static bool buffer_migrate_lock_buffers(struct buffer_head *head,
@@ -736,9 +777,14 @@ static bool buffer_migrate_lock_buffers(struct buffer_head *head,
return true;
}
-static int __buffer_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page, enum migrate_mode mode,
- bool check_refs)
+/*
+ * Migration function for pages with buffers. This function can only be used
+ * if the underlying filesystem guarantees that no other references to "page"
+ * exist. For example attached buffer heads are accessed only under page lock.
+ */
+int buffer_migrate_page(struct address_space *mapping,
+ struct page *newpage, struct page *page,
+ enum migrate_mode mode)
{
struct buffer_head *bh, *head;
int rc;
@@ -756,33 +802,6 @@ static int __buffer_migrate_page(struct address_space *mapping,
if (!buffer_migrate_lock_buffers(head, mode))
return -EAGAIN;
- if (check_refs) {
- bool busy;
- bool invalidated = false;
-
-recheck_buffers:
- busy = false;
- spin_lock(&mapping->private_lock);
- bh = head;
- do {
- if (atomic_read(&bh->b_count)) {
- busy = true;
- break;
- }
- bh = bh->b_this_page;
- } while (bh != head);
- spin_unlock(&mapping->private_lock);
- if (busy) {
- if (invalidated) {
- rc = -EAGAIN;
- goto unlock_buffers;
- }
- invalidate_bh_lrus();
- invalidated = true;
- goto recheck_buffers;
- }
- }
-
rc = migrate_page_move_mapping(mapping, newpage, page, mode, 0);
if (rc != MIGRATEPAGE_SUCCESS)
goto unlock_buffers;
@@ -818,72 +837,48 @@ static int __buffer_migrate_page(struct address_space *mapping,
return rc;
}
-
-/*
- * Migration function for pages with buffers. This function can only be used
- * if the underlying filesystem guarantees that no other references to "page"
- * exist. For example attached buffer heads are accessed only under page lock.
- */
-int buffer_migrate_page(struct address_space *mapping,
- struct page *newpage, struct page *page, enum migrate_mode mode)
-{
- return __buffer_migrate_page(mapping, newpage, page, mode, false);
-}
EXPORT_SYMBOL(buffer_migrate_page);
/*
- * Same as above except that this variant is more careful and checks that there
- * are also no buffer head references. This function is the right one for
- * mappings where buffer heads are directly looked up and referenced (such as
- * block device mappings).
+ * Same as above except that this variant is more careful. This function is
+ * the right one for mappings where buffer heads are directly looked up and
+ * referenced (such as block device mappings).
*/
int buffer_migrate_page_norefs(struct address_space *mapping,
struct page *newpage, struct page *page, enum migrate_mode mode)
{
- return __buffer_migrate_page(mapping, newpage, page, mode, true);
-}
-#endif
-
-/*
- * Writeback a page to clean the dirty state
- */
-static int writeout(struct address_space *mapping, struct page *page)
-{
- struct writeback_control wbc = {
- .sync_mode = WB_SYNC_NONE,
- .nr_to_write = 1,
- .range_start = 0,
- .range_end = LLONG_MAX,
- .for_reclaim = 1
- };
- int rc;
-
- if (!mapping->a_ops->writepage)
- /* No write method for the address space */
- return -EINVAL;
+ bool invalidated = false;
- if (!clear_page_dirty_for_io(page))
- /* Someone else already triggered a write */
- return -EAGAIN;
+ if (PageDirty(page)) {
+ /* Only writeback pages in full synchronous migration */
+ switch (mode) {
+ case MIGRATE_SYNC:
+ case MIGRATE_SYNC_NO_COPY:
+ break;
+ default:
+ return -EBUSY;
+ }
+ return writeout(mapping, page);
+ }
+retry:
/*
- * A dirty page may imply that the underlying filesystem has
- * the page on some queue. So the page must be clean for
- * migration. Writeout may mean we loose the lock and the
- * page state is no longer what we checked for earlier.
- * At this point we know that the migration attempt cannot
- * be successful.
+ * Buffers may be managed in a filesystem specific way.
+ * We must have no buffers or drop them.
*/
- remove_migration_ptes(page, page, false);
-
- rc = mapping->a_ops->writepage(page, &wbc);
-
- if (rc != AOP_WRITEPAGE_ACTIVATE)
- /* unlocked. Relock */
- lock_page(page);
+ if (page_has_private(page) &&
+ !try_to_release_page(page, GFP_KERNEL)) {
+ if (!invalidated) {
+ invalidate_bh_lrus();
+ invalidated = true;
+ goto retry;
+ }
+ return mode == MIGRATE_SYNC ? -EAGAIN : -EBUSY;
+ }
- return (rc < 0) ? -EIO : -EAGAIN;
+ return migrate_page(mapping, newpage, page, mode);
}
+#endif
/*
* Default handling if a filesystem does not provide a migration function.
--
2.16.4
From: Maarten ter Huurne <maarten(a)treewalker.org>
The SADC component can run at up to 8 MHz on JZ4725B, but is fed
a 12 MHz input clock (EXT). Divide it by two to get 6 MHz, then
set up another divider to match, to produce a 10us clock.
If the clock dividers are left on their power-on defaults (a divider
of 1), the SADC mostly works, but will occasionally produce erroneous
readings. This led to button presses being detected out of nowhere on
the RS90 every few minutes. With this change, no ghost button presses
were logged in almost a day worth of testing.
The ADCLK register for configuring clock dividers doesn't exist on
JZ4740, so avoid writing it there.
A function has been introduced rather than a flag because there is a lot
of variation between the ADCLK registers on JZ47xx SoCs, both in
the internal layout of the register and in the frequency range
supported by the SADC. So this solution should make it easier
to add support for other JZ47xx SoCs later.
Fixes: 1a78daea107d ("iio: adc: probe should set clock divider")
Signed-off-by: Maarten ter Huurne <maarten(a)treewalker.org>
Signed-off-by: Artur Rojek <contact(a)artur-rojek.eu>
---
Changes:
v2: Add the fixes tag.
drivers/iio/adc/ingenic-adc.c | 54 +++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/drivers/iio/adc/ingenic-adc.c b/drivers/iio/adc/ingenic-adc.c
index 92b1d5037ac9..e234970b7150 100644
--- a/drivers/iio/adc/ingenic-adc.c
+++ b/drivers/iio/adc/ingenic-adc.c
@@ -11,6 +11,7 @@
#include <linux/iio/iio.h>
#include <linux/io.h>
#include <linux/iopoll.h>
+#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/platform_device.h>
@@ -22,8 +23,11 @@
#define JZ_ADC_REG_ADTCH 0x18
#define JZ_ADC_REG_ADBDAT 0x1c
#define JZ_ADC_REG_ADSDAT 0x20
+#define JZ_ADC_REG_ADCLK 0x28
#define JZ_ADC_REG_CFG_BAT_MD BIT(4)
+#define JZ_ADC_REG_ADCLK_CLKDIV_LSB 0
+#define JZ_ADC_REG_ADCLK_CLKDIV10US_LSB 16
#define JZ_ADC_AUX_VREF 3300
#define JZ_ADC_AUX_VREF_BITS 12
@@ -34,6 +38,8 @@
#define JZ4740_ADC_BATTERY_HIGH_VREF (7500 * 0.986)
#define JZ4740_ADC_BATTERY_HIGH_VREF_BITS 12
+struct ingenic_adc;
+
struct ingenic_adc_soc_data {
unsigned int battery_high_vref;
unsigned int battery_high_vref_bits;
@@ -41,6 +47,7 @@ struct ingenic_adc_soc_data {
size_t battery_raw_avail_size;
const int *battery_scale_avail;
size_t battery_scale_avail_size;
+ int (*init_clk_div)(struct device *dev, struct ingenic_adc *adc);
};
struct ingenic_adc {
@@ -151,6 +158,42 @@ static const int jz4740_adc_battery_scale_avail[] = {
JZ_ADC_BATTERY_LOW_VREF, JZ_ADC_BATTERY_LOW_VREF_BITS,
};
+static int jz4725b_adc_init_clk_div(struct device *dev, struct ingenic_adc *adc)
+{
+ struct clk *parent_clk;
+ unsigned long parent_rate, rate;
+ unsigned int div_main, div_10us;
+
+ parent_clk = clk_get_parent(adc->clk);
+ if (!parent_clk) {
+ dev_err(dev, "ADC clock has no parent\n");
+ return -ENODEV;
+ }
+ parent_rate = clk_get_rate(parent_clk);
+
+ /*
+ * The JZ4725B ADC works at 500 kHz to 8 MHz.
+ * We pick the highest rate possible.
+ * In practice we typically get 6 MHz, half of the 12 MHz EXT clock.
+ */
+ div_main = DIV_ROUND_UP(parent_rate, 8000000);
+ div_main = clamp(div_main, 1u, 64u);
+ rate = parent_rate / div_main;
+ if (rate < 500000 || rate > 8000000) {
+ dev_err(dev, "No valid divider for ADC main clock\n");
+ return -EINVAL;
+ }
+
+ /* We also need a divider that produces a 10us clock. */
+ div_10us = DIV_ROUND_UP(rate, 100000);
+
+ writel(((div_10us - 1) << JZ_ADC_REG_ADCLK_CLKDIV10US_LSB) |
+ (div_main - 1) << JZ_ADC_REG_ADCLK_CLKDIV_LSB,
+ adc->base + JZ_ADC_REG_ADCLK);
+
+ return 0;
+}
+
static const struct ingenic_adc_soc_data jz4725b_adc_soc_data = {
.battery_high_vref = JZ4725B_ADC_BATTERY_HIGH_VREF,
.battery_high_vref_bits = JZ4725B_ADC_BATTERY_HIGH_VREF_BITS,
@@ -158,6 +201,7 @@ static const struct ingenic_adc_soc_data jz4725b_adc_soc_data = {
.battery_raw_avail_size = ARRAY_SIZE(jz4725b_adc_battery_raw_avail),
.battery_scale_avail = jz4725b_adc_battery_scale_avail,
.battery_scale_avail_size = ARRAY_SIZE(jz4725b_adc_battery_scale_avail),
+ .init_clk_div = jz4725b_adc_init_clk_div,
};
static const struct ingenic_adc_soc_data jz4740_adc_soc_data = {
@@ -167,6 +211,7 @@ static const struct ingenic_adc_soc_data jz4740_adc_soc_data = {
.battery_raw_avail_size = ARRAY_SIZE(jz4740_adc_battery_raw_avail),
.battery_scale_avail = jz4740_adc_battery_scale_avail,
.battery_scale_avail_size = ARRAY_SIZE(jz4740_adc_battery_scale_avail),
+ .init_clk_div = NULL, /* no ADCLK register on JZ4740 */
};
static int ingenic_adc_read_avail(struct iio_dev *iio_dev,
@@ -317,6 +362,15 @@ static int ingenic_adc_probe(struct platform_device *pdev)
return ret;
}
+ /* Set clock dividers. */
+ if (soc_data->init_clk_div) {
+ ret = soc_data->init_clk_div(dev, adc);
+ if (ret) {
+ clk_disable_unprepare(adc->clk);
+ return ret;
+ }
+ }
+
/* Put hardware in a known passive state. */
writeb(0x00, adc->base + JZ_ADC_REG_ENABLE);
writeb(0xff, adc->base + JZ_ADC_REG_CTRL);
--
2.22.0
On Sun, Jul 14, 2019 at 06:55:15AM +0800, kbuild test robot wrote:
> CC: kbuild-all(a)01.org
> TO: Dianzhang Chen <dianzhangchen0(a)gmail.com>
> CC: "Greg Kroah-Hartman" <gregkh(a)linuxfoundation.org>
> CC: Thomas Gleixner <tglx(a)linutronix.de>
>
> tree: https://kernel.googlesource.com/pub/scm/linux/kernel/git/stable/linux-stabl… linux-4.14.y
> head: 728f3eef5bdde0f9516277b4c4519fa5436e7e5d
> commit: 55ac552ebd34f9687cc1bdcb07006bf7f104dc99 [9981/9999] x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
> config: x86_64-rhel-7.2 (attached as .config)
> compiler: clang version 9.0.0 (git://gitmirror/llvm_project 87856e739c8e55f3b4e0f37baaf93308ec2dbd47)
> reproduce:
> git checkout 55ac552ebd34f9687cc1bdcb07006bf7f104dc99
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot <lkp(a)intel.com>
>
> All warnings (new ones prefixed by >>):
>
> >> arch/x86/kernel/ptrace.c:659:22: warning: ISO C90 forbids mixing declarations and code [-Wdeclaration-after-statement]
> struct perf_event *bp = thread->ptrace_bps[index];
> ^
> 1 warning generated.
>
> vim +659 arch/x86/kernel/ptrace.c
>
> ---
> 0-DAY kernel test infrastructure Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all Intel Corporation
Hi Greg and Sasha,
I was going to reply to this on the GCC version of the thread but I
don't really see a way to get the original message or the message ID
from the web archive since I'm not subscribed to that list :(
https://lists.01.org/pipermail/kbuild-all/2019-July/062379.html
This is not an issue in Linus' tree because he fixed it manually during
the merge:
https://lore.kernel.org/lkml/CAHk-=whhq5RQYNKzHOLqC+gzSjmcEGNJjbC=Psc_vQaCx…
I would say that it isn't unreasonable to fold that fixup into the
original patch, with a note that it came from Linus' merge upstream:
223cea6a4f05 ("Merge branch 'x86-pti-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip").
Cheers,
Nathan
Commit-ID: e4557c1a46b0d32746bd309e1941914b5a6912b4
Gitweb: https://git.kernel.org/tip/e4557c1a46b0d32746bd309e1941914b5a6912b4
Author: Kan Liang <kan.liang(a)linux.intel.com>
AuthorDate: Tue, 25 Jun 2019 07:21:35 -0700
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Sat, 13 Jul 2019 11:21:29 +0200
perf/x86/intel: Fix spurious NMI on fixed counter
If a user first sample a PEBS event on a fixed counter, then sample a
non-PEBS event on the same fixed counter on Icelake, it will trigger
spurious NMI. For example:
perf record -e 'cycles:p' -a
perf record -e 'cycles' -a
The error message for spurious NMI:
[June 21 15:38] Uhhuh. NMI received for unknown reason 30 on CPU 2.
[ +0.000000] Do you have a strange power saving mode enabled?
[ +0.000000] Dazed and confused, but trying to continue
The bug was introduced by the following commit:
commit 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
The commit moves the intel_pmu_pebs_disable() after intel_pmu_disable_fixed(),
which returns immediately. The related bit of PEBS_ENABLE MSR will never be
cleared for the fixed counter. Then a non-PEBS event runs on the fixed counter,
but the bit on PEBS_ENABLE is still set, which triggers spurious NMIs.
Check and disable PEBS for fixed counters after intel_pmu_disable_fixed().
Reported-by: Yi, Ammy <ammy.yi(a)intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Fixes: 6f55967ad9d9 ("perf/x86/intel: Fix race in intel_pmu_disable_event()")
Link: https://lkml.kernel.org/r/20190625142135.22112-1-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
arch/x86/events/intel/core.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index bda450ff51ee..9e911a96972b 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2161,12 +2161,10 @@ static void intel_pmu_disable_event(struct perf_event *event)
cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx);
cpuc->intel_cp_status &= ~(1ull << hwc->idx);
- if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
+ if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL))
intel_pmu_disable_fixed(hwc);
- return;
- }
-
- x86_pmu_disable_event(event);
+ else
+ x86_pmu_disable_event(event);
/*
* Needs to be called after x86_pmu_disable_event,
Commit-ID: 8a58ddae23796c733c5dfbd717538d89d036c5bd
Gitweb: https://git.kernel.org/tip/8a58ddae23796c733c5dfbd717538d89d036c5bd
Author: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
AuthorDate: Mon, 1 Jul 2019 14:07:55 +0300
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Sat, 13 Jul 2019 11:21:28 +0200
perf/core: Fix exclusive events' grouping
So far, we tried to disallow grouping exclusive events for the fear of
complications they would cause with moving between contexts. Specifically,
moving a software group to a hardware context would violate the exclusivity
rules if both groups contain matching exclusive events.
This attempt was, however, unsuccessful: the check that we have in the
perf_event_open() syscall is both wrong (looks at wrong PMU) and
insufficient (group leader may still be exclusive), as can be illustrated
by running:
$ perf record -e '{intel_pt//,cycles}' uname
$ perf record -e '{cycles,intel_pt//}' uname
ultimately successfully.
Furthermore, we are completely free to trigger the exclusivity violation
by:
perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'
even though the helpful perf record will not allow that, the ABI will.
The warning later in the perf_event_open() path will also not trigger, because
it's also wrong.
Fix all this by validating the original group before moving, getting rid
of broken safeguards and placing a useful one to perf_install_in_context().
Signed-off-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Cc: mathieu.poirier(a)linaro.org
Cc: will.deacon(a)arm.com
Fixes: bed5b25ad9c8a ("perf: Add a pmu capability for "exclusive" events")
Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.i…
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
include/linux/perf_event.h | 5 +++++
kernel/events/core.c | 34 ++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 16e38c286d46..e8ad3c590a23 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1055,6 +1055,11 @@ static inline int in_software_context(struct perf_event *event)
return event->ctx->pmu->task_ctx_nr == perf_sw_context;
}
+static inline int is_exclusive_pmu(struct pmu *pmu)
+{
+ return pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE;
+}
+
extern struct static_key perf_swevent_enabled[PERF_COUNT_SW_MAX];
extern void ___perf_sw_event(u32, u64, struct pt_regs *, u64);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5dd19bedbf64..eea9d52b010c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2553,6 +2553,9 @@ unlock:
return ret;
}
+static bool exclusive_event_installable(struct perf_event *event,
+ struct perf_event_context *ctx);
+
/*
* Attach a performance event to a context.
*
@@ -2567,6 +2570,8 @@ perf_install_in_context(struct perf_event_context *ctx,
lockdep_assert_held(&ctx->mutex);
+ WARN_ON_ONCE(!exclusive_event_installable(event, ctx));
+
if (event->cpu != -1)
event->cpu = cpu;
@@ -4360,7 +4365,7 @@ static int exclusive_event_init(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
- if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE))
+ if (!is_exclusive_pmu(pmu))
return 0;
/*
@@ -4391,7 +4396,7 @@ static void exclusive_event_destroy(struct perf_event *event)
{
struct pmu *pmu = event->pmu;
- if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE))
+ if (!is_exclusive_pmu(pmu))
return;
/* see comment in exclusive_event_init() */
@@ -4411,14 +4416,15 @@ static bool exclusive_event_match(struct perf_event *e1, struct perf_event *e2)
return false;
}
-/* Called under the same ctx::mutex as perf_install_in_context() */
static bool exclusive_event_installable(struct perf_event *event,
struct perf_event_context *ctx)
{
struct perf_event *iter_event;
struct pmu *pmu = event->pmu;
- if (!(pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE))
+ lockdep_assert_held(&ctx->mutex);
+
+ if (!is_exclusive_pmu(pmu))
return true;
list_for_each_entry(iter_event, &ctx->event_list, event_entry) {
@@ -10947,11 +10953,6 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_alloc;
}
- if ((pmu->capabilities & PERF_PMU_CAP_EXCLUSIVE) && group_leader) {
- err = -EBUSY;
- goto err_context;
- }
-
/*
* Look up the group leader (we will attach this event to it):
*/
@@ -11039,6 +11040,18 @@ SYSCALL_DEFINE5(perf_event_open,
move_group = 0;
}
}
+
+ /*
+ * Failure to create exclusive events returns -EBUSY.
+ */
+ err = -EBUSY;
+ if (!exclusive_event_installable(group_leader, ctx))
+ goto err_locked;
+
+ for_each_sibling_event(sibling, group_leader) {
+ if (!exclusive_event_installable(sibling, ctx))
+ goto err_locked;
+ }
} else {
mutex_lock(&ctx->mutex);
}
@@ -11075,9 +11088,6 @@ SYSCALL_DEFINE5(perf_event_open,
* because we need to serialize with concurrent event creation.
*/
if (!exclusive_event_installable(event, ctx)) {
- /* exclusive and group stuff are assumed mutually exclusive */
- WARN_ON_ONCE(move_group);
-
err = -EBUSY;
goto err_locked;
}
From: Kim Phillips <kim.phillips(a)amd.com>
Commit d7cbbe49a930 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask
for L3 Cache perf events") enables L3 PMC events for all threads and
slices by writing 1s in ChL3PmcCfg (L3 PMC PERF_CTL) register fields.
Those bitfields overlap with high order event select bits in the Data
Fabric PMC control register, however.
So when a user requests raw Data Fabric events (-e amd_df/event=0xYYY/),
the two highest order bits get inadvertently set, changing the counter
select to events that don't exist, and for which no counts are read.
This patch changes the logic to write the L3 masks only when dealing
with L3 PMC counters.
AMD Family 16h and below Northbridge (NB) counters were not affected.
Signed-off-by: Kim Phillips <kim.phillips(a)amd.com>
Cc: <stable(a)vger.kernel.org> # v4.19+
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Arnaldo Carvalho de Melo <acme(a)kernel.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Martin Liska <mliska(a)suse.cz>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit(a)amd.com>
Cc: Janakarajan Natarajan <Janakarajan.Natarajan(a)amd.com>
Cc: Gary Hook <Gary.Hook(a)amd.com>
Cc: Pu Wen <puwen(a)hygon.cn>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Cc: x86(a)kernel.org
Fixes: d7cbbe49a930 ("perf/x86/amd/uncore: Set ThreadMask and SliceMask for L3 Cache perf events")
---
RESEND3: file sent with header:
Content-Type: text/plain; charset="us-ascii"
to work around a bug in the Microsoft Outlook SMTP servers.
arch/x86/events/amd/uncore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/amd/uncore.c b/arch/x86/events/amd/uncore.c
index 85e6984c560b..c2c4ae5fbbfc 100644
--- a/arch/x86/events/amd/uncore.c
+++ b/arch/x86/events/amd/uncore.c
@@ -206,7 +206,7 @@ static int amd_uncore_event_init(struct perf_event *event)
* SliceMask and ThreadMask need to be set for certain L3 events in
* Family 17h. For other events, the two fields do not affect the count.
*/
- if (l3_mask)
+ if (l3_mask && is_llc_event(event))
hwc->config |= (AMD64_L3_SLICE_MASK | AMD64_L3_THREAD_MASK);
if (event->cpu < 0)
--
2.22.0
Commit-ID: 1cf8dfe8a661f0462925df943140e9f6d1ea5233
Gitweb: https://git.kernel.org/tip/1cf8dfe8a661f0462925df943140e9f6d1ea5233
Author: Peter Zijlstra <peterz(a)infradead.org>
AuthorDate: Sat, 13 Jul 2019 11:21:25 +0200
Committer: Ingo Molnar <mingo(a)kernel.org>
CommitDate: Sat, 13 Jul 2019 11:21:25 +0200
perf/core: Fix race between close() and fork()
Syzcaller reported the following Use-after-Free bug:
close() clone()
copy_process()
perf_event_init_task()
perf_event_init_context()
mutex_lock(parent_ctx->mutex)
inherit_task_group()
inherit_group()
inherit_event()
mutex_lock(event->child_mutex)
// expose event on child list
list_add_tail()
mutex_unlock(event->child_mutex)
mutex_unlock(parent_ctx->mutex)
...
goto bad_fork_*
bad_fork_cleanup_perf:
perf_event_free_task()
perf_release()
perf_event_release_kernel()
list_for_each_entry()
mutex_lock(ctx->mutex)
mutex_lock(event->child_mutex)
// event is from the failing inherit
// on the other CPU
perf_remove_from_context()
list_move()
mutex_unlock(event->child_mutex)
mutex_unlock(ctx->mutex)
mutex_lock(ctx->mutex)
list_for_each_entry_safe()
// event already stolen
mutex_unlock(ctx->mutex)
delayed_free_task()
free_task()
list_for_each_entry_safe()
list_del()
free_event()
_free_event()
// and so event->hw.target
// is the already freed failed clone()
if (event->hw.target)
put_task_struct(event->hw.target)
// WHOOPSIE, already quite dead
Which puts the lie to the the comment on perf_event_free_task():
'unexposed, unused context' not so much.
Which is a 'fun' confluence of fail; copy_process() doing an
unconditional free_task() and not respecting refcounts, and perf having
creative locking. In particular:
82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
seems to have overlooked this 'fun' parade.
Solve it by using the fact that detached events still have a reference
count on their (previous) context. With this perf_event_free_task()
can detect when events have escaped and wait for their destruction.
Debugged-by: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Reported-by: syzbot+a24c397a29ad22d86c98(a)syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Acked-by: Mark Rutland <mark.rutland(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme(a)redhat.com>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Stephane Eranian <eranian(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vince Weaver <vincent.weaver(a)maine.edu>
Fixes: 82d94856fa22 ("perf/core: Fix lock inversion between perf,trace,cpuhp")
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
---
kernel/events/core.c | 49 +++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 41 insertions(+), 8 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 785d708f8553..5dd19bedbf64 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4465,12 +4465,20 @@ static void _free_event(struct perf_event *event)
if (event->destroy)
event->destroy(event);
- if (event->ctx)
- put_ctx(event->ctx);
-
+ /*
+ * Must be after ->destroy(), due to uprobe_perf_close() using
+ * hw.target.
+ */
if (event->hw.target)
put_task_struct(event->hw.target);
+ /*
+ * perf_event_free_task() relies on put_ctx() being 'last', in particular
+ * all task references must be cleaned up.
+ */
+ if (event->ctx)
+ put_ctx(event->ctx);
+
exclusive_event_destroy(event);
module_put(event->pmu->module);
@@ -4650,8 +4658,17 @@ again:
mutex_unlock(&event->child_mutex);
list_for_each_entry_safe(child, tmp, &free_list, child_list) {
+ void *var = &child->ctx->refcount;
+
list_del(&child->child_list);
free_event(child);
+
+ /*
+ * Wake any perf_event_free_task() waiting for this event to be
+ * freed.
+ */
+ smp_mb(); /* pairs with wait_var_event() */
+ wake_up_var(var);
}
no_ctx:
@@ -11527,11 +11544,11 @@ static void perf_free_event(struct perf_event *event,
}
/*
- * Free an unexposed, unused context as created by inheritance by
- * perf_event_init_task below, used by fork() in case of fail.
+ * Free a context as created by inheritance by perf_event_init_task() below,
+ * used by fork() in case of fail.
*
- * Not all locks are strictly required, but take them anyway to be nice and
- * help out with the lockdep assertions.
+ * Even though the task has never lived, the context and events have been
+ * exposed through the child_list, so we must take care tearing it all down.
*/
void perf_event_free_task(struct task_struct *task)
{
@@ -11561,7 +11578,23 @@ void perf_event_free_task(struct task_struct *task)
perf_free_event(event, ctx);
mutex_unlock(&ctx->mutex);
- put_ctx(ctx);
+
+ /*
+ * perf_event_release_kernel() could've stolen some of our
+ * child events and still have them on its free_list. In that
+ * case we must wait for these events to have been freed (in
+ * particular all their references to this task must've been
+ * dropped).
+ *
+ * Without this copy_process() will unconditionally free this
+ * task (irrespective of its reference count) and
+ * _free_event()'s put_task_struct(event->hw.target) will be a
+ * use-after-free.
+ *
+ * Wait for all events to drop their context reference.
+ */
+ wait_var_event(&ctx->refcount, refcount_read(&ctx->refcount) == 1);
+ put_ctx(ctx); /* must be last */
}
}
The GTCO tablet input driver configures itself from an HID report sent
via USB during the initial enumeration process. Some debugging messages
are generated during the parsing. A debugging message indentation
counter is not bounds checked, leading to the ability for a specially
crafted HID report to cause '-' and null bytes be written past the end
of the indentation array. As long as the kernel has CONFIG_DYNAMIC_DEBUG
enabled, this code will not be optimized out. This was discovered
during code review after a previous syzkaller bug was found in this
driver.
Cc: stable(a)vger.kernel.org
Signed-off-by: Grant Hernandez <granthernandez(a)google.com>
---
drivers/input/tablet/gtco.c | 19 ++++++++++++++++---
1 file changed, 16 insertions(+), 3 deletions(-)
diff --git a/drivers/input/tablet/gtco.c b/drivers/input/tablet/gtco.c
index 4b8b9d7aa75e..9771052ed027 100644
--- a/drivers/input/tablet/gtco.c
+++ b/drivers/input/tablet/gtco.c
@@ -78,6 +78,7 @@ Scott Hill shill(a)gtcocalcomp.com
/* Max size of a single report */
#define REPORT_MAX_SIZE 10
+#define MAX_COLLECTION_LEVELS 10
/* Bitmask whether pen is in range */
@@ -223,8 +224,7 @@ static void parse_hid_report_descriptor(struct gtco *device, char * report,
char maintype = 'x';
char globtype[12];
int indent = 0;
- char indentstr[10] = "";
-
+ char indentstr[MAX_COLLECTION_LEVELS+1] = {0};
dev_dbg(ddev, "======>>>>>>PARSE<<<<<<======\n");
@@ -350,6 +350,12 @@ static void parse_hid_report_descriptor(struct gtco *device, char * report,
case TAG_MAIN_COL_START:
maintype = 'S';
+ if (indent == MAX_COLLECTION_LEVELS) {
+ dev_err(ddev, "Collection level %d would exceed limit of %d\n",
+ indent+1, MAX_COLLECTION_LEVELS);
+ break;
+ }
+
if (data == 0) {
dev_dbg(ddev, "======>>>>>> Physical\n");
strcpy(globtype, "Physical");
@@ -369,8 +375,15 @@ static void parse_hid_report_descriptor(struct gtco *device, char * report,
break;
case TAG_MAIN_COL_END:
- dev_dbg(ddev, "<<<<<<======\n");
maintype = 'E';
+
+ if (indent == 0) {
+ dev_err(ddev, "Collection level already at zero\n");
+ break;
+ }
+
+ dev_dbg(ddev, "<<<<<<======\n");
+
indent--;
for (x = 0; x < indent; x++)
indentstr[x] = '-';
--
2.22.0.410.gd8fdbe21b5-goog
The patch titled
Subject: mm/z3fold.c: lock z3fold page before __SetPageMovable()
has been removed from the -mm tree. Its filename was
mm-z3foldc-lock-z3fold-page-before-__setpagemovable.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Henry Burns <henryburns(a)google.com>
Subject: mm/z3fold.c: lock z3fold page before __SetPageMovable()
Following zsmalloc.c's example we call trylock_page() and unlock_page().
Also make z3fold_page_migrate() assert that newpage is passed in locked,
as per the documentation.
[akpm(a)linux-foundation.org: fix trylock_page return value test, per Shakeel]
Link: http://lkml.kernel.org/r/20190702005122.41036-1-henryburns@google.com
Link: http://lkml.kernel.org/r/20190702233538.52793-1-henryburns@google.com
Signed-off-by: Henry Burns <henryburns(a)google.com>
Suggested-by: Vitaly Wool <vitalywool(a)gmail.com>
Acked-by: Vitaly Wool <vitalywool(a)gmail.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Vitaly Vul <vitaly.vul(a)sony.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Xidong Wang <wangxidong_97(a)163.com>
Cc: Jonathan Adams <jwadams(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/z3fold.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
--- a/mm/z3fold.c~mm-z3foldc-lock-z3fold-page-before-__setpagemovable
+++ a/mm/z3fold.c
@@ -924,7 +924,16 @@ retry:
set_bit(PAGE_HEADLESS, &page->private);
goto headless;
}
- __SetPageMovable(page, pool->inode->i_mapping);
+ if (can_sleep) {
+ lock_page(page);
+ __SetPageMovable(page, pool->inode->i_mapping);
+ unlock_page(page);
+ } else {
+ if (trylock_page(page)) {
+ __SetPageMovable(page, pool->inode->i_mapping);
+ unlock_page(page);
+ }
+ }
z3fold_page_lock(zhdr);
found:
@@ -1331,6 +1340,7 @@ static int z3fold_page_migrate(struct ad
VM_BUG_ON_PAGE(!PageMovable(page), page);
VM_BUG_ON_PAGE(!PageIsolated(page), page);
+ VM_BUG_ON_PAGE(!PageLocked(newpage), newpage);
zhdr = page_address(page);
pool = zhdr_to_pool(zhdr);
_
Patches currently in -mm which might be from henryburns(a)google.com are
mm-z3foldc-remove-z3fold_migration-trylock.patch
The patch titled
Subject: mm/memcontrol: fix wrong statistics in memory.stat
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-wrong-statistics-in-memorystat.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: mm/memcontrol: fix wrong statistics in memory.stat
When we calculate total statistics for memcg1_stats and memcg1_events, we
use the the index 'i' in the for loop as the events index. Actually we
should use memcg1_stats[i] and memcg1_events[i] as the events index.
Link: http://lkml.kernel.org/r/1562116978-19539-1-git-send-email-laoar.shao@gmail…
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty").
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Yafang Shao <shaoyafang(a)didiglobal.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-wrong-statistics-in-memorystat
+++ a/mm/memcontrol.c
@@ -3523,12 +3523,13 @@ static int memcg_stat_show(struct seq_fi
if (memcg1_stats[i] == MEMCG_SWAP && !do_memsw_account())
continue;
seq_printf(m, "total_%s %llu\n", memcg1_stat_names[i],
- (u64)memcg_page_state(memcg, i) * PAGE_SIZE);
+ (u64)memcg_page_state(memcg, memcg1_stats[i]) *
+ PAGE_SIZE);
}
for (i = 0; i < ARRAY_SIZE(memcg1_events); i++)
seq_printf(m, "total_%s %llu\n", memcg1_event_names[i],
- (u64)memcg_events(memcg, i));
+ (u64)memcg_events(memcg, memcg1_events[i]));
for (i = 0; i < NR_LRU_LISTS; i++)
seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i],
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
mm-vmscan-expose-cgroup_ino-for-memcg-reclaim-tracepoints.patch
mm-memcontrol-keep-local-vm-counters-in-sync-with-the-hierarchical-ones.patch
mm-vmscan-add-a-new-member-reclaim_state-in-struct-shrink_control.patch
mm-vmscan-add-a-new-member-reclaim_state-in-struct-shrink_control-fix.patch
mm-vmscan-calculate-reclaimed-slab-caches-in-all-reclaim-paths.patch
The patch titled
Subject: mm/nvdimm: add is_ioremap_addr and use that to check ioremap address
has been removed from the -mm tree. Its filename was
mm-nvdimm-add-is_ioremap_addr-and-use-that-to-check-ioremap-address.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/nvdimm: add is_ioremap_addr and use that to check ioremap address
Architectures like powerpc use different address range to map ioremap and
vmalloc range. The memunmap() check used by the nvdimm layer was wrongly
using is_vmalloc_addr() to check for ioremap range which fails for ppc64.
This result in ppc64 not freeing the ioremap mapping. The side effect of
this is an unbind failure during module unload with papr_scm nvdimm driver
Link: http://lkml.kernel.org/r/20190701134038.14165-1-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions")
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/powerpc/include/asm/pgtable.h | 14 ++++++++++++++
include/linux/mm.h | 5 +++++
kernel/iomem.c | 2 +-
3 files changed, 20 insertions(+), 1 deletion(-)
--- a/arch/powerpc/include/asm/pgtable.h~mm-nvdimm-add-is_ioremap_addr-and-use-that-to-check-ioremap-address
+++ a/arch/powerpc/include/asm/pgtable.h
@@ -140,6 +140,20 @@ static inline void pte_frag_set(mm_conte
}
#endif
+#ifdef CONFIG_PPC64
+#define is_ioremap_addr is_ioremap_addr
+static inline bool is_ioremap_addr(const void *x)
+{
+#ifdef CONFIG_MMU
+ unsigned long addr = (unsigned long)x;
+
+ return addr >= IOREMAP_BASE && addr < IOREMAP_END;
+#else
+ return false;
+#endif
+}
+#endif /* CONFIG_PPC64 */
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_POWERPC_PGTABLE_H */
--- a/include/linux/mm.h~mm-nvdimm-add-is_ioremap_addr-and-use-that-to-check-ioremap-address
+++ a/include/linux/mm.h
@@ -633,6 +633,11 @@ static inline bool is_vmalloc_addr(const
return false;
#endif
}
+
+#ifndef is_ioremap_addr
+#define is_ioremap_addr(x) is_vmalloc_addr(x)
+#endif
+
#ifdef CONFIG_MMU
extern int is_vmalloc_or_module_addr(const void *x);
#else
--- a/kernel/iomem.c~mm-nvdimm-add-is_ioremap_addr-and-use-that-to-check-ioremap-address
+++ a/kernel/iomem.c
@@ -121,7 +121,7 @@ EXPORT_SYMBOL(memremap);
void memunmap(void *addr)
{
- if (is_vmalloc_addr(addr))
+ if (is_ioremap_addr(addr))
iounmap((void __iomem *) addr);
}
EXPORT_SYMBOL(memunmap);
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-move-map_sync-to-asm-generic-mman-commonh.patch
mm-mmap-move-common-defines-to-mman-commonh.patch
The patch titled
Subject: mm: vmscan: scan anonymous pages on file refaults
has been removed from the -mm tree. Its filename was
mm-vmscan-scan-anonymous-pages-on-file-refaults.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Kuo-Hsin Yang <vovoy(a)chromium.org>
Subject: mm: vmscan: scan anonymous pages on file refaults
When file refaults are detected and there are many inactive file pages,
the system never reclaim anonymous pages, the file pages are dropped
aggressively when there are still a lot of cold anonymous pages and system
thrashes. This issue impacts the performance of applications with large
executable, e.g. chrome.
With this patch, when file refault is detected, inactive_list_is_low()
always returns true for file pages in get_scan_count() to enable scanning
anonymous pages.
The problem can be reproduced by the following test program.
---8<---
void fallocate_file(const char *filename, off_t size)
{
struct stat st;
int fd;
if (!stat(filename, &st) && st.st_size >= size)
return;
fd = open(filename, O_WRONLY | O_CREAT, 0600);
if (fd < 0) {
perror("create file");
exit(1);
}
if (posix_fallocate(fd, 0, size)) {
perror("fallocate");
exit(1);
}
close(fd);
}
long *alloc_anon(long size)
{
long *start = malloc(size);
memset(start, 1, size);
return start;
}
long access_file(const char *filename, long size, long rounds)
{
int fd, i;
volatile char *start1, *end1, *start2;
const int page_size = getpagesize();
long sum = 0;
fd = open(filename, O_RDONLY);
if (fd == -1) {
perror("open");
exit(1);
}
/*
* Some applications, e.g. chrome, use a lot of executable file
* pages, map some of the pages with PROT_EXEC flag to simulate
* the behavior.
*/
start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED,
fd, 0);
if (start1 == MAP_FAILED) {
perror("mmap");
exit(1);
}
end1 = start1 + size / 2;
start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2);
if (start2 == MAP_FAILED) {
perror("mmap");
exit(1);
}
for (i = 0; i < rounds; ++i) {
struct timeval before, after;
volatile char *ptr1 = start1, *ptr2 = start2;
gettimeofday(&before, NULL);
for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size)
sum += *ptr1 + *ptr2;
gettimeofday(&after, NULL);
printf("File access time, round %d: %f (sec)
", i,
(after.tv_sec - before.tv_sec) +
(after.tv_usec - before.tv_usec) / 1000000.0);
}
return sum;
}
int main(int argc, char *argv[])
{
const long MB = 1024 * 1024;
long anon_mb, file_mb, file_rounds;
const char filename[] = "large";
long *ret1;
long ret2;
if (argc != 4) {
printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS
");
exit(0);
}
anon_mb = atoi(argv[1]);
file_mb = atoi(argv[2]);
file_rounds = atoi(argv[3]);
fallocate_file(filename, file_mb * MB);
printf("Allocate %ld MB anonymous pages
", anon_mb);
ret1 = alloc_anon(anon_mb * MB);
printf("Access %ld MB file pages
", file_mb);
ret2 = access_file(filename, file_mb * MB, file_rounds);
printf("Print result to prevent optimization: %ld
",
*ret1 + ret2);
return 0;
}
---8<---
Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program
fills ram with 2048 MB memory, access a 200 MB file for 10 times. Without
this patch, the file cache is dropped aggresively and every access to the
file is from disk.
$ ./thrash 2048 200 10
Allocate 2048 MB anonymous pages
Access 200 MB file pages
File access time, round 0: 2.489316 (sec)
File access time, round 1: 2.581277 (sec)
File access time, round 2: 2.487624 (sec)
File access time, round 3: 2.449100 (sec)
File access time, round 4: 2.420423 (sec)
File access time, round 5: 2.343411 (sec)
File access time, round 6: 2.454833 (sec)
File access time, round 7: 2.483398 (sec)
File access time, round 8: 2.572701 (sec)
File access time, round 9: 2.493014 (sec)
With this patch, these file pages can be cached.
$ ./thrash 2048 200 10
Allocate 2048 MB anonymous pages
Access 200 MB file pages
File access time, round 0: 2.475189 (sec)
File access time, round 1: 2.440777 (sec)
File access time, round 2: 2.411671 (sec)
File access time, round 3: 1.955267 (sec)
File access time, round 4: 0.029924 (sec)
File access time, round 5: 0.000808 (sec)
File access time, round 6: 0.000771 (sec)
File access time, round 7: 0.000746 (sec)
File access time, round 8: 0.000738 (sec)
File access time, round 9: 0.000747 (sec)
Checked the swap out stats during the test [1], 19006 pages swapped out
with this patch, 3418 pages swapped out without this patch. There are
more swap out, but I think it's within reasonable range when file backed
data set doesn't fit into the memory.
$ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS
PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644,
inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB)
pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages
Access 2100 MB regular file pages File access time, round 0: 12.165,
(sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896,
inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec)
active_anon: 1430576, inactive_anon: 477144, active_file: 25440,
inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec)
active_anon: 1427436, inactive_anon: 476060, active_file: 21112,
inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec)
active_anon: 1420444, inactive_anon: 473632, active_file: 23216,
inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec)
active_anon: 1413964, inactive_anon: 471460, active_file: 31728,
inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366
(+ 11309120)
With 4 processes accessing non-overlapping parts of a large file, 30316
pages swapped out with this patch, 5152 pages swapped out without this
patch. The swapout number is small comparing to pgpgin.
[1]: https://github.com/vovo/testing/blob/master/mem_thrash.c
Link: http://lkml.kernel.org/r/20190701081038.GA83398@google.com
Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty")
Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty")
Signed-off-by: Kuo-Hsin Yang <vovoy(a)chromium.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Sonny Rao <sonnyrao(a)chromium.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Rik van Riel <riel(a)redhat.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [4.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/vmscan.c~mm-vmscan-scan-anonymous-pages-on-file-refaults
+++ a/mm/vmscan.c
@@ -2125,7 +2125,7 @@ static void shrink_active_list(unsigned
* 10TB 320 32GB
*/
static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
- struct scan_control *sc, bool actual_reclaim)
+ struct scan_control *sc, bool trace)
{
enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE;
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
@@ -2151,7 +2151,7 @@ static bool inactive_list_is_low(struct
* rid of the stale workingset quickly.
*/
refaults = lruvec_page_state_local(lruvec, WORKINGSET_ACTIVATE);
- if (file && actual_reclaim && lruvec->refaults != refaults) {
+ if (file && lruvec->refaults != refaults) {
inactive_ratio = 0;
} else {
gb = (inactive + active) >> (30 - PAGE_SHIFT);
@@ -2161,7 +2161,7 @@ static bool inactive_list_is_low(struct
inactive_ratio = 1;
}
- if (actual_reclaim)
+ if (trace)
trace_mm_vmscan_inactive_list_is_low(pgdat->node_id, sc->reclaim_idx,
lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
_
Patches currently in -mm which might be from vovoy(a)chromium.org are
Hole puching currently evicts pages from page cache and then goes on to
remove blocks from the inode. This happens under both XFS_IOLOCK_EXCL
and XFS_MMAPLOCK_EXCL which provides appropriate serialization with
racing reads or page faults. However there is currently nothing that
prevents readahead triggered by fadvise() or madvise() from racing with
the hole punch and instantiating page cache page after hole punching has
evicted page cache in xfs_flush_unmap_range() but before it has removed
blocks from the inode. This page cache page will be mapping soon to be
freed block and that can lead to returning stale data to userspace or
even filesystem corruption.
Fix the problem by protecting handling of readahead requests by
XFS_IOLOCK_SHARED similarly as we protect reads.
CC: stable(a)vger.kernel.org
Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmah…
Reported-by: Amir Goldstein <amir73il(a)gmail.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/xfs/xfs_file.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 76748255f843..88fe3dbb3ba2 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -33,6 +33,7 @@
#include <linux/pagevec.h>
#include <linux/backing-dev.h>
#include <linux/mman.h>
+#include <linux/fadvise.h>
static const struct vm_operations_struct xfs_file_vm_ops;
@@ -939,6 +940,24 @@ xfs_file_fallocate(
return error;
}
+STATIC int
+xfs_file_fadvise(
+ struct file *file,
+ loff_t start,
+ loff_t end,
+ int advice)
+{
+ struct xfs_inode *ip = XFS_I(file_inode(file));
+ int ret;
+
+ /* Readahead needs protection from hole punching and similar ops */
+ if (advice == POSIX_FADV_WILLNEED)
+ xfs_ilock(ip, XFS_IOLOCK_SHARED);
+ ret = generic_fadvise(file, start, end, advice);
+ if (advice == POSIX_FADV_WILLNEED)
+ xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+ return ret;
+}
STATIC loff_t
xfs_file_remap_range(
@@ -1235,6 +1254,7 @@ const struct file_operations xfs_file_operations = {
.fsync = xfs_file_fsync,
.get_unmapped_area = thp_get_unmapped_area,
.fallocate = xfs_file_fallocate,
+ .fadvise = xfs_file_fadvise,
.remap_file_range = xfs_file_remap_range,
};
--
2.16.4
In linux version 4.4, a 32-bit process may fail to allocate 64M hugepage
memory by function shmat even though there is a 64M memory gap in
the process.
It is the adjusted length that causes the problem, introduced from
commit db4fbfb9523c935 ("mm: vm_unmapped_area() lookup function").
Accounting for the worst case alignment overhead, function unmapped_area
and unmapped_area_topdown adjust the search length before searching
for available vma gap. This is an estimated length, sum of the desired
length and the longest alignment offset, which can cause misjudgement
if the system has very few virtual memory left. For example, if the
longest memory gap available is 64M, we can’t get it from the system
by allocating 64M hugepage memory via shmat function. The reason is
that it requires a longger length, the sum of the desired length(64M)
and the longest alignment offset.
To fix this error ,we can calculate the alignment offset of
gap_start or gap_end to get a desired gap_start or gap_end value,
before searching for the available gap. In this way, we don't
need to adjust the search length.
Problem reproduces procedure:
1. allocate a lot of virtual memory segments via shmat and malloc
2. release one of the biggest memory segment via shmdt
3. attach the biggest memory segment via shmat
e.g.
process maps:
00008000-00009000 r-xp 00000000 00:12 3385 /tmp/memory_mmap
00011000-00012000 rw-p 00001000 00:12 3385 /tmp/memory_mmap
27536000-f756a000 rw-p 00000000 00:00 0
f756a000-f7691000 r-xp 00000000 01:00 560 /lib/libc-2.11.1.so
f7691000-f7699000 ---p 00127000 01:00 560 /lib/libc-2.11.1.so
f7699000-f769b000 r--p 00127000 01:00 560 /lib/libc-2.11.1.so
f769b000-f769c000 rw-p 00129000 01:00 560 /lib/libc-2.11.1.so
f769c000-f769f000 rw-p 00000000 00:00 0
f769f000-f76c0000 r-xp 00000000 01:00 583 /lib/libgcc_s.so.1
f76c0000-f76c7000 ---p 00021000 01:00 583 /lib/libgcc_s.so.1
f76c7000-f76c8000 rw-p 00020000 01:00 583 /lib/libgcc_s.so.1
f76c8000-f76e5000 r-xp 00000000 01:00 543 /lib/ld-2.11.1.so
f76e9000-f76ea000 rw-p 00000000 00:00 0
f76ea000-f76ec000 rw-p 00000000 00:00 0
f76ec000-f76ed000 r--p 0001c000 01:00 543 /lib/ld-2.11.1.so
f76ed000-f76ee000 rw-p 0001d000 01:00 543 /lib/ld-2.11.1.so
f7800000-f7a00000 rw-s 00000000 00:0e 0 /SYSV000000ea (deleted)
fba00000-fca00000 rw-s 00000000 00:0e 65538 /SYSV000000ec (deleted)
fca00000-fce00000 rw-s 00000000 00:0e 98307 /SYSV000000ed (deleted)
fce00000-fd800000 rw-s 00000000 00:0e 131076 /SYSV000000ee (deleted)
ff913000-ff934000 rw-p 00000000 00:00 0 [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors]
from 0xf7a00000 to fba00000, it has 64M memory gap, but we can't get
it from kernel.
Signed-off-by: jianhong chen <chenjianhong2(a)huawei.com>
Cc: stable(a)vger.kernel.org
---
mm/mmap.c | 43 +++++++++++++++++++++++++++++--------------
1 file changed, 29 insertions(+), 14 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index bd7b9f2..c5a5782 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1865,6 +1865,22 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
return error;
}
+static inline unsigned long gap_start_offset(struct vm_unmapped_area_info *info,
+ unsigned long addr)
+{
+ /* get gap_start offset to adjust gap address to the
+ * desired alignment
+ */
+ return (info->align_offset - addr) & info->align_mask;
+}
+
+static inline unsigned long gap_end_offset(struct vm_unmapped_area_info *info,
+ unsigned long addr)
+{
+ /* get gap_end offset to adjust gap address to the desired alignment */
+ return (addr - info->align_offset) & info->align_mask;
+}
+
unsigned long unmapped_area(struct vm_unmapped_area_info *info)
{
/*
@@ -1879,10 +1895,7 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
struct vm_area_struct *vma;
unsigned long length, low_limit, high_limit, gap_start, gap_end;
- /* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
- if (length < info->length)
- return -ENOMEM;
+ length = info->length;
/* Adjust search limits by the desired length */
if (info->high_limit < length)
@@ -1914,6 +1927,7 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
}
gap_start = vma->vm_prev ? vm_end_gap(vma->vm_prev) : 0;
+ gap_start += gap_start_offset(info, gap_start);
check_current:
/* Check if current node has a suitable gap */
if (gap_start > high_limit)
@@ -1942,6 +1956,7 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
struct vm_area_struct, vm_rb);
if (prev == vma->vm_rb.rb_left) {
gap_start = vm_end_gap(vma->vm_prev);
+ gap_start += gap_start_offset(info, gap_start);
gap_end = vm_start_gap(vma);
goto check_current;
}
@@ -1951,17 +1966,17 @@ unsigned long unmapped_area(struct vm_unmapped_area_info *info)
check_highest:
/* Check highest gap, which does not precede any rbtree node */
gap_start = mm->highest_vm_end;
+ gap_start += gap_start_offset(info, gap_start);
gap_end = ULONG_MAX; /* Only for VM_BUG_ON below */
if (gap_start > high_limit)
return -ENOMEM;
found:
/* We found a suitable gap. Clip it with the original low_limit. */
- if (gap_start < info->low_limit)
+ if (gap_start < info->low_limit) {
gap_start = info->low_limit;
-
- /* Adjust gap address to the desired alignment */
- gap_start += (info->align_offset - gap_start) & info->align_mask;
+ gap_start += gap_start_offset(info, gap_start);
+ }
VM_BUG_ON(gap_start + info->length > info->high_limit);
VM_BUG_ON(gap_start + info->length > gap_end);
@@ -1974,16 +1989,14 @@ unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
struct vm_area_struct *vma;
unsigned long length, low_limit, high_limit, gap_start, gap_end;
- /* Adjust search length to account for worst case alignment overhead */
- length = info->length + info->align_mask;
- if (length < info->length)
- return -ENOMEM;
+ length = info->length;
/*
* Adjust search limits by the desired length.
* See implementation comment at top of unmapped_area().
*/
gap_end = info->high_limit;
+ gap_end -= gap_end_offset(info, gap_end);
if (gap_end < length)
return -ENOMEM;
high_limit = gap_end - length;
@@ -2020,6 +2033,7 @@ unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
check_current:
/* Check if current node has a suitable gap */
gap_end = vm_start_gap(vma);
+ gap_end -= gap_end_offset(info, gap_end);
if (gap_end < low_limit)
return -ENOMEM;
if (gap_start <= high_limit &&
@@ -2054,13 +2068,14 @@ unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info)
found:
/* We found a suitable gap. Clip it with the original high_limit. */
- if (gap_end > info->high_limit)
+ if (gap_end > info->high_limit) {
gap_end = info->high_limit;
+ gap_end -= gap_end_offset(info, gap_end);
+ }
found_highest:
/* Compute highest gap address at the desired alignment */
gap_end -= info->length;
- gap_end -= (gap_end - info->align_offset) & info->align_mask;
VM_BUG_ON(gap_end < info->low_limit);
VM_BUG_ON(gap_end < gap_start);
--
1.8.5.6
This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564.
That commit caused a severe memory leak in nfs_readdir_make_qstr().
When listing a directory with more than 100 files (this is how many
struct nfs_cache_array_entry elements fit in one 4kB page), all
allocated file name strings past those 100 leak.
The root of the leakage is that those string pointers are managed in
pages which are never linked into the page cache.
fs/nfs/dir.c puts pages into the page cache by calling
read_cache_page(); the callback function nfs_readdir_filler() will
then fill the given page struct which was passed to it, which is
already linked in the page cache (by do_read_cache_page() calling
add_to_page_cache_lru()).
Commit be4c2d4723a4 added another (local) array of allocated pages, to
be filled with more data, instead of discarding excess items received
from the NFS server. Those additional pages can be used by the next
nfs_readdir_filler() call (from within the same nfs_readdir() call).
The leak happens when some of those additional pages are never used
(copied to the page cache using copy_highpage()). The pages will be
freed by nfs_readdir_free_pages(), but their contents will not. The
commit did not invoke nfs_readdir_clear_array() (and doing so would
have been dangerous, because it did not track which of those pages
were already copied to the page cache, risking double free bugs).
How to reproduce the leak:
- Use a kernel with CONFIG_SLUB_DEBUG_ON.
- Create a directory on a NFS mount with more than 100 files with
names long enough to use the "kmalloc-32" slab (so we can easily
look up the allocation counts):
for i in `seq 110`; do touch ${i}_0123456789abcdef; done
- Drop all caches:
echo 3 >/proc/sys/vm/drop_caches
- Check the allocation counter:
grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1
- Request a directory listing and check the allocation counters again:
ls
[...]
grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1
There are now 120 new allocations.
- Drop all caches and check the counters again:
echo 3 >/proc/sys/vm/drop_caches
grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls
30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1
110 allocations are gone, but 10 have leaked and will never be freed.
Unhelpfully, those allocations are explicitly excluded from KMEMLEAK,
that's why my initial attempts with KMEMLEAK were not successful:
/*
* Avoid a kmemleak false positive. The pointer to the name is stored
* in a page cache page which kmemleak does not scan.
*/
kmemleak_not_leak(string->name);
It would be possible to solve this bug without reverting the whole
commit:
- keep track of which pages were not used, and call
nfs_readdir_clear_array() on them, or
- manually link those pages into the page cache
But for now I have decided to just revert the commit, because the real
fix would require complex considerations, risking more dangerous
(crash) bugs, which may seem unsuitable for the stable branches.
Signed-off-by: Max Kellermann <mk(a)cm4all.com>
Cc: stable(a)vger.kernel.org
---
fs/nfs/dir.c | 90 ++++-------------------------------------------
fs/nfs/internal.h | 3 +-
2 files changed, 7 insertions(+), 86 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 57b6a45576ad..9f44ddc34c7b 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -140,19 +140,12 @@ struct nfs_cache_array {
struct nfs_cache_array_entry array[0];
};
-struct readdirvec {
- unsigned long nr;
- unsigned long index;
- struct page *pages[NFS_MAX_READDIR_RAPAGES];
-};
-
typedef int (*decode_dirent_t)(struct xdr_stream *, struct nfs_entry *, bool);
typedef struct {
struct file *file;
struct page *page;
struct dir_context *ctx;
unsigned long page_index;
- struct readdirvec pvec;
u64 *dir_cookie;
u64 last_cookie;
loff_t current_index;
@@ -532,10 +525,6 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
struct nfs_cache_array *array;
unsigned int count = 0;
int status;
- int max_rapages = NFS_MAX_READDIR_RAPAGES;
-
- desc->pvec.index = desc->page_index;
- desc->pvec.nr = 0;
scratch = alloc_page(GFP_KERNEL);
if (scratch == NULL)
@@ -560,40 +549,20 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
if (desc->plus)
nfs_prime_dcache(file_dentry(desc->file), entry);
- status = nfs_readdir_add_to_array(entry, desc->pvec.pages[desc->pvec.nr]);
- if (status == -ENOSPC) {
- desc->pvec.nr++;
- if (desc->pvec.nr == max_rapages)
- break;
- status = nfs_readdir_add_to_array(entry, desc->pvec.pages[desc->pvec.nr]);
- }
+ status = nfs_readdir_add_to_array(entry, page);
if (status != 0)
break;
} while (!entry->eof);
- /*
- * page and desc->pvec.pages[0] are valid, don't need to check
- * whether or not to be NULL.
- */
- copy_highpage(page, desc->pvec.pages[0]);
-
out_nopages:
if (count == 0 || (status == -EBADCOOKIE && entry->eof != 0)) {
- array = kmap_atomic(desc->pvec.pages[desc->pvec.nr]);
+ array = kmap(page);
array->eof_index = array->size;
status = 0;
- kunmap_atomic(array);
+ kunmap(page);
}
put_page(scratch);
-
- /*
- * desc->pvec.nr > 0 means at least one page was completely filled,
- * we should return -ENOSPC. Otherwise function
- * nfs_readdir_xdr_to_array will enter infinite loop.
- */
- if (desc->pvec.nr > 0)
- return -ENOSPC;
return status;
}
@@ -627,24 +596,6 @@ int nfs_readdir_alloc_pages(struct page **pages, unsigned int npages)
return -ENOMEM;
}
-/*
- * nfs_readdir_rapages_init initialize rapages by nfs_cache_array structure.
- */
-static
-void nfs_readdir_rapages_init(nfs_readdir_descriptor_t *desc)
-{
- struct nfs_cache_array *array;
- int max_rapages = NFS_MAX_READDIR_RAPAGES;
- int index;
-
- for (index = 0; index < max_rapages; index++) {
- array = kmap_atomic(desc->pvec.pages[index]);
- memset(array, 0, sizeof(struct nfs_cache_array));
- array->eof_index = -1;
- kunmap_atomic(array);
- }
-}
-
static
int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page, struct inode *inode)
{
@@ -655,12 +606,6 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
int status = -ENOMEM;
unsigned int array_size = ARRAY_SIZE(pages);
- /*
- * This means we hit readdir rdpages miss, the preallocated rdpages
- * are useless, the preallocate rdpages should be reinitialized.
- */
- nfs_readdir_rapages_init(desc);
-
entry.prev_cookie = 0;
entry.cookie = desc->last_cookie;
entry.eof = 0;
@@ -721,24 +666,9 @@ int nfs_readdir_filler(void *data, struct page* page)
struct inode *inode = file_inode(desc->file);
int ret;
- /*
- * If desc->page_index in range desc->pvec.index and
- * desc->pvec.index + desc->pvec.nr, we get readdir cache hit.
- */
- if (desc->page_index >= desc->pvec.index &&
- desc->page_index < (desc->pvec.index + desc->pvec.nr)) {
- /*
- * page and desc->pvec.pages[x] are valid, don't need to check
- * whether or not to be NULL.
- */
- copy_highpage(page, desc->pvec.pages[desc->page_index - desc->pvec.index]);
- ret = 0;
- } else {
- ret = nfs_readdir_xdr_to_array(desc, page, inode);
- if (ret < 0)
- goto error;
- }
-
+ ret = nfs_readdir_xdr_to_array(desc, page, inode);
+ if (ret < 0)
+ goto error;
SetPageUptodate(page);
if (invalidate_inode_pages2_range(inode->i_mapping, page->index + 1, -1) < 0) {
@@ -903,7 +833,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
*desc = &my_desc;
struct nfs_open_dir_context *dir_ctx = file->private_data;
int res = 0;
- int max_rapages = NFS_MAX_READDIR_RAPAGES;
dfprintk(FILE, "NFS: readdir(%pD2) starting at cookie %llu\n",
file, (long long)ctx->pos);
@@ -923,12 +852,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
desc->decode = NFS_PROTO(inode)->decode_dirent;
desc->plus = nfs_use_readdirplus(inode, ctx);
- res = nfs_readdir_alloc_pages(desc->pvec.pages, max_rapages);
- if (res < 0)
- return -ENOMEM;
-
- nfs_readdir_rapages_init(desc);
-
if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
res = nfs_revalidate_mapping(inode, file->f_mapping);
if (res < 0)
@@ -964,7 +887,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
break;
} while (!desc->eof);
out:
- nfs_readdir_free_pages(desc->pvec.pages, max_rapages);
if (res > 0)
res = 0;
dfprintk(FILE, "NFS: readdir(%pD2) returns %d\n", file, res);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 498fab72f70b..81e2fdff227e 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -69,8 +69,7 @@ struct nfs_clone_mount {
* Maximum number of pages that readdir can use for creating
* a vmapped array of pages.
*/
-#define NFS_MAX_READDIR_PAGES 64
-#define NFS_MAX_READDIR_RAPAGES 8
+#define NFS_MAX_READDIR_PAGES 8
struct nfs_client_initdata {
unsigned long init_flags;
--
2.20.1
Hi,
Please backport the following patch to Linux stable, OpenWrt now ships
it backported to kernel 4.14 and 4.19, but it should also work with
older kernel versions.
Upstream commit 1287533d3d95d5ad8b02773733044500b1be06bc
Hauke
On 11/16/18 5:09 PM, Sean Young wrote:
> When building BPF code using "clang -target bpf -c", clang does not
> define __linux__.
>
> To build BPF IR decoders the include linux/lirc.h is needed which
> includes linux/types.h. Currently this workaround is needed:
>
> https://git.linuxtv.org/v4l-utils.git/commit/?id=dd3ff81f58c4e1e6f33765dc61…
>
> This check might otherwise be useful to stop users from using a non-linux
> compiler, but if you're doing that you are going to have a lot more
> trouble anyway.
>
> Signed-off-by: Sean Young <sean(a)mess.org>
> ---
> arch/mips/include/uapi/asm/sgidefs.h | 8 --------
> 1 file changed, 8 deletions(-)
>
> diff --git a/arch/mips/include/uapi/asm/sgidefs.h b/arch/mips/include/uapi/asm/sgidefs.h
> index 26143e3b7c26..69c3de90c536 100644
> --- a/arch/mips/include/uapi/asm/sgidefs.h
> +++ b/arch/mips/include/uapi/asm/sgidefs.h
> @@ -11,14 +11,6 @@
> #ifndef __ASM_SGIDEFS_H
> #define __ASM_SGIDEFS_H
>
> -/*
> - * Using a Linux compiler for building Linux seems logic but not to
> - * everybody.
> - */
> -#ifndef __linux__
> -#error Use a Linux compiler or give up.
> -#endif
> -
> /*
> * Definitions for the ISA levels
> *
>
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
CC: stable(a)vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
---
The race occurs about 30 times in the first 3 minutes of copying files
via rsync and that's about it.
This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810
---
Changes:
v2:
* replaced spin_lock with cmpxchg+readonce
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..8d6569590161 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -48,6 +48,9 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
return addr;
}
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+ unsigned long size, unsigned int levels);
+
static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
{
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -57,9 +60,9 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
while (level) {
int n = (idx & mask) >> (level * shift);
- unsigned long tce;
+ unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) {
+ if (!tce) {
__be64 *tmp2;
if (!alloc)
@@ -70,10 +73,15 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
if (!tmp2)
return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) |
- TCE_PCI_READ | TCE_PCI_WRITE);
+ tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+ oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+ cpu_to_be64(tce)));
+ if (oldtce) {
+ pnv_pci_ioda2_table_do_free_pages(tmp2,
+ ilog2(tbl->it_level_size) + 3, 1);
+ tce = oldtce;
+ }
}
- tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;
--
2.17.1
pnv_tce() returns a pointer to a TCE entry and originally a TCE table
would be pre-allocated. For the default case of 2GB window the table
needs only a single level and that is fine. However if more levels are
requested, it is possible to get a race when 2 threads want a pointer
to a TCE entry from the same page of TCEs.
This adds cmpxchg to handle the race. Note that once TCE is non-zero,
it cannot become zero again.
CC: stable(a)vger.kernel.org # v4.19+
Fixes: a68bd1267b72 ("powerpc/powernv/ioda: Allocate indirect TCE levels on demand")
Signed-off-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
---
The race occurs about 30 times in the first 3 minutes of copying files
via rsync and that's about it.
This fixes EEH's from
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=110810
---
Changes:
v2:
* replaced spin_lock with cmpxchg+readonce
---
arch/powerpc/platforms/powernv/pci-ioda-tce.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/platforms/powernv/pci-ioda-tce.c b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
index e28f03e1eb5e..8d6569590161 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda-tce.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda-tce.c
@@ -48,6 +48,9 @@ static __be64 *pnv_alloc_tce_level(int nid, unsigned int shift)
return addr;
}
+static void pnv_pci_ioda2_table_do_free_pages(__be64 *addr,
+ unsigned long size, unsigned int levels);
+
static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
{
__be64 *tmp = user ? tbl->it_userspace : (__be64 *) tbl->it_base;
@@ -57,9 +60,9 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
while (level) {
int n = (idx & mask) >> (level * shift);
- unsigned long tce;
+ unsigned long oldtce, tce = be64_to_cpu(READ_ONCE(tmp[n]));
- if (tmp[n] == 0) {
+ if (!tce) {
__be64 *tmp2;
if (!alloc)
@@ -70,10 +73,15 @@ static __be64 *pnv_tce(struct iommu_table *tbl, bool user, long idx, bool alloc)
if (!tmp2)
return NULL;
- tmp[n] = cpu_to_be64(__pa(tmp2) |
- TCE_PCI_READ | TCE_PCI_WRITE);
+ tce = __pa(tmp2) | TCE_PCI_READ | TCE_PCI_WRITE;
+ oldtce = be64_to_cpu(cmpxchg(&tmp[n], 0,
+ cpu_to_be64(tce)));
+ if (oldtce) {
+ pnv_pci_ioda2_table_do_free_pages(tmp2,
+ ilog2(tbl->it_level_size) + 3, 1);
+ tce = oldtce;
+ }
}
- tce = be64_to_cpu(tmp[n]);
tmp = __va(tce & ~(TCE_PCI_READ | TCE_PCI_WRITE));
idx &= ~mask;
--
2.17.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 024c1fd9dbcc1d8a847f1311f999d35783921b7f Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:35 -0600
Subject: [PATCH] coresight: tmc-etf: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
caller is tmc_alloc_etf_buffer+0x5c/0x60
CPU: 2 PID: 2544 Comm: perf Not tainted 5.1.0-rc6-147786-g116841e #344
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x14/0x20
dump_stack+0x9c/0xc4
debug_smp_processor_id+0x10c/0x110
tmc_alloc_etf_buffer+0x5c/0x60
etm_setup_aux+0x1c4/0x230
rb_alloc_aux+0x1b8/0x2b8
perf_mmap+0x35c/0x478
mmap_region+0x34c/0x4f0
do_mmap+0x2d8/0x418
vm_mmap_pgoff+0xd0/0xf8
ksys_mmap_pgoff+0x88/0xf8
__arm64_sys_mmap+0x28/0x38
el0_svc_handler+0xd8/0x138
el0_svc+0x8/0xc
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2e499bbc1a929ac ("coresight: tmc: implementing TMC-ETF AUX space API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.7+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-4-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index b89e29c5b39d..23b7ff00af5c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -377,12 +377,10 @@ static void *tmc_alloc_etf_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
/* Allocate memory structure for interaction with Perf */
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 024c1fd9dbcc1d8a847f1311f999d35783921b7f Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:35 -0600
Subject: [PATCH] coresight: tmc-etf: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
caller is tmc_alloc_etf_buffer+0x5c/0x60
CPU: 2 PID: 2544 Comm: perf Not tainted 5.1.0-rc6-147786-g116841e #344
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x14/0x20
dump_stack+0x9c/0xc4
debug_smp_processor_id+0x10c/0x110
tmc_alloc_etf_buffer+0x5c/0x60
etm_setup_aux+0x1c4/0x230
rb_alloc_aux+0x1b8/0x2b8
perf_mmap+0x35c/0x478
mmap_region+0x34c/0x4f0
do_mmap+0x2d8/0x418
vm_mmap_pgoff+0xd0/0xf8
ksys_mmap_pgoff+0x88/0xf8
__arm64_sys_mmap+0x28/0x38
el0_svc_handler+0xd8/0x138
el0_svc+0x8/0xc
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2e499bbc1a929ac ("coresight: tmc: implementing TMC-ETF AUX space API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.7+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-4-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index b89e29c5b39d..23b7ff00af5c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -377,12 +377,10 @@ static void *tmc_alloc_etf_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
/* Allocate memory structure for interaction with Perf */
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 024c1fd9dbcc1d8a847f1311f999d35783921b7f Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:35 -0600
Subject: [PATCH] coresight: tmc-etf: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
caller is tmc_alloc_etf_buffer+0x5c/0x60
CPU: 2 PID: 2544 Comm: perf Not tainted 5.1.0-rc6-147786-g116841e #344
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x14/0x20
dump_stack+0x9c/0xc4
debug_smp_processor_id+0x10c/0x110
tmc_alloc_etf_buffer+0x5c/0x60
etm_setup_aux+0x1c4/0x230
rb_alloc_aux+0x1b8/0x2b8
perf_mmap+0x35c/0x478
mmap_region+0x34c/0x4f0
do_mmap+0x2d8/0x418
vm_mmap_pgoff+0xd0/0xf8
ksys_mmap_pgoff+0x88/0xf8
__arm64_sys_mmap+0x28/0x38
el0_svc_handler+0xd8/0x138
el0_svc+0x8/0xc
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2e499bbc1a929ac ("coresight: tmc: implementing TMC-ETF AUX space API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.7+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-4-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index b89e29c5b39d..23b7ff00af5c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -377,12 +377,10 @@ static void *tmc_alloc_etf_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
/* Allocate memory structure for interaction with Perf */
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 024c1fd9dbcc1d8a847f1311f999d35783921b7f Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:35 -0600
Subject: [PATCH] coresight: tmc-etf: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
caller is tmc_alloc_etf_buffer+0x5c/0x60
CPU: 2 PID: 2544 Comm: perf Not tainted 5.1.0-rc6-147786-g116841e #344
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x14/0x20
dump_stack+0x9c/0xc4
debug_smp_processor_id+0x10c/0x110
tmc_alloc_etf_buffer+0x5c/0x60
etm_setup_aux+0x1c4/0x230
rb_alloc_aux+0x1b8/0x2b8
perf_mmap+0x35c/0x478
mmap_region+0x34c/0x4f0
do_mmap+0x2d8/0x418
vm_mmap_pgoff+0xd0/0xf8
ksys_mmap_pgoff+0x88/0xf8
__arm64_sys_mmap+0x28/0x38
el0_svc_handler+0xd8/0x138
el0_svc+0x8/0xc
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2e499bbc1a929ac ("coresight: tmc: implementing TMC-ETF AUX space API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.7+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-4-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etf.c b/drivers/hwtracing/coresight/coresight-tmc-etf.c
index b89e29c5b39d..23b7ff00af5c 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etf.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etf.c
@@ -377,12 +377,10 @@ static void *tmc_alloc_etf_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
/* Allocate memory structure for interaction with Perf */
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3a8710392db2c70f74aed6f06b16e8bec0f05a35 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:34 -0600
Subject: [PATCH] coresight: tmc-etr: alloc_perf_buf: Do not call
smp_processor_id from preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/1743
caller is tmc_alloc_etr_buffer+0x1bc/0x1f0
CPU: 1 PID: 1743 Comm: perf Not tainted 5.1.0-rc6-147786-g116841e #344
Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Feb 1 2019
Call trace:
dump_backtrace+0x0/0x150
show_stack+0x14/0x20
dump_stack+0x9c/0xc4
debug_smp_processor_id+0x10c/0x110
tmc_alloc_etr_buffer+0x1bc/0x1f0
etm_setup_aux+0x1c4/0x230
rb_alloc_aux+0x1b8/0x2b8
perf_mmap+0x35c/0x478
mmap_region+0x34c/0x4f0
do_mmap+0x2d8/0x418
vm_mmap_pgoff+0xd0/0xf8
ksys_mmap_pgoff+0x88/0xf8
__arm64_sys_mmap+0x28/0x38
el0_svc_handler+0xd8/0x138
el0_svc+0x8/0xc
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 22f429f19c4135d51e9 ("coresight: etm-perf: Add support for ETR backend")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.20+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-3-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 7c81f634ecb4..5d2bf6d18961 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1184,14 +1184,11 @@ static struct etr_buf *
alloc_etr_buf(struct tmc_drvdata *drvdata, struct perf_event *event,
int nr_pages, void **pages, bool snapshot)
{
- int node, cpu = event->cpu;
+ int node;
struct etr_buf *etr_buf;
unsigned long size;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
-
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
/*
* Try to match the perf ring buffer size if it is larger
* than the size requested via sysfs.
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 730766bae3280a25d40ea76a53dc6342e84e6513 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:36 -0600
Subject: [PATCH] coresight: etb10: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2997aa4063d97fdb39 ("coresight: etb10: implementing AUX API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.6+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-5-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index d5b9edecf76e..3810290e6d07 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -374,12 +374,10 @@ static void *etb_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
if (!buf)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 730766bae3280a25d40ea76a53dc6342e84e6513 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:36 -0600
Subject: [PATCH] coresight: etb10: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2997aa4063d97fdb39 ("coresight: etb10: implementing AUX API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.6+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-5-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index d5b9edecf76e..3810290e6d07 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -374,12 +374,10 @@ static void *etb_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
if (!buf)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 730766bae3280a25d40ea76a53dc6342e84e6513 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:36 -0600
Subject: [PATCH] coresight: etb10: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2997aa4063d97fdb39 ("coresight: etb10: implementing AUX API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.6+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-5-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index d5b9edecf76e..3810290e6d07 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -374,12 +374,10 @@ static void *etb_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
if (!buf)
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 730766bae3280a25d40ea76a53dc6342e84e6513 Mon Sep 17 00:00:00 2001
From: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Date: Thu, 20 Jun 2019 16:12:36 -0600
Subject: [PATCH] coresight: etb10: Do not call smp_processor_id from
preemptible
During a perf session we try to allocate buffers on the "node" associated
with the CPU the event is bound to. If it is not bound to a CPU, we
use the current CPU node, using smp_processor_id(). However this is unsafe
in a pre-emptible context and could generate the splats as below :
BUG: using smp_processor_id() in preemptible [00000000] code: perf/2544
Use NUMA_NO_NODE hint instead of using the current node for events
not bound to CPUs.
Fixes: 2997aa4063d97fdb39 ("coresight: etb10: implementing AUX API")
Cc: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose(a)arm.com>
Cc: stable <stable(a)vger.kernel.org> # 4.6+
Signed-off-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20190620221237.3536-5-mathieu.poirier@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/hwtracing/coresight/coresight-etb10.c b/drivers/hwtracing/coresight/coresight-etb10.c
index d5b9edecf76e..3810290e6d07 100644
--- a/drivers/hwtracing/coresight/coresight-etb10.c
+++ b/drivers/hwtracing/coresight/coresight-etb10.c
@@ -374,12 +374,10 @@ static void *etb_alloc_buffer(struct coresight_device *csdev,
struct perf_event *event, void **pages,
int nr_pages, bool overwrite)
{
- int node, cpu = event->cpu;
+ int node;
struct cs_buffers *buf;
- if (cpu == -1)
- cpu = smp_processor_id();
- node = cpu_to_node(cpu);
+ node = (event->cpu == -1) ? NUMA_NO_NODE : cpu_to_node(event->cpu);
buf = kzalloc_node(sizeof(struct cs_buffers), GFP_KERNEL, node);
if (!buf)
According to Bspec clock divisor registers in GeminiLake
should be initialized by shifting 1(<<) to amount of correspondent
divisor. While i915 was writing all this time that value as is.
Surprisingly that it by accident worked, until we met some issues
with Microtech Etab.
v2: Added Fixes tag and cc
v3: Added stable to cc as well.
Signed-off-by: stanislav.lisovskiy(a)intel.com
Reviewed-by: Vandita Kulkarni <vandita.kulkarni(a)intel.com>
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=108826
Fixes: bcc657004841 ("drm/i915/glk: Program txesc clock divider for GLK")
Cc: Deepak M <m.deepak(a)intel.com>
Cc: Madhav Chauhan <madhav.chauhan(a)intel.com>
Cc: Jani Nikula <jani.nikula(a)intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/display/vlv_dsi_pll.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/vlv_dsi_pll.c b/drivers/gpu/drm/i915/display/vlv_dsi_pll.c
index 99cc3e2e9c2c..f016a776a39e 100644
--- a/drivers/gpu/drm/i915/display/vlv_dsi_pll.c
+++ b/drivers/gpu/drm/i915/display/vlv_dsi_pll.c
@@ -396,8 +396,8 @@ static void glk_dsi_program_esc_clock(struct drm_device *dev,
else
txesc2_div = 10;
- I915_WRITE(MIPIO_TXESC_CLK_DIV1, txesc1_div & GLK_TX_ESC_CLK_DIV1_MASK);
- I915_WRITE(MIPIO_TXESC_CLK_DIV2, txesc2_div & GLK_TX_ESC_CLK_DIV2_MASK);
+ I915_WRITE(MIPIO_TXESC_CLK_DIV1, (1 << (txesc1_div - 1)) & GLK_TX_ESC_CLK_DIV1_MASK);
+ I915_WRITE(MIPIO_TXESC_CLK_DIV2, (1 << (txesc2_div - 1)) & GLK_TX_ESC_CLK_DIV2_MASK);
}
/* Program BXT Mipi clocks and dividers */
--
2.17.1
of_get_next_child() increments the reference count of the returning
device_node. Decrement it in the check if we are using the old or the
new DTB.
Fixes: ba1f1f70c2c0 ("[media] media: mtk-mdp: Fix mdp device tree")
Signed-off-by: Matthias Brugger <matthias.bgg(a)gmail.com>
---
drivers/media/platform/mtk-mdp/mtk_mdp_core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
index bbb24fb95b95..bafe53c5d54a 100644
--- a/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
+++ b/drivers/media/platform/mtk-mdp/mtk_mdp_core.c
@@ -118,7 +118,9 @@ static int mtk_mdp_probe(struct platform_device *pdev)
mutex_init(&mdp->vpulock);
/* Old dts had the components as child nodes */
- if (of_get_next_child(dev->of_node, NULL)) {
+ parent = of_get_next_child(dev->of_node, NULL);
+ if (parent) {
+ of_node_put(parent);
parent = dev->of_node;
dev_warn(dev, "device tree is out of date\n");
} else {
--
2.21.0
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: mm/memcontrol: fix wrong statistics in memory.stat
When we calculate total statistics for memcg1_stats and memcg1_events, we
use the the index 'i' in the for loop as the events index. Actually we
should use memcg1_stats[i] and memcg1_events[i] as the events index.
Link: http://lkml.kernel.org/r/1562116978-19539-1-git-send-email-laoar.shao@gmail…
Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty").
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Yafang Shao <shaoyafang(a)didiglobal.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-wrong-statistics-in-memorystat
+++ a/mm/memcontrol.c
@@ -3523,12 +3523,13 @@ static int memcg_stat_show(struct seq_fi
if (memcg1_stats[i] == MEMCG_SWAP && !do_memsw_account())
continue;
seq_printf(m, "total_%s %llu\n", memcg1_stat_names[i],
- (u64)memcg_page_state(memcg, i) * PAGE_SIZE);
+ (u64)memcg_page_state(memcg, memcg1_stats[i]) *
+ PAGE_SIZE);
}
for (i = 0; i < ARRAY_SIZE(memcg1_events); i++)
seq_printf(m, "total_%s %llu\n", memcg1_event_names[i],
- (u64)memcg_events(memcg, i));
+ (u64)memcg_events(memcg, memcg1_events[i]));
for (i = 0; i < NR_LRU_LISTS; i++)
seq_printf(m, "total_%s %llu\n", mem_cgroup_lru_names[i],
_
From: Hongjie Fang <hongjiefang(a)asrmicro.com>
commit 5858bdad4d0d0fc18bf29f34c3ac836e0b59441f upstream.
[Please apply to 4.9-stable.]
The directory may have been removed when entering
fscrypt_ioctl_set_policy(). If so, the empty_dir() check will return
error for ext4 file system.
ext4_rmdir() sets i_size = 0, then ext4_empty_dir() reports an error
because 'inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2)'. If
the fs is mounted with errors=panic, it will trigger a panic issue.
Add the check IS_DEADDIR() to fix this problem.
Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support")
Cc: <stable(a)vger.kernel.org> # v4.1+
Signed-off-by: Hongjie Fang <hongjiefang(a)asrmicro.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/crypto/policy.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index c160d2d0e18d77..57a97b38a2fa2c 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -114,6 +114,8 @@ int fscrypt_process_policy(struct file *filp,
if (!inode_has_encryption_context(inode)) {
if (!S_ISDIR(inode->i_mode))
ret = -ENOTDIR;
+ else if (IS_DEADDIR(inode))
+ ret = -ENOENT;
else if (!inode->i_sb->s_cop->empty_dir)
ret = -EOPNOTSUPP;
else if (!inode->i_sb->s_cop->empty_dir(inode))
--
2.22.0.410.gd8fdbe21b5-goog
From: Hongjie Fang <hongjiefang(a)asrmicro.com>
commit 5858bdad4d0d0fc18bf29f34c3ac836e0b59441f upstream.
[Please apply to 4.4-stable.]
The directory may have been removed when entering
fscrypt_ioctl_set_policy(). If so, the empty_dir() check will return
error for ext4 file system.
ext4_rmdir() sets i_size = 0, then ext4_empty_dir() reports an error
because 'inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2)'. If
the fs is mounted with errors=panic, it will trigger a panic issue.
Add the check IS_DEADDIR() to fix this problem.
Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support")
Cc: <stable(a)vger.kernel.org> # v4.1+
Signed-off-by: Hongjie Fang <hongjiefang(a)asrmicro.com>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/ext4/crypto_policy.c | 2 ++
fs/f2fs/crypto_policy.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/fs/ext4/crypto_policy.c b/fs/ext4/crypto_policy.c
index e4f4fc4e56abee..77bd7bfb632913 100644
--- a/fs/ext4/crypto_policy.c
+++ b/fs/ext4/crypto_policy.c
@@ -111,6 +111,8 @@ int ext4_process_policy(const struct ext4_encryption_policy *policy,
if (!ext4_inode_has_encryption_context(inode)) {
if (!S_ISDIR(inode->i_mode))
return -EINVAL;
+ if (IS_DEADDIR(inode))
+ return -ENOENT;
if (!ext4_empty_dir(inode))
return -ENOTEMPTY;
return ext4_create_encryption_context_from_policy(inode,
diff --git a/fs/f2fs/crypto_policy.c b/fs/f2fs/crypto_policy.c
index 884f3f0fe29d32..613ca32ec24887 100644
--- a/fs/f2fs/crypto_policy.c
+++ b/fs/f2fs/crypto_policy.c
@@ -99,6 +99,8 @@ int f2fs_process_policy(const struct f2fs_encryption_policy *policy,
return -EINVAL;
if (!f2fs_inode_has_encryption_context(inode)) {
+ if (IS_DEADDIR(inode))
+ return -ENOENT;
if (!f2fs_empty_dir(inode))
return -ENOTEMPTY;
return f2fs_create_encryption_context_from_policy(inode,
--
2.22.0.410.gd8fdbe21b5-goog