From: Yunfeng Ye <yeyunfeng(a)huawei.com>
[ Upstream commit 01341fbd0d8d4e717fc1231cdffe00343088ce0b ]
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
alloc_workqueue
pwq_adjust_max_active
wake_up_worker
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng(a)huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/workqueue.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 3fb2d45c0b42f..6b293804cd734 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3361,17 +3361,24 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* is updated and visible.
*/
if (!freezable || !workqueue_freezing) {
+ bool kick = false;
+
pwq->max_active = wq->saved_max_active;
while (!list_empty(&pwq->delayed_works) &&
- pwq->nr_active < pwq->max_active)
+ pwq->nr_active < pwq->max_active) {
pwq_activate_first_delayed(pwq);
+ kick = true;
+ }
/*
* Need to kick a worker after thawed or an unbound wq's
- * max_active is bumped. It's a slow path. Do it always.
+ * max_active is bumped. In realtime scenarios, always kicking a
+ * worker will cause interference on the isolated cpu cores, so
+ * let's kick iff work items were activated.
*/
- wake_up_worker(pwq->pool);
+ if (kick)
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
--
2.27.0
From: Yunfeng Ye <yeyunfeng(a)huawei.com>
[ Upstream commit 01341fbd0d8d4e717fc1231cdffe00343088ce0b ]
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
alloc_workqueue
pwq_adjust_max_active
wake_up_worker
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng(a)huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/workqueue.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 00c295d3104bb..205c3131f8b05 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3448,17 +3448,24 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* is updated and visible.
*/
if (!freezable || !workqueue_freezing) {
+ bool kick = false;
+
pwq->max_active = wq->saved_max_active;
while (!list_empty(&pwq->delayed_works) &&
- pwq->nr_active < pwq->max_active)
+ pwq->nr_active < pwq->max_active) {
pwq_activate_first_delayed(pwq);
+ kick = true;
+ }
/*
* Need to kick a worker after thawed or an unbound wq's
- * max_active is bumped. It's a slow path. Do it always.
+ * max_active is bumped. In realtime scenarios, always kicking a
+ * worker will cause interference on the isolated cpu cores, so
+ * let's kick iff work items were activated.
*/
- wake_up_worker(pwq->pool);
+ if (kick)
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
--
2.27.0
From: Yunfeng Ye <yeyunfeng(a)huawei.com>
[ Upstream commit 01341fbd0d8d4e717fc1231cdffe00343088ce0b ]
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
alloc_workqueue
pwq_adjust_max_active
wake_up_worker
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng(a)huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/workqueue.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 18fae55713b0a..79fcec674485f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3494,17 +3494,24 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* is updated and visible.
*/
if (!freezable || !workqueue_freezing) {
+ bool kick = false;
+
pwq->max_active = wq->saved_max_active;
while (!list_empty(&pwq->delayed_works) &&
- pwq->nr_active < pwq->max_active)
+ pwq->nr_active < pwq->max_active) {
pwq_activate_first_delayed(pwq);
+ kick = true;
+ }
/*
* Need to kick a worker after thawed or an unbound wq's
- * max_active is bumped. It's a slow path. Do it always.
+ * max_active is bumped. In realtime scenarios, always kicking a
+ * worker will cause interference on the isolated cpu cores, so
+ * let's kick iff work items were activated.
*/
- wake_up_worker(pwq->pool);
+ if (kick)
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
--
2.27.0
From: Yunfeng Ye <yeyunfeng(a)huawei.com>
[ Upstream commit 01341fbd0d8d4e717fc1231cdffe00343088ce0b ]
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
alloc_workqueue
pwq_adjust_max_active
wake_up_worker
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng(a)huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/workqueue.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index eef77c82d2e19..cd98ef48345e1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3554,17 +3554,24 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* is updated and visible.
*/
if (!freezable || !workqueue_freezing) {
+ bool kick = false;
+
pwq->max_active = wq->saved_max_active;
while (!list_empty(&pwq->delayed_works) &&
- pwq->nr_active < pwq->max_active)
+ pwq->nr_active < pwq->max_active) {
pwq_activate_first_delayed(pwq);
+ kick = true;
+ }
/*
* Need to kick a worker after thawed or an unbound wq's
- * max_active is bumped. It's a slow path. Do it always.
+ * max_active is bumped. In realtime scenarios, always kicking a
+ * worker will cause interference on the isolated cpu cores, so
+ * let's kick iff work items were activated.
*/
- wake_up_worker(pwq->pool);
+ if (kick)
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
--
2.27.0
From: Yunfeng Ye <yeyunfeng(a)huawei.com>
[ Upstream commit 01341fbd0d8d4e717fc1231cdffe00343088ce0b ]
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
alloc_workqueue
pwq_adjust_max_active
wake_up_worker
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng(a)huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/workqueue.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4aa268582a225..28e52657e0930 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3718,17 +3718,24 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* is updated and visible.
*/
if (!freezable || !workqueue_freezing) {
+ bool kick = false;
+
pwq->max_active = wq->saved_max_active;
while (!list_empty(&pwq->delayed_works) &&
- pwq->nr_active < pwq->max_active)
+ pwq->nr_active < pwq->max_active) {
pwq_activate_first_delayed(pwq);
+ kick = true;
+ }
/*
* Need to kick a worker after thawed or an unbound wq's
- * max_active is bumped. It's a slow path. Do it always.
+ * max_active is bumped. In realtime scenarios, always kicking a
+ * worker will cause interference on the isolated cpu cores, so
+ * let's kick iff work items were activated.
*/
- wake_up_worker(pwq->pool);
+ if (kick)
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
--
2.27.0
From: Yunfeng Ye <yeyunfeng(a)huawei.com>
[ Upstream commit 01341fbd0d8d4e717fc1231cdffe00343088ce0b ]
In realtime scenario, We do not want to have interference on the
isolated cpu cores. but when invoking alloc_workqueue() for percpu wq
on the housekeeping cpu, it kick a kworker on the isolated cpu.
alloc_workqueue
pwq_adjust_max_active
wake_up_worker
The comment in pwq_adjust_max_active() said:
"Need to kick a worker after thawed or an unbound wq's
max_active is bumped"
So it is unnecessary to kick a kworker for percpu's wq when invoking
alloc_workqueue(). this patch only kick a worker based on the actual
activation of delayed works.
Signed-off-by: Yunfeng Ye <yeyunfeng(a)huawei.com>
Reviewed-by: Lai Jiangshan <jiangshanlai(a)gmail.com>
Signed-off-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/workqueue.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 437935e7a1991..0695c7895c892 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3728,17 +3728,24 @@ static void pwq_adjust_max_active(struct pool_workqueue *pwq)
* is updated and visible.
*/
if (!freezable || !workqueue_freezing) {
+ bool kick = false;
+
pwq->max_active = wq->saved_max_active;
while (!list_empty(&pwq->delayed_works) &&
- pwq->nr_active < pwq->max_active)
+ pwq->nr_active < pwq->max_active) {
pwq_activate_first_delayed(pwq);
+ kick = true;
+ }
/*
* Need to kick a worker after thawed or an unbound wq's
- * max_active is bumped. It's a slow path. Do it always.
+ * max_active is bumped. In realtime scenarios, always kicking a
+ * worker will cause interference on the isolated cpu cores, so
+ * let's kick iff work items were activated.
*/
- wake_up_worker(pwq->pool);
+ if (kick)
+ wake_up_worker(pwq->pool);
} else {
pwq->max_active = 0;
}
--
2.27.0
It sounds unwise to let user space pass an unchecked 32-bit
offset into a kernel structure in an ioctl. This is an unsigned
variable, so checking the upper bound for the size of the structure
it points into is sufficient to avoid data corruption, but as
the pointer might also be unaligned, it has to be written carefully
as well.
While I stumbled over this problem by reading the code, I did not
continue checking the function for further problems like it.
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 861f7140f52e..c3de69f3bee8 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -8095,7 +8095,7 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
int error = 0, i;
void *sense = NULL;
dma_addr_t sense_handle;
- unsigned long *sense_ptr;
+ void *sense_ptr;
u32 opcode = 0;
int ret = DCMD_SUCCESS;
@@ -8218,6 +8218,12 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
}
if (ioc->sense_len) {
+ /* make sure the pointer is part of the frame */
+ if (ioc->sense_off > (sizeof(union megasas_frame) - sizeof(__le64))) {
+ error = -EINVAL;
+ goto out;
+ }
+
sense = dma_alloc_coherent(&instance->pdev->dev, ioc->sense_len,
&sense_handle, GFP_KERNEL);
if (!sense) {
@@ -8225,12 +8231,11 @@ megasas_mgmt_fw_ioctl(struct megasas_instance *instance,
goto out;
}
- sense_ptr =
- (unsigned long *) ((unsigned long)cmd->frame + ioc->sense_off);
+ sense_ptr = (void *)cmd->frame + ioc->sense_off;
if (instance->consistent_mask_64bit)
- *sense_ptr = cpu_to_le64(sense_handle);
+ put_unaligned_le64(sense_handle, sense_ptr);
else
- *sense_ptr = cpu_to_le32(sense_handle);
+ put_unaligned_le32(sense_handle, sense_ptr);
}
/*
--
2.27.0
On 2020-12-29 9:54 a.m., Deucher, Alexander wrote:
> [AMD Public Use]
>
>
> I don't know if these fixes related to modifiers make sense in the
> pre-modifier code base. Bas, Nick?
>
> Alex
Mesa should be the only userspace trying to make use of DCC and it
doesn't do it for video formats. From the kernel side of things we've
also never supported this and you'd get corruption on the screen if you
tried.
It's a "fix" for both pre-modifiers and post-modifiers code.
Regards,
Nicholas Kazlauskas
> ------------------------------------------------------------------------
> *From:* amd-gfx <amd-gfx-bounces(a)lists.freedesktop.org> on behalf of
> Sasha Levin <sashal(a)kernel.org>
> *Sent:* Tuesday, December 22, 2020 9:16 PM
> *To:* linux-kernel(a)vger.kernel.org <linux-kernel(a)vger.kernel.org>;
> stable(a)vger.kernel.org <stable(a)vger.kernel.org>
> *Cc:* Sasha Levin <sashal(a)kernel.org>; dri-devel(a)lists.freedesktop.org
> <dri-devel(a)lists.freedesktop.org>; amd-gfx(a)lists.freedesktop.org
> <amd-gfx(a)lists.freedesktop.org>; Bas Nieuwenhuizen
> <bas(a)basnieuwenhuizen.nl>; Deucher, Alexander
> <Alexander.Deucher(a)amd.com>; Kazlauskas, Nicholas
> <Nicholas.Kazlauskas(a)amd.com>
> *Subject:* [PATCH AUTOSEL 5.4 006/130] drm/amd/display: Do not silently
> accept DCC for multiplane formats.
> From: Bas Nieuwenhuizen <bas(a)basnieuwenhuizen.nl>
>
> [ Upstream commit b35ce7b364ec80b54f48a8fdf9fb74667774d2da ]
>
> Silently accepting it could result in corruption.
>
> Signed-off-by: Bas Nieuwenhuizen <bas(a)basnieuwenhuizen.nl>
> Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
> ---
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d2dd387c95d86..ce70c42a2c3ec 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -2734,7 +2734,7 @@ fill_plane_dcc_attributes(struct amdgpu_device *adev,
> return 0;
>
> if (format >= SURFACE_PIXEL_FORMAT_VIDEO_BEGIN)
> - return 0;
> + return -EINVAL;
>
> if (!dc->cap_funcs.get_dcc_compression_cap)
> return -EINVAL;
> --
> 2.27.0
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx(a)lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre…
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.fre…>
Commit 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in
shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing
some infrastructure we have in place to flush inodes that we use for
device replace and snapshot. However this introduced a pretty serious
performance regression. The root cause is because before we would
generally use the normal writeback path to reclaim delalloc space, and
for this we would provide it with the number of pages we wanted to
flush. The referenced commit changed this to flush that many inodes,
which drastically increased the amount of space we were flushing in
certain cases, which severely affected performance.
We cannot revert this patch unfortunately, because Filipe has another
fix that requires the ability to skip flushing inodes that are being
cloned in certain scenarios, which means we need to keep using our
flushing infrastructure or risk re-introducing the deadlock.
Instead to fix this problem we can go back to providing
btrfs_start_delalloc_roots with a number of pages to flush, and then set
up a writeback_control and utilize sync_inode() to handle the flushing
for us. This gives us the same behavior we had prior to the fix, while
still allowing us to avoid the deadlock that was fixed by Filipe. The
user reported reproducer was a simple untarring of a large tarball of
the source code for Firefox. The results are as follows
5.9 0m54.258s
5.10 1m26.212s
Patch 0m38.800s
We are significantly faster because of the work I did around improving
ENOSPC flushing in 5.10 and 5.11, so reverting to the previous write out
behavior gave us a pretty big boost.
CC: stable(a)vger.kernel.org # 5.10
Reported-by: René Rebe <rene(a)exactcode.de>
Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
fs/btrfs/inode.c | 60 +++++++++++++++++++++++++++++++------------
fs/btrfs/space-info.c | 4 ++-
2 files changed, 46 insertions(+), 18 deletions(-)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 070716650df8..a8e0a6b038d3 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -9390,7 +9390,8 @@ static struct btrfs_delalloc_work *btrfs_alloc_delalloc_work(struct inode *inode
* some fairly slow code that needs optimization. This walks the list
* of all the inodes with pending delalloc and forces them to disk.
*/
-static int start_delalloc_inodes(struct btrfs_root *root, u64 *nr, bool snapshot,
+static int start_delalloc_inodes(struct btrfs_root *root,
+ struct writeback_control *wbc, bool snapshot,
bool in_reclaim_context)
{
struct btrfs_inode *binode;
@@ -9399,6 +9400,7 @@ static int start_delalloc_inodes(struct btrfs_root *root, u64 *nr, bool snapshot
struct list_head works;
struct list_head splice;
int ret = 0;
+ bool full_flush = wbc->nr_to_write == LONG_MAX;
INIT_LIST_HEAD(&works);
INIT_LIST_HEAD(&splice);
@@ -9427,18 +9429,24 @@ static int start_delalloc_inodes(struct btrfs_root *root, u64 *nr, bool snapshot
if (snapshot)
set_bit(BTRFS_INODE_SNAPSHOT_FLUSH,
&binode->runtime_flags);
- work = btrfs_alloc_delalloc_work(inode);
- if (!work) {
- iput(inode);
- ret = -ENOMEM;
- goto out;
- }
- list_add_tail(&work->list, &works);
- btrfs_queue_work(root->fs_info->flush_workers,
- &work->work);
- if (*nr != U64_MAX) {
- (*nr)--;
- if (*nr == 0)
+ if (full_flush) {
+ work = btrfs_alloc_delalloc_work(inode);
+ if (!work) {
+ iput(inode);
+ ret = -ENOMEM;
+ goto out;
+ }
+ list_add_tail(&work->list, &works);
+ btrfs_queue_work(root->fs_info->flush_workers,
+ &work->work);
+ } else {
+ ret = sync_inode(inode, wbc);
+ if (!ret &&
+ test_bit(BTRFS_INODE_HAS_ASYNC_EXTENT,
+ &BTRFS_I(inode)->runtime_flags))
+ ret = sync_inode(inode, wbc);
+ btrfs_add_delayed_iput(inode);
+ if (ret || wbc->nr_to_write <= 0)
goto out;
}
cond_resched();
@@ -9464,18 +9472,29 @@ static int start_delalloc_inodes(struct btrfs_root *root, u64 *nr, bool snapshot
int btrfs_start_delalloc_snapshot(struct btrfs_root *root)
{
+ struct writeback_control wbc = {
+ .nr_to_write = LONG_MAX,
+ .sync_mode = WB_SYNC_NONE,
+ .range_start = 0,
+ .range_end = LLONG_MAX,
+ };
struct btrfs_fs_info *fs_info = root->fs_info;
- u64 nr = U64_MAX;
if (test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state))
return -EROFS;
- return start_delalloc_inodes(root, &nr, true, false);
+ return start_delalloc_inodes(root, &wbc, true, false);
}
int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
bool in_reclaim_context)
{
+ struct writeback_control wbc = {
+ .nr_to_write = (nr == U64_MAX) ? LONG_MAX : (unsigned long)nr,
+ .sync_mode = WB_SYNC_NONE,
+ .range_start = 0,
+ .range_end = LLONG_MAX,
+ };
struct btrfs_root *root;
struct list_head splice;
int ret;
@@ -9489,6 +9508,13 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
spin_lock(&fs_info->delalloc_root_lock);
list_splice_init(&fs_info->delalloc_roots, &splice);
while (!list_empty(&splice) && nr) {
+ /*
+ * Reset nr_to_write here so we know that we're doing a full
+ * flush.
+ */
+ if (nr == U64_MAX)
+ wbc.nr_to_write = LONG_MAX;
+
root = list_first_entry(&splice, struct btrfs_root,
delalloc_root);
root = btrfs_grab_root(root);
@@ -9497,9 +9523,9 @@ int btrfs_start_delalloc_roots(struct btrfs_fs_info *fs_info, u64 nr,
&fs_info->delalloc_roots);
spin_unlock(&fs_info->delalloc_root_lock);
- ret = start_delalloc_inodes(root, &nr, false, in_reclaim_context);
+ ret = start_delalloc_inodes(root, &wbc, false, in_reclaim_context);
btrfs_put_root(root);
- if (ret < 0)
+ if (ret < 0 || wbc.nr_to_write <= 0)
goto out;
spin_lock(&fs_info->delalloc_root_lock);
}
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 67e55c5479b8..e8347461c8dd 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -532,7 +532,9 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
loops = 0;
while ((delalloc_bytes || dio_bytes) && loops < 3) {
- btrfs_start_delalloc_roots(fs_info, items, true);
+ u64 nr_pages = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
+
+ btrfs_start_delalloc_roots(fs_info, nr_pages, true);
loops++;
if (wait_ordered && !trans) {
--
2.26.2
There is one GSWIP_MII_CFG register for each switch-port except the CPU
port. The register offset for the first port is 0x0, 0x02 for the
second, 0x04 for the third and so on.
Update the driver to not only restrict the GSWIP_MII_CFG registers to
ports 0, 1 and 5. Handle ports 0..5 instead but skip the CPU port. This
means we are not overwriting the configuration for the third port (port
two since we start counting from zero) with the settings for the sixth
port (with number five) anymore.
The GSWIP_MII_PCDU(p) registers are not updated because there's really
only three (one for each of the following ports: 0, 1, 5).
Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200")
Cc: stable(a)vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
---
drivers/net/dsa/lantiq_gswip.c | 23 ++++++-----------------
1 file changed, 6 insertions(+), 17 deletions(-)
diff --git a/drivers/net/dsa/lantiq_gswip.c b/drivers/net/dsa/lantiq_gswip.c
index 5d378c8026f0..4b36d89bec06 100644
--- a/drivers/net/dsa/lantiq_gswip.c
+++ b/drivers/net/dsa/lantiq_gswip.c
@@ -92,9 +92,7 @@
GSWIP_MDIO_PHY_FDUP_MASK)
/* GSWIP MII Registers */
-#define GSWIP_MII_CFG0 0x00
-#define GSWIP_MII_CFG1 0x02
-#define GSWIP_MII_CFG5 0x04
+#define GSWIP_MII_CFGp(p) (0x2 * (p))
#define GSWIP_MII_CFG_EN BIT(14)
#define GSWIP_MII_CFG_LDCLKDIS BIT(12)
#define GSWIP_MII_CFG_MODE_MIIP 0x0
@@ -392,17 +390,9 @@ static void gswip_mii_mask(struct gswip_priv *priv, u32 clear, u32 set,
static void gswip_mii_mask_cfg(struct gswip_priv *priv, u32 clear, u32 set,
int port)
{
- switch (port) {
- case 0:
- gswip_mii_mask(priv, clear, set, GSWIP_MII_CFG0);
- break;
- case 1:
- gswip_mii_mask(priv, clear, set, GSWIP_MII_CFG1);
- break;
- case 5:
- gswip_mii_mask(priv, clear, set, GSWIP_MII_CFG5);
- break;
- }
+ /* There's no MII_CFG register for the CPU port */
+ if (!dsa_is_cpu_port(priv->ds, port))
+ gswip_mii_mask(priv, clear, set, GSWIP_MII_CFGp(port));
}
static void gswip_mii_mask_pcdu(struct gswip_priv *priv, u32 clear, u32 set,
@@ -822,9 +812,8 @@ static int gswip_setup(struct dsa_switch *ds)
gswip_mdio_mask(priv, 0xff, 0x09, GSWIP_MDIO_MDC_CFG1);
/* Disable the xMII link */
- gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, 0);
- gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, 1);
- gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, 5);
+ for (i = 0; i < priv->hw_info->max_ports; i++)
+ gswip_mii_mask_cfg(priv, GSWIP_MII_CFG_EN, 0, i);
/* enable special tag insertion on cpu port */
gswip_switch_mask(priv, 0, GSWIP_FDMA_PCTRL_STEN,
--
2.30.0
This is a revert for 38d715f494f2 ("btrfs: use
btrfs_start_delalloc_roots in shrink_delalloc"). A user reported a
problem where performance was significantly worse with this patch
applied. The problem needs to be fixed with proper pre-flushing, and
changes to how we deal with the work queues for the inodes. However
that work is much more complicated than is acceptable for stable, and
simply reverting this patch fixes the problem. The original patch was
a cleanup of the code, so it's fine to revert it. My numbers for the
original reported test, which was untarring a copy of the firefox
sources, are as follows
5.9 0m54.258s
5.10 1m26.212s
Fix 0m35.038s
cc: stable(a)vger.kernel.org # 5.10
Reported-by: René Rebe <rene(a)exactcode.de>
Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc")
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
---
Dave, this is ontop of linus's branch, because we've changed the arguments for
btrfs_start_delalloc_roots in misc-next, and this needs to go back to 5.10 ASAP.
I can send a misc-next version if you want to have it there as well while we're
waiting for it to go into linus's tree, just let me know.
fs/btrfs/space-info.c | 54 ++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 53 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 64099565ab8f..a2b322275b8d 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -465,6 +465,28 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info,
up_read(&info->groups_sem);
}
+static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info,
+ unsigned long nr_pages, u64 nr_items)
+{
+ struct super_block *sb = fs_info->sb;
+
+ if (down_read_trylock(&sb->s_umount)) {
+ writeback_inodes_sb_nr(sb, nr_pages, WB_REASON_FS_FREE_SPACE);
+ up_read(&sb->s_umount);
+ } else {
+ /*
+ * We needn't worry the filesystem going from r/w to r/o though
+ * we don't acquire ->s_umount mutex, because the filesystem
+ * should guarantee the delalloc inodes list be empty after
+ * the filesystem is readonly(all dirty pages are written to
+ * the disk).
+ */
+ btrfs_start_delalloc_roots(fs_info, nr_items);
+ if (!current->journal_info)
+ btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1);
+ }
+}
+
static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info,
u64 to_reclaim)
{
@@ -490,8 +512,10 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
struct btrfs_trans_handle *trans;
u64 delalloc_bytes;
u64 dio_bytes;
+ u64 async_pages;
u64 items;
long time_left;
+ unsigned long nr_pages;
int loops;
/* Calc the number of the pages we need flush for space reservation */
@@ -532,8 +556,36 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info,
loops = 0;
while ((delalloc_bytes || dio_bytes) && loops < 3) {
- btrfs_start_delalloc_roots(fs_info, items);
+ nr_pages = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT;
+
+ /*
+ * Triggers inode writeback for up to nr_pages. This will invoke
+ * ->writepages callback and trigger delalloc filling
+ * (btrfs_run_delalloc_range()).
+ */
+ btrfs_writeback_inodes_sb_nr(fs_info, nr_pages, items);
+ /*
+ * We need to wait for the compressed pages to start before
+ * we continue.
+ */
+ async_pages = atomic_read(&fs_info->async_delalloc_pages);
+ if (!async_pages)
+ goto skip_async;
+
+ /*
+ * Calculate how many compressed pages we want to be written
+ * before we continue. I.e if there are more async pages than we
+ * require wait_event will wait until nr_pages are written.
+ */
+ if (async_pages <= nr_pages)
+ async_pages = 0;
+ else
+ async_pages -= nr_pages;
+ wait_event(fs_info->async_submit_wait,
+ atomic_read(&fs_info->async_delalloc_pages) <=
+ (int)async_pages);
+skip_async:
loops++;
if (wait_ordered && !trans) {
btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1);
--
2.26.2
This is a note to let you know that I've just added the patch titled
usb: gadget: configfs: Preserve function ordering after bind failure
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 6cd0fe91387917be48e91385a572a69dfac2f3f7 Mon Sep 17 00:00:00 2001
From: Chandana Kishori Chiluveru <cchiluve(a)codeaurora.org>
Date: Tue, 29 Dec 2020 14:44:43 -0800
Subject: usb: gadget: configfs: Preserve function ordering after bind failure
When binding the ConfigFS gadget to a UDC, the functions in each
configuration are added in list order. However, if usb_add_function()
fails, the failed function is put back on its configuration's
func_list and purge_configs_funcs() is called to further clean up.
purge_configs_funcs() iterates over the configurations and functions
in forward order, calling unbind() on each of the previously added
functions. But after doing so, each function gets moved to the
tail of the configuration's func_list. This results in reshuffling
the original order of the functions within a configuration such
that the failed function now appears first even though it may have
originally appeared in the middle or even end of the list. At this
point if the ConfigFS gadget is attempted to re-bind to the UDC,
the functions will be added in a different order than intended,
with the only recourse being to remove and relink the functions all
over again.
An example of this as follows:
ln -s functions/mass_storage.0 configs/c.1
ln -s functions/ncm.0 configs/c.1
ln -s functions/ffs.adb configs/c.1 # oops, forgot to start adbd
echo "<udc device>" > UDC # fails
start adbd
echo "<udc device>" > UDC # now succeeds, but...
# bind order is
# "ADB", mass_storage, ncm
[30133.118289] configfs-gadget gadget: adding 'Mass Storage Function'/ffffff810af87200 to config 'c'/ffffff817d6a2520
[30133.119875] configfs-gadget gadget: adding 'cdc_network'/ffffff80f48d1a00 to config 'c'/ffffff817d6a2520
[30133.119974] using random self ethernet address
[30133.120002] using random host ethernet address
[30133.139604] usb0: HOST MAC 3e:27:46:ba:3e:26
[30133.140015] usb0: MAC 6e:28:7e:42:66:6a
[30133.140062] configfs-gadget gadget: adding 'Function FS Gadget'/ffffff80f3868438 to config 'c'/ffffff817d6a2520
[30133.140081] configfs-gadget gadget: adding 'Function FS Gadget'/ffffff80f3868438 --> -19
[30133.140098] configfs-gadget gadget: unbind function 'Mass Storage Function'/ffffff810af87200
[30133.140119] configfs-gadget gadget: unbind function 'cdc_network'/ffffff80f48d1a00
[30133.173201] configfs-gadget a600000.dwc3: failed to start g1: -19
[30136.661933] init: starting service 'adbd'...
[30136.700126] read descriptors
[30136.700413] read strings
[30138.574484] configfs-gadget gadget: adding 'Function FS Gadget'/ffffff80f3868438 to config 'c'/ffffff817d6a2520
[30138.575497] configfs-gadget gadget: adding 'Mass Storage Function'/ffffff810af87200 to config 'c'/ffffff817d6a2520
[30138.575554] configfs-gadget gadget: adding 'cdc_network'/ffffff80f48d1a00 to config 'c'/ffffff817d6a2520
[30138.575631] using random self ethernet address
[30138.575660] using random host ethernet address
[30138.595338] usb0: HOST MAC 2e:cf:43:cd:ca:c8
[30138.597160] usb0: MAC 6a:f0:9f:ee:82:a0
[30138.791490] configfs-gadget gadget: super-speed config #1: c
Fix this by reversing the iteration order of the functions in
purge_config_funcs() when unbinding them, and adding them back to
the config's func_list at the head instead of the tail. This
ensures that we unbind and unwind back to the original list order.
Fixes: 88af8bbe4ef7 ("usb: gadget: the start of the configfs interface")
Signed-off-by: Chandana Kishori Chiluveru <cchiluve(a)codeaurora.org>
Signed-off-by: Jack Pham <jackp(a)codeaurora.org>
Reviewed-by: Peter Chen <peter.chen(a)nxp.com>
Link: https://lore.kernel.org/r/20201229224443.31623-1-jackp@codeaurora.org
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/configfs.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
index d9743f4b56c3..408a5332a975 100644
--- a/drivers/usb/gadget/configfs.c
+++ b/drivers/usb/gadget/configfs.c
@@ -1255,9 +1255,9 @@ static void purge_configs_funcs(struct gadget_info *gi)
cfg = container_of(c, struct config_usb_cfg, c);
- list_for_each_entry_safe(f, tmp, &c->functions, list) {
+ list_for_each_entry_safe_reverse(f, tmp, &c->functions, list) {
- list_move_tail(&f->list, &cfg->func_list);
+ list_move(&f->list, &cfg->func_list);
if (f->unbind) {
dev_dbg(&gi->cdev.gadget->dev,
"unbind function '%s'/%p\n",
--
2.30.0
This is a note to let you know that I've just added the patch titled
usb: gadget: select CONFIG_CRC32
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From d7889c2020e08caab0d7e36e947f642d91015bd0 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann <arnd(a)arndb.de>
Date: Sun, 3 Jan 2021 22:42:17 +0100
Subject: usb: gadget: select CONFIG_CRC32
Without crc32 support, this driver fails to link:
arm-linux-gnueabi-ld: drivers/usb/gadget/function/f_eem.o: in function `eem_unwrap':
f_eem.c:(.text+0x11cc): undefined reference to `crc32_le'
arm-linux-gnueabi-ld: drivers/usb/gadget/function/f_ncm.o:f_ncm.c:(.text+0x1e40):
more undefined references to `crc32_le' follow
Fixes: 6d3865f9d41f ("usb: gadget: NCM: Add transmit multi-frame.")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210103214224.1996535-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
index 7e47e6223089..2d152571a7de 100644
--- a/drivers/usb/gadget/Kconfig
+++ b/drivers/usb/gadget/Kconfig
@@ -265,6 +265,7 @@ config USB_CONFIGFS_NCM
depends on NET
select USB_U_ETHER
select USB_F_NCM
+ select CRC32
help
NCM is an advanced protocol for Ethernet encapsulation, allows
grouping of several ethernet frames into one USB transfer and
@@ -314,6 +315,7 @@ config USB_CONFIGFS_EEM
depends on NET
select USB_U_ETHER
select USB_F_EEM
+ select CRC32
help
CDC EEM is a newer USB standard that is somewhat simpler than CDC ECM
and therefore can be supported by more hardware. Technically ECM and
--
2.30.0
This is a note to let you know that I've just added the patch titled
usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From a1383b3537a7bea1c213baa7878ccc4ecf4413b5 Mon Sep 17 00:00:00 2001
From: Wesley Cheng <wcheng(a)codeaurora.org>
Date: Tue, 29 Dec 2020 15:00:37 -0800
Subject: usb: dwc3: gadget: Restart DWC3 gadget when enabling pullup
usb_gadget_deactivate/usb_gadget_activate does not execute the UDC start
operation, which may leave EP0 disabled and event IRQs disabled when
re-activating the function. Move the enabling/disabling of USB EP0 and
device event IRQs to be performed in the pullup routine.
Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller")
Tested-by: Michael Tretter <m.tretter(a)pengutronix.de>
Cc: stable <stable(a)vger.kernel.org>
Reported-by: Michael Tretter <m.tretter(a)pengutronix.de>
Signed-off-by: Wesley Cheng <wcheng(a)codeaurora.org>
Link: https://lore.kernel.org/r/1609282837-21666-1-git-send-email-wcheng@codeauro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/dwc3/gadget.c | 14 +++-----------
1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 78cb4db8a6e4..25f654b79e48 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2083,6 +2083,7 @@ static int dwc3_gadget_run_stop(struct dwc3 *dwc, int is_on, int suspend)
static void dwc3_gadget_disable_irq(struct dwc3 *dwc);
static void __dwc3_gadget_stop(struct dwc3 *dwc);
+static int __dwc3_gadget_start(struct dwc3 *dwc);
static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
{
@@ -2145,6 +2146,8 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) %
dwc->ev_buf->length;
}
+ } else {
+ __dwc3_gadget_start(dwc);
}
ret = dwc3_gadget_run_stop(dwc, is_on, false);
@@ -2319,10 +2322,6 @@ static int dwc3_gadget_start(struct usb_gadget *g,
}
dwc->gadget_driver = driver;
-
- if (pm_runtime_active(dwc->dev))
- __dwc3_gadget_start(dwc);
-
spin_unlock_irqrestore(&dwc->lock, flags);
return 0;
@@ -2348,13 +2347,6 @@ static int dwc3_gadget_stop(struct usb_gadget *g)
unsigned long flags;
spin_lock_irqsave(&dwc->lock, flags);
-
- if (pm_runtime_suspended(dwc->dev))
- goto out;
-
- __dwc3_gadget_stop(dwc);
-
-out:
dwc->gadget_driver = NULL;
spin_unlock_irqrestore(&dwc->lock, flags);
--
2.30.0
This is a note to let you know that I've just added the patch titled
usb: usbip: vhci_hcd: protect shift size
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From 718bf42b119de652ebcc93655a1f33a9c0d04b3c Mon Sep 17 00:00:00 2001
From: Randy Dunlap <rdunlap(a)infradead.org>
Date: Mon, 28 Dec 2020 23:13:09 -0800
Subject: usb: usbip: vhci_hcd: protect shift size
Fix shift out-of-bounds in vhci_hcd.c:
UBSAN: shift-out-of-bounds in ../drivers/usb/usbip/vhci_hcd.c:399:41
shift exponent 768 is too large for 32-bit type 'int'
Fixes: 03cd00d538a6 ("usbip: vhci-hcd: Set the vhci structure up to work")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Reported-by: syzbot+297d20e437b79283bf6d(a)syzkaller.appspotmail.com
Cc: Yuyang Du <yuyang.du(a)intel.com>
Cc: Shuah Khan <shuahkh(a)osg.samsung.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20201229071309.18418-1-rdunlap@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/usbip/vhci_hcd.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c
index 66cde5e5f796..3209b5ddd30c 100644
--- a/drivers/usb/usbip/vhci_hcd.c
+++ b/drivers/usb/usbip/vhci_hcd.c
@@ -396,6 +396,8 @@ static int vhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
default:
usbip_dbg_vhci_rh(" ClearPortFeature: default %x\n",
wValue);
+ if (wValue >= 32)
+ goto error;
vhci_hcd->port_status[rhport] &= ~(1 << wValue);
break;
}
--
2.30.0
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit d6d51a96c7d63b7450860a3037f2d62388286a52 ]
Functions like memset()/memmove()/memcpy() do a lot of memory
accesses.
If a bad pointer is passed to one of these functions it is important
to catch this. Compiler instrumentation cannot do this since these
functions are written in assembly.
KASan replaces these memory functions with instrumented variants.
The original functions are declared as weak symbols so that
the strong definitions in mm/kasan/kasan.c can replace them.
The original functions have aliases with a '__' prefix in their
name, so we can call the non-instrumented variant if needed.
We must use __memcpy()/__memset() in place of memcpy()/memset()
when we copy .data to RAM and when we clear .bss, because
kasan_early_init cannot be called before the initialization of
.data and .bss.
For the kernel compression and EFI libstub's custom string
libraries we need a special quirk: even if these are built
without KASan enabled, they rely on the global headers for their
custom string libraries, which means that e.g. memcpy()
will be defined to __memcpy() and we get link failures.
Since these implementations are written i C rather than
assembly we use e.g. __alias(memcpy) to redirected any
users back to the local implementation.
Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: kasan-dev(a)googlegroups.com
Reviewed-by: Ard Biesheuvel <ardb(a)kernel.org>
Tested-by: Ard Biesheuvel <ardb(a)kernel.org> # QEMU/KVM/mach-virt/LPAE/8G
Tested-by: Florian Fainelli <f.fainelli(a)gmail.com> # Brahma SoCs
Tested-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de> # i.MX6Q
Reported-by: Russell King - ARM Linux <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Ahmad Fatoum <a.fatoum(a)pengutronix.de>
Signed-off-by: Abbott Liu <liuwenliang(a)huawei.com>
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/boot/compressed/string.c | 19 +++++++++++++++++++
arch/arm/include/asm/string.h | 26 ++++++++++++++++++++++++++
arch/arm/kernel/head-common.S | 4 ++--
arch/arm/lib/memcpy.S | 3 +++
arch/arm/lib/memmove.S | 5 ++++-
arch/arm/lib/memset.S | 3 +++
6 files changed, 57 insertions(+), 3 deletions(-)
diff --git a/arch/arm/boot/compressed/string.c b/arch/arm/boot/compressed/string.c
index ade5079bebbf9..8c0fa276d9946 100644
--- a/arch/arm/boot/compressed/string.c
+++ b/arch/arm/boot/compressed/string.c
@@ -7,6 +7,25 @@
#include <linux/string.h>
+/*
+ * The decompressor is built without KASan but uses the same redirects as the
+ * rest of the kernel when CONFIG_KASAN is enabled, defining e.g. memcpy()
+ * to __memcpy() but since we are not linking with the main kernel string
+ * library in the decompressor, that will lead to link failures.
+ *
+ * Undefine KASan's versions, define the wrapped functions and alias them to
+ * the right names so that when e.g. __memcpy() appear in the code, it will
+ * still be linked to this local version of memcpy().
+ */
+#ifdef CONFIG_KASAN
+#undef memcpy
+#undef memmove
+#undef memset
+void *__memcpy(void *__dest, __const void *__src, size_t __n) __alias(memcpy);
+void *__memmove(void *__dest, __const void *__src, size_t count) __alias(memmove);
+void *__memset(void *s, int c, size_t count) __alias(memset);
+#endif
+
void *memcpy(void *__dest, __const void *__src, size_t __n)
{
int i = 0;
diff --git a/arch/arm/include/asm/string.h b/arch/arm/include/asm/string.h
index 111a1d8a41ddf..6c607c68f3ad7 100644
--- a/arch/arm/include/asm/string.h
+++ b/arch/arm/include/asm/string.h
@@ -5,6 +5,9 @@
/*
* We don't do inline string functions, since the
* optimised inline asm versions are not small.
+ *
+ * The __underscore versions of some functions are for KASan to be able
+ * to replace them with instrumented versions.
*/
#define __HAVE_ARCH_STRRCHR
@@ -15,15 +18,18 @@ extern char * strchr(const char * s, int c);
#define __HAVE_ARCH_MEMCPY
extern void * memcpy(void *, const void *, __kernel_size_t);
+extern void *__memcpy(void *dest, const void *src, __kernel_size_t n);
#define __HAVE_ARCH_MEMMOVE
extern void * memmove(void *, const void *, __kernel_size_t);
+extern void *__memmove(void *dest, const void *src, __kernel_size_t n);
#define __HAVE_ARCH_MEMCHR
extern void * memchr(const void *, int, __kernel_size_t);
#define __HAVE_ARCH_MEMSET
extern void * memset(void *, int, __kernel_size_t);
+extern void *__memset(void *s, int c, __kernel_size_t n);
#define __HAVE_ARCH_MEMSET32
extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
@@ -39,4 +45,24 @@ static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
return __memset64(p, v, n * 8, v >> 32);
}
+/*
+ * For files that are not instrumented (e.g. mm/slub.c) we
+ * must use non-instrumented versions of the mem*
+ * functions named __memcpy() etc. All such kernel code has
+ * been tagged with KASAN_SANITIZE_file.o = n, which means
+ * that the address sanitization argument isn't passed to the
+ * compiler, and __SANITIZE_ADDRESS__ is not set. As a result
+ * these defines kick in.
+ */
+#if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
+#define memcpy(dst, src, len) __memcpy(dst, src, len)
+#define memmove(dst, src, len) __memmove(dst, src, len)
+#define memset(s, c, n) __memset(s, c, n)
+
+#ifndef __NO_FORTIFY
+#define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
+#endif
+
+#endif
+
#endif
diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
index 4a3982812a401..6840c7c60a858 100644
--- a/arch/arm/kernel/head-common.S
+++ b/arch/arm/kernel/head-common.S
@@ -95,7 +95,7 @@ __mmap_switched:
THUMB( ldmia r4!, {r0, r1, r2, r3} )
THUMB( mov sp, r3 )
sub r2, r2, r1
- bl memcpy @ copy .data to RAM
+ bl __memcpy @ copy .data to RAM
#endif
ARM( ldmia r4!, {r0, r1, sp} )
@@ -103,7 +103,7 @@ __mmap_switched:
THUMB( mov sp, r3 )
sub r2, r1, r0
mov r1, #0
- bl memset @ clear .bss
+ bl __memset @ clear .bss
ldmia r4, {r0, r1, r2, r3}
str r9, [r0] @ Save processor ID
diff --git a/arch/arm/lib/memcpy.S b/arch/arm/lib/memcpy.S
index 09a333153dc66..ad4625d16e117 100644
--- a/arch/arm/lib/memcpy.S
+++ b/arch/arm/lib/memcpy.S
@@ -58,6 +58,8 @@
/* Prototype: void *memcpy(void *dest, const void *src, size_t n); */
+.weak memcpy
+ENTRY(__memcpy)
ENTRY(mmiocpy)
ENTRY(memcpy)
@@ -65,3 +67,4 @@ ENTRY(memcpy)
ENDPROC(memcpy)
ENDPROC(mmiocpy)
+ENDPROC(__memcpy)
diff --git a/arch/arm/lib/memmove.S b/arch/arm/lib/memmove.S
index b50e5770fb44d..fd123ea5a5a4a 100644
--- a/arch/arm/lib/memmove.S
+++ b/arch/arm/lib/memmove.S
@@ -24,12 +24,14 @@
* occurring in the opposite direction.
*/
+.weak memmove
+ENTRY(__memmove)
ENTRY(memmove)
UNWIND( .fnstart )
subs ip, r0, r1
cmphi r2, ip
- bls memcpy
+ bls __memcpy
stmfd sp!, {r0, r4, lr}
UNWIND( .fnend )
@@ -222,3 +224,4 @@ ENTRY(memmove)
18: backward_copy_shift push=24 pull=8
ENDPROC(memmove)
+ENDPROC(__memmove)
diff --git a/arch/arm/lib/memset.S b/arch/arm/lib/memset.S
index 6ca4535c47fb6..0e7ff0423f50b 100644
--- a/arch/arm/lib/memset.S
+++ b/arch/arm/lib/memset.S
@@ -13,6 +13,8 @@
.text
.align 5
+.weak memset
+ENTRY(__memset)
ENTRY(mmioset)
ENTRY(memset)
UNWIND( .fnstart )
@@ -132,6 +134,7 @@ UNWIND( .fnstart )
UNWIND( .fnend )
ENDPROC(memset)
ENDPROC(mmioset)
+ENDPROC(__memset)
ENTRY(__memset32)
UNWIND( .fnstart )
--
2.27.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 78af4dc949daaa37b3fcd5f348f373085b4e858f Mon Sep 17 00:00:00 2001
From: "peterz(a)infradead.org" <peterz(a)infradead.org>
Date: Fri, 28 Aug 2020 14:37:20 +0200
Subject: [PATCH] perf: Break deadlock involving exec_update_mutex
Syzbot reported a lock inversion involving perf. The sore point being
perf holding exec_update_mutex() for a very long time, specifically
across a whole bunch of filesystem ops in pmu::event_init() (uprobes)
and anon_inode_getfile().
This then inverts against procfs code trying to take
exec_update_mutex.
Move the permission checks later, such that we need to hold the mutex
over less code.
Reported-by: syzbot+db9cdf3dd1f64252c6ef(a)syzkaller.appspotmail.com
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a21b0be2f22c..19ae6c931c52 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11832,24 +11832,6 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_task;
}
- if (task) {
- err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
- if (err)
- goto err_task;
-
- /*
- * Preserve ptrace permission check for backwards compatibility.
- *
- * We must hold exec_update_mutex across this and any potential
- * perf_install_in_context() call for this new event to
- * serialize against exec() altering our credentials (and the
- * perf_event_exit_task() that could imply).
- */
- err = -EACCES;
- if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
- goto err_cred;
- }
-
if (flags & PERF_FLAG_PID_CGROUP)
cgroup_fd = pid;
@@ -11857,7 +11839,7 @@ SYSCALL_DEFINE5(perf_event_open,
NULL, NULL, cgroup_fd);
if (IS_ERR(event)) {
err = PTR_ERR(event);
- goto err_cred;
+ goto err_task;
}
if (is_sampling_event(event)) {
@@ -11976,6 +11958,24 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_context;
}
+ if (task) {
+ err = mutex_lock_interruptible(&task->signal->exec_update_mutex);
+ if (err)
+ goto err_file;
+
+ /*
+ * Preserve ptrace permission check for backwards compatibility.
+ *
+ * We must hold exec_update_mutex across this and any potential
+ * perf_install_in_context() call for this new event to
+ * serialize against exec() altering our credentials (and the
+ * perf_event_exit_task() that could imply).
+ */
+ err = -EACCES;
+ if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS))
+ goto err_cred;
+ }
+
if (move_group) {
gctx = __perf_event_ctx_lock_double(group_leader, ctx);
@@ -12151,7 +12151,10 @@ SYSCALL_DEFINE5(perf_event_open,
if (move_group)
perf_event_ctx_unlock(group_leader, gctx);
mutex_unlock(&ctx->mutex);
-/* err_file: */
+err_cred:
+ if (task)
+ mutex_unlock(&task->signal->exec_update_mutex);
+err_file:
fput(event_file);
err_context:
perf_unpin_context(ctx);
@@ -12163,9 +12166,6 @@ SYSCALL_DEFINE5(perf_event_open,
*/
if (!event_file)
free_event(event);
-err_cred:
- if (task)
- mutex_unlock(&task->signal->exec_update_mutex);
err_task:
if (task)
put_task_struct(task);
There are several reports about the tps6598x causing
interrupt flood on boards with the INT3515 ACPI node, which
then causes instability. There appears to be several
problems with the interrupt. One problem is that the
I2CSerialBus resources do not always map to the Interrupt
resource with the same index, but that is not the only
problem. We have not been able to come up with a solution
for all the issues, and because of that disabling the device
for now.
The PD controller on these platforms is autonomous, and the
purpose for the driver is primarily to supply status to the
userspace, so this will not affect any functionality.
Reported-by: Moody Salem <moody(a)uniswap.org>
Fixes: a3dd034a1707 ("ACPI / scan: Create platform device for INT3515 ACPI nodes")
Cc: stable(a)vger.kernel.org
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1883511
Signed-off-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
---
drivers/platform/x86/i2c-multi-instantiate.c | 31 +++++++++++++++-----
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/drivers/platform/x86/i2c-multi-instantiate.c b/drivers/platform/x86/i2c-multi-instantiate.c
index 6acc8457866e1..e1df665d3ad31 100644
--- a/drivers/platform/x86/i2c-multi-instantiate.c
+++ b/drivers/platform/x86/i2c-multi-instantiate.c
@@ -166,13 +166,29 @@ static const struct i2c_inst_data bsg2150_data[] = {
{}
};
-static const struct i2c_inst_data int3515_data[] = {
- { "tps6598x", IRQ_RESOURCE_APIC, 0 },
- { "tps6598x", IRQ_RESOURCE_APIC, 1 },
- { "tps6598x", IRQ_RESOURCE_APIC, 2 },
- { "tps6598x", IRQ_RESOURCE_APIC, 3 },
- {}
-};
+/*
+ * Device with _HID INT3515 (TI PD controllers) has some unresolved interrupt
+ * issues. The most common problem seen is interrupt flood.
+ *
+ * There are at least two known causes. Firstly, on some boards, the
+ * I2CSerialBus resource index does not match the Interrupt resource, i.e. they
+ * are not one-to-one mapped like in the array below. Secondly, on some boards
+ * the irq line from the PD controller is not actually connected at all. But the
+ * interrupt flood is also seen on some boards where those are not a problem, so
+ * there are some other problems as well.
+ *
+ * Because of the issues with the interrupt, the device is disabled for now. If
+ * you wish to debug the issues, uncomment the below, and add an entry for the
+ * INT3515 device to the i2c_multi_instance__ids table.
+ *
+ * static const struct i2c_inst_data int3515_data[] = {
+ * { "tps6598x", IRQ_RESOURCE_APIC, 0 },
+ * { "tps6598x", IRQ_RESOURCE_APIC, 1 },
+ * { "tps6598x", IRQ_RESOURCE_APIC, 2 },
+ * { "tps6598x", IRQ_RESOURCE_APIC, 3 },
+ * { }
+ * };
+ */
/*
* Note new device-ids must also be added to i2c_multi_instantiate_ids in
@@ -181,7 +197,6 @@ static const struct i2c_inst_data int3515_data[] = {
static const struct acpi_device_id i2c_multi_inst_acpi_ids[] = {
{ "BSG1160", (unsigned long)bsg1160_data },
{ "BSG2150", (unsigned long)bsg2150_data },
- { "INT3515", (unsigned long)int3515_data },
{ }
};
MODULE_DEVICE_TABLE(acpi, i2c_multi_inst_acpi_ids);
--
2.29.2
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a9d2e9ae953f0ddd0327479c81a085adaa76d903 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg(a)nvidia.com>
Date: Fri, 6 Nov 2020 10:00:49 -0400
Subject: [PATCH] RDMA/siw,rxe: Make emulated devices virtual in the device
tree
This moves siw and rxe to be virtual devices in the device tree:
lrwxrwxrwx 1 root root 0 Nov 6 13:55 /sys/class/infiniband/rxe0 -> ../../devices/virtual/infiniband/rxe0/
Previously they were trying to parent themselves to the physical device of
their attached netdev, which doesn't make alot of sense.
My hope is this will solve some weird syzkaller hits related to sysfs as
it could be possible that the parent of a netdev is another netdev, eg
under bonding or some other syzkaller found netdev configuration.
Nesting a ib_device under anything but a physical device is going to cause
inconsistencies in sysfs during destructions.
Link: https://lore.kernel.org/r/0-v1-dcbfc68c4b4a+d6-virtual_dev_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
diff --git a/drivers/infiniband/sw/rxe/rxe_net.c b/drivers/infiniband/sw/rxe/rxe_net.c
index ee3cf3af7cab..c4b06ced30a7 100644
--- a/drivers/infiniband/sw/rxe/rxe_net.c
+++ b/drivers/infiniband/sw/rxe/rxe_net.c
@@ -19,15 +19,6 @@
static struct rxe_recv_sockets recv_sockets;
-struct device *rxe_dma_device(struct rxe_dev *rxe)
-{
- struct net_device *ndev;
-
- ndev = rxe->ndev;
-
- return ndev->dev.parent;
-}
-
int rxe_mcast_add(struct rxe_dev *rxe, union ib_gid *mgid)
{
int err;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index a2bd91aaa5de..2fbea2b2d72a 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -1134,7 +1134,6 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
dev->node_type = RDMA_NODE_IB_CA;
dev->phys_port_cnt = 1;
dev->num_comp_vectors = num_possible_cpus();
- dev->dev.parent = rxe_dma_device(rxe);
dev->local_dma_lkey = 0;
addrconf_addr_eui48((unsigned char *)&dev->node_guid,
rxe->ndev->dev_addr);
diff --git a/drivers/infiniband/sw/siw/siw_main.c b/drivers/infiniband/sw/siw/siw_main.c
index d9de062852c4..ee95cf29179d 100644
--- a/drivers/infiniband/sw/siw/siw_main.c
+++ b/drivers/infiniband/sw/siw/siw_main.c
@@ -305,24 +305,8 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
{
struct siw_device *sdev = NULL;
struct ib_device *base_dev;
- struct device *parent = netdev->dev.parent;
int rv;
- if (!parent) {
- /*
- * The loopback device has no parent device,
- * so it appears as a top-level device. To support
- * loopback device connectivity, take this device
- * as the parent device. Skip all other devices
- * w/o parent device.
- */
- if (netdev->type != ARPHRD_LOOPBACK) {
- pr_warn("siw: device %s error: no parent device\n",
- netdev->name);
- return NULL;
- }
- parent = &netdev->dev;
- }
sdev = ib_alloc_device(siw_device, base_dev);
if (!sdev)
return NULL;
@@ -359,7 +343,6 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
* per physical port.
*/
base_dev->phys_port_cnt = 1;
- base_dev->dev.parent = parent;
base_dev->num_comp_vectors = num_possible_cpus();
xa_init_flags(&sdev->qp_xa, XA_FLAGS_ALLOC1);
@@ -401,7 +384,7 @@ static struct siw_device *siw_device_create(struct net_device *netdev)
atomic_set(&sdev->num_mr, 0);
atomic_set(&sdev->num_pd, 0);
- sdev->numa_node = dev_to_node(parent);
+ sdev->numa_node = dev_to_node(&netdev->dev);
spin_lock_init(&sdev->lock);
return sdev;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd14bf0e4a084514aa62d24d2109e0f09a93822f Mon Sep 17 00:00:00 2001
From: Stanley Chu <stanley.chu(a)mediatek.com>
Date: Tue, 8 Dec 2020 21:56:34 +0800
Subject: [PATCH] scsi: ufs: Re-enable WriteBooster after device reset
UFS 3.1 specification mentions that the WriteBooster flags listed below
will be set to their default values, i.e. disabled, after power cycle or
any type of reset event. Thus we need to reset the flag variables kept in
struct hba to align with the device status and ensure that
WriteBooster-related functions are configured properly after device reset.
Without this fix, WriteBooster will not be enabled successfully after by
ufshcd_wb_ctrl() after device reset because hba->wb_enabled remains true.
Flags required to be reset to default values:
- fWriteBoosterEn: hba->wb_enabled
- fWriteBoosterBufferFlushEn: hba->wb_buf_flush_enabled
- fWriteBoosterBufferFlushDuringHibernate: No variable mapped
Link: https://lore.kernel.org/r/20201208135635.15326-2-stanley.chu@mediatek.com
Fixes: 3d17b9b5ab11 ("scsi: ufs: Add write booster feature support")
Reviewed-by: Bean Huo <beanhuo(a)micron.com>
Signed-off-by: Stanley Chu <stanley.chu(a)mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/ufs/ufshcd.h b/drivers/scsi/ufs/ufshcd.h
index 08c8a591e6b0..36d367eb8139 100644
--- a/drivers/scsi/ufs/ufshcd.h
+++ b/drivers/scsi/ufs/ufshcd.h
@@ -1221,8 +1221,13 @@ static inline void ufshcd_vops_device_reset(struct ufs_hba *hba)
if (hba->vops && hba->vops->device_reset) {
int err = hba->vops->device_reset(hba);
- if (!err)
+ if (!err) {
ufshcd_set_ufs_dev_active(hba);
+ if (ufshcd_is_wb_allowed(hba)) {
+ hba->wb_enabled = false;
+ hba->wb_buf_flush_enabled = false;
+ }
+ }
if (err != -EOPNOTSUPP)
ufshcd_update_evt_hist(hba, UFS_EVT_DEV_RESET, err);
}
The dentries such as /proc/<pid>/ns/ have the DCACHE_OP_DELETE flag, they
should be deleted when the process exits.
Suppose the following race appears:
release_task dput
-> proc_flush_task
-> dentry->d_op->d_delete(dentry)
-> __exit_signal
-> dentry->d_lockref.count-- and return.
In the proc_flush_task(), if another process is using this dentry, it will
not be deleted. At the same time, in dput(), d_op->d_delete() can be executed
before __exit_signal(pid has not been hashed), d_delete returns false, so
this dentry still cannot be deleted.
This dentry will always be cached (although its count is 0 and the
DCACHE_OP_DELETE flag is set), its parent denry will also be cached too, and
these dentries can only be deleted when drop_caches is manually triggered.
This will result in wasted memory. What's more troublesome is that these
dentries reference pid, according to the commit f333c700c610 ("pidns: Add a
limit on the number of pid namespaces"), if the pid cannot be released, it
may result in the inability to create a new pid_ns.
This problem occurred in our cluster environment (Linux 4.9 LTS).
We could reproduce it by manually constructing a test program + adding some
debugging switches in the kernel:
* A test program to open the directory (/proc/<pid>/ns) [1]
* Adding some debugging switches to the kernel, adding a delay between
proc_flush_task and __exit_signal in release_task() [2]
The test process is as follows:
A, terminal #1
Turn on the debug switch:
echo 1> /proc/sys/vm/dentry_debug_trace
Execute the following unshare command:
sudo unshare --pid --fork --mount-proc bash
B, terminal #2
Find the pid of the unshare process:
# pstree -p | grep unshare
| `-sshd(716)---bash(718)--sudo(816)---unshare(817)---bash(818)
Find the corresponding dentry:
# dmesg | grep pid=818
[70.424722] XXX proc_pid_instantiate:3119 pid=818 tid=818 entry=818/ffff8802c7b670e8
C, terminal #3
Execute the opendir program, it will always open the /proc/818/ns/ directory:
# ./a.out /proc/818/ns/
pid: 876
.
..
net
uts
ipc
pid
user
mnt
cgroup
D, go back to terminal #2
Turn on the debugging switches to construct the race:
# echo 818> /proc/sys/vm/dentry_debug_pid
# echo 1> /proc/sys/vm/dentry_debug_delay
Kill the unshare process (pid 818). Since the debugging switches have been
turned on, it will get stuck in release_task():
# kill -9 818
Then kill the process that opened the /proc/818/ns/ directory:
# kill -9 876
Then turn off these debugging switches to allow the 818 process to exit:
# echo 0> /proc/sys/vm/dentry_debug_delay
# echo 0> /proc/sys/vm/dentry_debug_pid
Checking the dmesg, we will find that the dentry(/proc/818/ns) ’s count is 0,
and the flag is 2800cc (#define DCACHE_OP_DELETE 0x00000008), but it is still
cached:
# dmesg | grep ffff8802a3999548
…
[565.559156] XXX dput:853 dentry=ns/ffff8802bea7b528, flag=2800cc, cnt=0, inode=ffff8802b38c2010, pdentry=818/ffff8802c7b670e8, pflag=20008c, pcnt=1, pinode=ffff8802c7812010, keywords: be cached
It could also be verified via the crash tool:
crash> dentry.d_flags,d_iname,d_inode,d_lockref -x ffff8802bea7b528
d_flags = 0x2800cc
d_iname = "ns\000kkkkkkkkkkkkkkkkkkkkkkkkkkkk"
d_inode = 0xffff8802b38c2010
d_lockref = {
{
lock_count = 0x0,
{
lock = {
{
rlock = {
raw_lock = {
{
val = {
counter = 0x0
},
{
locked = 0x0,
pending = 0x0
},
{
locked_pending = 0x0,
tail = 0x0
}
}
}
}
}
},
count = 0x0
}
}
}
crash> kmem ffff8802bea7b528
CACHE OBJSIZE ALLOCATED TOTAL SLABS SSIZE NAME
ffff8802dd5f5900 192 23663 26130 871 16k dentry
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea000afa9e00 ffff8802bea78000 0 30 25 5
FREE / [ALLOCATED]
[ffff8802bea7b520]
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea000afa9ec0 2bea7b000 dead000000000400 0 0 2fffff80000000
crash>
This series of patches is to fix this issue.
Regards,
Wen
Alexey Dobriyan (1):
proc: use %u for pid printing and slightly less stack
Andreas Gruenbacher (1):
proc: Pass file mode to proc_pid_make_inode
Christian Brauner (1):
clone: add CLONE_PIDFD
Eric W. Biederman (6):
proc: Better ownership of files for non-dumpable tasks in user
namespaces
proc: Rename in proc_inode rename sysctl_inodes sibling_inodes
proc: Generalize proc_sys_prune_dcache into proc_prune_siblings_dcache
proc: Clear the pieces of proc_inode that proc_evict_inode cares about
proc: Use d_invalidate in proc_prune_siblings_dcache
proc: Use a list of inodes to flush from proc
Joel Fernandes (Google) (1):
pidfd: add polling support
fs/proc/base.c | 242 ++++++++++++++++++++-------------------------
fs/proc/fd.c | 20 +---
fs/proc/inode.c | 67 ++++++++++++-
fs/proc/internal.h | 22 ++---
fs/proc/namespaces.c | 3 +-
fs/proc/proc_sysctl.c | 45 ++-------
fs/proc/self.c | 6 +-
fs/proc/thread_self.c | 5 +-
include/linux/pid.h | 5 +
include/linux/proc_fs.h | 4 +-
include/uapi/linux/sched.h | 1 +
kernel/exit.c | 5 +-
kernel/fork.c | 145 ++++++++++++++++++++++++++-
kernel/pid.c | 3 +
kernel/signal.c | 11 +++
security/selinux/hooks.c | 1 +
16 files changed, 357 insertions(+), 228 deletions(-)
[1] A test program to open the directory (/proc/<pid>/ns)
#include <stdio.h>
#include <sys/types.h>
#include <dirent.h>
#include <errno.h>
int main(int argc, char *argv[])
{
DIR *dip;
struct dirent *dit;
if (argc < 2) {
printf("Usage :%s <directory>\n", argv[0]);
return -1;
}
if ((dip = opendir(argv[1])) == NULL) {
perror("opendir");
return -1;
}
printf("pid: %d\n", getpid());
while((dit = readdir (dip)) != NULL) {
printf("%s\n", dit->d_name);
}
while (1)
sleep (1);
return 0;
}
[2] Adding some debugging switches to the kernel, also adding a delay between
proc_flush_task and __exit_signal in release_task():
diff --git a/fs/dcache.c b/fs/dcache.c
index 05bad55..fafad37 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -84,6 +84,9 @@
int sysctl_vfs_cache_pressure __read_mostly = 100;
EXPORT_SYMBOL_GPL(sysctl_vfs_cache_pressure);
+int sysctl_dentry_debug_trace __read_mostly = 0;
+EXPORT_SYMBOL_GPL(sysctl_dentry_debug_trace);
+
__cacheline_aligned_in_smp DEFINE_SEQLOCK(rename_lock);
EXPORT_SYMBOL(rename_lock);
@@ -758,6 +761,26 @@ static inline bool fast_dput(struct dentry *dentry)
return 0;
}
+#define DENTRY_DEBUG_TRACE(dentry, keywords) \
+do { \
+ if (sysctl_dentry_debug_trace) \
+ printk("XXX %s:%d " \
+ "dentry=%s/%p, flag=%x, cnt=%d, inode=%p, " \
+ "pdentry=%s/%p, pflag=%x, pcnt=%d, pinode=%p, " \
+ "keywords: %s\n", \
+ __func__, __LINE__, \
+ dentry->d_name.name, \
+ dentry, \
+ dentry->d_flags, \
+ dentry->d_lockref.count, \
+ dentry->d_inode, \
+ dentry->d_parent->d_name.name, \
+ dentry->d_parent, \
+ dentry->d_parent->d_flags, \
+ dentry->d_parent->d_lockref.count, \
+ dentry->d_parent->d_inode, \
+ keywords); \
+} while (0)
/*
* This is dput
@@ -804,6 +827,8 @@ void dput(struct dentry *dentry)
WARN_ON(d_in_lookup(dentry));
+ DENTRY_DEBUG_TRACE(dentry, "be checked");
+
/* Unreachable? Get rid of it */
if (unlikely(d_unhashed(dentry)))
goto kill_it;
@@ -812,8 +837,10 @@ void dput(struct dentry *dentry)
goto kill_it;
if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
- if (dentry->d_op->d_delete(dentry))
+ if (dentry->d_op->d_delete(dentry)) {
+ DENTRY_DEBUG_TRACE(dentry, "be killed");
goto kill_it;
+ }
}
if (!(dentry->d_flags & DCACHE_REFERENCED))
@@ -822,6 +849,9 @@ void dput(struct dentry *dentry)
dentry->d_lockref.count--;
spin_unlock(&dentry->d_lock);
+
+ DENTRY_DEBUG_TRACE(dentry, "be cached");
+
return;
kill_it:
diff --git a/fs/proc/base.c b/fs/proc/base.c
index b9e4183..419a409 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3090,6 +3090,8 @@ void proc_flush_task(struct task_struct *task)
}
}
+extern int sysctl_dentry_debug_trace;
+
static int proc_pid_instantiate(struct inode *dir,
struct dentry * dentry,
struct task_struct *task, const void *ptr)
@@ -3111,6 +3113,12 @@ static int proc_pid_instantiate(struct inode *dir,
d_set_d_op(dentry, &pid_dentry_operations);
d_add(dentry, inode);
+
+ if (sysctl_dentry_debug_trace)
+ printk("XXX %s:%d pid=%d tid=%d entry=%s/%p\n",
+ __func__, __LINE__, task->pid, task->tgid,
+ dentry->d_name.name, dentry);
+
/* Close the race of the process dying before we return the dentry */
if (pid_revalidate(dentry, 0))
return 0;
diff --git a/kernel/exit.c b/kernel/exit.c
index 27f4168..2b3e1b6 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -55,6 +55,8 @@
#include <linux/shm.h>
#include <linux/kcov.h>
+#include <linux/delay.h>
+
#include <asm/uaccess.h>
#include <asm/unistd.h>
#include <asm/pgtable.h>
@@ -164,6 +166,8 @@ static void delayed_put_task_struct(struct rcu_head *rhp)
put_task_struct(tsk);
}
+int sysctl_dentry_debug_delay __read_mostly = 0;
+int sysctl_dentry_debug_pid __read_mostly = 0;
void release_task(struct task_struct *p)
{
@@ -178,6 +182,11 @@ void release_task(struct task_struct *p)
proc_flush_task(p);
+ if (sysctl_dentry_debug_delay && p->pid == sysctl_dentry_debug_pid) {
+ while (sysctl_dentry_debug_delay)
+ mdelay(1);
+ }
+
write_lock_irq(&tasklist_lock);
ptrace_release_task(p);
__exit_signal(p);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 513e6da..27f1395 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -282,6 +282,10 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
static int max_extfrag_threshold = 1000;
#endif
+extern int sysctl_dentry_debug_trace;
+extern int sysctl_dentry_debug_delay;
+extern int sysctl_dentry_debug_pid;
+
static struct ctl_table kern_table[] = {
{
.procname = "sched_child_runs_first",
@@ -1498,6 +1502,30 @@ static int sysrq_sysctl_handler(struct ctl_table *table, int write,
.proc_handler = proc_dointvec,
.extra1 = &zero,
},
+ {
+ .procname = "dentry_debug_trace",
+ .data = &sysctl_dentry_debug_trace,
+ .maxlen = sizeof(sysctl_dentry_debug_trace),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
+ {
+ .procname = "dentry_debug_delay",
+ .data = &sysctl_dentry_debug_delay,
+ .maxlen = sizeof(sysctl_dentry_debug_delay),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
+ {
+ .procname = "dentry_debug_pid",
+ .data = &sysctl_dentry_debug_pid,
+ .maxlen = sizeof(sysctl_dentry_debug_pid),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ .extra1 = &zero,
+ },
#ifdef HAVE_ARCH_PICK_MMAP_LAYOUT
{
.procname = "legacy_va_layout",
Signed-off-by: Wen Yang <wenyang(a)linux.alibaba.com>
Cc: Pavel Emelyanov <xemul(a)openvz.org>
Cc: Oleg Nesterov <oleg(a)tv-sign.ru>
Cc: Sukadev Bhattiprolu <sukadev(a)us.ibm.com>
Cc: Paul Menage <menage(a)google.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
--
1.8.3.1
From: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
From: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
commit 56ec43d8b02719402c9fcf984feb52ec2300f8a5 upstream.
As best as I can tell the meminit_pfn_in_nid call is completely redundant.
The deferred memory initialization is already making use of
for_each_free_mem_range which in turn will call into __next_mem_range
which will only return a memory range if it matches the node ID provided
assuming it is not NUMA_NO_NODE.
I am operating on the assumption that there are no zones or pgdata_t
structures that have a NUMA node of NUMA_NO_NODE associated with them. If
that is the case then __next_mem_range will never return a memory range
that doesn't match the zone's node ID and as such the check is redundant.
So one piece I would like to verify on this is if this works for ia64.
Technically it was using a different approach to get the node ID, but it
seems to have the node ID also encoded into the memblock. So I am
assuming this is okay, but would like to get confirmation on that.
On my x86_64 test system with 384GB of memory per node I saw a reduction
in initialization time from 2.80s to 1.85s as a result of this patch.
Link: http://lkml.kernel.org/r/20190405221219.12227.93957.stgit@localhost.localdo…
Signed-off-by: Alexander Duyck <alexander.h.duyck(a)linux.intel.com>
Reviewed-by: Pavel Tatashin <pavel.tatashin(a)microsoft.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Dave Jiang <dave.jiang(a)intel.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Khalid Aziz <khalid.aziz(a)oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Laurent Dufour <ldufour(a)linux.vnet.ibm.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <yi.z.zhang(a)linux.intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Pavel Tatashin <pasha.tatashin(a)soleen.com>
---
mm/page_alloc.c | 51 ++++++++++++++-----------------------------------
1 file changed, 14 insertions(+), 37 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d8c3051387d1..c86a117acb5b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1321,36 +1321,22 @@ int __meminit early_pfn_to_nid(unsigned long pfn)
#endif
#ifdef CONFIG_NODES_SPAN_OTHER_NODES
-static inline bool __meminit __maybe_unused
-meminit_pfn_in_nid(unsigned long pfn, int node,
- struct mminit_pfnnid_cache *state)
+/* Only safe to use early in boot when initialisation is single-threaded */
+static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
{
int nid;
- nid = __early_pfn_to_nid(pfn, state);
+ nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache);
if (nid >= 0 && nid != node)
return false;
return true;
}
-/* Only safe to use early in boot when initialisation is single-threaded */
-static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
-{
- return meminit_pfn_in_nid(pfn, node, &early_pfnnid_cache);
-}
-
#else
-
static inline bool __meminit early_pfn_in_nid(unsigned long pfn, int node)
{
return true;
}
-static inline bool __meminit __maybe_unused
-meminit_pfn_in_nid(unsigned long pfn, int node,
- struct mminit_pfnnid_cache *state)
-{
- return true;
-}
#endif
@@ -1480,21 +1466,13 @@ static inline void __init pgdat_init_report_one_done(void)
*
* Then, we check if a current large page is valid by only checking the validity
* of the head pfn.
- *
- * Finally, meminit_pfn_in_nid is checked on systems where pfns can interleave
- * within a node: a pfn is between start and end of a node, but does not belong
- * to this memory node.
*/
-static inline bool __init
-deferred_pfn_valid(int nid, unsigned long pfn,
- struct mminit_pfnnid_cache *nid_init_state)
+static inline bool __init deferred_pfn_valid(unsigned long pfn)
{
if (!pfn_valid_within(pfn))
return false;
if (!(pfn & (pageblock_nr_pages - 1)) && !pfn_valid(pfn))
return false;
- if (!meminit_pfn_in_nid(pfn, nid, nid_init_state))
- return false;
return true;
}
@@ -1502,15 +1480,14 @@ deferred_pfn_valid(int nid, unsigned long pfn,
* Free pages to buddy allocator. Try to free aligned pages in
* pageblock_nr_pages sizes.
*/
-static void __init deferred_free_pages(int nid, int zid, unsigned long pfn,
+static void __init deferred_free_pages(unsigned long pfn,
unsigned long end_pfn)
{
- struct mminit_pfnnid_cache nid_init_state = { };
unsigned long nr_pgmask = pageblock_nr_pages - 1;
unsigned long nr_free = 0;
for (; pfn < end_pfn; pfn++) {
- if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) {
+ if (!deferred_pfn_valid(pfn)) {
deferred_free_range(pfn - nr_free, nr_free);
nr_free = 0;
} else if (!(pfn & nr_pgmask)) {
@@ -1530,17 +1507,18 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn,
* by performing it only once every pageblock_nr_pages.
* Return number of pages initialized.
*/
-static unsigned long __init deferred_init_pages(int nid, int zid,
+static unsigned long __init deferred_init_pages(struct zone *zone,
unsigned long pfn,
unsigned long end_pfn)
{
- struct mminit_pfnnid_cache nid_init_state = { };
unsigned long nr_pgmask = pageblock_nr_pages - 1;
+ int nid = zone_to_nid(zone);
unsigned long nr_pages = 0;
+ int zid = zone_idx(zone);
struct page *page = NULL;
for (; pfn < end_pfn; pfn++) {
- if (!deferred_pfn_valid(nid, pfn, &nid_init_state)) {
+ if (!deferred_pfn_valid(pfn)) {
page = NULL;
continue;
} else if (!page || !(pfn & nr_pgmask)) {
@@ -1603,12 +1581,12 @@ static int __init deferred_init_memmap(void *data)
for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
- nr_pages += deferred_init_pages(nid, zid, spfn, epfn);
+ nr_pages += deferred_init_pages(zone, spfn, epfn);
}
for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa));
- deferred_free_pages(nid, zid, spfn, epfn);
+ deferred_free_pages(spfn, epfn);
}
pgdat_resize_unlock(pgdat, &flags);
@@ -1640,7 +1618,6 @@ static int __init deferred_init_memmap(void *data)
static noinline bool __init
deferred_grow_zone(struct zone *zone, unsigned int order)
{
- int zid = zone_idx(zone);
int nid = zone_to_nid(zone);
pg_data_t *pgdat = NODE_DATA(nid);
unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION);
@@ -1690,7 +1667,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
while (spfn < epfn && nr_pages < nr_pages_needed) {
t = ALIGN(spfn + PAGES_PER_SECTION, PAGES_PER_SECTION);
first_deferred_pfn = min(t, epfn);
- nr_pages += deferred_init_pages(nid, zid, spfn,
+ nr_pages += deferred_init_pages(zone, spfn,
first_deferred_pfn);
spfn = first_deferred_pfn;
}
@@ -1702,7 +1679,7 @@ deferred_grow_zone(struct zone *zone, unsigned int order)
for_each_free_mem_range(i, nid, MEMBLOCK_NONE, &spa, &epa, NULL) {
spfn = max_t(unsigned long, first_init_pfn, PFN_UP(spa));
epfn = min_t(unsigned long, first_deferred_pfn, PFN_DOWN(epa));
- deferred_free_pages(nid, zid, spfn, epfn);
+ deferred_free_pages(spfn, epfn);
if (first_deferred_pfn == epfn)
break;
--
2.25.1
This is the start of the stable review cycle for the 5.2.6 release.
There are 20 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun 04 Aug 2019 09:19:34 AM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.6-rc1.…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.2.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.2.6-rc1
Yan, Zheng <zyan(a)redhat.com>
ceph: hold i_ceph_lock when removing caps for freeing inode
Yoshinori Sato <ysato(a)users.sourceforge.jp>
Fix allyesconfig output.
Miroslav Lichvar <mlichvar(a)redhat.com>
drivers/pps/pps.c: clear offset flags in PPS_SETPARAMS ioctl
Linus Torvalds <torvalds(a)linux-foundation.org>
/proc/<pid>/cmdline: add back the setproctitle() special case
Linus Torvalds <torvalds(a)linux-foundation.org>
/proc/<pid>/cmdline: remove all the special cases
Jann Horn <jannh(a)google.com>
sched/fair: Use RCU accessors consistently for ->numa_group
Jann Horn <jannh(a)google.com>
sched/fair: Don't free p->numa_faults with concurrent readers
Vladis Dronov <vdronov(a)redhat.com>
Bluetooth: hci_uart: check for missing tty operations
Marta Rybczynska <mrybczyn(a)kalray.eu>
nvme: fix multipath crash when ANA is deactivated
Florian Westphal <fw(a)strlen.de>
xfrm: policy: fix bydst hlist corruption on hash rebuild
Luke Nowakowski-Krijger <lnowakow(a)eng.ucsd.edu>
media: radio-raremono: change devm_k*alloc to k*alloc
Benjamin Coddington <bcodding(a)redhat.com>
NFS: Cleanup if nfs_match_client is interrupted
Andrey Konovalov <andreyknvl(a)google.com>
media: pvrusb2: use a different format for warnings
Oliver Neukum <oneukum(a)suse.com>
media: cpia2_usb: first wake up, then free in disconnect
Fabio Estevam <festevam(a)gmail.com>
ath10k: Change the warning message string
Sean Young <sean(a)mess.org>
media: au0828: fix null dereference in error path
Stanislav Fomichev <sdf(a)google.com>
bpf: fix NULL deref in btf_type_is_resolve_source_only
Takashi Iwai <tiwai(a)suse.de>
ALSA: usb-audio: Sanity checks for each pipe and EP types
Phong Tran <tranmanphong(a)gmail.com>
ISDN: hfcsusb: checking idx of ep configuration
Sunil Muthuswamy <sunilmut(a)microsoft.com>
vsock: correct removal of socket from the list
-------------
Diffstat:
Makefile | 4 +-
arch/sh/boards/Kconfig | 14 +--
drivers/bluetooth/hci_ath.c | 3 +
drivers/bluetooth/hci_bcm.c | 3 +
drivers/bluetooth/hci_intel.c | 3 +
drivers/bluetooth/hci_ldisc.c | 13 +++
drivers/bluetooth/hci_mrvl.c | 3 +
drivers/bluetooth/hci_qca.c | 3 +
drivers/bluetooth/hci_uart.h | 1 +
drivers/isdn/hardware/mISDN/hfcsusb.c | 3 +
drivers/media/radio/radio-raremono.c | 30 ++++--
drivers/media/usb/au0828/au0828-core.c | 12 +--
drivers/media/usb/cpia2/cpia2_usb.c | 3 +-
drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 4 +-
drivers/media/usb/pvrusb2/pvrusb2-i2c-core.c | 6 +-
drivers/media/usb/pvrusb2/pvrusb2-std.c | 2 +-
drivers/net/wireless/ath/ath10k/usb.c | 2 +-
drivers/nvme/host/multipath.c | 8 +-
drivers/nvme/host/nvme.h | 6 +-
drivers/pps/pps.c | 8 ++
fs/ceph/caps.c | 10 +-
fs/ceph/inode.c | 2 +-
fs/ceph/super.h | 2 +-
fs/exec.c | 2 +-
fs/nfs/client.c | 4 +-
fs/proc/base.c | 132 +++++++++++++-----------
include/linux/sched.h | 10 +-
include/linux/sched/numa_balancing.h | 4 +-
kernel/bpf/btf.c | 12 +--
kernel/fork.c | 2 +-
kernel/sched/fair.c | 144 +++++++++++++++++++--------
net/vmw_vsock/af_vsock.c | 38 ++-----
net/xfrm/xfrm_policy.c | 12 ++-
sound/usb/helper.c | 17 ++++
sound/usb/helper.h | 1 +
sound/usb/quirks.c | 18 +++-
tools/testing/selftests/net/xfrm_policy.sh | 27 ++++-
37 files changed, 368 insertions(+), 200 deletions(-)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 5812b32e01c6d86ba7a84110702b46d8a8531fe9 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Mon, 23 Nov 2020 11:23:12 +0100
Subject: [PATCH] of: fix linker-section match-table corruption
Specify type alignment when declaring linker-section match-table entries
to prevent gcc from increasing alignment and corrupting the various
tables with padding (e.g. timers, irqchips, clocks, reserved memory).
This is specifically needed on x86 where gcc (typically) aligns larger
objects like struct of_device_id with static extent on 32-byte
boundaries which at best prevents matching on anything but the first
entry. Specifying alignment when declaring variables suppresses this
optimisation.
Here's a 64-bit example where all entries are corrupt as 16 bytes of
padding has been inserted before the first entry:
ffffffff8266b4b0 D __clk_of_table
ffffffff8266b4c0 d __of_table_fixed_factor_clk
ffffffff8266b5a0 d __of_table_fixed_clk
ffffffff8266b680 d __clk_of_table_sentinel
And here's a 32-bit example where the 8-byte-aligned table happens to be
placed on a 32-byte boundary so that all but the first entry are corrupt
due to the 28 bytes of padding inserted between entries:
812b3ec0 D __irqchip_of_table
812b3ec0 d __of_table_irqchip1
812b3fa0 d __of_table_irqchip2
812b4080 d __of_table_irqchip3
812b4160 d irqchip_of_match_end
Verified on x86 using gcc-9.3 and gcc-4.9 (which uses 64-byte
alignment), and on arm using gcc-7.2.
Note that there are no in-tree users of these tables on x86 currently
(even if they are included in the image).
Fixes: 54196ccbe0ba ("of: consolidate linker section OF match table declarations")
Fixes: f6e916b82022 ("irqchip: add basic infrastructure")
Cc: stable <stable(a)vger.kernel.org> # 3.9
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Link: https://lore.kernel.org/r/20201123102319.8090-2-johan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/include/linux/of.h b/include/linux/of.h
index 5d51891cbf1a..af655d264f10 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -1300,6 +1300,7 @@ static inline int of_get_available_child_count(const struct device_node *np)
#define _OF_DECLARE(table, name, compat, fn, fn_type) \
static const struct of_device_id __of_table_##name \
__used __section("__" #table "_of_table") \
+ __aligned(__alignof__(struct of_device_id)) \
= { .compatible = compat, \
.data = (fn == (fn_type)NULL) ? fn : fn }
#else
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Fri, 20 Nov 2020 10:55:11 +0900
Subject: [PATCH] null_blk: Fix zone size initialization
For a null_blk device with zoned mode enabled is currently initialized
with a number of zones equal to the device capacity divided by the zone
size, without considering if the device capacity is a multiple of the
zone size. If the zone size is not a divisor of the capacity, the zones
end up not covering the entire capacity, potentially resulting is out
of bounds accesses to the zone array.
Fix this by adding one last smaller zone with a size equal to the
remainder of the disk capacity divided by the zone size if the capacity
is not a multiple of the zone size. For such smaller last zone, the zone
capacity is also checked so that it does not exceed the smaller zone
size.
Reported-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Fixes: ca4b2a011948 ("null_blk: add zone support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index beb34b4f76b0..1d0370d91fe7 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -6,8 +6,7 @@
#define CREATE_TRACE_POINTS
#include "null_blk_trace.h"
-/* zone_size in MBs to sectors. */
-#define ZONE_SIZE_SHIFT 11
+#define MB_TO_SECTS(mb) (((sector_t)mb * SZ_1M) >> SECTOR_SHIFT)
static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
{
@@ -16,7 +15,7 @@ static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
{
- sector_t dev_size = (sector_t)dev->size * 1024 * 1024;
+ sector_t dev_capacity_sects, zone_capacity_sects;
sector_t sector = 0;
unsigned int i;
@@ -38,9 +37,13 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
return -EINVAL;
}
- dev->zone_size_sects = dev->zone_size << ZONE_SIZE_SHIFT;
- dev->nr_zones = dev_size >>
- (SECTOR_SHIFT + ilog2(dev->zone_size_sects));
+ zone_capacity_sects = MB_TO_SECTS(dev->zone_capacity);
+ dev_capacity_sects = MB_TO_SECTS(dev->size);
+ dev->zone_size_sects = MB_TO_SECTS(dev->zone_size);
+ dev->nr_zones = dev_capacity_sects >> ilog2(dev->zone_size_sects);
+ if (dev_capacity_sects & (dev->zone_size_sects - 1))
+ dev->nr_zones++;
+
dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct blk_zone),
GFP_KERNEL | __GFP_ZERO);
if (!dev->zones)
@@ -101,8 +104,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
struct blk_zone *zone = &dev->zones[i];
zone->start = zone->wp = sector;
- zone->len = dev->zone_size_sects;
- zone->capacity = dev->zone_capacity << ZONE_SIZE_SHIFT;
+ if (zone->start + dev->zone_size_sects > dev_capacity_sects)
+ zone->len = dev_capacity_sects - zone->start;
+ else
+ zone->len = dev->zone_size_sects;
+ zone->capacity =
+ min_t(sector_t, zone->len, zone_capacity_sects);
zone->type = BLK_ZONE_TYPE_SEQWRITE_REQ;
zone->cond = BLK_ZONE_COND_EMPTY;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0ebcdd702f49aeb0ad2e2d894f8c124a0acc6e23 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)wdc.com>
Date: Fri, 20 Nov 2020 10:55:11 +0900
Subject: [PATCH] null_blk: Fix zone size initialization
For a null_blk device with zoned mode enabled is currently initialized
with a number of zones equal to the device capacity divided by the zone
size, without considering if the device capacity is a multiple of the
zone size. If the zone size is not a divisor of the capacity, the zones
end up not covering the entire capacity, potentially resulting is out
of bounds accesses to the zone array.
Fix this by adding one last smaller zone with a size equal to the
remainder of the disk capacity divided by the zone size if the capacity
is not a multiple of the zone size. For such smaller last zone, the zone
capacity is also checked so that it does not exceed the smaller zone
size.
Reported-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Fixes: ca4b2a011948 ("null_blk: add zone support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal(a)wdc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/block/null_blk_zoned.c b/drivers/block/null_blk_zoned.c
index beb34b4f76b0..1d0370d91fe7 100644
--- a/drivers/block/null_blk_zoned.c
+++ b/drivers/block/null_blk_zoned.c
@@ -6,8 +6,7 @@
#define CREATE_TRACE_POINTS
#include "null_blk_trace.h"
-/* zone_size in MBs to sectors. */
-#define ZONE_SIZE_SHIFT 11
+#define MB_TO_SECTS(mb) (((sector_t)mb * SZ_1M) >> SECTOR_SHIFT)
static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
{
@@ -16,7 +15,7 @@ static inline unsigned int null_zone_no(struct nullb_device *dev, sector_t sect)
int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
{
- sector_t dev_size = (sector_t)dev->size * 1024 * 1024;
+ sector_t dev_capacity_sects, zone_capacity_sects;
sector_t sector = 0;
unsigned int i;
@@ -38,9 +37,13 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
return -EINVAL;
}
- dev->zone_size_sects = dev->zone_size << ZONE_SIZE_SHIFT;
- dev->nr_zones = dev_size >>
- (SECTOR_SHIFT + ilog2(dev->zone_size_sects));
+ zone_capacity_sects = MB_TO_SECTS(dev->zone_capacity);
+ dev_capacity_sects = MB_TO_SECTS(dev->size);
+ dev->zone_size_sects = MB_TO_SECTS(dev->zone_size);
+ dev->nr_zones = dev_capacity_sects >> ilog2(dev->zone_size_sects);
+ if (dev_capacity_sects & (dev->zone_size_sects - 1))
+ dev->nr_zones++;
+
dev->zones = kvmalloc_array(dev->nr_zones, sizeof(struct blk_zone),
GFP_KERNEL | __GFP_ZERO);
if (!dev->zones)
@@ -101,8 +104,12 @@ int null_init_zoned_dev(struct nullb_device *dev, struct request_queue *q)
struct blk_zone *zone = &dev->zones[i];
zone->start = zone->wp = sector;
- zone->len = dev->zone_size_sects;
- zone->capacity = dev->zone_capacity << ZONE_SIZE_SHIFT;
+ if (zone->start + dev->zone_size_sects > dev_capacity_sects)
+ zone->len = dev_capacity_sects - zone->start;
+ else
+ zone->len = dev->zone_size_sects;
+ zone->capacity =
+ min_t(sector_t, zone->len, zone_capacity_sects);
zone->type = BLK_ZONE_TYPE_SEQWRITE_REQ;
zone->cond = BLK_ZONE_COND_EMPTY;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b08070eca9e247f60ab39d79b2c25d274750441f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 27 Nov 2020 12:33:54 +0100
Subject: [PATCH] ext4: don't remount read-only with errors=continue on reboot
ext4_handle_error() with errors=continue mount option can accidentally
remount the filesystem read-only when the system is rebooting. Fix that.
Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Andreas Dilger <adilger(a)dilger.ca>
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20201127113405.26867-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 872d45a131ca..3ef84e8ab1ae 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -666,19 +666,17 @@ static bool system_going_down(void)
static void ext4_handle_error(struct super_block *sb)
{
+ journal_t *journal = EXT4_SB(sb)->s_journal;
+
if (test_opt(sb, WARN_ON_ERROR))
WARN_ON_ONCE(1);
- if (sb_rdonly(sb))
+ if (sb_rdonly(sb) || test_opt(sb, ERRORS_CONT))
return;
- if (!test_opt(sb, ERRORS_CONT)) {
- journal_t *journal = EXT4_SB(sb)->s_journal;
-
- ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED);
- if (journal)
- jbd2_journal_abort(journal, -EIO);
- }
+ ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED);
+ if (journal)
+ jbd2_journal_abort(journal, -EIO);
/*
* We force ERRORS_RO behavior when system is rebooting. Otherwise we
* could panic during 'reboot -f' as the underlying device got already
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b08070eca9e247f60ab39d79b2c25d274750441f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 27 Nov 2020 12:33:54 +0100
Subject: [PATCH] ext4: don't remount read-only with errors=continue on reboot
ext4_handle_error() with errors=continue mount option can accidentally
remount the filesystem read-only when the system is rebooting. Fix that.
Fixes: 1dc1097ff60e ("ext4: avoid panic during forced reboot")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Andreas Dilger <adilger(a)dilger.ca>
Cc: stable(a)kernel.org
Link: https://lore.kernel.org/r/20201127113405.26867-2-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 872d45a131ca..3ef84e8ab1ae 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -666,19 +666,17 @@ static bool system_going_down(void)
static void ext4_handle_error(struct super_block *sb)
{
+ journal_t *journal = EXT4_SB(sb)->s_journal;
+
if (test_opt(sb, WARN_ON_ERROR))
WARN_ON_ONCE(1);
- if (sb_rdonly(sb))
+ if (sb_rdonly(sb) || test_opt(sb, ERRORS_CONT))
return;
- if (!test_opt(sb, ERRORS_CONT)) {
- journal_t *journal = EXT4_SB(sb)->s_journal;
-
- ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED);
- if (journal)
- jbd2_journal_abort(journal, -EIO);
- }
+ ext4_set_mount_flag(sb, EXT4_MF_FS_ABORTED);
+ if (journal)
+ jbd2_journal_abort(journal, -EIO);
/*
* We force ERRORS_RO behavior when system is rebooting. Otherwise we
* could panic during 'reboot -f' as the underlying device got already
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 779055842da5b2e508f3ccf9a8153cb1f704f566 Mon Sep 17 00:00:00 2001
From: Souptick Joarder <jrdr.linux(a)gmail.com>
Date: Sun, 6 Sep 2020 12:21:53 +0530
Subject: [PATCH] xen/gntdev.c: Mark pages as dirty
There seems to be a bug in the original code when gntdev_get_page()
is called with writeable=true then the page needs to be marked dirty
before being put.
To address this, a bool writeable is added in gnt_dev_copy_batch, set
it in gntdev_grant_copy_seg() (and drop `writeable` argument to
gntdev_get_page()) and then, based on batch->writeable, use
set_page_dirty_lock().
Fixes: a4cdb556cae0 (xen/gntdev: add ioctl for grant copy)
Suggested-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Signed-off-by: Souptick Joarder <jrdr.linux(a)gmail.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: David Vrabel <david.vrabel(a)citrix.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/1599375114-32360-1-git-send-email-jrdr.linux@gmai…
Reviewed-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky(a)oracle.com>
diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
index 64a9025a87be..1f32db7b72b2 100644
--- a/drivers/xen/gntdev.c
+++ b/drivers/xen/gntdev.c
@@ -720,17 +720,18 @@ struct gntdev_copy_batch {
s16 __user *status[GNTDEV_COPY_BATCH];
unsigned int nr_ops;
unsigned int nr_pages;
+ bool writeable;
};
static int gntdev_get_page(struct gntdev_copy_batch *batch, void __user *virt,
- bool writeable, unsigned long *gfn)
+ unsigned long *gfn)
{
unsigned long addr = (unsigned long)virt;
struct page *page;
unsigned long xen_pfn;
int ret;
- ret = get_user_pages_fast(addr, 1, writeable ? FOLL_WRITE : 0, &page);
+ ret = get_user_pages_fast(addr, 1, batch->writeable ? FOLL_WRITE : 0, &page);
if (ret < 0)
return ret;
@@ -746,9 +747,13 @@ static void gntdev_put_pages(struct gntdev_copy_batch *batch)
{
unsigned int i;
- for (i = 0; i < batch->nr_pages; i++)
+ for (i = 0; i < batch->nr_pages; i++) {
+ if (batch->writeable && !PageDirty(batch->pages[i]))
+ set_page_dirty_lock(batch->pages[i]);
put_page(batch->pages[i]);
+ }
batch->nr_pages = 0;
+ batch->writeable = false;
}
static int gntdev_copy(struct gntdev_copy_batch *batch)
@@ -837,8 +842,9 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
virt = seg->source.virt + copied;
off = (unsigned long)virt & ~XEN_PAGE_MASK;
len = min(len, (size_t)XEN_PAGE_SIZE - off);
+ batch->writeable = false;
- ret = gntdev_get_page(batch, virt, false, &gfn);
+ ret = gntdev_get_page(batch, virt, &gfn);
if (ret < 0)
return ret;
@@ -856,8 +862,9 @@ static int gntdev_grant_copy_seg(struct gntdev_copy_batch *batch,
virt = seg->dest.virt + copied;
off = (unsigned long)virt & ~XEN_PAGE_MASK;
len = min(len, (size_t)XEN_PAGE_SIZE - off);
+ batch->writeable = true;
- ret = gntdev_get_page(batch, virt, true, &gfn);
+ ret = gntdev_get_page(batch, virt, &gfn);
if (ret < 0)
return ret;
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: rc: ensure that uevent can be read directly after rc device register
Author: Sean Young <sean(a)mess.org>
Date: Sun Dec 20 13:29:54 2020 +0100
There is a race condition where if the /sys/class/rc0/uevent file is read
before rc_dev->registered is set to true, -ENODEV will be returned.
Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1901089
Cc: stable(a)vger.kernel.org
Fixes: a2e2d73fa281 ("media: rc: do not access device via sysfs after rc_unregister_device()")
Signed-off-by: Sean Young <sean(a)mess.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/rc/rc-main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
---
diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c
index 1d811e5ffb55..29d4d01896ff 100644
--- a/drivers/media/rc/rc-main.c
+++ b/drivers/media/rc/rc-main.c
@@ -1928,6 +1928,8 @@ int rc_register_device(struct rc_dev *dev)
goto out_raw;
}
+ dev->registered = true;
+
rc = device_add(&dev->dev);
if (rc)
goto out_rx_free;
@@ -1937,8 +1939,6 @@ int rc_register_device(struct rc_dev *dev)
dev->device_name ?: "Unspecified device", path ?: "N/A");
kfree(path);
- dev->registered = true;
-
/*
* once the the input device is registered in rc_setup_rx_device,
* userspace can open the input device and rc_open() will be called
ksys_umount was refactored to into split into another function
(path_umount) to enable sharing code. This changed the order that flags and
permissions are validated in, and made it so that user_path_at was called
before validating flags and capabilities.
Unfortunately, libfuse2[1] and libmount[2] rely on the old flag validation
behaviour to determine whether or not the kernel supports UMOUNT_NOFOLLOW.
The other path that this validation is being checked on is
init_umount->path_umount->can_umount. That's all internal to the kernel.
[1]: https://github.com/libfuse/libfuse/blob/9bfbeb576c5901b62a171d35510f0d1a922…
[2]: https://github.com/karelzak/util-linux/blob/7ed579523b556b1270f28dbdb7ee07d…
Signed-off-by: Sargun Dhillon <sargun(a)sargun.me>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: stable(a)vger.kernel.org
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: linux-fsdevel(a)vger.kernel.org
Fixes: 41525f56e256 ("fs: refactor ksys_umount")
---
fs/namespace.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/namespace.c b/fs/namespace.c
index cebaa3e81794..dc76f1cb89f4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1710,10 +1710,6 @@ static int can_umount(const struct path *path, int flags)
{
struct mount *mnt = real_mount(path->mnt);
- if (flags & ~(MNT_FORCE | MNT_DETACH | MNT_EXPIRE | UMOUNT_NOFOLLOW))
- return -EINVAL;
- if (!may_mount())
- return -EPERM;
if (path->dentry != path->mnt->mnt_root)
return -EINVAL;
if (!check_mnt(mnt))
@@ -1746,6 +1742,12 @@ static int ksys_umount(char __user *name, int flags)
struct path path;
int ret;
+ if (flags & ~(MNT_FORCE | MNT_DETACH | MNT_EXPIRE | UMOUNT_NOFOLLOW))
+ return -EINVAL;
+
+ if (!may_mount())
+ return -EPERM;
+
if (!(flags & UMOUNT_NOFOLLOW))
lookup_flags |= LOOKUP_FOLLOW;
ret = user_path_at(AT_FDCWD, name, lookup_flags, &path);
--
2.25.1
The function ovl_dir_real_file() currently uses the semaphore of the
inode to synchronize write to the upperfile cache field.
However, this function will get called by ovl_ioctl_set_flags(), which
utilizes the inode semaphore too. In this case ovl_dir_real_file() will
try to claim a lock that is owned by a function in its call stack, which
won't get released before ovl_dir_real_file() returns.
Define a dedicated semaphore for the upperfile cache, so that the
deadlock won't happen.
Fixes: 61536bed2149 ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories")
Cc: stable(a)vger.kernel.org # v5.10
Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io>
---
fs/overlayfs/readdir.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 01620ebae1bd..f10701aabb71 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -56,6 +56,7 @@ struct ovl_dir_file {
struct list_head *cursor;
struct file *realfile;
struct file *upperfile;
+ struct semaphore upperfile_sem;
};
static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
@@ -883,7 +884,7 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
ovl_path_upper(dentry, &upperpath);
realfile = ovl_dir_open_realfile(file, &upperpath);
- inode_lock(inode);
+ down(&od->upperfile_sem);
if (!od->upperfile) {
if (IS_ERR(realfile)) {
inode_unlock(inode);
@@ -896,7 +897,7 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
fput(realfile);
realfile = od->upperfile;
}
- inode_unlock(inode);
+ up(&od->upperfile_sem);
}
}
@@ -959,6 +960,7 @@ static int ovl_dir_open(struct inode *inode, struct file *file)
od->realfile = realfile;
od->is_real = ovl_dir_is_real(file->f_path.dentry);
od->is_upper = OVL_TYPE_UPPER(type);
+ sema_init(&od->upperfile_sem, 1);
file->private_data = od;
return 0;
--
2.28.0
The function ovl_dir_real_file() currently uses the semaphore of the
inode to synchronize write to the upperfile cache field.
However, this function will get called by ovl_ioctl_set_flags(), which
utilizes the inode semaphore too. In this case ovl_dir_real_file() will
try to claim a lock that is owned by a function in its call stack, which
won't get released before ovl_dir_real_file() returns.
Define a dedicated semaphore for the upperfile cache, so that the
deadlock won't happen.
Fixes: 61536bed2149 ("ovl: support [S|G]ETFLAGS and FS[S|G]ETXATTR ioctls for directories")
Cc: stable(a)vger.kernel.org # v5.10
Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io>
---
fs/overlayfs/readdir.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 01620ebae1bd..fa1844ff8db4 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -56,6 +56,7 @@ struct ovl_dir_file {
struct list_head *cursor;
struct file *realfile;
struct file *upperfile;
+ struct semaphore upperfile_sem;
};
static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
@@ -874,8 +875,6 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
* Need to check if we started out being a lower dir, but got copied up
*/
if (!od->is_upper) {
- struct inode *inode = file_inode(file);
-
realfile = READ_ONCE(od->upperfile);
if (!realfile) {
struct path upperpath;
@@ -883,10 +882,10 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
ovl_path_upper(dentry, &upperpath);
realfile = ovl_dir_open_realfile(file, &upperpath);
- inode_lock(inode);
+ down(&od->upperfile_sem);
if (!od->upperfile) {
if (IS_ERR(realfile)) {
- inode_unlock(inode);
+ up(&od->upperfile_sem);
return realfile;
}
smp_store_release(&od->upperfile, realfile);
@@ -896,7 +895,7 @@ struct file *ovl_dir_real_file(const struct file *file, bool want_upper)
fput(realfile);
realfile = od->upperfile;
}
- inode_unlock(inode);
+ up(&od->upperfile_sem);
}
}
@@ -959,6 +958,7 @@ static int ovl_dir_open(struct inode *inode, struct file *file)
od->realfile = realfile;
od->is_real = ovl_dir_is_real(file->f_path.dentry);
od->is_upper = OVL_TYPE_UPPER(type);
+ sema_init(&od->upperfile_sem, 1);
file->private_data = od;
return 0;
--
2.28.0
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.9+
---
drivers/md/bcache/features.c | 2 +-
drivers/md/bcache/features.h | 11 ++++++++---
drivers/md/bcache/super.c | 22 +++++++++++++++++++---
include/uapi/linux/bcache.h | 2 +-
4 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/md/bcache/features.c b/drivers/md/bcache/features.c
index 6469223f0b77..d636b7b2d070 100644
--- a/drivers/md/bcache/features.c
+++ b/drivers/md/bcache/features.c
@@ -17,7 +17,7 @@ struct feature {
};
static struct feature feature_list[] = {
- {BCH_FEATURE_INCOMPAT, BCH_FEATURE_INCOMPAT_LARGE_BUCKET,
+ {BCH_FEATURE_INCOMPAT, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE,
"large_bucket"},
{0, 0, 0 },
};
diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h
index e73724c2b49b..84fc2c0f0101 100644
--- a/drivers/md/bcache/features.h
+++ b/drivers/md/bcache/features.h
@@ -13,11 +13,15 @@
/* Feature set definition */
/* Incompat feature set */
-#define BCH_FEATURE_INCOMPAT_LARGE_BUCKET 0x0001 /* 32bit bucket size */
+/* 32bit bucket size, obsoleted */
+#define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET 0x0001
+/* real bucket size is (1 << bucket_size) */
+#define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE 0x0002
#define BCH_FEATURE_COMPAT_SUPP 0
#define BCH_FEATURE_RO_COMPAT_SUPP 0
-#define BCH_FEATURE_INCOMPAT_SUPP BCH_FEATURE_INCOMPAT_LARGE_BUCKET
+#define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
+ BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE)
#define BCH_HAS_COMPAT_FEATURE(sb, mask) \
((sb)->feature_compat & (mask))
@@ -77,7 +81,8 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \
~BCH##_FEATURE_INCOMPAT_##flagname; \
}
-BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET);
+BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET);
+BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE);
static inline bool bch_has_unknown_compat_features(struct cache_sb *sb)
{
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index f4674a3298af..3999641f1775 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -64,9 +64,25 @@ static unsigned int get_bucket_size(struct cache_sb *sb, struct cache_sb_disk *s
{
unsigned int bucket_size = le16_to_cpu(s->bucket_size);
- if (sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES &&
- bch_has_feature_large_bucket(sb))
- bucket_size |= le16_to_cpu(s->bucket_size_hi) << 16;
+ if (sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES) {
+ if (bch_has_feature_large_bucket(sb)) {
+ unsigned int max, order;
+
+ max = sizeof(unsigned int) * BITS_PER_BYTE - 1;
+ order = le16_to_cpu(s->bucket_size);
+ /*
+ * bcache tool will make sure the overflow won't
+ * happen, an error message here is enough.
+ */
+ if (order > max)
+ pr_err("Bucket size (1 << %u) overflows\n",
+ order);
+ bucket_size = 1 << order;
+ } else if (bch_has_feature_obso_large_bucket(sb)) {
+ bucket_size +=
+ le16_to_cpu(s->obso_bucket_size_hi) << 16;
+ }
+ }
return bucket_size;
}
diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h
index 52e8bcb33981..cf7399f03b71 100644
--- a/include/uapi/linux/bcache.h
+++ b/include/uapi/linux/bcache.h
@@ -213,7 +213,7 @@ struct cache_sb_disk {
__le16 keys;
};
__le64 d[SB_JOURNAL_BUCKETS]; /* journal buckets */
- __le16 bucket_size_hi;
+ __le16 obso_bucket_size_hi; /* obsoleted */
};
/*
--
2.26.2
If regmap_read() fails, the product_id local variable will contain
random value from the stack. Do not try to parse such value and fail
the ASV driver probe.
Fixes: 5ea428595cc5 ("soc: samsung: Add Exynos Adaptive Supply Voltage driver")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
drivers/soc/samsung/exynos-asv.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/soc/samsung/exynos-asv.c b/drivers/soc/samsung/exynos-asv.c
index f653e3533f0f..5daeadc36382 100644
--- a/drivers/soc/samsung/exynos-asv.c
+++ b/drivers/soc/samsung/exynos-asv.c
@@ -129,7 +129,13 @@ static int exynos_asv_probe(struct platform_device *pdev)
return PTR_ERR(asv->chipid_regmap);
}
- regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_PRO_ID, &product_id);
+ ret = regmap_read(asv->chipid_regmap, EXYNOS_CHIPID_REG_PRO_ID,
+ &product_id);
+ if (ret < 0) {
+ dev_err(&pdev->dev, "Cannot read revision from ChipID: %d\n",
+ ret);
+ return -ENODEV;
+ }
switch (product_id & EXYNOS_MASK) {
case 0xE5422000:
--
2.25.1
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_wokrs or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Cc: stable(a)vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
fs/file.c | 2 --
kernel/exit.c | 2 ++
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index c0b60961c672..dab120b71e44 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -21,7 +21,6 @@
#include <linux/rcupdate.h>
#include <linux/close_range.h>
#include <net/sock.h>
-#include <linux/io_uring.h>
unsigned int sysctl_nr_open __read_mostly = 1024*1024;
unsigned int sysctl_nr_open_min = BITS_PER_LONG;
@@ -428,7 +427,6 @@ void exit_files(struct task_struct *tsk)
struct files_struct * files = tsk->files;
if (files) {
- io_uring_files_cancel(files);
task_lock(tsk);
tsk->files = NULL;
task_unlock(tsk);
diff --git a/kernel/exit.c b/kernel/exit.c
index 3594291a8542..04029e35e69a 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -63,6 +63,7 @@
#include <linux/random.h>
#include <linux/rcuwait.h>
#include <linux/compat.h>
+#include <linux/io_uring.h>
#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -776,6 +777,7 @@ void __noreturn do_exit(long code)
schedule();
}
+ io_uring_files_cancel(tsk->files);
exit_signals(tsk); /* sets PF_EXITING */
/* sync mm's RSS info before statistics gathering */
--
2.24.0
When large bucket feature was added, BCH_FEATURE_INCOMPAT_LARGE_BUCKET
was introduced into the incompat feature set. It used bucket_size_hi
(which was added at the tail of struct cache_sb_disk) to extend current
16bit bucket size to 32bit with existing bucket_size in struct
cache_sb_disk.
This is not a good idea, there are two obvious problems,
- Bucket size is always value power of 2, if store log2(bucket size) in
existing bucket_size of struct cache_sb_disk, it is unnecessary to add
bucket_size_hi.
- Macro csum_set() assumes d[SB_JOURNAL_BUCKETS] is the last member in
struct cache_sb_disk, bucket_size_hi was added after d[] which makes
csum_set calculate an unexpected super block checksum.
To fix the above problems, this patch introduces a new incompat feature
bit BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE, when this bit is set, it
means bucket_size in struct cache_sb_disk stores the order of power-of-2
bucket size value. When user specifies a bucket size larger than 32768
sectors, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE will be set to
incompat feature set, and bucket_size stores log2(bucket size) more
than store the real bucket size value.
The obsoleted BCH_FEATURE_INCOMPAT_LARGE_BUCKET won't be used anymore,
it is renamed to BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET and still only
recognized by kernel driver for legacy compatible purpose. The previous
bucket_size_hi is renmaed to obso_bucket_size_hi in struct cache_sb_disk
and not used in bcache-tools anymore.
For cache device created with BCH_FEATURE_INCOMPAT_LARGE_BUCKET feature,
bcache-tools and kernel driver still recognize the feature string and
display it as "obso_large_bucket".
With this change, the unnecessary extra space extend of bcache on-disk
super block can be avoided, and csum_set() may generate expected check
sum as well.
Fixes: ffa470327572 ("bcache: add bucket_size_hi into struct cache_sb_disk for large bucket")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.9+
---
drivers/md/bcache/features.c | 2 +-
drivers/md/bcache/features.h | 11 ++++++++---
drivers/md/bcache/super.c | 22 +++++++++++++++++++---
include/uapi/linux/bcache.h | 2 +-
4 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/drivers/md/bcache/features.c b/drivers/md/bcache/features.c
index 6469223f0b77..d636b7b2d070 100644
--- a/drivers/md/bcache/features.c
+++ b/drivers/md/bcache/features.c
@@ -17,7 +17,7 @@ struct feature {
};
static struct feature feature_list[] = {
- {BCH_FEATURE_INCOMPAT, BCH_FEATURE_INCOMPAT_LARGE_BUCKET,
+ {BCH_FEATURE_INCOMPAT, BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE,
"large_bucket"},
{0, 0, 0 },
};
diff --git a/drivers/md/bcache/features.h b/drivers/md/bcache/features.h
index e73724c2b49b..84fc2c0f0101 100644
--- a/drivers/md/bcache/features.h
+++ b/drivers/md/bcache/features.h
@@ -13,11 +13,15 @@
/* Feature set definition */
/* Incompat feature set */
-#define BCH_FEATURE_INCOMPAT_LARGE_BUCKET 0x0001 /* 32bit bucket size */
+/* 32bit bucket size, obsoleted */
+#define BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET 0x0001
+/* real bucket size is (1 << bucket_size) */
+#define BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE 0x0002
#define BCH_FEATURE_COMPAT_SUPP 0
#define BCH_FEATURE_RO_COMPAT_SUPP 0
-#define BCH_FEATURE_INCOMPAT_SUPP BCH_FEATURE_INCOMPAT_LARGE_BUCKET
+#define BCH_FEATURE_INCOMPAT_SUPP (BCH_FEATURE_INCOMPAT_OBSO_LARGE_BUCKET| \
+ BCH_FEATURE_INCOMPAT_LOG_LARGE_BUCKET_SIZE)
#define BCH_HAS_COMPAT_FEATURE(sb, mask) \
((sb)->feature_compat & (mask))
@@ -77,7 +81,8 @@ static inline void bch_clear_feature_##name(struct cache_sb *sb) \
~BCH##_FEATURE_INCOMPAT_##flagname; \
}
-BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LARGE_BUCKET);
+BCH_FEATURE_INCOMPAT_FUNCS(obso_large_bucket, OBSO_LARGE_BUCKET);
+BCH_FEATURE_INCOMPAT_FUNCS(large_bucket, LOG_LARGE_BUCKET_SIZE);
static inline bool bch_has_unknown_compat_features(struct cache_sb *sb)
{
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1804df5c866b..d742eb5dc2e5 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -64,9 +64,25 @@ static unsigned int get_bucket_size(struct cache_sb *sb, struct cache_sb_disk *s
{
unsigned int bucket_size = le16_to_cpu(s->bucket_size);
- if (sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES &&
- bch_has_feature_large_bucket(sb))
- bucket_size |= le16_to_cpu(s->bucket_size_hi) << 16;
+ if (sb->version >= BCACHE_SB_VERSION_CDEV_WITH_FEATURES) {
+ if (bch_has_feature_large_bucket(sb)) {
+ unsigned int max, order;
+
+ max = sizeof(unsigned int) * BITS_PER_BYTE - 1;
+ order = le16_to_cpu(s->bucket_size);
+ /*
+ * bcache tool will make sure the overflow won't
+ * happen, an error message here is enough.
+ */
+ if (order > max)
+ pr_err("Bucket size (1 << %u) overflows\n",
+ order);
+ bucket_size = 1 << order;
+ } else if (bch_has_feature_obso_large_bucket(sb)) {
+ bucket_size +=
+ le16_to_cpu(s->obso_bucket_size_hi) << 16;
+ }
+ }
return bucket_size;
}
diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h
index 52e8bcb33981..cf7399f03b71 100644
--- a/include/uapi/linux/bcache.h
+++ b/include/uapi/linux/bcache.h
@@ -213,7 +213,7 @@ struct cache_sb_disk {
__le16 keys;
};
__le64 d[SB_JOURNAL_BUCKETS]; /* journal buckets */
- __le16 bucket_size_hi;
+ __le16 obso_bucket_size_hi; /* obsoleted */
};
/*
--
2.26.2
在 2021/1/2 上午1:09, Barnabás Pőcze 写道:
> Hi
>
>
> 2021. január 1., péntek 17:08 keltezéssel, Jiaxun Yang írta:
>
>> [...]
>>>> @@ -1006,6 +1018,10 @@ static int ideapad_acpi_add(struct platform_device *pdev)
>>>> if (!priv->has_hw_rfkill_switch)
>>>> write_ec_cmd(priv->adev->handle, VPCCMD_W_RF, 1);
>>>>
>>>> + /* The same for Touchpad */
>>>> + if (!priv->has_touchpad_switch)
>>>> + write_ec_cmd(priv->adev->handle, VPCCMD_W_TOUCHPAD, 1);
>>>> +
>>> Shouldn't it be the other way around: `if (priv->has_touchpad_switch)`?
>> It is to prevent accidentally disable touchpad on machines that do have EC switch,
>> so it's intentional.
>> [...]
> Sorry, but the explanation not fully clear to me. The commit message seems to
> indicate that some models "do not use EC to switch touchpad", and I take that
> means that reading from VPCCMD_R_TOUCHPAD will not reflect the actual state of the
> touchpad and writing to VPCCMD_W_TOUCHPAD will not change the state of the touchpad.
I'm just trying to prevent removing functionality on machines that
touchpad can be controlled
by EC but also equipped I2C HID touchpad. At least users will have a
functional touchpad
after that.
>
> But then why do you still write to VPCCMD_W_TOUCHPAD on devices where supposedly
> this does not have any effect (at least not the desired one)? And the part of the
> code I made my comment about only runs on machines on which the touchpad supposedly
> cannot be controlled by the EC. What am I missing?
>
> And there is the other problem: on some machines, this patch removes working
> functionality.
Yeah that's a problem. I just don't want to repeat the story of rfkill
whitelist, it ends up with
countless machine to be added.
Maybe I should specify HID of touchpad as well. Two machines that known
to be problematic
all have ELAN0634 touchpad.
Thanks.
- Jiaxun
>
>
> Regards,
> Barnabás Pőcze
The patch titled
Subject: lib/zlib: fix inflating zlib streams on s390
has been removed from the -mm tree. Its filename was
lib-zlib-fix-inflating-zlib-streams-on-s390.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Ilya Leoshkevich <iii(a)linux.ibm.com>
Subject: lib/zlib: fix inflating zlib streams on s390
Decompressing zlib streams on s390 fails with "incorrect data check"
error.
Userspace zlib checks inflate_state.flags in order to byteswap checksums
only for zlib streams, and s390 hardware inflate code, which was ported
from there, tries to match this behavior. At the same time, kernel zlib
does not use inflate_state.flags, so it contains essentially random
values. For many use cases either zlib stream is zeroed out or checksum
is not used, so this problem is masked, but at least SquashFS is still
affected.
Fix by always passing a checksum to and from the hardware as is, which
matches zlib_inflate()'s expectations.
Link: https://lkml.kernel.org/r/20201215155551.894884-1-iii@linux.ibm.com
Fixes: 126196100063 ("lib/zlib: add s390 hardware support for kernel zlib_inflate")
Signed-off-by: Ilya Leoshkevich <iii(a)linux.ibm.com>
Tested-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Acked-by: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Acked-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Mikhail Zaslonko <zaslonko(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org> [5.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/zlib_dfltcc/dfltcc_inflate.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/lib/zlib_dfltcc/dfltcc_inflate.c~lib-zlib-fix-inflating-zlib-streams-on-s390
+++ a/lib/zlib_dfltcc/dfltcc_inflate.c
@@ -125,7 +125,7 @@ dfltcc_inflate_action dfltcc_inflate(
param->ho = (state->write - state->whave) & ((1 << HB_BITS) - 1);
if (param->hl)
param->nt = 0; /* Honor history for the first block */
- param->cv = state->flags ? REVERSE(state->check) : state->check;
+ param->cv = state->check;
/* Inflate */
do {
@@ -138,7 +138,7 @@ dfltcc_inflate_action dfltcc_inflate(
state->bits = param->sbb;
state->whave = param->hl;
state->write = (param->ho + param->hl) & ((1 << HB_BITS) - 1);
- state->check = state->flags ? REVERSE(param->cv) : param->cv;
+ state->check = param->cv;
if (cc == DFLTCC_CC_OP2_CORRUPT && param->oesc != 0) {
/* Report an error if stream is corrupted */
state->mode = BAD;
_
Patches currently in -mm which might be from iii(a)linux.ibm.com are
The patch titled
Subject: mm: memmap defer init doesn't work as expected
has been removed from the -mm tree. Its filename was
mm-memmap-defer-init-dosnt-work-as-expected.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Baoquan He <bhe(a)redhat.com>
Subject: mm: memmap defer init doesn't work as expected
VMware observed a performance regression during memmap init on their
platform, and bisected to commit 73a6e474cb376 ("mm: memmap_init: iterate
over memblock regions rather that check each PFN") causing it.
Before the commit:
[0.033176] Normal zone: 1445888 pages used for memmap
[0.033176] Normal zone: 89391104 pages, LIFO batch:63
[0.035851] ACPI: PM-Timer IO Port: 0x448
With commit
[0.026874] Normal zone: 1445888 pages used for memmap
[0.026875] Normal zone: 89391104 pages, LIFO batch:63
[2.028450] ACPI: PM-Timer IO Port: 0x448
The root cause is the current memmap defer init doesn't work as expected.
Before, memmap_init_zone() was used to do memmap init of one whole zone,
to initialize all low zones of one numa node, but defer memmap init of the
last zone in that numa node. However, since commit 73a6e474cb376,
function memmap_init() is adapted to iterater over memblock regions inside
one zone, then call memmap_init_zone() to do memmap init for each region.
E.g, on VMware's system, the memory layout is as below, there are two
memory regions in node 2. The current code will mistakenly initialize the
whole 1st region [mem 0xab00000000-0xfcffffffff], then do memmap defer to
iniatialize only one memmory section on the 2nd region [mem
0x10000000000-0x1033fffffff]. In fact, we only expect to see that there's
only one memory section's memmap initialized. That's why more time is
costed at the time.
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x0009ffff]
[ 0.008842] ACPI: SRAT: Node 0 PXM 0 [mem 0x00100000-0xbfffffff]
[ 0.008843] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x55ffffffff]
[ 0.008844] ACPI: SRAT: Node 1 PXM 1 [mem 0x5600000000-0xaaffffffff]
[ 0.008844] ACPI: SRAT: Node 2 PXM 2 [mem 0xab00000000-0xfcffffffff]
[ 0.008845] ACPI: SRAT: Node 2 PXM 2 [mem 0x10000000000-0x1033fffffff]
Now, let's add a parameter 'zone_end_pfn' to memmap_init_zone() to pass
down the real zone end pfn so that defer_init() can use it to judge
whether defer need be taken in zone wide.
Link: https://lkml.kernel.org/r/20201223080811.16211-1-bhe@redhat.com
Link: https://lkml.kernel.org/r/20201223080811.16211-2-bhe@redhat.com
Fixes: commit 73a6e474cb376 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Reported-by: Rahul Gopakumar <gopakumarr(a)vmware.com>
Reviewed-by: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/ia64/mm/init.c | 4 ++--
include/linux/mm.h | 5 +++--
mm/memory_hotplug.c | 2 +-
mm/page_alloc.c | 8 +++++---
4 files changed, 11 insertions(+), 8 deletions(-)
--- a/arch/ia64/mm/init.c~mm-memmap-defer-init-dosnt-work-as-expected
+++ a/arch/ia64/mm/init.c
@@ -536,7 +536,7 @@ virtual_memmap_init(u64 start, u64 end,
if (map_start < map_end)
memmap_init_zone((unsigned long)(map_end - map_start),
- args->nid, args->zone, page_to_pfn(map_start),
+ args->nid, args->zone, page_to_pfn(map_start), page_to_pfn(map_end),
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
return 0;
}
@@ -546,7 +546,7 @@ memmap_init (unsigned long size, int nid
unsigned long start_pfn)
{
if (!vmem_map) {
- memmap_init_zone(size, nid, zone, start_pfn,
+ memmap_init_zone(size, nid, zone, start_pfn, start_pfn + size,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
} else {
struct page *start;
--- a/include/linux/mm.h~mm-memmap-defer-init-dosnt-work-as-expected
+++ a/include/linux/mm.h
@@ -2439,8 +2439,9 @@ extern int __meminit early_pfn_to_nid(un
#endif
extern void set_dma_reserve(unsigned long new_dma_reserve);
-extern void memmap_init_zone(unsigned long, int, unsigned long, unsigned long,
- enum meminit_context, struct vmem_altmap *, int migratetype);
+extern void memmap_init_zone(unsigned long, int, unsigned long,
+ unsigned long, unsigned long, enum meminit_context,
+ struct vmem_altmap *, int migratetype);
extern void setup_per_zone_wmarks(void);
extern int __meminit init_per_zone_wmark_min(void);
extern void mem_init(void);
--- a/mm/memory_hotplug.c~mm-memmap-defer-init-dosnt-work-as-expected
+++ a/mm/memory_hotplug.c
@@ -713,7 +713,7 @@ void __ref move_pfn_range_to_zone(struct
* expects the zone spans the pfn range. All the pages in the range
* are reserved so nobody should be touching them so we should be safe
*/
- memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn,
+ memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 0,
MEMINIT_HOTPLUG, altmap, migratetype);
set_zone_contiguous(zone);
--- a/mm/page_alloc.c~mm-memmap-defer-init-dosnt-work-as-expected
+++ a/mm/page_alloc.c
@@ -423,6 +423,8 @@ defer_init(int nid, unsigned long pfn, u
if (end_pfn < pgdat_end_pfn(NODE_DATA(nid)))
return false;
+ if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX)
+ return true;
/*
* We start only with one section of pages, more pages are added as
* needed until the rest of deferred pages are initialized.
@@ -6116,7 +6118,7 @@ overlap_memmap_init(unsigned long zone,
* zone stats (e.g., nr_isolate_pageblock) are touched.
*/
void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
- unsigned long start_pfn,
+ unsigned long start_pfn, unsigned long zone_end_pfn,
enum meminit_context context,
struct vmem_altmap *altmap, int migratetype)
{
@@ -6152,7 +6154,7 @@ void __meminit memmap_init_zone(unsigned
if (context == MEMINIT_EARLY) {
if (overlap_memmap_init(zone, &pfn))
continue;
- if (defer_init(nid, pfn, end_pfn))
+ if (defer_init(nid, pfn, zone_end_pfn))
break;
}
@@ -6266,7 +6268,7 @@ void __meminit __weak memmap_init(unsign
if (end_pfn > start_pfn) {
size = end_pfn - start_pfn;
- memmap_init_zone(size, nid, zone, start_pfn,
+ memmap_init_zone(size, nid, zone, start_pfn, range_end_pfn,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
}
}
_
Patches currently in -mm which might be from bhe(a)redhat.com are
The patch titled
Subject: mm/hugetlb: fix deadlock in hugetlb_cow error path
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-deadlock-in-hugetlb_cow-error-path.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: mm/hugetlb: fix deadlock in hugetlb_cow error path
syzbot reported the deadlock here [1]. The issue is in hugetlb cow error
handling when there are not enough huge pages for the faulting task which
took the original reservation. It is possible that other (child) tasks
could have consumed pages associated with the reservation. In this case,
we want the task which took the original reservation to succeed. So, we
unmap any associated pages in children so that they can be used by the
faulting task that owns the reservation.
The unmapping code needs to hold i_mmap_rwsem in write mode. However, due
to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing
synchronization") we are already holding i_mmap_rwsem in read mode when
hugetlb_cow is called. Technically, i_mmap_rwsem does not need to be held
in read mode for COW mappings as they can not share pmd's. Modifying the
fault code to not take i_mmap_rwsem in read mode for COW (and other
non-sharable) mappings is too involved for a stable fix. Instead, we
simply drop the hugetlb_fault_mutex and i_mmap_rwsem before unmapping.
This is OK as it is technically not needed. They are reacquired after
unmapping as expected by calling code. Since this is done in an uncommon
error path, the overhead of dropping and reacquiring mutexes is
acceptable.
While making changes, remove redundant BUG_ON after unmap_ref_private.
[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@google.com
Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@oracle.com
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: syzbot+5eee4145df3c15e96625(a)syzkaller.appspotmail.com
Cc: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-deadlock-in-hugetlb_cow-error-path
+++ a/mm/hugetlb.c
@@ -4105,10 +4105,30 @@ retry_avoidcopy:
* may get SIGKILLed if it later faults.
*/
if (outside_reserve) {
+ struct address_space *mapping = vma->vm_file->f_mapping;
+ pgoff_t idx;
+ u32 hash;
+
put_page(old_page);
BUG_ON(huge_pte_none(pte));
+ /*
+ * Drop hugetlb_fault_mutex and i_mmap_rwsem before
+ * unmapping. unmapping needs to hold i_mmap_rwsem
+ * in write mode. Dropping i_mmap_rwsem in read mode
+ * here is OK as COW mappings do not interact with
+ * PMD sharing.
+ *
+ * Reacquire both after unmap operation.
+ */
+ idx = vma_hugecache_offset(h, vma, haddr);
+ hash = hugetlb_fault_mutex_hash(mapping, idx);
+ mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
+
unmap_ref_private(mm, vma, old_page, haddr);
- BUG_ON(huge_pte_none(pte));
+
+ i_mmap_lock_read(mapping);
+ mutex_lock(&hugetlb_fault_mutex_table[hash]);
spin_lock(ptl);
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (likely(ptep &&
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
mm-hugetlb-change-hugetlb_reserve_pages-to-type-bool.patch
hugetlbfs-remove-special-hugetlbfs_set_page_dirty.patch
Currently request cancellations are happening before PF_EXITING is set,
so it's allowed to call task_work_run(). Even though it should work as
it's not it's safer to remove PF_EXITING checks.
Cc: stable(a)vger.kernel.org # 5.5+
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
---
fs/io_uring.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c
index ca46f314640b..8d4fa0031e0a 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -2361,12 +2361,8 @@ static inline unsigned int io_put_rw_kbuf(struct io_kiocb *req)
static inline bool io_run_task_work(void)
{
- /*
- * Not safe to run on exiting task, and the task_work handling will
- * not add work to such a task.
- */
- if (unlikely(current->flags & PF_EXITING))
- return false;
+ WARN_ON_ONCE(current->flags & PF_EXITING);
+
if (current->task_works) {
__set_current_state(TASK_RUNNING);
task_work_run();
--
2.24.0
Hello !
I could need your help ... I have tested the new kernel 5.10.3 and sound doesn't work with this
version.
Seems the new Intel audio drivers are the main reason. What can be done ? Do you have any ideas ?
Intel Catpt driver support is new ... This deprecates the previous Haswell SoC audio driver code
previously providing the audio capabilities.
And I am having a Haswell CPU -> Audio device: Intel Corporation Xeon E3-1200 v3/4th Gen Core
Processor HD Audio Controller (rev 06)
Regards,
Christian
This is a note to let you know that I've just added the patch titled
crypto: asym_tpm: correct zero out potential secrets
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-linus branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will hopefully also be merged in Linus's tree for the
next -rc kernel release.
If you have any questions about this process, please let me know.
>From f93274ef0fe972c120c96b3207f8fce376231a60 Mon Sep 17 00:00:00 2001
From: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Date: Fri, 4 Dec 2020 09:01:36 +0100
Subject: crypto: asym_tpm: correct zero out potential secrets
The function derive_pub_key() should be calling memzero_explicit()
instead of memset() in case the complier decides to optimize away the
call to memset() because it "knows" no one is going to touch the memory
anymore.
Cc: stable <stable(a)vger.kernel.org>
Reported-by: Ilil Blum Shem-Tov <ilil.blum.shem-tov(a)intel.com>
Tested-by: Ilil Blum Shem-Tov <ilil.blum.shem-tov(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Link: https://lore.kernel.org/r/X8ns4AfwjKudpyfe@kroah.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
crypto/asymmetric_keys/asym_tpm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/crypto/asymmetric_keys/asym_tpm.c b/crypto/asymmetric_keys/asym_tpm.c
index 511932aa94a6..0959613560b9 100644
--- a/crypto/asymmetric_keys/asym_tpm.c
+++ b/crypto/asymmetric_keys/asym_tpm.c
@@ -354,7 +354,7 @@ static uint32_t derive_pub_key(const void *pub_key, uint32_t len, uint8_t *buf)
memcpy(cur, e, sizeof(e));
cur += sizeof(e);
/* Zero parameters to satisfy set_pub_key ABI. */
- memset(cur, 0, SETKEY_PARAMS_SIZE);
+ memzero_explicit(cur, SETKEY_PARAMS_SIZE);
return cur - buf;
}
--
2.30.0