The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 74583f1b92cb3bbba1a3741cea237545c56f506c Mon Sep 17 00:00:00 2001
From: Niklas Cassel <niklas.cassel(a)wdc.com>
Date: Tue, 1 Mar 2022 00:44:18 +0000
Subject: [PATCH] riscv: dts: k210: fix broken IRQs on hart1
Commit 67d96729a9e7 ("riscv: Update Canaan Kendryte K210 device tree")
incorrectly removed two entries from the PLIC interrupt-controller node's
interrupts-extended property.
The PLIC driver cannot know the mapping between hart contexts and hart ids,
so this information has to be provided by device tree, as specified by the
PLIC device tree binding.
The PLIC driver uses the interrupts-extended property, and initializes the
hart context registers in the exact same order as provided by the
interrupts-extended property.
In other words, if we don't specify the S-mode interrupts, the PLIC driver
will simply initialize the hart0 S-mode hart context with the hart1 M-mode
configuration. It is therefore essential to specify the S-mode IRQs even
though the system itself will only ever be running in M-mode.
Re-add the S-mode interrupts, so that we get working IRQs on hart1 again.
Cc: <stable(a)vger.kernel.org>
Fixes: 67d96729a9e7 ("riscv: Update Canaan Kendryte K210 device tree")
Signed-off-by: Niklas Cassel <niklas.cassel(a)wdc.com>
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/boot/dts/canaan/k210.dtsi b/arch/riscv/boot/dts/canaan/k210.dtsi
index 56f57118c633..44d338514761 100644
--- a/arch/riscv/boot/dts/canaan/k210.dtsi
+++ b/arch/riscv/boot/dts/canaan/k210.dtsi
@@ -113,7 +113,8 @@ plic0: interrupt-controller@c000000 {
compatible = "canaan,k210-plic", "sifive,plic-1.0.0";
reg = <0xC000000 0x4000000>;
interrupt-controller;
- interrupts-extended = <&cpu0_intc 11>, <&cpu1_intc 11>;
+ interrupts-extended = <&cpu0_intc 11>, <&cpu0_intc 9>,
+ <&cpu1_intc 11>, <&cpu1_intc 9>;
riscv,ndev = <65>;
};
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4e6f55120c7eccf6f9323bb681632e23cbcb3f3c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Fri, 4 Feb 2022 16:18:18 +0200
Subject: [PATCH] drm/i915: Workaround broken BIOS DBUF configuration on
TGL/RKL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
On TGL/RKL the BIOS likes to use some kind of bogus DBUF layout
that doesn't match what the spec recommends. With a single active
pipe that is not going to be a problem, but with multiple pipes
active skl_commit_modeset_enables() goes into an infinite loop
since it can't figure out any order in which it can commit the
pipes without causing DBUF overlaps between the planes.
We'd need some kind of extra DBUF defrag stage in between to
make the transition possible. But that is clearly way too complex
a solution, so in the name of simplicity let's just sanitize the
DBUF state by simply turning off all planes when we detect a
pipe encroaching on its neighbours' DBUF slices. We only have
to disable the primary planes as all other planes should have
already been disabled (if they somehow were enabled) by
earlier sanitization steps.
And for good measure let's also sanitize in case the DBUF
allocations of the pipes already seem to overlap each other.
Cc: <stable(a)vger.kernel.org> # v5.14+
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4762
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220204141818.1900-3-ville.s…
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy(a)intel.com>
(cherry picked from commit 15512021eb3975a8c2366e3883337e252bb0eee5)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index bf7ce684dd8e..bb4a85445fc6 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -10673,6 +10673,7 @@ intel_modeset_setup_hw_state(struct drm_device *dev,
vlv_wm_sanitize(dev_priv);
} else if (DISPLAY_VER(dev_priv) >= 9) {
skl_wm_get_hw_state(dev_priv);
+ skl_wm_sanitize(dev_priv);
} else if (HAS_PCH_SPLIT(dev_priv)) {
ilk_wm_get_hw_state(dev_priv);
}
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a298846dd8cf..3edba7fd0c49 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6698,6 +6698,74 @@ void skl_wm_get_hw_state(struct drm_i915_private *dev_priv)
dbuf_state->enabled_slices = dev_priv->dbuf.enabled_slices;
}
+static bool skl_dbuf_is_misconfigured(struct drm_i915_private *i915)
+{
+ const struct intel_dbuf_state *dbuf_state =
+ to_intel_dbuf_state(i915->dbuf.obj.state);
+ struct skl_ddb_entry entries[I915_MAX_PIPES] = {};
+ struct intel_crtc *crtc;
+
+ for_each_intel_crtc(&i915->drm, crtc) {
+ const struct intel_crtc_state *crtc_state =
+ to_intel_crtc_state(crtc->base.state);
+
+ entries[crtc->pipe] = crtc_state->wm.skl.ddb;
+ }
+
+ for_each_intel_crtc(&i915->drm, crtc) {
+ const struct intel_crtc_state *crtc_state =
+ to_intel_crtc_state(crtc->base.state);
+ u8 slices;
+
+ slices = skl_compute_dbuf_slices(crtc, dbuf_state->active_pipes,
+ dbuf_state->joined_mbus);
+ if (dbuf_state->slices[crtc->pipe] & ~slices)
+ return true;
+
+ if (skl_ddb_allocation_overlaps(&crtc_state->wm.skl.ddb, entries,
+ I915_MAX_PIPES, crtc->pipe))
+ return true;
+ }
+
+ return false;
+}
+
+void skl_wm_sanitize(struct drm_i915_private *i915)
+{
+ struct intel_crtc *crtc;
+
+ /*
+ * On TGL/RKL (at least) the BIOS likes to assign the planes
+ * to the wrong DBUF slices. This will cause an infinite loop
+ * in skl_commit_modeset_enables() as it can't find a way to
+ * transition between the old bogus DBUF layout to the new
+ * proper DBUF layout without DBUF allocation overlaps between
+ * the planes (which cannot be allowed or else the hardware
+ * may hang). If we detect a bogus DBUF layout just turn off
+ * all the planes so that skl_commit_modeset_enables() can
+ * simply ignore them.
+ */
+ if (!skl_dbuf_is_misconfigured(i915))
+ return;
+
+ drm_dbg_kms(&i915->drm, "BIOS has misprogrammed the DBUF, disabling all planes\n");
+
+ for_each_intel_crtc(&i915->drm, crtc) {
+ struct intel_plane *plane = to_intel_plane(crtc->base.primary);
+ const struct intel_plane_state *plane_state =
+ to_intel_plane_state(plane->base.state);
+ struct intel_crtc_state *crtc_state =
+ to_intel_crtc_state(crtc->base.state);
+
+ if (plane_state->uapi.visible)
+ intel_plane_disable_noatomic(crtc, plane);
+
+ drm_WARN_ON(&i915->drm, crtc_state->active_planes != 0);
+
+ memset(&crtc_state->wm.skl.ddb, 0, sizeof(crtc_state->wm.skl.ddb));
+ }
+}
+
static void ilk_pipe_wm_get_hw_state(struct intel_crtc *crtc)
{
struct drm_device *dev = crtc->base.dev;
diff --git a/drivers/gpu/drm/i915/intel_pm.h b/drivers/gpu/drm/i915/intel_pm.h
index 990cdcaf85ce..d2243653a893 100644
--- a/drivers/gpu/drm/i915/intel_pm.h
+++ b/drivers/gpu/drm/i915/intel_pm.h
@@ -47,6 +47,7 @@ void skl_pipe_wm_get_hw_state(struct intel_crtc *crtc,
struct skl_pipe_wm *out);
void g4x_wm_sanitize(struct drm_i915_private *dev_priv);
void vlv_wm_sanitize(struct drm_i915_private *dev_priv);
+void skl_wm_sanitize(struct drm_i915_private *dev_priv);
bool intel_can_enable_sagv(struct drm_i915_private *dev_priv,
const struct intel_bw_state *bw_state);
void intel_sagv_pre_plane_update(struct intel_atomic_state *state);
From: Filipe Manana <fdmanana(a)suse.com>
Commit d96b34248c2f4ea8cd09286090f2f6f77102eaab upstream.
We don't allow send and balance/relocation to run in parallel in order
to prevent send failing or silently producing some bad stream. This is
because while send is using an extent (specially metadata) or about to
read a metadata extent and expecting it belongs to a specific parent
node, relocation can run, the transaction used for the relocation is
committed and the extent gets reallocated while send is still using the
extent, so it ends up with a different content than expected. This can
result in just failing to read a metadata extent due to failure of the
validation checks (parent transid, level, etc), failure to find a
backreference for a data extent, and other unexpected failures. Besides
reallocation, there's also a similar problem of an extent getting
discarded when it's unpinned after the transaction used for block group
relocation is committed.
The restriction between balance and send was added in commit 9e967495e0e0
("Btrfs: prevent send failures and crashes due to concurrent relocation"),
kernel 5.3, while the more general restriction between send and relocation
was added in commit 1cea5cf0e664 ("btrfs: ensure relocation never runs
while we have send operations running"), kernel 5.14.
Both send and relocation can be very long running operations. Relocation
because it has to do a lot of IO and expensive backreference lookups in
case there are many snapshots, and send due to read IO when operating on
very large trees. This makes it inconvenient for users and tools to deal
with scheduling both operations.
For zoned filesystem we also have automatic block group relocation, so
send can fail with -EAGAIN when users least expect it or send can end up
delaying the block group relocation for too long. In the future we might
also get the automatic block group relocation for non zoned filesystems.
This change makes it possible for send and relocation to run in parallel.
This is achieved the following way:
1) For all tree searches, send acquires a read lock on the commit root
semaphore;
2) After each tree search, and before releasing the commit root semaphore,
the leaf is cloned and placed in the search path (struct btrfs_path);
3) After releasing the commit root semaphore, the changed_cb() callback
is invoked, which operates on the leaf and writes commands to the pipe
(or file in case send/receive is not used with a pipe). It's important
here to not hold a lock on the commit root semaphore, because if we did
we could deadlock when sending and receiving to the same filesystem
using a pipe - the send task blocks on the pipe because it's full, the
receive task, which is the only consumer of the pipe, triggers a
transaction commit when attempting to create a subvolume or reserve
space for a write operation for example, but the transaction commit
blocks trying to write lock the commit root semaphore, resulting in a
deadlock;
4) Before moving to the next key, or advancing to the next change in case
of an incremental send, check if a transaction used for relocation was
committed (or is about to finish its commit). If so, release the search
path(s) and restart the search, to where we were before, so that we
don't operate on stale extent buffers. The search restarts are always
possible because both the send and parent roots are RO, and no one can
add, remove of update keys (change their offset) in RO trees - the
only exception is deduplication, but that is still not allowed to run
in parallel with send;
5) Periodically check if there is contention on the commit root semaphore,
which means there is a transaction commit trying to write lock it, and
release the semaphore and reschedule if there is contention, so as to
avoid causing any significant delays to transaction commits.
This leaves some room for optimizations for send to have less path
releases and re searching the trees when there's relocation running, but
for now it's kept simple as it performs quite well (on very large trees
with resulting send streams in the order of a few hundred gigabytes).
Test case btrfs/187, from fstests, stresses relocation, send and
deduplication attempting to run in parallel, but without verifying if send
succeeds and if it produces correct streams. A new test case will be added
that exercises relocation happening in parallel with send and then checks
that send succeeds and the resulting streams are correct.
A final note is that for now this still leaves the mutual exclusion
between send operations and deduplication on files belonging to a root
used by send operations. A solution for that will be slightly more complex
but it will eventually be built on top of this change.
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
---
fs/btrfs/block-group.c | 9 +-
fs/btrfs/ctree.c | 98 ++++++++---
fs/btrfs/ctree.h | 14 +-
fs/btrfs/disk-io.c | 4 +-
fs/btrfs/relocation.c | 13 --
fs/btrfs/send.c | 357 +++++++++++++++++++++++++++++++++++------
fs/btrfs/transaction.c | 4 +
7 files changed, 395 insertions(+), 104 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index d721c66d0b41..5edd07e0232d 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1491,7 +1491,6 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
container_of(work, struct btrfs_fs_info, reclaim_bgs_work);
struct btrfs_block_group *bg;
struct btrfs_space_info *space_info;
- LIST_HEAD(again_list);
if (!test_bit(BTRFS_FS_OPEN, &fs_info->flags))
return;
@@ -1562,18 +1561,14 @@ void btrfs_reclaim_bgs_work(struct work_struct *work)
div64_u64(zone_unusable * 100, bg->length));
trace_btrfs_reclaim_block_group(bg);
ret = btrfs_relocate_chunk(fs_info, bg->start);
- if (ret && ret != -EAGAIN)
+ if (ret)
btrfs_err(fs_info, "error relocating chunk %llu",
bg->start);
next:
+ btrfs_put_block_group(bg);
spin_lock(&fs_info->unused_bgs_lock);
- if (ret == -EAGAIN && list_empty(&bg->bg_list))
- list_add_tail(&bg->bg_list, &again_list);
- else
- btrfs_put_block_group(bg);
}
- list_splice_tail(&again_list, &fs_info->reclaim_bgs);
spin_unlock(&fs_info->unused_bgs_lock);
mutex_unlock(&fs_info->reclaim_bgs_lock);
btrfs_exclop_finish(fs_info);
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 95a6a63caf04..899f85445925 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -1566,32 +1566,13 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
struct btrfs_path *p,
int write_lock_level)
{
- struct btrfs_fs_info *fs_info = root->fs_info;
struct extent_buffer *b;
int root_lock = 0;
int level = 0;
if (p->search_commit_root) {
- /*
- * The commit roots are read only so we always do read locks,
- * and we always must hold the commit_root_sem when doing
- * searches on them, the only exception is send where we don't
- * want to block transaction commits for a long time, so
- * we need to clone the commit root in order to avoid races
- * with transaction commits that create a snapshot of one of
- * the roots used by a send operation.
- */
- if (p->need_commit_sem) {
- down_read(&fs_info->commit_root_sem);
- b = btrfs_clone_extent_buffer(root->commit_root);
- up_read(&fs_info->commit_root_sem);
- if (!b)
- return ERR_PTR(-ENOMEM);
-
- } else {
- b = root->commit_root;
- atomic_inc(&b->refs);
- }
+ b = root->commit_root;
+ atomic_inc(&b->refs);
level = btrfs_header_level(b);
/*
* Ensure that all callers have set skip_locking when
@@ -1657,6 +1638,42 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root,
return b;
}
+/*
+ * Replace the extent buffer at the lowest level of the path with a cloned
+ * version. The purpose is to be able to use it safely, after releasing the
+ * commit root semaphore, even if relocation is happening in parallel, the
+ * transaction used for relocation is committed and the extent buffer is
+ * reallocated in the next transaction.
+ *
+ * This is used in a context where the caller does not prevent transaction
+ * commits from happening, either by holding a transaction handle or holding
+ * some lock, while it's doing searches through a commit root.
+ * At the moment it's only used for send operations.
+ */
+static int finish_need_commit_sem_search(struct btrfs_path *path)
+{
+ const int i = path->lowest_level;
+ const int slot = path->slots[i];
+ struct extent_buffer *lowest = path->nodes[i];
+ struct extent_buffer *clone;
+
+ ASSERT(path->need_commit_sem);
+
+ if (!lowest)
+ return 0;
+
+ lockdep_assert_held_read(&lowest->fs_info->commit_root_sem);
+
+ clone = btrfs_clone_extent_buffer(lowest);
+ if (!clone)
+ return -ENOMEM;
+
+ btrfs_release_path(path);
+ path->nodes[i] = clone;
+ path->slots[i] = slot;
+
+ return 0;
+}
/*
* btrfs_search_slot - look for a key in a tree and perform necessary
@@ -1693,6 +1710,7 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
const struct btrfs_key *key, struct btrfs_path *p,
int ins_len, int cow)
{
+ struct btrfs_fs_info *fs_info = root->fs_info;
struct extent_buffer *b;
int slot;
int ret;
@@ -1734,6 +1752,11 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
min_write_lock_level = write_lock_level;
+ if (p->need_commit_sem) {
+ ASSERT(p->search_commit_root);
+ down_read(&fs_info->commit_root_sem);
+ }
+
again:
prev_cmp = -1;
b = btrfs_search_slot_get_root(root, p, write_lock_level);
@@ -1928,6 +1951,16 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root,
done:
if (ret < 0 && !p->skip_release_on_error)
btrfs_release_path(p);
+
+ if (p->need_commit_sem) {
+ int ret2;
+
+ ret2 = finish_need_commit_sem_search(p);
+ up_read(&fs_info->commit_root_sem);
+ if (ret2)
+ ret = ret2;
+ }
+
return ret;
}
ALLOW_ERROR_INJECTION(btrfs_search_slot, ERRNO);
@@ -4396,7 +4429,9 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
int level;
struct extent_buffer *c;
struct extent_buffer *next;
+ struct btrfs_fs_info *fs_info = root->fs_info;
struct btrfs_key key;
+ bool need_commit_sem = false;
u32 nritems;
int ret;
int i;
@@ -4413,14 +4448,20 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
path->keep_locks = 1;
- if (time_seq)
+ if (time_seq) {
ret = btrfs_search_old_slot(root, &key, path, time_seq);
- else
+ } else {
+ if (path->need_commit_sem) {
+ path->need_commit_sem = 0;
+ need_commit_sem = true;
+ down_read(&fs_info->commit_root_sem);
+ }
ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+ }
path->keep_locks = 0;
if (ret < 0)
- return ret;
+ goto done;
nritems = btrfs_header_nritems(path->nodes[0]);
/*
@@ -4543,6 +4584,15 @@ int btrfs_next_old_leaf(struct btrfs_root *root, struct btrfs_path *path,
ret = 0;
done:
unlock_up(path, 0, 1, 0, NULL);
+ if (need_commit_sem) {
+ int ret2;
+
+ path->need_commit_sem = 1;
+ ret2 = finish_need_commit_sem_search(path);
+ up_read(&fs_info->commit_root_sem);
+ if (ret2)
+ ret = ret2;
+ }
return ret;
}
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b46409801647..e89f814cc8f5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -568,7 +568,6 @@ enum {
/*
* Indicate that relocation of a chunk has started, it's set per chunk
* and is toggled between chunks.
- * Set, tested and cleared while holding fs_info::send_reloc_lock.
*/
BTRFS_FS_RELOC_RUNNING,
@@ -668,6 +667,12 @@ struct btrfs_fs_info {
u64 generation;
u64 last_trans_committed;
+ /*
+ * Generation of the last transaction used for block group relocation
+ * since the filesystem was last mounted (or 0 if none happened yet).
+ * Must be written and read while holding btrfs_fs_info::commit_root_sem.
+ */
+ u64 last_reloc_trans;
u64 avg_delayed_ref_runtime;
/*
@@ -997,13 +1002,6 @@ struct btrfs_fs_info {
struct crypto_shash *csum_shash;
- spinlock_t send_reloc_lock;
- /*
- * Number of send operations in progress.
- * Updated while holding fs_info::send_reloc_lock.
- */
- int send_in_progress;
-
/* Type of exclusive operation running, protected by super_lock */
enum btrfs_exclusive_operation exclusive_operation;
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 2180fcef56ca..d5a590b11be5 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -2859,6 +2859,7 @@ static int __cold init_tree_roots(struct btrfs_fs_info *fs_info)
/* All successful */
fs_info->generation = generation;
fs_info->last_trans_committed = generation;
+ fs_info->last_reloc_trans = 0;
/* Always begin writing backup roots after the one being used */
if (backup_index < 0) {
@@ -2992,9 +2993,6 @@ void btrfs_init_fs_info(struct btrfs_fs_info *fs_info)
spin_lock_init(&fs_info->swapfile_pins_lock);
fs_info->swapfile_pins = RB_ROOT;
- spin_lock_init(&fs_info->send_reloc_lock);
- fs_info->send_in_progress = 0;
-
fs_info->bg_reclaim_threshold = BTRFS_DEFAULT_RECLAIM_THRESH;
INIT_WORK(&fs_info->reclaim_bgs_work, btrfs_reclaim_bgs_work);
}
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index a050f9748fa7..a6661f2ad2c0 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -3854,25 +3854,14 @@ struct inode *create_reloc_inode(struct btrfs_fs_info *fs_info,
* 0 success
* -EINPROGRESS operation is already in progress, that's probably a bug
* -ECANCELED cancellation request was set before the operation started
- * -EAGAIN can not start because there are ongoing send operations
*/
static int reloc_chunk_start(struct btrfs_fs_info *fs_info)
{
- spin_lock(&fs_info->send_reloc_lock);
- if (fs_info->send_in_progress) {
- btrfs_warn_rl(fs_info,
-"cannot run relocation while send operations are in progress (%d in progress)",
- fs_info->send_in_progress);
- spin_unlock(&fs_info->send_reloc_lock);
- return -EAGAIN;
- }
if (test_and_set_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) {
/* This should not happen */
- spin_unlock(&fs_info->send_reloc_lock);
btrfs_err(fs_info, "reloc already running, cannot start");
return -EINPROGRESS;
}
- spin_unlock(&fs_info->send_reloc_lock);
if (atomic_read(&fs_info->reloc_cancel_req) > 0) {
btrfs_info(fs_info, "chunk relocation canceled on start");
@@ -3894,9 +3883,7 @@ static void reloc_chunk_end(struct btrfs_fs_info *fs_info)
/* Requested after start, clear bit first so any waiters can continue */
if (atomic_read(&fs_info->reloc_cancel_req) > 0)
btrfs_info(fs_info, "chunk relocation canceled during operation");
- spin_lock(&fs_info->send_reloc_lock);
clear_and_wake_up_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags);
- spin_unlock(&fs_info->send_reloc_lock);
atomic_set(&fs_info->reloc_cancel_req, 0);
}
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index 5612e8bf2ace..4d2c6ce29fe5 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -24,6 +24,7 @@
#include "transaction.h"
#include "compression.h"
#include "xattr.h"
+#include "print-tree.h"
/*
* Maximum number of references an extent can have in order for us to attempt to
@@ -95,6 +96,15 @@ struct send_ctx {
struct btrfs_path *right_path;
struct btrfs_key *cmp_key;
+ /*
+ * Keep track of the generation of the last transaction that was used
+ * for relocating a block group. This is periodically checked in order
+ * to detect if a relocation happened since the last check, so that we
+ * don't operate on stale extent buffers for nodes (level >= 1) or on
+ * stale disk_bytenr values of file extent items.
+ */
+ u64 last_reloc_trans;
+
/*
* infos of the currently processed inode. In case of deleted inodes,
* these are the values from the deleted inode.
@@ -1415,6 +1425,26 @@ static int find_extent_clone(struct send_ctx *sctx,
if (ret < 0)
goto out;
+ down_read(&fs_info->commit_root_sem);
+ if (fs_info->last_reloc_trans > sctx->last_reloc_trans) {
+ /*
+ * A transaction commit for a transaction in which block group
+ * relocation was done just happened.
+ * The disk_bytenr of the file extent item we processed is
+ * possibly stale, referring to the extent's location before
+ * relocation. So act as if we haven't found any clone sources
+ * and fallback to write commands, which will read the correct
+ * data from the new extent location. Otherwise we will fail
+ * below because we haven't found our own back reference or we
+ * could be getting incorrect sources in case the old extent
+ * was already reallocated after the relocation.
+ */
+ up_read(&fs_info->commit_root_sem);
+ ret = -ENOENT;
+ goto out;
+ }
+ up_read(&fs_info->commit_root_sem);
+
if (!backref_ctx.found_itself) {
/* found a bug in backref code? */
ret = -EIO;
@@ -6596,6 +6626,50 @@ static int changed_cb(struct btrfs_path *left_path,
{
int ret = 0;
+ /*
+ * We can not hold the commit root semaphore here. This is because in
+ * the case of sending and receiving to the same filesystem, using a
+ * pipe, could result in a deadlock:
+ *
+ * 1) The task running send blocks on the pipe because it's full;
+ *
+ * 2) The task running receive, which is the only consumer of the pipe,
+ * is waiting for a transaction commit (for example due to a space
+ * reservation when doing a write or triggering a transaction commit
+ * when creating a subvolume);
+ *
+ * 3) The transaction is waiting to write lock the commit root semaphore,
+ * but can not acquire it since it's being held at 1).
+ *
+ * Down this call chain we write to the pipe through kernel_write().
+ * The same type of problem can also happen when sending to a file that
+ * is stored in the same filesystem - when reserving space for a write
+ * into the file, we can trigger a transaction commit.
+ *
+ * Our caller has supplied us with clones of leaves from the send and
+ * parent roots, so we're safe here from a concurrent relocation and
+ * further reallocation of metadata extents while we are here. Below we
+ * also assert that the leaves are clones.
+ */
+ lockdep_assert_not_held(&sctx->send_root->fs_info->commit_root_sem);
+
+ /*
+ * We always have a send root, so left_path is never NULL. We will not
+ * have a leaf when we have reached the end of the send root but have
+ * not yet reached the end of the parent root.
+ */
+ if (left_path->nodes[0])
+ ASSERT(test_bit(EXTENT_BUFFER_UNMAPPED,
+ &left_path->nodes[0]->bflags));
+ /*
+ * When doing a full send we don't have a parent root, so right_path is
+ * NULL. When doing an incremental send, we may have reached the end of
+ * the parent root already, so we don't have a leaf at right_path.
+ */
+ if (right_path && right_path->nodes[0])
+ ASSERT(test_bit(EXTENT_BUFFER_UNMAPPED,
+ &right_path->nodes[0]->bflags));
+
if (result == BTRFS_COMPARE_TREE_SAME) {
if (key->type == BTRFS_INODE_REF_KEY ||
key->type == BTRFS_INODE_EXTREF_KEY) {
@@ -6642,14 +6716,46 @@ static int changed_cb(struct btrfs_path *left_path,
return ret;
}
+static int search_key_again(const struct send_ctx *sctx,
+ struct btrfs_root *root,
+ struct btrfs_path *path,
+ const struct btrfs_key *key)
+{
+ int ret;
+
+ if (!path->need_commit_sem)
+ lockdep_assert_held_read(&root->fs_info->commit_root_sem);
+
+ /*
+ * Roots used for send operations are readonly and no one can add,
+ * update or remove keys from them, so we should be able to find our
+ * key again. The only exception is deduplication, which can operate on
+ * readonly roots and add, update or remove keys to/from them - but at
+ * the moment we don't allow it to run in parallel with send.
+ */
+ ret = btrfs_search_slot(NULL, root, key, path, 0, 0);
+ ASSERT(ret <= 0);
+ if (ret > 0) {
+ btrfs_print_tree(path->nodes[path->lowest_level], false);
+ btrfs_err(root->fs_info,
+"send: key (%llu %u %llu) not found in %s root %llu, lowest_level %d, slot %d",
+ key->objectid, key->type, key->offset,
+ (root == sctx->parent_root ? "parent" : "send"),
+ root->root_key.objectid, path->lowest_level,
+ path->slots[path->lowest_level]);
+ return -EUCLEAN;
+ }
+
+ return ret;
+}
+
static int full_send_tree(struct send_ctx *sctx)
{
int ret;
struct btrfs_root *send_root = sctx->send_root;
struct btrfs_key key;
+ struct btrfs_fs_info *fs_info = send_root->fs_info;
struct btrfs_path *path;
- struct extent_buffer *eb;
- int slot;
path = alloc_path_for_send();
if (!path)
@@ -6660,6 +6766,10 @@ static int full_send_tree(struct send_ctx *sctx)
key.type = BTRFS_INODE_ITEM_KEY;
key.offset = 0;
+ down_read(&fs_info->commit_root_sem);
+ sctx->last_reloc_trans = fs_info->last_reloc_trans;
+ up_read(&fs_info->commit_root_sem);
+
ret = btrfs_search_slot_for_read(send_root, &key, path, 1, 0);
if (ret < 0)
goto out;
@@ -6667,15 +6777,35 @@ static int full_send_tree(struct send_ctx *sctx)
goto out_finish;
while (1) {
- eb = path->nodes[0];
- slot = path->slots[0];
- btrfs_item_key_to_cpu(eb, &key, slot);
+ btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
ret = changed_cb(path, NULL, &key,
BTRFS_COMPARE_TREE_NEW, sctx);
if (ret < 0)
goto out;
+ down_read(&fs_info->commit_root_sem);
+ if (fs_info->last_reloc_trans > sctx->last_reloc_trans) {
+ sctx->last_reloc_trans = fs_info->last_reloc_trans;
+ up_read(&fs_info->commit_root_sem);
+ /*
+ * A transaction used for relocating a block group was
+ * committed or is about to finish its commit. Release
+ * our path (leaf) and restart the search, so that we
+ * avoid operating on any file extent items that are
+ * stale, with a disk_bytenr that reflects a pre
+ * relocation value. This way we avoid as much as
+ * possible to fallback to regular writes when checking
+ * if we can clone file ranges.
+ */
+ btrfs_release_path(path);
+ ret = search_key_again(sctx, send_root, path, &key);
+ if (ret < 0)
+ goto out;
+ } else {
+ up_read(&fs_info->commit_root_sem);
+ }
+
ret = btrfs_next_item(send_root, path);
if (ret < 0)
goto out;
@@ -6693,6 +6823,20 @@ static int full_send_tree(struct send_ctx *sctx)
return ret;
}
+static int replace_node_with_clone(struct btrfs_path *path, int level)
+{
+ struct extent_buffer *clone;
+
+ clone = btrfs_clone_extent_buffer(path->nodes[level]);
+ if (!clone)
+ return -ENOMEM;
+
+ free_extent_buffer(path->nodes[level]);
+ path->nodes[level] = clone;
+
+ return 0;
+}
+
static int tree_move_down(struct btrfs_path *path, int *level, u64 reada_min_gen)
{
struct extent_buffer *eb;
@@ -6702,6 +6846,8 @@ static int tree_move_down(struct btrfs_path *path, int *level, u64 reada_min_gen
u64 reada_max;
u64 reada_done = 0;
+ lockdep_assert_held_read(&parent->fs_info->commit_root_sem);
+
BUG_ON(*level == 0);
eb = btrfs_read_node_slot(parent, slot);
if (IS_ERR(eb))
@@ -6725,6 +6871,10 @@ static int tree_move_down(struct btrfs_path *path, int *level, u64 reada_min_gen
path->nodes[*level - 1] = eb;
path->slots[*level - 1] = 0;
(*level)--;
+
+ if (*level == 0)
+ return replace_node_with_clone(path, 0);
+
return 0;
}
@@ -6738,8 +6888,10 @@ static int tree_move_next_or_upnext(struct btrfs_path *path,
path->slots[*level]++;
while (path->slots[*level] >= nritems) {
- if (*level == root_level)
+ if (*level == root_level) {
+ path->slots[*level] = nritems - 1;
return -1;
+ }
/* move upnext */
path->slots[*level] = 0;
@@ -6771,14 +6923,20 @@ static int tree_advance(struct btrfs_path *path,
} else {
ret = tree_move_down(path, level, reada_min_gen);
}
- if (ret >= 0) {
- if (*level == 0)
- btrfs_item_key_to_cpu(path->nodes[*level], key,
- path->slots[*level]);
- else
- btrfs_node_key_to_cpu(path->nodes[*level], key,
- path->slots[*level]);
- }
+
+ /*
+ * Even if we have reached the end of a tree, ret is -1, update the key
+ * anyway, so that in case we need to restart due to a block group
+ * relocation, we can assert that the last key of the root node still
+ * exists in the tree.
+ */
+ if (*level == 0)
+ btrfs_item_key_to_cpu(path->nodes[*level], key,
+ path->slots[*level]);
+ else
+ btrfs_node_key_to_cpu(path->nodes[*level], key,
+ path->slots[*level]);
+
return ret;
}
@@ -6807,6 +6965,97 @@ static int tree_compare_item(struct btrfs_path *left_path,
return 0;
}
+/*
+ * A transaction used for relocating a block group was committed or is about to
+ * finish its commit. Release our paths and restart the search, so that we are
+ * not using stale extent buffers:
+ *
+ * 1) For levels > 0, we are only holding references of extent buffers, without
+ * any locks on them, which does not prevent them from having been relocated
+ * and reallocated after the last time we released the commit root semaphore.
+ * The exception are the root nodes, for which we always have a clone, see
+ * the comment at btrfs_compare_trees();
+ *
+ * 2) For leaves, level 0, we are holding copies (clones) of extent buffers, so
+ * we are safe from the concurrent relocation and reallocation. However they
+ * can have file extent items with a pre relocation disk_bytenr value, so we
+ * restart the start from the current commit roots and clone the new leaves so
+ * that we get the post relocation disk_bytenr values. Not doing so, could
+ * make us clone the wrong data in case there are new extents using the old
+ * disk_bytenr that happen to be shared.
+ */
+static int restart_after_relocation(struct btrfs_path *left_path,
+ struct btrfs_path *right_path,
+ const struct btrfs_key *left_key,
+ const struct btrfs_key *right_key,
+ int left_level,
+ int right_level,
+ const struct send_ctx *sctx)
+{
+ int root_level;
+ int ret;
+
+ lockdep_assert_held_read(&sctx->send_root->fs_info->commit_root_sem);
+
+ btrfs_release_path(left_path);
+ btrfs_release_path(right_path);
+
+ /*
+ * Since keys can not be added or removed to/from our roots because they
+ * are readonly and we do not allow deduplication to run in parallel
+ * (which can add, remove or change keys), the layout of the trees should
+ * not change.
+ */
+ left_path->lowest_level = left_level;
+ ret = search_key_again(sctx, sctx->send_root, left_path, left_key);
+ if (ret < 0)
+ return ret;
+
+ right_path->lowest_level = right_level;
+ ret = search_key_again(sctx, sctx->parent_root, right_path, right_key);
+ if (ret < 0)
+ return ret;
+
+ /*
+ * If the lowest level nodes are leaves, clone them so that they can be
+ * safely used by changed_cb() while not under the protection of the
+ * commit root semaphore, even if relocation and reallocation happens in
+ * parallel.
+ */
+ if (left_level == 0) {
+ ret = replace_node_with_clone(left_path, 0);
+ if (ret < 0)
+ return ret;
+ }
+
+ if (right_level == 0) {
+ ret = replace_node_with_clone(right_path, 0);
+ if (ret < 0)
+ return ret;
+ }
+
+ /*
+ * Now clone the root nodes (unless they happen to be the leaves we have
+ * already cloned). This is to protect against concurrent snapshotting of
+ * the send and parent roots (see the comment at btrfs_compare_trees()).
+ */
+ root_level = btrfs_header_level(sctx->send_root->commit_root);
+ if (root_level > 0) {
+ ret = replace_node_with_clone(left_path, root_level);
+ if (ret < 0)
+ return ret;
+ }
+
+ root_level = btrfs_header_level(sctx->parent_root->commit_root);
+ if (root_level > 0) {
+ ret = replace_node_with_clone(right_path, root_level);
+ if (ret < 0)
+ return ret;
+ }
+
+ return 0;
+}
+
/*
* This function compares two trees and calls the provided callback for
* every changed/new/deleted item it finds.
@@ -6835,10 +7084,10 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
int right_root_level;
int left_level;
int right_level;
- int left_end_reached;
- int right_end_reached;
- int advance_left;
- int advance_right;
+ int left_end_reached = 0;
+ int right_end_reached = 0;
+ int advance_left = 0;
+ int advance_right = 0;
u64 left_blockptr;
u64 right_blockptr;
u64 left_gen;
@@ -6906,12 +7155,18 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
down_read(&fs_info->commit_root_sem);
left_level = btrfs_header_level(left_root->commit_root);
left_root_level = left_level;
+ /*
+ * We clone the root node of the send and parent roots to prevent races
+ * with snapshot creation of these roots. Snapshot creation COWs the
+ * root node of a tree, so after the transaction is committed the old
+ * extent can be reallocated while this send operation is still ongoing.
+ * So we clone them, under the commit root semaphore, to be race free.
+ */
left_path->nodes[left_level] =
btrfs_clone_extent_buffer(left_root->commit_root);
if (!left_path->nodes[left_level]) {
- up_read(&fs_info->commit_root_sem);
ret = -ENOMEM;
- goto out;
+ goto out_unlock;
}
right_level = btrfs_header_level(right_root->commit_root);
@@ -6919,9 +7174,8 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
right_path->nodes[right_level] =
btrfs_clone_extent_buffer(right_root->commit_root);
if (!right_path->nodes[right_level]) {
- up_read(&fs_info->commit_root_sem);
ret = -ENOMEM;
- goto out;
+ goto out_unlock;
}
/*
* Our right root is the parent root, while the left root is the "send"
@@ -6931,7 +7185,6 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
* will need to read them at some point.
*/
reada_min_gen = btrfs_header_generation(right_root->commit_root);
- up_read(&fs_info->commit_root_sem);
if (left_level == 0)
btrfs_item_key_to_cpu(left_path->nodes[left_level],
@@ -6946,11 +7199,26 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
btrfs_node_key_to_cpu(right_path->nodes[right_level],
&right_key, right_path->slots[right_level]);
- left_end_reached = right_end_reached = 0;
- advance_left = advance_right = 0;
+ sctx->last_reloc_trans = fs_info->last_reloc_trans;
while (1) {
- cond_resched();
+ if (need_resched() ||
+ rwsem_is_contended(&fs_info->commit_root_sem)) {
+ up_read(&fs_info->commit_root_sem);
+ cond_resched();
+ down_read(&fs_info->commit_root_sem);
+ }
+
+ if (fs_info->last_reloc_trans > sctx->last_reloc_trans) {
+ ret = restart_after_relocation(left_path, right_path,
+ &left_key, &right_key,
+ left_level, right_level,
+ sctx);
+ if (ret < 0)
+ goto out_unlock;
+ sctx->last_reloc_trans = fs_info->last_reloc_trans;
+ }
+
if (advance_left && !left_end_reached) {
ret = tree_advance(left_path, &left_level,
left_root_level,
@@ -6959,7 +7227,7 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
if (ret == -1)
left_end_reached = ADVANCE;
else if (ret < 0)
- goto out;
+ goto out_unlock;
advance_left = 0;
}
if (advance_right && !right_end_reached) {
@@ -6970,54 +7238,55 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
if (ret == -1)
right_end_reached = ADVANCE;
else if (ret < 0)
- goto out;
+ goto out_unlock;
advance_right = 0;
}
if (left_end_reached && right_end_reached) {
ret = 0;
- goto out;
+ goto out_unlock;
} else if (left_end_reached) {
if (right_level == 0) {
+ up_read(&fs_info->commit_root_sem);
ret = changed_cb(left_path, right_path,
&right_key,
BTRFS_COMPARE_TREE_DELETED,
sctx);
if (ret < 0)
goto out;
+ down_read(&fs_info->commit_root_sem);
}
advance_right = ADVANCE;
continue;
} else if (right_end_reached) {
if (left_level == 0) {
+ up_read(&fs_info->commit_root_sem);
ret = changed_cb(left_path, right_path,
&left_key,
BTRFS_COMPARE_TREE_NEW,
sctx);
if (ret < 0)
goto out;
+ down_read(&fs_info->commit_root_sem);
}
advance_left = ADVANCE;
continue;
}
if (left_level == 0 && right_level == 0) {
+ up_read(&fs_info->commit_root_sem);
cmp = btrfs_comp_cpu_keys(&left_key, &right_key);
if (cmp < 0) {
ret = changed_cb(left_path, right_path,
&left_key,
BTRFS_COMPARE_TREE_NEW,
sctx);
- if (ret < 0)
- goto out;
advance_left = ADVANCE;
} else if (cmp > 0) {
ret = changed_cb(left_path, right_path,
&right_key,
BTRFS_COMPARE_TREE_DELETED,
sctx);
- if (ret < 0)
- goto out;
advance_right = ADVANCE;
} else {
enum btrfs_compare_tree_result result;
@@ -7031,11 +7300,13 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
result = BTRFS_COMPARE_TREE_SAME;
ret = changed_cb(left_path, right_path,
&left_key, result, sctx);
- if (ret < 0)
- goto out;
advance_left = ADVANCE;
advance_right = ADVANCE;
}
+
+ if (ret < 0)
+ goto out;
+ down_read(&fs_info->commit_root_sem);
} else if (left_level == right_level) {
cmp = btrfs_comp_cpu_keys(&left_key, &right_key);
if (cmp < 0) {
@@ -7075,6 +7346,8 @@ static int btrfs_compare_trees(struct btrfs_root *left_root,
}
}
+out_unlock:
+ up_read(&fs_info->commit_root_sem);
out:
btrfs_free_path(left_path);
btrfs_free_path(right_path);
@@ -7413,21 +7686,7 @@ long btrfs_ioctl_send(struct file *mnt_file, struct btrfs_ioctl_send_args *arg)
if (ret)
goto out;
- spin_lock(&fs_info->send_reloc_lock);
- if (test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags)) {
- spin_unlock(&fs_info->send_reloc_lock);
- btrfs_warn_rl(fs_info,
- "cannot run send because a relocation operation is in progress");
- ret = -EAGAIN;
- goto out;
- }
- fs_info->send_in_progress++;
- spin_unlock(&fs_info->send_reloc_lock);
-
ret = send_subvol(sctx);
- spin_lock(&fs_info->send_reloc_lock);
- fs_info->send_in_progress--;
- spin_unlock(&fs_info->send_reloc_lock);
if (ret < 0)
goto out;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 9a6009108ea5..642cd2b55fa0 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -163,6 +163,10 @@ static noinline void switch_commit_roots(struct btrfs_trans_handle *trans)
struct btrfs_caching_control *caching_ctl, *next;
down_write(&fs_info->commit_root_sem);
+
+ if (test_bit(BTRFS_FS_RELOC_RUNNING, &fs_info->flags))
+ fs_info->last_reloc_trans = trans->transid;
+
list_for_each_entry_safe(root, tmp, &cur_trans->switch_commits,
dirty_list) {
list_del_init(&root->dirty_list);
--
2.33.1
From: Josh Triplett <josh(a)joshtriplett.org>
commit b1489186cc8391e0c1e342f9fbc3eedf6b944c61 upstream.
The in-kernel ext4 resize code doesn't support filesystem with the
sparse_super2 feature. It fails with errors like this and doesn't
finish the resize:
EXT4-fs (loop0): resizing filesystem from 16640 to 7864320 blocks
EXT4-fs warning (device loop0): verify_reserved_gdb:760: reserved GDT 2 missing grp 1 (32770)
EXT4-fs warning (device loop0): ext4_resize_fs:2111: error (-22) occurred during file system resize
EXT4-fs (loop0): resized filesystem to 2097152
To reproduce:
mkfs.ext4 -b 4096 -I 256 -J size=32 -E resize=$((256*1024*1024)) -O sparse_super2 ext4.img 65M
truncate -s 30G ext4.img
mount ext4.img /mnt
python3 -c 'import fcntl, os, struct ; fd = os.open("/mnt", os.O_RDONLY | os.O_DIRECTORY) ; fcntl.ioctl(fd, 0x40086610, struct.pack("Q", 30 * 1024 * 1024 * 1024 // 4096), False) ; os.close(fd)'
dmesg | tail
e2fsck ext4.img
The userspace resize2fs tool has a check for this case: it checks if
the filesystem has sparse_super2 set and if the kernel provides
/sys/fs/ext4/features/sparse_super2. However, the former check
requires manually reading and parsing the filesystem superblock.
Detect this case in ext4_resize_begin and error out early with a clear
error message.
Signed-off-by: Josh Triplett <josh(a)joshtriplett.org>
Link: https://lore.kernel.org/r/74b8ae78405270211943cd7393e65586c5faeed1.16230932…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
fs/ext4/resize.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index 928700d57eb6..6513079c728b 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -74,6 +74,11 @@ int ext4_resize_begin(struct super_block *sb)
return -EPERM;
}
+ if (ext4_has_feature_sparse_super2(sb)) {
+ ext4_msg(sb, KERN_ERR, "Online resizing not supported with sparse_super2");
+ return -EOPNOTSUPP;
+ }
+
if (test_and_set_bit_lock(EXT4_FLAGS_RESIZING,
&EXT4_SB(sb)->s_ext4_flags))
ret = -EBUSY;
--
2.31.0
The stable release process documented in stable-kernel-rules.rst needs
to be updated to reflect current procedure.
Patch 1 is merely reorganizing three submission option lists to be
subsection of the procedure section.
Patch 2 contains the actual process documentation update.
Patch 3 and 4 updates "Tree" section by adding stable -rc tree link and
correcting link for stable tree, respectively.
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: stable(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Bagas Sanjaya (4):
Documentation: make option lists subsection of "Procedure for
submitting patches to the -stable tree" in stable-kernel-rules.rst
Documentation: update stable review cycle documentation
Documentation: add link to stable release candidate tree
Documentation: update stable tree link
Documentation/process/stable-kernel-rules.rst | 31 ++++++++++++++-----
1 file changed, 24 insertions(+), 7 deletions(-)
base-commit: ffb217a13a2eaf6d5bd974fc83036a53ca69f1e2
--
An old man doll... just what I always wanted! - Clara
Good evening from here this evening my dear, how are you doing today?
My name is Mrs. Latifa Rassim Mohamad from Saudi Arabia, I have
something very important and serious i will like to discuss with you
privately, so i hope this is your private email?
Mrs. Latifa Rassim Mohamad.
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The patch below does not apply to the 5.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 173ce1ca47c489135b2799f70f550e1319ba36d8 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 11 Mar 2022 15:58:21 +0000
Subject: [PATCH] afs: Fix potential thrashing in afs writeback
In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.
Fix this with three measures:
(1) Advance start to after the page we found.
(2) Break out of the loop and return if rescheduling is requested.
(3) Arbitrarily give up after a maximum of 5 skips.
Fixes: 31143d5d515e ("AFS: implement basic file write support")
Reported-by: Marc Dionne <marc.dionne(a)auristor.com>
Signed-off-by: David Howells <dhowells(a)redhat.com>
Tested-by: Marc Dionne <marc.dionne(a)auristor.com>
Acked-by: Marc Dionne <marc.dionne(a)auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@wa… # v1
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 5e9157d0da29..f447c902318d 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -703,7 +703,7 @@ static int afs_writepages_region(struct address_space *mapping,
struct folio *folio;
struct page *head_page;
ssize_t ret;
- int n;
+ int n, skips = 0;
_enter("%llx,%llx,", start, end);
@@ -754,8 +754,15 @@ static int afs_writepages_region(struct address_space *mapping,
#ifdef CONFIG_AFS_FSCACHE
folio_wait_fscache(folio);
#endif
+ } else {
+ start += folio_size(folio);
}
folio_put(folio);
+ if (wbc->sync_mode == WB_SYNC_NONE) {
+ if (skips >= 5 || need_resched())
+ break;
+ skips++;
+ }
continue;
}
The driver_override field from platform driver should not be initialized
from static memory (string literal) because the core later kfree() it,
for example when driver_override is set via sysfs.
Use dedicated helper to set driver_override properly.
Fixes: 77d8f3068c63 ("clk: imx: scu: add two cells binding support")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
---
drivers/clk/imx/clk-scu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/imx/clk-scu.c b/drivers/clk/imx/clk-scu.c
index 083da31dc3ea..4b2268b7d0d0 100644
--- a/drivers/clk/imx/clk-scu.c
+++ b/drivers/clk/imx/clk-scu.c
@@ -683,7 +683,12 @@ struct clk_hw *imx_clk_scu_alloc_dev(const char *name,
return ERR_PTR(ret);
}
- pdev->driver_override = "imx-scu-clk";
+ ret = driver_set_override(&pdev->dev, &pdev->driver_override,
+ "imx-scu-clk", strlen("imx-scu-clk"));
+ if (ret) {
+ platform_device_put(pdev);
+ return ERR_PTR(ret);
+ }
ret = imx_clk_scu_attach_pd(&pdev->dev, rsrc_id);
if (ret)
--
2.32.0
Hi,
following two patches were backported "automatically" applied in
4.14.y, 4.19.y, 5.4.y, 5.10.y, 5.5.y and 5.16.y. But they failed
to apply cleanly in v4.9.y due to some changes in the patch context
and one missing function in the older batman-adv version.
These problems were now fixed manually.
Kind regards,
Sven
Sven Eckelmann (2):
batman-adv: Request iflink once in batadv-on-batadv check
batman-adv: Don't expect inter-netns unique iflink indices
net/batman-adv/hard-interface.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--
2.30.2
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Sat, 5 Mar 2022 18:07:14 +0100
Subject: [PATCH] swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 17706dc91ec9..1887d92e8e92 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
-
-DMA_ATTR_OVERWRITE
-------------------
-
-This is a hint to the DMA-mapping subsystem that the device is expected to
-overwrite the entire mapped size, thus the caller does not require any of the
-previous buffer contents to be preserved. This allows bounce-buffering
-implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6150d11a607e..dca2b1355bb1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,14 +61,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
-/*
- * This is a hint to the DMA-mapping subsystem that the device is expected
- * to overwrite the entire mapped size, thus the caller does not require any
- * of the previous buffer contents to be preserved. This allows
- * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
- */
-#define DMA_ATTR_OVERWRITE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bfc56cb21705..6db1c475ec82 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -627,10 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
- dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ /*
+ * When dir == DMA_FROM_DEVICE we could omit the copy from the orig
+ * to the tlb buffer, if we knew for sure the device will
+ * overwirte the entire current content. But we don't. Thus
+ * unconditional bounce may prevent leaking swiotlb content (i.e.
+ * kernel memory) to user-space.
+ */
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
@@ -697,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
- swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
- else
- BUG_ON(dir != DMA_FROM_DEVICE);
+ /*
+ * Unconditional bounce is necessary to avoid corruption on
+ * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite
+ * the whole lengt of the bounce buffer.
+ */
+ swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+ BUG_ON(!valid_dma_direction(dir));
}
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Sat, 5 Mar 2022 18:07:14 +0100
Subject: [PATCH] swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 17706dc91ec9..1887d92e8e92 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
-
-DMA_ATTR_OVERWRITE
-------------------
-
-This is a hint to the DMA-mapping subsystem that the device is expected to
-overwrite the entire mapped size, thus the caller does not require any of the
-previous buffer contents to be preserved. This allows bounce-buffering
-implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6150d11a607e..dca2b1355bb1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,14 +61,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
-/*
- * This is a hint to the DMA-mapping subsystem that the device is expected
- * to overwrite the entire mapped size, thus the caller does not require any
- * of the previous buffer contents to be preserved. This allows
- * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
- */
-#define DMA_ATTR_OVERWRITE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bfc56cb21705..6db1c475ec82 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -627,10 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
- dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ /*
+ * When dir == DMA_FROM_DEVICE we could omit the copy from the orig
+ * to the tlb buffer, if we knew for sure the device will
+ * overwirte the entire current content. But we don't. Thus
+ * unconditional bounce may prevent leaking swiotlb content (i.e.
+ * kernel memory) to user-space.
+ */
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
@@ -697,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
- swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
- else
- BUG_ON(dir != DMA_FROM_DEVICE);
+ /*
+ * Unconditional bounce is necessary to avoid corruption on
+ * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite
+ * the whole lengt of the bounce buffer.
+ */
+ swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+ BUG_ON(!valid_dma_direction(dir));
}
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Sat, 5 Mar 2022 18:07:14 +0100
Subject: [PATCH] swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 17706dc91ec9..1887d92e8e92 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
-
-DMA_ATTR_OVERWRITE
-------------------
-
-This is a hint to the DMA-mapping subsystem that the device is expected to
-overwrite the entire mapped size, thus the caller does not require any of the
-previous buffer contents to be preserved. This allows bounce-buffering
-implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6150d11a607e..dca2b1355bb1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,14 +61,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
-/*
- * This is a hint to the DMA-mapping subsystem that the device is expected
- * to overwrite the entire mapped size, thus the caller does not require any
- * of the previous buffer contents to be preserved. This allows
- * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
- */
-#define DMA_ATTR_OVERWRITE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bfc56cb21705..6db1c475ec82 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -627,10 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
- dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ /*
+ * When dir == DMA_FROM_DEVICE we could omit the copy from the orig
+ * to the tlb buffer, if we knew for sure the device will
+ * overwirte the entire current content. But we don't. Thus
+ * unconditional bounce may prevent leaking swiotlb content (i.e.
+ * kernel memory) to user-space.
+ */
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
@@ -697,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
- swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
- else
- BUG_ON(dir != DMA_FROM_DEVICE);
+ /*
+ * Unconditional bounce is necessary to avoid corruption on
+ * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite
+ * the whole lengt of the bounce buffer.
+ */
+ swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+ BUG_ON(!valid_dma_direction(dir));
}
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Sat, 5 Mar 2022 18:07:14 +0100
Subject: [PATCH] swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 17706dc91ec9..1887d92e8e92 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
-
-DMA_ATTR_OVERWRITE
-------------------
-
-This is a hint to the DMA-mapping subsystem that the device is expected to
-overwrite the entire mapped size, thus the caller does not require any of the
-previous buffer contents to be preserved. This allows bounce-buffering
-implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6150d11a607e..dca2b1355bb1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,14 +61,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
-/*
- * This is a hint to the DMA-mapping subsystem that the device is expected
- * to overwrite the entire mapped size, thus the caller does not require any
- * of the previous buffer contents to be preserved. This allows
- * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
- */
-#define DMA_ATTR_OVERWRITE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bfc56cb21705..6db1c475ec82 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -627,10 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
- dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ /*
+ * When dir == DMA_FROM_DEVICE we could omit the copy from the orig
+ * to the tlb buffer, if we knew for sure the device will
+ * overwirte the entire current content. But we don't. Thus
+ * unconditional bounce may prevent leaking swiotlb content (i.e.
+ * kernel memory) to user-space.
+ */
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
@@ -697,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
- swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
- else
- BUG_ON(dir != DMA_FROM_DEVICE);
+ /*
+ * Unconditional bounce is necessary to avoid corruption on
+ * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite
+ * the whole lengt of the bounce buffer.
+ */
+ swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+ BUG_ON(!valid_dma_direction(dir));
}
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Sat, 5 Mar 2022 18:07:14 +0100
Subject: [PATCH] swiotlb: rework "fix info leak with DMA_FROM_DEVICE"
Unfortunately, we ended up merging an old version of the patch "fix info
leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph
(the swiotlb maintainer), he asked me to create an incremental fix
(after I have pointed this out the mix up, and asked him for guidance).
So here we go.
The main differences between what we got and what was agreed are:
* swiotlb_sync_single_for_device is also required to do an extra bounce
* We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters
* The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE
must take precedence over DMA_ATTR_SKIP_CPU_SYNC
Thus this patch removes DMA_ATTR_OVERWRITE, and makes
swiotlb_sync_single_for_device() bounce unconditionally (that is, also
when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale
data from the swiotlb buffer.
Let me note, that if the size used with dma_sync_* API is less than the
size used with dma_[un]map_*, under certain circumstances we may still
end up with swiotlb not being transparent. In that sense, this is no
perfect fix either.
To get this bullet proof, we would have to bounce the entire
mapping/bounce buffer. For that we would have to figure out the starting
address, and the size of the mapping in
swiotlb_sync_single_for_device(). While this does seem possible, there
seems to be no firm consensus on how things are supposed to work.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 17706dc91ec9..1887d92e8e92 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,11 +130,3 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
-
-DMA_ATTR_OVERWRITE
-------------------
-
-This is a hint to the DMA-mapping subsystem that the device is expected to
-overwrite the entire mapped size, thus the caller does not require any of the
-previous buffer contents to be preserved. This allows bounce-buffering
-implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 6150d11a607e..dca2b1355bb1 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,14 +61,6 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
-/*
- * This is a hint to the DMA-mapping subsystem that the device is expected
- * to overwrite the entire mapped size, thus the caller does not require any
- * of the previous buffer contents to be preserved. This allows
- * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
- */
-#define DMA_ATTR_OVERWRITE (1UL << 10)
-
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index bfc56cb21705..6db1c475ec82 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -627,10 +627,14 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
- if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
- dir == DMA_BIDIRECTIONAL))
- swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+ /*
+ * When dir == DMA_FROM_DEVICE we could omit the copy from the orig
+ * to the tlb buffer, if we knew for sure the device will
+ * overwirte the entire current content. But we don't. Thus
+ * unconditional bounce may prevent leaking swiotlb content (i.e.
+ * kernel memory) to user-space.
+ */
+ swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
@@ -697,10 +701,13 @@ void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
{
- if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
- swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
- else
- BUG_ON(dir != DMA_FROM_DEVICE);
+ /*
+ * Unconditional bounce is necessary to avoid corruption on
+ * sync_*_for_cpu or dma_ummap_* when the device didn't overwrite
+ * the whole lengt of the bounce buffer.
+ */
+ swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+ BUG_ON(!valid_dma_direction(dir));
}
void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Fri, 11 Feb 2022 02:12:52 +0100
Subject: [PATCH] swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..17706dc91ec9 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,11 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_OVERWRITE
+------------------
+
+This is a hint to the DMA-mapping subsystem that the device is expected to
+overwrite the entire mapped size, thus the caller does not require any of the
+previous buffer contents to be preserved. This allows bounce-buffering
+implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..6150d11a607e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * This is a hint to the DMA-mapping subsystem that the device is expected
+ * to overwrite the entire mapped size, thus the caller does not require any
+ * of the previous buffer contents to be preserved. This allows
+ * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
+ */
+#define DMA_ATTR_OVERWRITE (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..bfc56cb21705 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -628,7 +628,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
+ dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Fri, 11 Feb 2022 02:12:52 +0100
Subject: [PATCH] swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..17706dc91ec9 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,11 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_OVERWRITE
+------------------
+
+This is a hint to the DMA-mapping subsystem that the device is expected to
+overwrite the entire mapped size, thus the caller does not require any of the
+previous buffer contents to be preserved. This allows bounce-buffering
+implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..6150d11a607e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * This is a hint to the DMA-mapping subsystem that the device is expected
+ * to overwrite the entire mapped size, thus the caller does not require any
+ * of the previous buffer contents to be preserved. This allows
+ * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
+ */
+#define DMA_ATTR_OVERWRITE (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..bfc56cb21705 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -628,7 +628,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
+ dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Fri, 11 Feb 2022 02:12:52 +0100
Subject: [PATCH] swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..17706dc91ec9 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,11 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_OVERWRITE
+------------------
+
+This is a hint to the DMA-mapping subsystem that the device is expected to
+overwrite the entire mapped size, thus the caller does not require any of the
+previous buffer contents to be preserved. This allows bounce-buffering
+implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..6150d11a607e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * This is a hint to the DMA-mapping subsystem that the device is expected
+ * to overwrite the entire mapped size, thus the caller does not require any
+ * of the previous buffer contents to be preserved. This allows
+ * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
+ */
+#define DMA_ATTR_OVERWRITE (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..bfc56cb21705 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -628,7 +628,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
+ dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Fri, 11 Feb 2022 02:12:52 +0100
Subject: [PATCH] swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..17706dc91ec9 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,11 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_OVERWRITE
+------------------
+
+This is a hint to the DMA-mapping subsystem that the device is expected to
+overwrite the entire mapped size, thus the caller does not require any of the
+previous buffer contents to be preserved. This allows bounce-buffering
+implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..6150d11a607e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * This is a hint to the DMA-mapping subsystem that the device is expected
+ * to overwrite the entire mapped size, thus the caller does not require any
+ * of the previous buffer contents to be preserved. This allows
+ * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
+ */
+#define DMA_ATTR_OVERWRITE (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..bfc56cb21705 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -628,7 +628,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
+ dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic(a)linux.ibm.com>
Date: Fri, 11 Feb 2022 02:12:52 +0100
Subject: [PATCH] swiotlb: fix info leak with DMA_FROM_DEVICE
The problem I'm addressing was discovered by the LTP test covering
cve-2018-1000204.
A short description of what happens follows:
1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
and a corresponding dxferp. The peculiar thing about this is that TUR
is not reading from the device.
2) In sg_start_req() the invocation of blk_rq_map_user() effectively
bounces the user-space buffer. As if the device was to transfer into
it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in
sg_build_indirect()") we make sure this first bounce buffer is
allocated with GFP_ZERO.
3) For the rest of the story we keep ignoring that we have a TUR, so the
device won't touch the buffer we prepare as if the we had a
DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
and the buffer allocated by SG is mapped by the function
virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
scatter-gather and not scsi generics). This mapping involves bouncing
via the swiotlb (we need swiotlb to do virtio in protected guest like
s390 Secure Execution, or AMD SEV).
4) When the SCSI TUR is done, we first copy back the content of the second
(that is swiotlb) bounce buffer (which most likely contains some
previous IO data), to the first bounce buffer, which contains all
zeros. Then we copy back the content of the first bounce buffer to
the user-space buffer.
5) The test case detects that the buffer, which it zero-initialized,
ain't all zeros and fails.
One can argue that this is an swiotlb problem, because without swiotlb
we leak all zeros, and the swiotlb should be transparent in a sense that
it does not affect the outcome (if all other participants are well
behaved).
Copying the content of the original buffer into the swiotlb buffer is
the only way I can think of to make swiotlb transparent in such
scenarios. So let's do just that if in doubt, but allow the driver
to tell us that the whole mapped buffer is going to be overwritten,
in which case we can preserve the old behavior and avoid the performance
impact of the extra bounce.
Signed-off-by: Halil Pasic <pasic(a)linux.ibm.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..17706dc91ec9 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,11 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_OVERWRITE
+------------------
+
+This is a hint to the DMA-mapping subsystem that the device is expected to
+overwrite the entire mapped size, thus the caller does not require any of the
+previous buffer contents to be preserved. This allows bounce-buffering
+implementations to optimise DMA_FROM_DEVICE transfers.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..6150d11a607e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -61,6 +61,14 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)
+/*
+ * This is a hint to the DMA-mapping subsystem that the device is expected
+ * to overwrite the entire mapped size, thus the caller does not require any
+ * of the previous buffer contents to be preserved. This allows
+ * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers.
+ */
+#define DMA_ATTR_OVERWRITE (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f1e7ea160b43..bfc56cb21705 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -628,7 +628,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
- (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
+ (!(attrs & DMA_ATTR_OVERWRITE) || dir == DMA_TO_DEVICE ||
+ dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
return tlb_addr;
}
The logic in commit 2a5f1b67ec57 "KVM: arm64: Don't access PMCR_EL0 when no
PMU is available" relies on an empty reset handler being benign. This was
not the case in earlier kernel versions, so the stable backport of this
patch is causing problems.
KVMs behaviour in this area changed over time. In particular, prior to commit
03fdfb269009 ("KVM: arm64: Don't write junk to sysregs on reset"), an empty
reset handler will trigger a warning, as the guest registers have been
poisoned.
Prior to commit 20589c8cc47d ("arm/arm64: KVM: Don't panic on failure to
properly reset system registers"), this warning was a panic().
Instead of reverting the backport, make it write 0 to the sys_reg[] array.
This keeps the reset logic happy, and the dodgy value can't be seen by
the guest as it can't request the emulation.
The original bug was accessing the PMCR_EL0 register on CPUs that don't
implement that feature. There is no known silicon that does this, but
v4.9's ACPI support is unable to find the PMU, so triggers this code:
| Kernel panic - not syncing: Didn't reset vcpu_sys_reg(24)
| CPU: 1 PID: 3055 Comm: lkvm Not tainted 4.9.302-00032-g64e078a56789 #13476
| Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jul 30 2018
| Call trace:
| [<ffff00000808b4b0>] dump_backtrace+0x0/0x1a0
| [<ffff00000808b664>] show_stack+0x14/0x20
| [<ffff0000088f0e18>] dump_stack+0x98/0xb8
| [<ffff0000088eef08>] panic+0x118/0x274
| [<ffff0000080b50e0>] access_actlr+0x0/0x20
| [<ffff0000080b2620>] kvm_reset_vcpu+0x5c/0xac
| [<ffff0000080ac688>] kvm_arch_vcpu_ioctl+0x3e4/0x490
| [<ffff0000080a382c>] kvm_vcpu_ioctl+0x5b8/0x720
| [<ffff000008201e44>] do_vfs_ioctl+0x2f4/0x884
| [<ffff00000820244c>] SyS_ioctl+0x78/0x9c
| [<ffff000008083a9c>] __sys_trace_return+0x0/0x4
Cc: <stable(a)vger.kernel.org> # < v5.3 with 2a5f1b67ec57 backported
Signed-off-by: James Morse <james.morse(a)arm.com>
---
arch/arm64/kvm/sys_regs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 10d80456f38f..8d548fdbb6b2 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -451,8 +451,10 @@ static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
u64 pmcr, val;
/* No PMU available, PMCR_EL0 may UNDEF... */
- if (!kvm_arm_support_pmu_v3())
+ if (!kvm_arm_support_pmu_v3()) {
+ vcpu_sys_reg(vcpu, PMCR_EL0) = 0;
return;
+ }
pmcr = read_sysreg(pmcr_el0);
/*
--
2.30.2
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0bf476fc3624e3a72af4ba7340d430a91c18cd67 Mon Sep 17 00:00:00 2001
From: Robert Hancock <robert.hancock(a)calian.com>
Date: Thu, 3 Mar 2022 12:10:27 -0600
Subject: [PATCH] net: macb: Fix lost RX packet wakeup race in NAPI receive
There is an oddity in the way the RSR register flags propagate to the
ISR register (and the actual interrupt output) on this hardware: it
appears that RSR register bits only result in ISR being asserted if the
interrupt was actually enabled at the time, so enabling interrupts with
RSR bits already set doesn't trigger an interrupt to be raised. There
was already a partial fix for this race in the macb_poll function where
it checked for RSR bits being set and re-triggered NAPI receive.
However, there was a still a race window between checking RSR and
actually enabling interrupts, where a lost wakeup could happen. It's
necessary to check again after enabling interrupts to see if RSR was set
just prior to the interrupt being enabled, and re-trigger receive in that
case.
This issue was noticed in a point-to-point UDP request-response protocol
which periodically saw timeouts or abnormally high response times due to
received packets not being processed in a timely fashion. In many
applications, more packets arriving, including TCP retransmissions, would
cause the original packet to be processed, thus masking the issue.
Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done")
Cc: stable(a)vger.kernel.org
Co-developed-by: Scott McNutt <scott.mcnutt(a)siriusxm.com>
Signed-off-by: Scott McNutt <scott.mcnutt(a)siriusxm.com>
Signed-off-by: Robert Hancock <robert.hancock(a)calian.com>
Tested-by: Claudiu Beznea <claudiu.beznea(a)microchip.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 98498a76ae16..d13f06cf0308 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1573,7 +1573,14 @@ static int macb_poll(struct napi_struct *napi, int budget)
if (work_done < budget) {
napi_complete_done(napi, work_done);
- /* Packets received while interrupts were disabled */
+ /* RSR bits only seem to propagate to raise interrupts when
+ * interrupts are enabled at the time, so if bits are already
+ * set due to packets received while interrupts were disabled,
+ * they will not cause another interrupt to be generated when
+ * interrupts are re-enabled.
+ * Check for this case here. This has been seen to happen
+ * around 30% of the time under heavy network load.
+ */
status = macb_readl(bp, RSR);
if (status) {
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
@@ -1581,6 +1588,22 @@ static int macb_poll(struct napi_struct *napi, int budget)
napi_reschedule(napi);
} else {
queue_writel(queue, IER, bp->rx_intr_mask);
+
+ /* In rare cases, packets could have been received in
+ * the window between the check above and re-enabling
+ * interrupts. Therefore, a double-check is required
+ * to avoid losing a wakeup. This can potentially race
+ * with the interrupt handler doing the same actions
+ * if an interrupt is raised just after enabling them,
+ * but this should be harmless.
+ */
+ status = macb_readl(bp, RSR);
+ if (unlikely(status)) {
+ queue_writel(queue, IDR, bp->rx_intr_mask);
+ if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
+ queue_writel(queue, ISR, MACB_BIT(RCOMP));
+ napi_schedule(napi);
+ }
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0bf476fc3624e3a72af4ba7340d430a91c18cd67 Mon Sep 17 00:00:00 2001
From: Robert Hancock <robert.hancock(a)calian.com>
Date: Thu, 3 Mar 2022 12:10:27 -0600
Subject: [PATCH] net: macb: Fix lost RX packet wakeup race in NAPI receive
There is an oddity in the way the RSR register flags propagate to the
ISR register (and the actual interrupt output) on this hardware: it
appears that RSR register bits only result in ISR being asserted if the
interrupt was actually enabled at the time, so enabling interrupts with
RSR bits already set doesn't trigger an interrupt to be raised. There
was already a partial fix for this race in the macb_poll function where
it checked for RSR bits being set and re-triggered NAPI receive.
However, there was a still a race window between checking RSR and
actually enabling interrupts, where a lost wakeup could happen. It's
necessary to check again after enabling interrupts to see if RSR was set
just prior to the interrupt being enabled, and re-trigger receive in that
case.
This issue was noticed in a point-to-point UDP request-response protocol
which periodically saw timeouts or abnormally high response times due to
received packets not being processed in a timely fashion. In many
applications, more packets arriving, including TCP retransmissions, would
cause the original packet to be processed, thus masking the issue.
Fixes: 02f7a34f34e3 ("net: macb: Re-enable RX interrupt only when RX is done")
Cc: stable(a)vger.kernel.org
Co-developed-by: Scott McNutt <scott.mcnutt(a)siriusxm.com>
Signed-off-by: Scott McNutt <scott.mcnutt(a)siriusxm.com>
Signed-off-by: Robert Hancock <robert.hancock(a)calian.com>
Tested-by: Claudiu Beznea <claudiu.beznea(a)microchip.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 98498a76ae16..d13f06cf0308 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1573,7 +1573,14 @@ static int macb_poll(struct napi_struct *napi, int budget)
if (work_done < budget) {
napi_complete_done(napi, work_done);
- /* Packets received while interrupts were disabled */
+ /* RSR bits only seem to propagate to raise interrupts when
+ * interrupts are enabled at the time, so if bits are already
+ * set due to packets received while interrupts were disabled,
+ * they will not cause another interrupt to be generated when
+ * interrupts are re-enabled.
+ * Check for this case here. This has been seen to happen
+ * around 30% of the time under heavy network load.
+ */
status = macb_readl(bp, RSR);
if (status) {
if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
@@ -1581,6 +1588,22 @@ static int macb_poll(struct napi_struct *napi, int budget)
napi_reschedule(napi);
} else {
queue_writel(queue, IER, bp->rx_intr_mask);
+
+ /* In rare cases, packets could have been received in
+ * the window between the check above and re-enabling
+ * interrupts. Therefore, a double-check is required
+ * to avoid losing a wakeup. This can potentially race
+ * with the interrupt handler doing the same actions
+ * if an interrupt is raised just after enabling them,
+ * but this should be harmless.
+ */
+ status = macb_readl(bp, RSR);
+ if (unlikely(status)) {
+ queue_writel(queue, IDR, bp->rx_intr_mask);
+ if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
+ queue_writel(queue, ISR, MACB_BIT(RCOMP));
+ napi_schedule(napi);
+ }
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Mon, 7 Mar 2022 16:30:44 +0100
Subject: [PATCH] fuse: fix pipe buffer lifetime for direct_io
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.
On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe
buffers.
Reported-by: Jann Horn <jannh(a)google.com>
Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cd54a529460d..592730fd6e42 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -941,7 +941,17 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
while (count) {
if (cs->write && cs->pipebufs && page) {
- return fuse_ref_page(cs, page, offset, count);
+ /*
+ * Can't control lifetime of pipe buffers, so always
+ * copy user pages.
+ */
+ if (cs->req->args->user_pages) {
+ err = fuse_copy_fill(cs);
+ if (err)
+ return err;
+ } else {
+ return fuse_ref_page(cs, page, offset, count);
+ }
} else if (!cs->len) {
if (cs->move_pages && page &&
offset == 0 && count == PAGE_SIZE) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 829094451774..0fc150c1c50b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1413,6 +1413,7 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
(PAGE_SIZE - ret) & (PAGE_SIZE - 1);
}
+ ap->args.user_pages = true;
if (write)
ap->args.in_pages = true;
else
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e59fbdefeb..eac4984cc753 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -256,6 +256,7 @@ struct fuse_args {
bool nocreds:1;
bool in_pages:1;
bool out_pages:1;
+ bool user_pages:1;
bool out_argvar:1;
bool page_zeroing:1;
bool page_replace:1;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Mon, 7 Mar 2022 16:30:44 +0100
Subject: [PATCH] fuse: fix pipe buffer lifetime for direct_io
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.
On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe
buffers.
Reported-by: Jann Horn <jannh(a)google.com>
Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cd54a529460d..592730fd6e42 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -941,7 +941,17 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
while (count) {
if (cs->write && cs->pipebufs && page) {
- return fuse_ref_page(cs, page, offset, count);
+ /*
+ * Can't control lifetime of pipe buffers, so always
+ * copy user pages.
+ */
+ if (cs->req->args->user_pages) {
+ err = fuse_copy_fill(cs);
+ if (err)
+ return err;
+ } else {
+ return fuse_ref_page(cs, page, offset, count);
+ }
} else if (!cs->len) {
if (cs->move_pages && page &&
offset == 0 && count == PAGE_SIZE) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 829094451774..0fc150c1c50b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1413,6 +1413,7 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
(PAGE_SIZE - ret) & (PAGE_SIZE - 1);
}
+ ap->args.user_pages = true;
if (write)
ap->args.in_pages = true;
else
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e59fbdefeb..eac4984cc753 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -256,6 +256,7 @@ struct fuse_args {
bool nocreds:1;
bool in_pages:1;
bool out_pages:1;
+ bool user_pages:1;
bool out_argvar:1;
bool page_zeroing:1;
bool page_replace:1;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0c4bcfdecb1ac0967619ee7ff44871d93c08c909 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Mon, 7 Mar 2022 16:30:44 +0100
Subject: [PATCH] fuse: fix pipe buffer lifetime for direct_io
In FOPEN_DIRECT_IO mode, fuse_file_write_iter() calls
fuse_direct_write_iter(), which normally calls fuse_direct_io(), which then
imports the write buffer with fuse_get_user_pages(), which uses
iov_iter_get_pages() to grab references to userspace pages instead of
actually copying memory.
On the filesystem device side, these pages can then either be read to
userspace (via fuse_dev_read()), or splice()d over into a pipe using
fuse_dev_splice_read() as pipe buffers with &nosteal_pipe_buf_ops.
This is wrong because after fuse_dev_do_read() unlocks the FUSE request,
the userspace filesystem can mark the request as completed, causing write()
to return. At that point, the userspace filesystem should no longer have
access to the pipe buffer.
Fix by copying pages coming from the user address space to new pipe
buffers.
Reported-by: Jann Horn <jannh(a)google.com>
Fixes: c3021629a0d8 ("fuse: support splice() reading from fuse device")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index cd54a529460d..592730fd6e42 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -941,7 +941,17 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep,
while (count) {
if (cs->write && cs->pipebufs && page) {
- return fuse_ref_page(cs, page, offset, count);
+ /*
+ * Can't control lifetime of pipe buffers, so always
+ * copy user pages.
+ */
+ if (cs->req->args->user_pages) {
+ err = fuse_copy_fill(cs);
+ if (err)
+ return err;
+ } else {
+ return fuse_ref_page(cs, page, offset, count);
+ }
} else if (!cs->len) {
if (cs->move_pages && page &&
offset == 0 && count == PAGE_SIZE) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 829094451774..0fc150c1c50b 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1413,6 +1413,7 @@ static int fuse_get_user_pages(struct fuse_args_pages *ap, struct iov_iter *ii,
(PAGE_SIZE - ret) & (PAGE_SIZE - 1);
}
+ ap->args.user_pages = true;
if (write)
ap->args.in_pages = true;
else
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e8e59fbdefeb..eac4984cc753 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -256,6 +256,7 @@ struct fuse_args {
bool nocreds:1;
bool in_pages:1;
bool out_pages:1;
+ bool user_pages:1;
bool out_argvar:1;
bool page_zeroing:1;
bool page_replace:1;
--
Hello Dear,
how are you today?hope you are fine
My name is Dr Ava Smith ,Am an English and French nationalities.
I will give you pictures and more details about me as soon as i hear from you
Thanks
Ava
Hi Greg,
As reported before, the mips failure for 5.15-stable will need two backports.
I can see one of them in the queue, but looks like you have missed:
b81e0c2372e6 ("block: drop unused includes in <linux/genhd.h>")
Can you please add it to the queue.
--
Regards
Sudip
The patch titled
Subject: mempolicy: mbind_range() set_policy() after vma_merge()
has been added to the -mm tree. Its filename is
mempolicy-mbind_range-set_policy-after-vma_merge.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mempolicy-mbind_range-set_policy-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mempolicy-mbind_range-set_policy-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: mempolicy: mbind_range() set_policy() after vma_merge()
v2.6.34 commit 9d8cebd4bcd7 ("mm: fix mbind vma merge problem") introduced
vma_merge() to mbind_range(); but unlike madvise, mlock and mprotect, it
put a "continue" to next vma where its precedents go to update flags on
current vma before advancing: that left vma with the wrong setting in the
infamous vma_merge() case 8.
v3.10 commit 1444f92c8498 ("mm: merging memory blocks resets mempolicy")
tried to fix that in vma_adjust(), without fully understanding the issue.
v3.11 commit 3964acd0dbec ("mm: mempolicy: fix mbind_range() &&
vma_adjust() interaction") reverted that, and went about the fix in the
right way, but chose to optimize out an unnecessary mpol_dup() with a
prior mpol_equal() test. But on tmpfs, that also pessimized out the vital
call to its ->set_policy(), leaving the new mbind unenforced.
The user visible effect was that the pages got allocated on the local
node (happened to be 0), after the mbind() caller had specifically
asked for them to be allocated on node 1. There was not any page
migration involved in the case reported: the pages simply got allocated
on the wrong node.
Just delete that optimization now (though it could be made conditional on
vma not having a set_policy). Also remove the "next" variable: it turned
out to be blameless, but also pointless.
Link: https://lkml.kernel.org/r/319e4db9-64ae-4bca-92f0-ade85d342ff@google.com
Fixes: 3964acd0dbec ("mm: mempolicy: fix mbind_range() && vma_adjust() interaction")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
--- a/mm/mempolicy.c~mempolicy-mbind_range-set_policy-after-vma_merge
+++ a/mm/mempolicy.c
@@ -786,7 +786,6 @@ static int vma_replace_policy(struct vm_
static int mbind_range(struct mm_struct *mm, unsigned long start,
unsigned long end, struct mempolicy *new_pol)
{
- struct vm_area_struct *next;
struct vm_area_struct *prev;
struct vm_area_struct *vma;
int err = 0;
@@ -801,8 +800,7 @@ static int mbind_range(struct mm_struct
if (start > vma->vm_start)
prev = vma;
- for (; vma && vma->vm_start < end; prev = vma, vma = next) {
- next = vma->vm_next;
+ for (; vma && vma->vm_start < end; prev = vma, vma = vma->vm_next) {
vmstart = max(start, vma->vm_start);
vmend = min(end, vma->vm_end);
@@ -817,10 +815,6 @@ static int mbind_range(struct mm_struct
anon_vma_name(vma));
if (prev) {
vma = prev;
- next = vma->vm_next;
- if (mpol_equal(vma_policy(vma), new_pol))
- continue;
- /* vma_merge() joined vma && vma->next, case 8 */
goto replace;
}
if (vma->vm_start != vmstart) {
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-fs-delete-pf_swapwrite.patch
mm-__isolate_lru_page_prepare-in-isolate_migratepages_block.patch
tmpfs-support-for-file-creation-time-fix.patch
shmem-mapping_set_exiting-to-help-mapped-resilience.patch
tmpfs-do-not-allocate-pages-on-read.patch
mm-_install_special_mapping-apply-vm_locked_clear_mask.patch
mempolicy-mbind_range-set_policy-after-vma_merge.patch
mm-thp-refix-__split_huge_pmd_locked-for-migration-pmd.patch
mm-thp-clearpagedoublemap-in-first-page_add_file_rmap.patch
mm-delete-__clearpagewaiters.patch
mm-filemap_unaccount_folio-large-skip-mapcount-fixup.patch
mm-thp-fix-nr_file_mapped-accounting-in-page__file_rmap.patch
mm-warn-on-deleting-redirtied-only-if-accounted.patch
mm-unmap_mapping_range_tree-with-i_mmap_rwsem-shared.patch
RAID arrays check/repair operations benefit a lot from merging requests.
If we only check the previous entry for merge attempt, many merge will be
missed. As a result, significant regression is observed for RAID check
and repair.
Fix this by checking more than just the previous entry when
plug->multiple_queues == true.
This improves the check/repair speed of a 20-HDD raid6 from 19 MB/s to
103 MB/s.
Fixes: d38a9c04c0d5 ("block: only check previous entry for plug merge attempt")
Cc: stable(a)vger.kernel.org # v5.16
Reported-by: Larkin Lowrey <llowrey(a)nuclearwinter.com>
Reported-by: Wilson Jonathan <i400sjon(a)gmail.com>
Reported-by: Roger Heflin <rogerheflin(a)gmail.com>
Signed-off-by: Song Liu <song(a)kernel.org>
---
block/blk-merge.c | 14 ++++++++------
1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 4de34a332c9f..57e2075fb2f4 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -1089,12 +1089,14 @@ bool blk_attempt_plug_merge(struct request_queue *q, struct bio *bio,
if (!plug || rq_list_empty(plug->mq_list))
return false;
- /* check the previously added entry for a quick merge attempt */
- rq = rq_list_peek(&plug->mq_list);
- if (rq->q == q) {
- if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
- BIO_MERGE_OK)
- return true;
+ rq_list_for_each(&plug->mq_list, rq) {
+ if (rq->q == q) {
+ if (blk_attempt_bio_merge(q, rq, bio, nr_segs, false) ==
+ BIO_MERGE_OK)
+ return true;
+ }
+ if (!plug->multiple_queues)
+ break;
}
return false;
}
--
2.30.2
The patch titled
Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
has been added to the -mm tree. Its filename is
mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-madvise-skip-unmapped-vma-hole…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-madvise-skip-unmapped-vma-hole…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Charan Teja Kalla <quic_charante(a)quicinc.com>
Subject: mm: madvise: skip unmapped vma holes passed to process_madvise
The process_madvise() system call is expected to skip holes in vma passed
through 'struct iovec' vector list. But do_madvise, which
process_madvise() calls for each vma, returns ENOMEM in case of unmapped
holes, despite the VMA is processed.
Thus process_madvise() should treat ENOMEM as expected and consider the
VMA passed to as processed and continue processing other vma's in the
vector list. Returning -ENOMEM to user, despite the VMA is processed,
will be unable to figure out where to start the next madvise.
Link: https://lkml.kernel.org/r/4f091776142f2ebf7b94018146de72318474e686.16470087…
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
--- a/mm/madvise.c~mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise
+++ a/mm/madvise.c
@@ -1428,9 +1428,16 @@ SYSCALL_DEFINE5(process_madvise, int, pi
while (iov_iter_count(&iter)) {
iovec = iov_iter_iovec(&iter);
+ /*
+ * do_madvise returns ENOMEM if unmapped holes are present
+ * in the passed VMA. process_madvise() is expected to skip
+ * unmapped holes passed to it in the 'struct iovec' list
+ * and not fail because of them. Thus treat -ENOMEM return
+ * from do_madvise as valid and continue processing.
+ */
ret = do_madvise(mm, (unsigned long)iovec.iov_base,
iovec.iov_len, behavior);
- if (ret < 0)
+ if (ret < 0 && ret != -ENOMEM)
break;
iov_iter_advance(&iter, iovec.iov_len);
}
_
Patches currently in -mm which might be from quic_charante(a)quicinc.com are
mm-vmscan-fix-documentation-for-page_check_references.patch
mm-madvise-return-correct-bytes-advised-with-process_madvise.patch
mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise.patch
The patch titled
Subject: mm: madvise: return correct bytes advised with process_madvise
has been added to the -mm tree. Its filename is
mm-madvise-return-correct-bytes-advised-with-process_madvise.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-madvise-return-correct-bytes-a…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-madvise-return-correct-bytes-a…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Charan Teja Kalla <quic_charante(a)quicinc.com>
Subject: mm: madvise: return correct bytes advised with process_madvise
Patch series "mm: madvise: return correct bytes processed with
process_madvise", v2. With the process_madvise(), always choose to return
non zero processed bytes over an error. This can help the user to know on
which VMA, passed in the 'struct iovec' vector list, is failed to advise
thus can take the decission of retrying/skipping on that VMA.
This patch (of 2):
The process_madvise() system call returns error even after processing some
VMA's passed in the 'struct iovec' vector list which leaves the user
confused to know where to restart the advise next. It is also against
this syscall man page[1] documentation where it mentions that "return
value may be less than the total number of requested bytes, if an error
occurred after some iovec elements were already processed.".
Consider a user passed 10 VMA's in the 'struct iovec' vector list of which
9 are processed but one. Then it just returns the error caused on that
failed VMA despite the first 9 VMA's processed, leaving the user confused
about on which VMA it is failed. Returning the number of bytes processed
here can help the user to know which VMA it is failed on and thus can
retry/skip the advise on that VMA.
[1]https://man7.org/linux/man-pages/man2/process_madvise.2.html.
Link: https://lkml.kernel.org/r/cover.1647008754.git.quic_charante@quicinc.com
Link: https://lkml.kernel.org/r/125b61a0edcee5c2db8658aed9d06a43a19ccafc.16470087…
Fixes: ecb8ac8b1f14("mm/madvise: introduce process_madvise() syscall: an external memory hinting API")
Signed-off-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/madvise.c~mm-madvise-return-correct-bytes-advised-with-process_madvise
+++ a/mm/madvise.c
@@ -1435,8 +1435,7 @@ SYSCALL_DEFINE5(process_madvise, int, pi
iov_iter_advance(&iter, iovec.iov_len);
}
- if (ret == 0)
- ret = total_len - iov_iter_count(&iter);
+ ret = (total_len - iov_iter_count(&iter)) ? : ret;
release_mm:
mmput(mm);
_
Patches currently in -mm which might be from quic_charante(a)quicinc.com are
mm-vmscan-fix-documentation-for-page_check_references.patch
mm-madvise-return-correct-bytes-advised-with-process_madvise.patch
mm-madvise-skip-unmapped-vma-holes-passed-to-process_madvise.patch
There is a limited amount of SGX memory (EPC) on each system. When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code. Instead of
swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code. There is a hierarchy like this:
EPC <-> shmem <-> disk
After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed. Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running. The
memory is recovered when the enclave is destroyed and the backing
storage freed.
Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC. In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.
Reported-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 1728ab54b4be ("x86/sgx: Add a page reclaimer")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v6:
* Re-applied on top of tip/x86/sgx and fixed the merge conflict, i.e.
sgx_encl_get_backing() instead of sgx_encl_lookup_backing().
v5:
* Encapsulated file offset calculation for PCMD struct.
* Replaced "magic number" PAGE_SIZE with sizeof(struct sgx_secs) to make
the offset calculation more self-documentative.
v4:
* Sanitized the offset calculations.
v3:
* Resend.
v2:
* Rewrite commit message as proposed by Dave.
* Truncate PCMD pages (Dave).
---
arch/x86/kernel/cpu/sgx/encl.c | 57 ++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 001808e3901c..6fa3d0a14b93 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -12,6 +12,30 @@
#include "encls.h"
#include "sgx.h"
+/*
+ * Calculate byte offset of a PCMD struct associated with an enclave page. PCMD's
+ * follow right after the EPC data in the backing storage. In addition to the
+ * visible enclave pages, there's one extra page slot for SECS, before PCMD
+ * structs.
+ */
+static inline pgoff_t sgx_encl_get_backing_page_pcmd_offset(struct sgx_encl *encl,
+ unsigned long page_index)
+{
+ pgoff_t epc_end_off = encl->size + sizeof(struct sgx_secs);
+
+ return epc_end_off + page_index * sizeof(struct sgx_pcmd);
+}
+
+/*
+ * Free a page from the backing storage in the given page index.
+ */
+static inline void sgx_encl_truncate_backing_page(struct sgx_encl *encl, unsigned long page_index)
+{
+ struct inode *inode = file_inode(encl->backing);
+
+ shmem_truncate_range(inode, PFN_PHYS(page_index), PFN_PHYS(page_index) + PAGE_SIZE - 1);
+}
+
/*
* ELDU: Load an EPC page as unblocked. For more info, see "OS Management of EPC
* Pages" in the SDM.
@@ -22,9 +46,11 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
{
unsigned long va_offset = encl_page->desc & SGX_ENCL_PAGE_VA_OFFSET_MASK;
struct sgx_encl *encl = encl_page->encl;
+ pgoff_t page_index, page_pcmd_off;
struct sgx_pageinfo pginfo;
struct sgx_backing b;
- pgoff_t page_index;
+ bool pcmd_page_empty;
+ u8 *pcmd_page;
int ret;
if (secs_page)
@@ -32,14 +58,16 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
else
page_index = PFN_DOWN(encl->size);
+ page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
+
ret = sgx_encl_get_backing(encl, page_index, &b);
if (ret)
return ret;
pginfo.addr = encl_page->desc & PAGE_MASK;
pginfo.contents = (unsigned long)kmap_atomic(b.contents);
- pginfo.metadata = (unsigned long)kmap_atomic(b.pcmd) +
- b.pcmd_offset;
+ pcmd_page = kmap_atomic(b.pcmd);
+ pginfo.metadata = (unsigned long)pcmd_page + b.pcmd_offset;
if (secs_page)
pginfo.secs = (u64)sgx_get_epc_virt_addr(secs_page);
@@ -55,11 +83,24 @@ static int __sgx_encl_eldu(struct sgx_encl_page *encl_page,
ret = -EFAULT;
}
- kunmap_atomic((void *)(unsigned long)(pginfo.metadata - b.pcmd_offset));
+ memset(pcmd_page + b.pcmd_offset, 0, sizeof(struct sgx_pcmd));
+
+ /*
+ * The area for the PCMD in the page was zeroed above. Check if the
+ * whole page is now empty meaning that all PCMD's have been zeroed:
+ */
+ pcmd_page_empty = !memchr_inv(pcmd_page, 0, PAGE_SIZE);
+
+ kunmap_atomic(pcmd_page);
kunmap_atomic((void *)(unsigned long)pginfo.contents);
sgx_encl_put_backing(&b, false);
+ sgx_encl_truncate_backing_page(encl, page_index);
+
+ if (pcmd_page_empty)
+ sgx_encl_truncate_backing_page(encl, PFN_DOWN(page_pcmd_off));
+
return ret;
}
@@ -577,7 +618,7 @@ static struct page *sgx_encl_get_backing_page(struct sgx_encl *encl,
int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
struct sgx_backing *backing)
{
- pgoff_t pcmd_index = PFN_DOWN(encl->size) + 1 + (page_index >> 5);
+ pgoff_t page_pcmd_off = sgx_encl_get_backing_page_pcmd_offset(encl, page_index);
struct page *contents;
struct page *pcmd;
@@ -585,7 +626,7 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
if (IS_ERR(contents))
return PTR_ERR(contents);
- pcmd = sgx_encl_get_backing_page(encl, pcmd_index);
+ pcmd = sgx_encl_get_backing_page(encl, PFN_DOWN(page_pcmd_off));
if (IS_ERR(pcmd)) {
put_page(contents);
return PTR_ERR(pcmd);
@@ -594,9 +635,7 @@ int sgx_encl_get_backing(struct sgx_encl *encl, unsigned long page_index,
backing->page_index = page_index;
backing->contents = contents;
backing->pcmd = pcmd;
- backing->pcmd_offset =
- (page_index & (PAGE_SIZE / sizeof(struct sgx_pcmd) - 1)) *
- sizeof(struct sgx_pcmd);
+ backing->pcmd_offset = page_pcmd_off & (PAGE_SIZE - 1);
return 0;
}
--
2.35.1
ATTENTION PLEASE,
I am Mrs Aminata Zongo, a personal Accountant/Executive board of
Directors working with United bank for African Burkina Faso (UBA). I
have an interesting business proposal for you that will be of immense
benefit to both of us. Although this may be hard for you to believe,
we stand to gain a huge amount between us in a matter of days. Please
grant me the benefit of doubt and hear me out. I need you to signify
your interest by replying to my mail.
Honestly, i have business transaction worth the sum of
(US$8,200,000.00) Eight Million two hundred thousand united state
dollars to transfer to you through proper documentation in position of
your own Account.
Most importantly, I will need you to promise to keep whatever you
learn from me between us even if you decide not to go along with me. I
will make more details available to you on receipt of a positive
response from you.
This transaction is risk-free; please urgently confirm your
willingness and interest to assist in this deal, I am in good faith
and with trust waiting for your Urgent respond and maximum cooperation
for more details.
Best Regards,
Mrs Aminata Zongo.
This is the start of the stable review cycle for the 4.9.306 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 12 Mar 2022 14:07:58 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.306-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.306-rc2
Juergen Gross <jgross(a)suse.com>
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross <jgross(a)suse.com>
xen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross <jgross(a)suse.com>
xen: remove gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix build warning in proc-v7-bugs.c
WANG Chao <chao.wang(a)ucloud.cn>
x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
Masahiro Yamada <yamada.masahiro(a)socionext.com>
x86/build: Fix compiler support check for CONFIG_RETPOLINE
Nathan Chancellor <nathan(a)kernel.org>
ARM: Do not use NOCROSSREFS directive with ld.lld
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix co-processor register typo
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
Josh Poimboeuf <jpoimboe(a)redhat.com>
Documentation: Add swapgs description to the Spectre v1 documentation
Tim Chen <tim.c.chen(a)linux.intel.com>
Documentation: Add section about CPU vulnerabilities for Spectre
Zhenzhong Duan <zhenzhong.duan(a)oracle.com>
x86/retpoline: Remove minimal retpoline support
Zhenzhong Duan <zhenzhong.duan(a)oracle.com>
x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
Zhenzhong Duan <zhenzhong.duan(a)oracle.com>
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
-------------
Diffstat:
Documentation/hw-vuln/index.rst | 1 +
Documentation/hw-vuln/spectre.rst | 785 +++++++++++++++++++++++++++++++
Documentation/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 ++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 +++-
arch/arm/kernel/entry-common.S | 24 +
arch/arm/kernel/spectre.c | 71 +++
arch/arm/kernel/traps.c | 65 ++-
arch/arm/kernel/vmlinux-xip.lds.S | 45 +-
arch/arm/kernel/vmlinux.lds.S | 45 +-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 199 ++++++--
arch/x86/Kconfig | 4 -
arch/x86/Makefile | 11 +-
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 41 +-
arch/x86/kernel/cpu/bugs.c | 225 ++++++---
drivers/block/xen-blkfront.c | 67 +--
drivers/firmware/psci.c | 15 +
drivers/net/xen-netfront.c | 54 ++-
drivers/scsi/xen-scsifront.c | 3 +-
drivers/xen/gntalloc.c | 25 +-
drivers/xen/grant-table.c | 59 ++-
drivers/xen/xenbus/xenbus_client.c | 24 +-
include/linux/arm-smccc.h | 74 +++
include/linux/bpf.h | 11 +
include/linux/compiler-gcc.h | 2 +-
include/linux/module.h | 2 +-
include/xen/grant_table.h | 19 +-
kernel/sysctl.c | 8 +
scripts/mod/modpost.c | 2 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
35 files changed, 1763 insertions(+), 268 deletions(-)
This is the start of the stable review cycle for the 4.14.271 release.
There are 31 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 12 Mar 2022 14:07:58 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.271-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.271-rc2
Juergen Gross <jgross(a)suse.com>
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross <jgross(a)suse.com>
xen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross <jgross(a)suse.com>
xen/9p: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen: remove gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix build warning in proc-v7-bugs.c
Nathan Chancellor <nathan(a)kernel.org>
ARM: Do not use NOCROSSREFS directive with ld.lld
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix co-processor register typo
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/spectre.h | 32 ++++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++++++-
arch/arm/kernel/entry-common.S | 24 +++
arch/arm/kernel/spectre.c | 71 ++++++++
arch/arm/kernel/traps.c | 65 ++++++-
arch/arm/kernel/vmlinux-xip.lds.S | 45 +++--
arch/arm/kernel/vmlinux.lds.S | 45 +++--
arch/arm/mm/Kconfig | 11 ++
arch/arm/mm/proc-v7-bugs.c | 199 +++++++++++++++++++---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 214 +++++++++++++++++-------
drivers/block/xen-blkfront.c | 67 ++++----
drivers/firmware/psci.c | 15 ++
drivers/net/xen-netfront.c | 54 +++---
drivers/scsi/xen-scsifront.c | 3 +-
drivers/xen/gntalloc.c | 25 +--
drivers/xen/grant-table.c | 59 ++++---
drivers/xen/xenbus/xenbus_client.c | 24 ++-
include/linux/arm-smccc.h | 74 ++++++++
include/linux/bpf.h | 11 ++
include/xen/grant_table.h | 19 ++-
kernel/sysctl.c | 8 +
net/9p/trans_xen.c | 14 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
30 files changed, 986 insertions(+), 264 deletions(-)
This is the start of the stable review cycle for the 4.19.234 release.
There are 33 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 12 Mar 2022 14:07:58 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.234-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.234-rc2
Juergen Gross <jgross(a)suse.com>
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross <jgross(a)suse.com>
xen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross <jgross(a)suse.com>
xen/pvcalls: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen/9p: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen: remove gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix build warning in proc-v7-bugs.c
Nathan Chancellor <nathan(a)kernel.org>
ARM: Do not use NOCROSSREFS directive with ld.lld
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix co-processor register typo
Sami Tolvanen <samitolvanen(a)google.com>
kbuild: add CONFIG_LD_IS_LLD
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/spectre.h | 32 ++++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++++++-
arch/arm/kernel/entry-common.S | 24 +++
arch/arm/kernel/spectre.c | 71 ++++++++
arch/arm/kernel/traps.c | 65 ++++++-
arch/arm/kernel/vmlinux.lds.h | 43 ++++-
arch/arm/mm/Kconfig | 11 ++
arch/arm/mm/proc-v7-bugs.c | 201 +++++++++++++++++++---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 214 +++++++++++++++++-------
drivers/block/xen-blkfront.c | 63 ++++---
drivers/firmware/psci.c | 15 ++
drivers/net/xen-netfront.c | 54 +++---
drivers/scsi/xen-scsifront.c | 3 +-
drivers/xen/gntalloc.c | 25 +--
drivers/xen/grant-table.c | 71 ++++----
drivers/xen/pvcalls-front.c | 8 +-
drivers/xen/xenbus/xenbus_client.c | 24 ++-
include/linux/arm-smccc.h | 74 ++++++++
include/linux/bpf.h | 11 ++
include/xen/grant_table.h | 19 ++-
init/Kconfig | 3 +
kernel/sysctl.c | 8 +
net/9p/trans_xen.c | 14 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
31 files changed, 963 insertions(+), 261 deletions(-)
This is the start of the stable review cycle for the 5.4.184 release.
There are 33 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 12 Mar 2022 14:07:58 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.184-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.184-rc2
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"
Juergen Gross <jgross(a)suse.com>
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross <jgross(a)suse.com>
xen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross <jgross(a)suse.com>
xen/pvcalls: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen/9p: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen: remove gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix build warning in proc-v7-bugs.c
Nathan Chancellor <nathan(a)kernel.org>
ARM: Do not use NOCROSSREFS directive with ld.lld
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix co-processor register typo
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/spectre.h | 32 ++++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++++++-
arch/arm/kernel/entry-common.S | 24 +++
arch/arm/kernel/spectre.c | 71 ++++++++
arch/arm/kernel/traps.c | 65 ++++++-
arch/arm/kernel/vmlinux.lds.h | 43 ++++-
arch/arm/mm/Kconfig | 11 ++
arch/arm/mm/proc-v7-bugs.c | 200 +++++++++++++++++++---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 216 +++++++++++++++++-------
drivers/acpi/ec.c | 10 --
drivers/acpi/sleep.c | 14 +-
drivers/block/xen-blkfront.c | 63 ++++---
drivers/firmware/psci/psci.c | 15 ++
drivers/net/xen-netfront.c | 54 +++---
drivers/scsi/xen-scsifront.c | 3 +-
drivers/xen/gntalloc.c | 25 +--
drivers/xen/grant-table.c | 71 ++++----
drivers/xen/pvcalls-front.c | 8 +-
drivers/xen/xenbus/xenbus_client.c | 24 ++-
include/linux/arm-smccc.h | 74 ++++++++
include/linux/bpf.h | 12 ++
include/xen/grant_table.h | 19 ++-
kernel/sysctl.c | 8 +
net/9p/trans_xen.c | 14 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
32 files changed, 970 insertions(+), 277 deletions(-)
Hello
Greetings to you my dear how are you doing today? May this mail find
you well. My name is Lady. Rachael Cole and am from France but i
working with the Bank BTCI in Lome-Togo as an Accountant, I have
Business Proposal for you from my Bank. And its about the fund in my
deceased clients Account that has no indication of beneficiary or next
of Kin. Now my Bank has mandated me to provide the Beneficiary or the
next of Kin to the fund or it will be confiscated, so i seek your
consent to present you to my Bank as the Beneficiary to the fund so
that they will release the fund to you for our mutual benefit, so do
respond for further information if only you are interested.
Lady. Rachael Cole F.
Note, I'm sending all the patches again for all of the -rc2 releases as
there has been a lot of churn from what was in -rc1 to -rc2.
This is the start of the stable review cycle for the 5.16.14 release.
There are 53 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 12 Mar 2022 14:07:58 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.14-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.16.14-rc2
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"
Juergen Gross <jgross(a)suse.com>
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross <jgross(a)suse.com>
xen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross <jgross(a)suse.com>
xen/pvcalls: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen/9p: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen: remove gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix build warning in proc-v7-bugs.c
Nathan Chancellor <nathan(a)kernel.org>
arm64: Do not include __READ_ONCE() block in assembly files
Nathan Chancellor <nathan(a)kernel.org>
ARM: Do not use NOCROSSREFS directive with ld.lld
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix co-processor register typo
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
James Morse <james.morse(a)arm.com>
arm64: Use the clearbhb instruction in mitigations
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
James Morse <james.morse(a)arm.com>
arm64: Mitigate spectre style branch history side channels
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
James Morse <james.morse(a)arm.com>
arm64: Add percpu vectors for EL1
James Morse <james.morse(a)arm.com>
arm64: entry: Add macro for reading symbol addresses from the trampoline
James Morse <james.morse(a)arm.com>
arm64: entry: Add vectors that have the bhb mitigation sequences
James Morse <james.morse(a)arm.com>
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
James Morse <james.morse(a)arm.com>
arm64: entry: Allow the trampoline text to occupy multiple pages
James Morse <james.morse(a)arm.com>
arm64: entry: Make the kpti trampoline's kpti sequence optional
James Morse <james.morse(a)arm.com>
arm64: entry: Move trampoline macros out of ifdef'd section
James Morse <james.morse(a)arm.com>
arm64: entry: Don't assume tramp_vectors is the start of the vectors
James Morse <james.morse(a)arm.com>
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
James Morse <james.morse(a)arm.com>
arm64: entry: Move the trampoline data page before the text page
James Morse <james.morse(a)arm.com>
arm64: entry: Free up another register on kpti's tramp_exit path
James Morse <james.morse(a)arm.com>
arm64: entry: Make the trampoline cleanup optional
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
James Morse <james.morse(a)arm.com>
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
James Morse <james.morse(a)arm.com>
arm64: entry.S: Add ventry overflow sanity checks
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly <joey.gouly(a)arm.com>
arm64: add ID_AA64ISAR2_EL1 sys register
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 50 +--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Documentation/arm64/cpu-feature-registers.rst | 17 ++
Documentation/arm64/elf_hwcaps.rst | 8 +
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 ++
arch/arm/include/asm/vmlinux.lds.h | 43 ++-
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++-
arch/arm/kernel/entry-common.S | 24 ++
arch/arm/kernel/spectre.c | 71 +++++
arch/arm/kernel/traps.c | 65 +++-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 208 ++++++++++---
arch/arm64/Kconfig | 9 +
arch/arm64/include/asm/assembler.h | 53 ++++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpufeature.h | 29 ++
arch/arm64/include/asm/cputype.h | 8 +
arch/arm64/include/asm/fixmap.h | 6 +-
arch/arm64/include/asm/hwcap.h | 2 +
arch/arm64/include/asm/insn.h | 1 +
arch/arm64/include/asm/kvm_host.h | 5 +
arch/arm64/include/asm/rwonce.h | 4 +-
arch/arm64/include/asm/sections.h | 5 +
arch/arm64/include/asm/spectre.h | 4 +
arch/arm64/include/asm/sysreg.h | 18 ++
arch/arm64/include/asm/vectors.h | 73 +++++
arch/arm64/include/uapi/asm/hwcap.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 5 +
arch/arm64/kernel/cpu_errata.c | 7 +
arch/arm64/kernel/cpufeature.c | 25 ++
arch/arm64/kernel/cpuinfo.c | 3 +
arch/arm64/kernel/entry.S | 214 +++++++++----
arch/arm64/kernel/image-vars.h | 4 +
arch/arm64/kernel/proton-pack.c | 391 +++++++++++++++++++++++-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/kvm/arm.c | 5 +-
arch/arm64/kvm/hyp/hyp-entry.S | 9 +
arch/arm64/kvm/hyp/nvhe/mm.c | 4 +-
arch/arm64/kvm/hyp/vhe/switch.c | 9 +-
arch/arm64/kvm/hypercalls.c | 12 +
arch/arm64/kvm/psci.c | 18 +-
arch/arm64/kvm/sys_regs.c | 2 +-
arch/arm64/mm/mmu.c | 12 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/alternative.c | 8 +-
arch/x86/kernel/cpu/bugs.c | 204 ++++++++++---
arch/x86/lib/retpoline.S | 2 +-
arch/x86/net/bpf_jit_comp.c | 2 +-
drivers/acpi/ec.c | 10 -
drivers/acpi/sleep.c | 14 +-
drivers/block/xen-blkfront.c | 63 ++--
drivers/net/xen-netfront.c | 54 ++--
drivers/scsi/xen-scsifront.c | 3 +-
drivers/xen/gntalloc.c | 25 +-
drivers/xen/grant-table.c | 71 +++--
drivers/xen/pvcalls-front.c | 8 +-
drivers/xen/xenbus/xenbus_client.c | 24 +-
include/linux/arm-smccc.h | 5 +
include/linux/bpf.h | 12 +
include/xen/grant_table.h | 19 +-
kernel/sysctl.c | 7 +
net/9p/trans_xen.c | 14 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
68 files changed, 1782 insertions(+), 358 deletions(-)
This is the start of the stable review cycle for the 5.15.28 release.
There are 58 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 12 Mar 2022 14:07:58 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.28-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.28-rc2
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "ACPI: PM: s2idle: Cancel wakeup before dispatching EC GPE"
Juergen Gross <jgross(a)suse.com>
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Juergen Gross <jgross(a)suse.com>
xen/gnttab: fix gnttab_end_foreign_access() without page specified
Juergen Gross <jgross(a)suse.com>
xen/pvcalls: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen/9p: use alloc/free_pages_exact()
Juergen Gross <jgross(a)suse.com>
xen: remove gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/gntalloc: don't use gnttab_query_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
Juergen Gross <jgross(a)suse.com>
xen/grant-table: add gnttab_try_end_foreign_access()
Juergen Gross <jgross(a)suse.com>
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix build warning in proc-v7-bugs.c
Nathan Chancellor <nathan(a)kernel.org>
arm64: Do not include __READ_ONCE() block in assembly files
Nathan Chancellor <nathan(a)kernel.org>
ARM: Do not use NOCROSSREFS directive with ld.lld
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: fix co-processor register typo
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
James Morse <james.morse(a)arm.com>
arm64: Use the clearbhb instruction in mitigations
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
James Morse <james.morse(a)arm.com>
arm64: Mitigate spectre style branch history side channels
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
James Morse <james.morse(a)arm.com>
arm64: Add percpu vectors for EL1
James Morse <james.morse(a)arm.com>
arm64: entry: Add macro for reading symbol addresses from the trampoline
James Morse <james.morse(a)arm.com>
arm64: entry: Add vectors that have the bhb mitigation sequences
James Morse <james.morse(a)arm.com>
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
James Morse <james.morse(a)arm.com>
arm64: entry: Allow the trampoline text to occupy multiple pages
James Morse <james.morse(a)arm.com>
arm64: entry: Make the kpti trampoline's kpti sequence optional
James Morse <james.morse(a)arm.com>
arm64: entry: Move trampoline macros out of ifdef'd section
James Morse <james.morse(a)arm.com>
arm64: entry: Don't assume tramp_vectors is the start of the vectors
James Morse <james.morse(a)arm.com>
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
James Morse <james.morse(a)arm.com>
arm64: entry: Move the trampoline data page before the text page
James Morse <james.morse(a)arm.com>
arm64: entry: Free up another register on kpti's tramp_exit path
James Morse <james.morse(a)arm.com>
arm64: entry: Make the trampoline cleanup optional
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
James Morse <james.morse(a)arm.com>
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
James Morse <james.morse(a)arm.com>
arm64: entry.S: Add ventry overflow sanity checks
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly <joey.gouly(a)arm.com>
arm64: add ID_AA64ISAR2_EL1 sys register
Anshuman Khandual <anshuman.khandual(a)arm.com>
arm64: Add Cortex-X2 CPU part definition
Marc Zyngier <maz(a)kernel.org>
arm64: Add HWCAP for self-synchronising virtual counter
Suzuki K Poulose <suzuki.poulose(a)arm.com>
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Huang Pei <huangpei(a)loongson.cn>
slip: fix macro redefine warning
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++-
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Documentation/arm64/cpu-feature-registers.rst | 29 +-
Documentation/arm64/elf_hwcaps.rst | 12 +
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 ++
arch/arm/include/asm/vmlinux.lds.h | 43 ++-
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++-
arch/arm/kernel/entry-common.S | 24 ++
arch/arm/kernel/spectre.c | 71 +++++
arch/arm/kernel/traps.c | 65 +++-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 208 ++++++++++---
arch/arm64/Kconfig | 9 +
arch/arm64/include/asm/assembler.h | 53 ++++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpufeature.h | 29 ++
arch/arm64/include/asm/cputype.h | 14 +
arch/arm64/include/asm/fixmap.h | 6 +-
arch/arm64/include/asm/hwcap.h | 3 +
arch/arm64/include/asm/insn.h | 1 +
arch/arm64/include/asm/kvm_host.h | 5 +
arch/arm64/include/asm/rwonce.h | 4 +-
arch/arm64/include/asm/sections.h | 5 +
arch/arm64/include/asm/spectre.h | 4 +
arch/arm64/include/asm/sysreg.h | 18 ++
arch/arm64/include/asm/vectors.h | 73 +++++
arch/arm64/include/uapi/asm/hwcap.h | 3 +
arch/arm64/include/uapi/asm/kvm.h | 5 +
arch/arm64/kernel/cpu_errata.c | 7 +
arch/arm64/kernel/cpufeature.c | 28 +-
arch/arm64/kernel/cpuinfo.c | 4 +
arch/arm64/kernel/entry.S | 214 +++++++++----
arch/arm64/kernel/image-vars.h | 4 +
arch/arm64/kernel/proton-pack.c | 391 +++++++++++++++++++++++-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/kvm/arm.c | 5 +-
arch/arm64/kvm/hyp/hyp-entry.S | 9 +
arch/arm64/kvm/hyp/nvhe/mm.c | 4 +-
arch/arm64/kvm/hyp/vhe/switch.c | 9 +-
arch/arm64/kvm/hypercalls.c | 12 +
arch/arm64/kvm/psci.c | 18 +-
arch/arm64/kvm/sys_regs.c | 2 +-
arch/arm64/mm/mmu.c | 12 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 205 +++++++++----
arch/x86/lib/retpoline.S | 2 +-
drivers/acpi/ec.c | 10 -
drivers/acpi/sleep.c | 14 +-
drivers/block/xen-blkfront.c | 63 ++--
drivers/net/slip/slip.h | 2 +
drivers/net/xen-netfront.c | 54 ++--
drivers/scsi/xen-scsifront.c | 3 +-
drivers/xen/gntalloc.c | 25 +-
drivers/xen/grant-table.c | 71 +++--
drivers/xen/pvcalls-front.c | 8 +-
drivers/xen/xenbus/xenbus_client.c | 24 +-
include/linux/arm-smccc.h | 5 +
include/linux/bpf.h | 12 +
include/xen/grant_table.h | 19 +-
kernel/sysctl.c | 7 +
net/9p/trans_xen.c | 14 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
67 files changed, 1800 insertions(+), 359 deletions(-)
I'm announcing the release of the 4.9.306 kernel.
All users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/hw-vuln/index.rst | 1
Documentation/hw-vuln/spectre.rst | 785 +++++++++++++++++++++++++++++++
Documentation/kernel-parameters.txt | 8
Makefile | 2
arch/arm/include/asm/assembler.h | 10
arch/arm/include/asm/spectre.h | 32 +
arch/arm/kernel/Makefile | 2
arch/arm/kernel/entry-armv.S | 79 ++-
arch/arm/kernel/entry-common.S | 24
arch/arm/kernel/spectre.c | 71 ++
arch/arm/kernel/traps.c | 65 ++
arch/arm/kernel/vmlinux-xip.lds.S | 45 +
arch/arm/kernel/vmlinux.lds.S | 45 +
arch/arm/mm/Kconfig | 11
arch/arm/mm/proc-v7-bugs.c | 199 ++++++-
arch/x86/Kconfig | 4
arch/x86/Makefile | 11
arch/x86/include/asm/cpufeatures.h | 2
arch/x86/include/asm/nospec-branch.h | 41 +
arch/x86/kernel/cpu/bugs.c | 225 ++++++--
drivers/block/xen-blkfront.c | 67 +-
drivers/firmware/psci.c | 15
drivers/net/xen-netfront.c | 54 +-
drivers/scsi/xen-scsifront.c | 3
drivers/xen/gntalloc.c | 25
drivers/xen/grant-table.c | 59 +-
drivers/xen/xenbus/xenbus_client.c | 24
include/linux/arm-smccc.h | 74 ++
include/linux/bpf.h | 11
include/linux/compiler-gcc.h | 2
include/linux/module.h | 2
include/xen/grant_table.h | 19
kernel/sysctl.c | 8
scripts/mod/modpost.c | 2
tools/arch/x86/include/asm/cpufeatures.h | 2
35 files changed, 1762 insertions(+), 267 deletions(-)
Borislav Petkov (1):
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
Emmanuel Gil Peyrot (1):
ARM: fix build error when BPF_SYSCALL is disabled
Greg Kroah-Hartman (1):
Linux 4.9.306
Josh Poimboeuf (4):
Documentation: Add swapgs description to the Spectre v1 documentation
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
x86/speculation: Warn about Spectre v2 LFENCE mitigation
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Juergen Gross (9):
xen/xenbus: don't let xenbus_grant_ring() remove grants in error case
xen/grant-table: add gnttab_try_end_foreign_access()
xen/blkfront: don't use gnttab_query_foreign_access() for mapped status
xen/netfront: don't use gnttab_query_foreign_access() for mapped status
xen/scsifront: don't use gnttab_query_foreign_access() for mapped status
xen/gntalloc: don't use gnttab_query_foreign_access()
xen: remove gnttab_query_foreign_access()
xen/gnttab: fix gnttab_end_foreign_access() without page specified
xen/netfront: react properly to failing gnttab_end_foreign_access_ref()
Kim Phillips (2):
x86/speculation: Use generic retpoline by default on AMD
x86/speculation: Update link to AMD speculation whitepaper
Lukas Bulwahn (1):
Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
Mark Rutland (1):
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Masahiro Yamada (1):
x86/build: Fix compiler support check for CONFIG_RETPOLINE
Nathan Chancellor (1):
ARM: Do not use NOCROSSREFS directive with ld.lld
Peter Zijlstra (3):
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
x86/speculation: Add eIBRS + Retpoline options
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra (Intel) (1):
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Russell King (Oracle) (7):
ARM: report Spectre v2 status through sysfs
ARM: early traps initialisation
ARM: use LOADADDR() to get load address of sections
ARM: Spectre-BHB workaround
ARM: include unprivileged BPF status in Spectre V2 reporting
ARM: fix co-processor register typo
ARM: fix build warning in proc-v7-bugs.c
Steven Price (1):
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Tim Chen (1):
Documentation: Add section about CPU vulnerabilities for Spectre
WANG Chao (1):
x86, modpost: Replace last remnants of RETPOLINE with CONFIG_RETPOLINE
Zhenzhong Duan (3):
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
x86/retpoline: Remove minimal retpoline support
Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 56348560d495d2501e87db559a61de717cd3ab02 ("debugfs: do not attempt to create a new file before the filesystem is initalized")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
in testcase: boot
on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <oliver.sang(a)intel.com>
[ 22.678378][ T1] UBI error: cannot create "ubi" debugfs directory, error -2
[ 22.679890][ T1] UBI error: cannot initialize UBI, error -2
To reproduce:
# build kernel
cd linux
cp config-5.11.0-rc5-00034-g56348560d495 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-9 CC=gcc-9 ARCH=i386 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation
Thanks,
Oliver Sang
This is the start of the stable review cycle for the 5.16.14 release.
There are 37 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.16.14-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.16.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.16.14-rc1
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
James Morse <james.morse(a)arm.com>
arm64: Use the clearbhb instruction in mitigations
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
James Morse <james.morse(a)arm.com>
arm64: Mitigate spectre style branch history side channels
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
James Morse <james.morse(a)arm.com>
arm64: Add percpu vectors for EL1
James Morse <james.morse(a)arm.com>
arm64: entry: Add macro for reading symbol addresses from the trampoline
James Morse <james.morse(a)arm.com>
arm64: entry: Add vectors that have the bhb mitigation sequences
James Morse <james.morse(a)arm.com>
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
James Morse <james.morse(a)arm.com>
arm64: entry: Allow the trampoline text to occupy multiple pages
James Morse <james.morse(a)arm.com>
arm64: entry: Make the kpti trampoline's kpti sequence optional
James Morse <james.morse(a)arm.com>
arm64: entry: Move trampoline macros out of ifdef'd section
James Morse <james.morse(a)arm.com>
arm64: entry: Don't assume tramp_vectors is the start of the vectors
James Morse <james.morse(a)arm.com>
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
James Morse <james.morse(a)arm.com>
arm64: entry: Move the trampoline data page before the text page
James Morse <james.morse(a)arm.com>
arm64: entry: Free up another register on kpti's tramp_exit path
James Morse <james.morse(a)arm.com>
arm64: entry: Make the trampoline cleanup optional
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
James Morse <james.morse(a)arm.com>
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
James Morse <james.morse(a)arm.com>
arm64: entry.S: Add ventry overflow sanity checks
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly <joey.gouly(a)arm.com>
arm64: add ID_AA64ISAR2_EL1 sys register
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 50 +--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Documentation/arm64/cpu-feature-registers.rst | 17 ++
Documentation/arm64/elf_hwcaps.rst | 8 +
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 ++
arch/arm/include/asm/vmlinux.lds.h | 35 ++-
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++-
arch/arm/kernel/entry-common.S | 24 ++
arch/arm/kernel/spectre.c | 71 +++++
arch/arm/kernel/traps.c | 65 +++-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 207 ++++++++++---
arch/arm64/Kconfig | 9 +
arch/arm64/include/asm/assembler.h | 53 ++++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpufeature.h | 29 ++
arch/arm64/include/asm/cputype.h | 8 +
arch/arm64/include/asm/fixmap.h | 6 +-
arch/arm64/include/asm/hwcap.h | 2 +
arch/arm64/include/asm/insn.h | 1 +
arch/arm64/include/asm/kvm_host.h | 5 +
arch/arm64/include/asm/sections.h | 5 +
arch/arm64/include/asm/spectre.h | 4 +
arch/arm64/include/asm/sysreg.h | 18 ++
arch/arm64/include/asm/vectors.h | 73 +++++
arch/arm64/include/uapi/asm/hwcap.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 5 +
arch/arm64/kernel/cpu_errata.c | 7 +
arch/arm64/kernel/cpufeature.c | 25 ++
arch/arm64/kernel/cpuinfo.c | 3 +
arch/arm64/kernel/entry.S | 214 +++++++++----
arch/arm64/kernel/image-vars.h | 4 +
arch/arm64/kernel/proton-pack.c | 391 +++++++++++++++++++++++-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/kvm/arm.c | 5 +-
arch/arm64/kvm/hyp/hyp-entry.S | 9 +
arch/arm64/kvm/hyp/nvhe/mm.c | 4 +-
arch/arm64/kvm/hyp/vhe/switch.c | 9 +-
arch/arm64/kvm/hypercalls.c | 12 +
arch/arm64/kvm/psci.c | 18 +-
arch/arm64/kvm/sys_regs.c | 2 +-
arch/arm64/mm/mmu.c | 12 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/alternative.c | 8 +-
arch/x86/kernel/cpu/bugs.c | 204 ++++++++++---
arch/x86/lib/retpoline.S | 2 +-
arch/x86/net/bpf_jit_comp.c | 2 +-
include/linux/arm-smccc.h | 5 +
include/linux/bpf.h | 12 +
kernel/sysctl.c | 7 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
56 files changed, 1606 insertions(+), 216 deletions(-)
The patch titled
Subject: ocfs2: fix crash when initialize filecheck kobj fails
has been added to the -mm tree. Its filename is
ocfs2-fix-crash-when-initialize-filecheck-kobj-fails.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/ocfs2-fix-crash-when-initialize-f…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-fix-crash-when-initialize-f…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Subject: ocfs2: fix crash when initialize filecheck kobj fails
Once s_root is set, genric_shutdown_super() will be called if fill_super()
fails. That means, we will call ocfs2_dismount_volume() twice in such
case, which can lead to kernel crash. Fix this issue by initializing
filecheck kobj before setting s_root.
Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com
Fixes: 5f483c4abb50 ("ocfs2: add kobject for online file check")
Signed-off-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/super.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
--- a/fs/ocfs2/super.c~ocfs2-fix-crash-when-initialize-filecheck-kobj-fails
+++ a/fs/ocfs2/super.c
@@ -1105,17 +1105,6 @@ static int ocfs2_fill_super(struct super
goto read_super_error;
}
- root = d_make_root(inode);
- if (!root) {
- status = -ENOMEM;
- mlog_errno(status);
- goto read_super_error;
- }
-
- sb->s_root = root;
-
- ocfs2_complete_mount_recovery(osb);
-
osb->osb_dev_kset = kset_create_and_add(sb->s_id, NULL,
&ocfs2_kset->kobj);
if (!osb->osb_dev_kset) {
@@ -1133,6 +1122,17 @@ static int ocfs2_fill_super(struct super
goto read_super_error;
}
+ root = d_make_root(inode);
+ if (!root) {
+ status = -ENOMEM;
+ mlog_errno(status);
+ goto read_super_error;
+ }
+
+ sb->s_root = root;
+
+ ocfs2_complete_mount_recovery(osb);
+
if (ocfs2_mount_local(osb))
snprintf(nodestr, sizeof(nodestr), "local");
else
_
Patches currently in -mm which might be from joseph.qi(a)linux.alibaba.com are
ocfs2-fix-crash-when-initialize-filecheck-kobj-fails.patch
ocfs2-cleanup-some-return-variables.patch
The patch titled
Subject: mm: only re-generate demotion targets when a numa node changes its N_CPU state
has been added to the -mm tree. Its filename is
mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-only-re-generate-demotion-targ…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-only-re-generate-demotion-targ…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: mm: only re-generate demotion targets when a numa node changes its N_CPU state
Abhishek reported that after patch [1], hotplug operations are taking
~double the expected time. [2]
The reason behind is that the CPU callbacks that migrate_on_reclaim_init()
sets always call set_migration_target_nodes() whenever a CPU is brought
up/down. But we only care about numa nodes going from having cpus to
become cpuless, and vice versa, as that influences the demotion_target
order.
We do already have two CPU callbacks (vmstat_cpu_online() and
vmstat_cpu_dead()) that check exactly that, so get rid of the CPU
callbacks in migrate_on_reclaim_init() and only call
set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a
numa node change its N_CPU state.
[1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.…
[2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux…
Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de
Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events")
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reported-by: Abhishek Goel <huntbag(a)linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Abhishek Goel <huntbag(a)linux.vnet.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/migrate.h | 8 +++++++
mm/migrate.c | 41 ++++----------------------------------
mm/vmstat.c | 13 +++++++++++-
3 files changed, 25 insertions(+), 37 deletions(-)
--- a/include/linux/migrate.h~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/include/linux/migrate.h
@@ -48,7 +48,15 @@ int folio_migrate_mapping(struct address
struct folio *newfolio, struct folio *folio, int extra_count);
extern bool numa_demotion_enabled;
+extern void migrate_on_reclaim_init(void);
+#ifdef CONFIG_HOTPLUG_CPU
+extern void set_migration_target_nodes(void);
#else
+static inline void set_migration_target_nodes(void) {}
+#endif
+#else
+
+static inline void set_migration_target_nodes(void) {}
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t new,
--- a/mm/migrate.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/mm/migrate.c
@@ -3211,7 +3211,7 @@ again:
/*
* For callers that do not hold get_online_mems() already.
*/
-static void set_migration_target_nodes(void)
+void set_migration_target_nodes(void)
{
get_online_mems();
__set_migration_target_nodes();
@@ -3275,51 +3275,20 @@ static int __meminit migrate_on_reclaim_
return notifier_from_errno(0);
}
-/*
- * React to hotplug events that might affect the migration targets
- * like events that online or offline NUMA nodes.
- *
- * The ordering is also currently dependent on which nodes have
- * CPUs. That means we need CPU on/offline notification too.
- */
-static int migration_online_cpu(unsigned int cpu)
-{
- set_migration_target_nodes();
- return 0;
-}
-
-static int migration_offline_cpu(unsigned int cpu)
+void __init migrate_on_reclaim_init(void)
{
- set_migration_target_nodes();
- return 0;
-}
-
-static int __init migrate_on_reclaim_init(void)
-{
- int ret;
-
node_demotion = kmalloc_array(nr_node_ids,
sizeof(struct demotion_nodes),
GFP_KERNEL);
WARN_ON(!node_demotion);
- ret = cpuhp_setup_state_nocalls(CPUHP_MM_DEMOTION_DEAD, "mm/demotion:offline",
- NULL, migration_offline_cpu);
/*
- * In the unlikely case that this fails, the automatic
- * migration targets may become suboptimal for nodes
- * where N_CPU changes. With such a small impact in a
- * rare case, do not bother trying to do anything special.
+ * At this point, all numa nodes with memory/CPus have their state
+ * properly set, so we can build the demotion order now.
*/
- WARN_ON(ret < 0);
- ret = cpuhp_setup_state(CPUHP_AP_MM_DEMOTION_ONLINE, "mm/demotion:online",
- migration_online_cpu, NULL);
- WARN_ON(ret < 0);
-
+ set_migration_target_nodes();
hotplug_memory_notifier(migrate_on_reclaim_callback, 100);
- return 0;
}
-late_initcall(migrate_on_reclaim_init);
#endif /* CONFIG_HOTPLUG_CPU */
bool numa_demotion_enabled = false;
--- a/mm/vmstat.c~mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state
+++ a/mm/vmstat.c
@@ -28,6 +28,7 @@
#include <linux/mm_inline.h>
#include <linux/page_ext.h>
#include <linux/page_owner.h>
+#include <linux/migrate.h>
#include "internal.h"
@@ -2049,7 +2050,12 @@ static void __init init_cpu_node_state(v
static int vmstat_cpu_online(unsigned int cpu)
{
refresh_zone_stat_thresholds();
- node_set_state(cpu_to_node(cpu), N_CPU);
+
+ if (!node_state(cpu_to_node(cpu), N_CPU)) {
+ node_set_state(cpu_to_node(cpu), N_CPU);
+ set_migration_target_nodes();
+ }
+
return 0;
}
@@ -2072,6 +2078,8 @@ static int vmstat_cpu_dead(unsigned int
return 0;
node_clear_state(node, N_CPU);
+ set_migration_target_nodes();
+
return 0;
}
@@ -2103,6 +2111,9 @@ void __init init_mm_internals(void)
start_shepherd_timer();
#endif
+#if defined(CONFIG_MIGRATION) && defined(CONFIG_HOTPLUG_CPU)
+ migrate_on_reclaim_init();
+#endif
#ifdef CONFIG_PROC_FS
proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op);
proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op);
_
Patches currently in -mm which might be from osalvador(a)suse.de are
arch-x86-mm-numa-do-not-initialize-nodes-twice.patch
arch-x86-mm-numa-do-not-initialize-nodes-twice-v2.patch
mm-only-re-generate-demotion-targets-when-a-numa-node-changes-its-n_cpu-state.patch
This is the start of the stable review cycle for the 4.19.234 release.
There are 18 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.234-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.234-rc1
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 ++
arch/arm/include/asm/spectre.h | 32 ++++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++++++-
arch/arm/kernel/entry-common.S | 24 +++
arch/arm/kernel/spectre.c | 71 ++++++++
arch/arm/kernel/traps.c | 65 ++++++-
arch/arm/kernel/vmlinux.lds.h | 35 +++-
arch/arm/mm/Kconfig | 11 ++
arch/arm/mm/proc-v7-bugs.c | 200 +++++++++++++++++++---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 214 +++++++++++++++++-------
drivers/firmware/psci.c | 15 ++
include/linux/arm-smccc.h | 74 ++++++++
include/linux/bpf.h | 11 ++
kernel/sysctl.c | 8 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
21 files changed, 796 insertions(+), 135 deletions(-)
For some reason, the Microsoft Surface Go 3 uses the standard ACPI
interface for battery information, but does not use the standard PNP0C0A
HID. Instead it uses MSHW0146 as identifier. Add that ID to the driver
as this seems to work well.
Additionally, the power state is not updated immediately after the AC
has been (un-)plugged, so add the respective quirk for that.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Maximilian Luz <luzmaximilian(a)gmail.com>
---
drivers/acpi/battery.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/acpi/battery.c b/drivers/acpi/battery.c
index ea31ae01458b..dc208f5f5a1f 100644
--- a/drivers/acpi/battery.c
+++ b/drivers/acpi/battery.c
@@ -59,6 +59,10 @@ MODULE_PARM_DESC(cache_time, "cache time in milliseconds");
static const struct acpi_device_id battery_device_ids[] = {
{"PNP0C0A", 0},
+
+ /* Microsoft Surface Go 3 */
+ {"MSHW0146", 0},
+
{"", 0},
};
@@ -1148,6 +1152,14 @@ static const struct dmi_system_id bat_dmi_table[] __initconst = {
DMI_MATCH(DMI_PRODUCT_VERSION, "ThinkPad"),
},
},
+ {
+ /* Microsoft Surface Go 3 */
+ .callback = battery_notification_delay_quirk,
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Surface Go 3"),
+ },
+ },
{},
};
--
2.35.1
Please I need your help,
Please forgive me for stressing you with my predicaments as I know
that this letter may come to you as a big surprise. Actually, I came
across your E-mail from my personal search afterward. I decided to
email you directly believing that you will be honest to fulfil my
final wish before I die.
Meanwhile, I am Mrs.Karen Olsen, 62 years old,I am suffering from a
long time cancer and from all indication my condition is really
deteriorating as my doctors have confirmed and courageously advised me
that I may not live beyond two months from now for the reason that my
tumour has reached a critical stage which has defiled all forms of
medical treatment.
As a matter of fact, I registered as a nurse by profession while my
husband was dealing on Gold Dust and Gold Dore Bars till his sudden
death the year 2017 then I took over his business till date. In fact,
at this moment I have a deposit sum of eight million five hundred
thousand US dollars [$8,500,000.00] with one of the leading bank but
unfortunately I cannot visit the bank since I m critically sick and
powerless to do anything myself but my bank account officer advised me
to assign any of my trustworthy relative, friends or partner with
authorization letter to stand as the recipient of my money but
sorrowfully I don t have any reliable relative and no child.
Therefore, I want you to receive the money and take 30% to take care
of yourself and family while 70% should be used basically on
humanitarian purposes mostly to orphanages, Motherless babies home,
less privileged and disable citizens and widows around the world. and
as soon as I receive your response I shall send you my pictures,
banking records and with full contacts of my banking institution to If
you are interested in carrying out this task please contact me for
more details
Hope to hear from you soon.
Yours Faithfully
Mrs.Karen Olsen
Fix the descriptions of the return values of helper
bpf_current_task_under_cgroup().
Fixes: c6b5fb8690fa ("bpf: add documentation for eBPF helpers (42-50)")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hengqi Chen <hengqi.chen(a)gmail.com>
---
include/uapi/linux/bpf.h | 4 ++--
tools/include/uapi/linux/bpf.h | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bc23020b638d..374db485f063 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2302,8 +2302,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
- * * 0, if current task belongs to the cgroup2.
- * * 1, if current task does not belong to the cgroup2.
+ * * 1, if current task belongs to the cgroup2.
+ * * 0, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index bc23020b638d..374db485f063 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2302,8 +2302,8 @@ union bpf_attr {
* Return
* The return value depends on the result of the test, and can be:
*
- * * 0, if current task belongs to the cgroup2.
- * * 1, if current task does not belong to the cgroup2.
+ * * 1, if current task belongs to the cgroup2.
+ * * 0, if current task does not belong to the cgroup2.
* * A negative error code, if an error occurred.
*
* long bpf_skb_change_tail(struct sk_buff *skb, u32 len, u64 flags)
--
2.30.2
upstream commit 277c8cb3e8ac ("MIPS: fix local_{add,sub}_return on MIPS64")
was backported to v5.15.27 as
commit f98371d2ac83 ("MIPS: fix local_{add,sub}_return on MIPS64")
but breaks MIPS build:
In file included from ./arch/mips/include/asm/local.h:8:0,
from ./include/linux/genhd.h:20,
from ./include/linux/blkdev.h:8,
from ./include/linux/blk-cgroup.h:23,
from ./include/linux/writeback.h:14,
from ./include/linux/memcontrol.h:22,
from ./include/net/sock.h:53,
from ./include/linux/tcp.h:19,
from drivers/net/slip/slip.c:91:
./arch/mips/include/asm/asm.h:68:0: warning: "END" redefined
#define END(function) \
In file included from drivers/net/slip/slip.c:88:0:
drivers/net/slip/slip.h:44:0: note: this is the location of the previous definition
#define END 0300 /* indicates end of frame */
Analyses reveals that with the backported MIPS fix there is a new
#include <asm/asm.h> introduced by ./arch/mips/include/asm/local.h
which already defines some END macro.
But why does v5.16.x compile fine where
commit a0ecfd10d669c ("MIPS: fix local_{add,sub}_return on MIPS64")
is also present since v5.16.3?
Deeper analyses shows that there is another patch introduced
in v5.16-rc1 which removed one #include in the above chain and
therefore does not define END by asm/asm.h:
commit 348332e000697 ("mm: don't include <linux/blk-cgroup.h> in <linux/writeback.h>")
Hence, the MIPS fix should only be applied to branches where
the mm fix is already present. Or the mm fix should be backported
as well (if it has no side-effects).
Note: the MIPS fix was apparently not (yet?) applied to v5.10.y or earlier
even tough the Fixes: 7232311ef14c ("local_t: mips extension")
would be true.
BR and thanks,
Nikolaus Schaller
Dear Sir or Madam,
We would be very grateful if you could take a moment to leave feedback
to us, if you received our email dated 4th March 2022 to you.
Best regards.
Gordon Harvey.
From: Nicolas Saenz Julienne <nsaenzju(a)redhat.com>
At the moment running osnoise on a nohz_full CPU or uncontested FIFO
priority and a PREEMPT_RCU kernel might have the side effect of
extending grace periods too much. This will entice RCU to force a
context switch on the wayward CPU to end the grace period, all while
introducing unwarranted noise into the tracer. This behaviour is
unavoidable as overly extending grace periods might exhaust the system's
memory.
This same exact problem is what extended quiescent states (EQS) were
created for, conversely, rcu_momentary_dyntick_idle() emulates them by
performing a zero duration EQS. So let's make use of it.
In the common case rcu_momentary_dyntick_idle() is fairly inexpensive:
atomically incrementing a local per-CPU counter and doing a store. So it
shouldn't affect osnoise's measurements (which has a 1us granularity),
so we'll call it unanimously.
The uncommon case involve calling rcu_momentary_dyntick_idle() after
having the osnoise process:
- Receive an expedited quiescent state IPI with preemption disabled or
during an RCU critical section. (activates rdp->cpu_no_qs.b.exp
code-path).
- Being preempted within in an RCU critical section and having the
subsequent outermost rcu_read_unlock() called with interrupts
disabled. (t->rcu_read_unlock_special.b.blocked code-path).
Neither of those are possible at the moment, and are unlikely to be in
the future given the osnoise's loop design. On top of this, the noise
generated by the situations described above is unavoidable, and if not
exposed by rcu_momentary_dyntick_idle() will be eventually seen in
subsequent rcu_read_unlock() calls or schedule operations.
Link: https://lkml.kernel.org/r/20220307180740.577607-1-nsaenzju@redhat.com
Cc: stable(a)vger.kernel.org
Fixes: bce29ac9ce0b ("trace: Add osnoise tracer")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju(a)redhat.com>
Acked-by: Paul E. McKenney <paulmck(a)kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_osnoise.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index 2aa3efdca755..5e3c62a08fc0 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -1386,6 +1386,26 @@ static int run_osnoise(void)
osnoise_stop_tracing();
}
+ /*
+ * In some cases, notably when running on a nohz_full CPU with
+ * a stopped tick PREEMPT_RCU has no way to account for QSs.
+ * This will eventually cause unwarranted noise as PREEMPT_RCU
+ * will force preemption as the means of ending the current
+ * grace period. We avoid this problem by calling
+ * rcu_momentary_dyntick_idle(), which performs a zero duration
+ * EQS allowing PREEMPT_RCU to end the current grace period.
+ * This call shouldn't be wrapped inside an RCU critical
+ * section.
+ *
+ * Note that in non PREEMPT_RCU kernels QSs are handled through
+ * cond_resched()
+ */
+ if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
+ local_irq_disable();
+ rcu_momentary_dyntick_idle();
+ local_irq_enable();
+ }
+
/*
* For the non-preemptive kernel config: let threads runs, if
* they so wish.
--
2.34.1
From: Daniel Bristot de Oliveira <bristot(a)kernel.org>
Nicolas reported that using:
# trace-cmd record -e all -M 10 -p osnoise --poll
Resulted in the following kernel warning:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1217 at kernel/tracepoint.c:404 tracepoint_probe_unregister+0x280/0x370
[...]
CPU: 0 PID: 1217 Comm: trace-cmd Not tainted 5.17.0-rc6-next-20220307-nico+ #19
RIP: 0010:tracepoint_probe_unregister+0x280/0x370
[...]
CR2: 00007ff919b29497 CR3: 0000000109da4005 CR4: 0000000000170ef0
Call Trace:
<TASK>
osnoise_workload_stop+0x36/0x90
tracing_set_tracer+0x108/0x260
tracing_set_trace_write+0x94/0xd0
? __check_object_size.part.0+0x10a/0x150
? selinux_file_permission+0x104/0x150
vfs_write+0xb5/0x290
ksys_write+0x5f/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7ff919a18127
[...]
---[ end trace 0000000000000000 ]---
The warning complains about an attempt to unregister an
unregistered tracepoint.
This happens on trace-cmd because it first stops tracing, and
then switches the tracer to nop. Which is equivalent to:
# cd /sys/kernel/tracing/
# echo osnoise > current_tracer
# echo 0 > tracing_on
# echo nop > current_tracer
The osnoise tracer stops the workload when no trace instance
is actually collecting data. This can be caused both by
disabling tracing or disabling the tracer itself.
To avoid unregistering events twice, use the existing
trace_osnoise_callback_enabled variable to check if the events
(and the workload) are actually active before trying to
deactivate them.
Link: https://lore.kernel.org/all/c898d1911f7f9303b7e14726e7cc9678fbfb4a0e.camel@…
Link: https://lkml.kernel.org/r/938765e17d5a781c2df429a98f0b2e7cc317b022.16468239…
Cc: stable(a)vger.kernel.org
Cc: Marcelo Tosatti <mtosatti(a)redhat.com>
Fixes: 2fac8d6486d5 ("tracing/osnoise: Allow multiple instances of the same tracer")
Reported-by: Nicolas Saenz Julienne <nsaenzju(a)redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_osnoise.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index cfddb30e65ab..2aa3efdca755 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -2200,6 +2200,17 @@ static void osnoise_workload_stop(void)
if (osnoise_has_registered_instances())
return;
+ /*
+ * If callbacks were already disabled in a previous stop
+ * call, there is no need to disable then again.
+ *
+ * For instance, this happens when tracing is stopped via:
+ * echo 0 > tracing_on
+ * echo nop > current_tracer.
+ */
+ if (!trace_osnoise_callback_enabled)
+ return;
+
trace_osnoise_callback_enabled = false;
/*
* Make sure that ftrace_nmi_enter/exit() see
--
2.34.1
stable-rc queue/4.19 x86 and i386 gcc-11 builds failed due to following
multiple warnings and errors.
metadata:
git_describe: v4.19.233-22-g83c76d59eabe
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc-queues
git_sha: 83c76d59eabe7545b86485336a9aeb0f652666be
git_short_log: 83c76d59eabe (\ARM: fix build warning in proc-v7-bugs.c\)
target_arch: x86_64
toolchain: gcc-11
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/2/build ARCH=x86_64
CROSS_COMPILE=x86_64-linux-gnu- 'CC=sccache x86_64-linux-gnu-gcc'
'HOSTCC=sccache gcc'
arch/x86/entry/entry_64.S: Assembler messages:
arch/x86/entry/entry_64.S:1738: Warning: no instruction mnemonic
suffix given and no register operands; using default for `sysret'
arch/x86/kernel/cpu/bugs.c: In function 'spectre_v2_select_mitigation':
arch/x86/kernel/cpu/bugs.c:973:41: error: implicit declaration of
function 'unprivileged_ebpf_enabled'
[-Werror=implicit-function-declaration]
973 | if (mode == SPECTRE_V2_EIBRS && unprivileged_ebpf_enabled())
| ^~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
drivers/crypto/ccp/sp-platform.c:37:34: warning: array 'sp_of_match'
assumed to have one element
37 | static const struct of_device_id sp_of_match[];
| ^~~~~~~~~~~
drivers/gpu/drm/i915/intel_dp.c: In function 'intel_dp_check_mst_status':
drivers/gpu/drm/i915/intel_dp.c:4129:30: warning:
'drm_dp_channel_eq_ok' reading 6 bytes from a region of size 4
[-Wstringop-overread]
4129 | !drm_dp_channel_eq_ok(&esi[10],
intel_dp->lane_count)) {
|
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/i915/intel_dp.c:4129:30: note: referencing argument 1
of type 'const u8 *' {aka 'const unsigned char *'}
In file included from drivers/gpu/drm/i915/intel_dp.c:39:
include/drm/drm_dp_helper.h:954:6: note: in a call to function
'drm_dp_channel_eq_ok'
954 | bool drm_dp_channel_eq_ok(const u8 link_status[DP_LINK_STATUS_SIZE],
| ^~~~~~~~~~~~~~~~~~~~
make[1]: Target '_all' not remade because of errors.
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
build link [1] & [2]
--
Linaro LKFT
https://lkft.linaro.org
[1] https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc-queues/-/jobs…
[2] https://builds.tuxbuild.com/26C6OfFPTfEBORLviGXIOuwmzR2/
On Thu, Mar 10, 2022 at 01:26:16PM +0100, Rafael J. Wysocki wrote:
> Hi Greg & Sasha,
>
> Commit 4287509b4d21e34dc492 that went into 5.16.y as a backport of
> mainline commit dc0075ba7f38 ("ACPI: PM: s2idle: Cancel wakeup before
> dispatching EC GPE") is causing trouble in 5.16.y, but 5.17-rc7
> including the original commit is fine.
>
> This is most likely due to some other changes that commit dc0075ba7f38
> turns out to depend on which have not been backported, but because it
> is not an essential fix (and it was backported, because it carried a
> Fixes tag and not because it was marked for backporting), IMV it is
> better to revert it from 5.16.y than to try to pull all of the
> dependencies in (and risk missing any of them), so please do that.
>
> Please see this thread:
>
> https://lore.kernel.org/linux-pm/31b9d1cd-6a67-218b-4ada-12f72e6f00dc@redha…
Odd that this is only showing up in 5.16 as this commit also is in 5.4
and 5.10 and 5.15. Should I drop it from there as well?
thanks,
greg k-h
--
Hi Dear,
My name is Lisa Williams, I am from the United States of America, Its
my pleasure to contact you for new and special friendship, I will be
glad to see your reply for us to know each other better.
Yours
Lisa
I messed up the stable list address, sorry.
On Thu, Mar 10, 2022 at 1:26 PM Rafael J. Wysocki <rafael(a)kernel.org> wrote:
>
> Hi Greg & Sasha,
>
> Commit 4287509b4d21e34dc492 that went into 5.16.y as a backport of
> mainline commit dc0075ba7f38 ("ACPI: PM: s2idle: Cancel wakeup before
> dispatching EC GPE") is causing trouble in 5.16.y, but 5.17-rc7
> including the original commit is fine.
>
> This is most likely due to some other changes that commit dc0075ba7f38
> turns out to depend on which have not been backported, but because it
> is not an essential fix (and it was backported, because it carried a
> Fixes tag and not because it was marked for backporting), IMV it is
> better to revert it from 5.16.y than to try to pull all of the
> dependencies in (and risk missing any of them), so please do that.
>
> Please see this thread:
>
> https://lore.kernel.org/linux-pm/31b9d1cd-6a67-218b-4ada-12f72e6f00dc@redha…
>
> for reference.
>
> Cheers!
This is the start of the stable review cycle for the 4.9.306 release.
There are 24 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.306-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.306-rc1
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Mark Rutland <mark.rutland(a)arm.com>
arm/arm64: smccc/psci: add arm_smccc_1_1_get_conduit()
Steven Price <steven.price(a)arm.com>
arm/arm64: Provide a wrapper for SMCCC 1.1 calls
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
Borislav Petkov <bp(a)suse.de>
x86/speculation: Merge one test in spectre_v2_user_select_mitigation()
Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
Documentation: refer to config RANDOMIZE_BASE for kernel address-space randomization
Josh Poimboeuf <jpoimboe(a)redhat.com>
Documentation: Add swapgs description to the Spectre v1 documentation
Tim Chen <tim.c.chen(a)linux.intel.com>
Documentation: Add section about CPU vulnerabilities for Spectre
Zhenzhong Duan <zhenzhong.duan(a)oracle.com>
x86/retpoline: Remove minimal retpoline support
Zhenzhong Duan <zhenzhong.duan(a)oracle.com>
x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support
Zhenzhong Duan <zhenzhong.duan(a)oracle.com>
x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant
-------------
Diffstat:
Documentation/hw-vuln/index.rst | 1 +
Documentation/hw-vuln/spectre.rst | 785 +++++++++++++++++++++++++++++++
Documentation/kernel-parameters.txt | 8 +-
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 ++
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 +++-
arch/arm/kernel/entry-common.S | 24 +
arch/arm/kernel/spectre.c | 71 +++
arch/arm/kernel/traps.c | 65 ++-
arch/arm/kernel/vmlinux-xip.lds.S | 37 +-
arch/arm/kernel/vmlinux.lds.S | 37 +-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 198 ++++++--
arch/x86/Kconfig | 4 -
arch/x86/Makefile | 5 +-
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 41 +-
arch/x86/kernel/cpu/bugs.c | 223 ++++++---
drivers/firmware/psci.c | 15 +
include/linux/arm-smccc.h | 74 +++
include/linux/bpf.h | 11 +
kernel/sysctl.c | 8 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
25 files changed, 1596 insertions(+), 153 deletions(-)
This is the start of the stable review cycle for the 5.10.105 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.105-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.105-rc1
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
James Morse <james.morse(a)arm.com>
arm64: Use the clearbhb instruction in mitigations
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
James Morse <james.morse(a)arm.com>
arm64: Mitigate spectre style branch history side channels
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
James Morse <james.morse(a)arm.com>
arm64: Add percpu vectors for EL1
James Morse <james.morse(a)arm.com>
arm64: entry: Add macro for reading symbol addresses from the trampoline
James Morse <james.morse(a)arm.com>
arm64: entry: Add vectors that have the bhb mitigation sequences
James Morse <james.morse(a)arm.com>
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
James Morse <james.morse(a)arm.com>
arm64: entry: Allow the trampoline text to occupy multiple pages
James Morse <james.morse(a)arm.com>
arm64: entry: Make the kpti trampoline's kpti sequence optional
James Morse <james.morse(a)arm.com>
arm64: entry: Move trampoline macros out of ifdef'd section
James Morse <james.morse(a)arm.com>
arm64: entry: Don't assume tramp_vectors is the start of the vectors
James Morse <james.morse(a)arm.com>
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
James Morse <james.morse(a)arm.com>
arm64: entry: Move the trampoline data page before the text page
James Morse <james.morse(a)arm.com>
arm64: entry: Free up another register on kpti's tramp_exit path
James Morse <james.morse(a)arm.com>
arm64: entry: Make the trampoline cleanup optional
James Morse <james.morse(a)arm.com>
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
James Morse <james.morse(a)arm.com>
arm64: entry.S: Add ventry overflow sanity checks
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly <joey.gouly(a)arm.com>
arm64: add ID_AA64ISAR2_EL1 sys register
Marc Zyngier <maz(a)kernel.org>
arm64: Add HWCAP for self-synchronising virtual counter
Anshuman Khandual <anshuman.khandual(a)arm.com>
arm64: Add Cortex-A510 CPU part definition
Anshuman Khandual <anshuman.khandual(a)arm.com>
arm64: Add Cortex-X2 CPU part definition
Suzuki K Poulose <suzuki.poulose(a)arm.com>
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
Hector Martin <marcan(a)marcan.st>
arm64: cputype: Add CPU implementor & types for the Apple M1 cores
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++--
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Documentation/arm64/cpu-feature-registers.rst | 29 +-
Documentation/arm64/elf_hwcaps.rst | 12 +
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 +++
arch/arm/include/asm/vmlinux.lds.h | 35 ++-
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 +++++-
arch/arm/kernel/entry-common.S | 24 ++
arch/arm/kernel/spectre.c | 71 +++++
arch/arm/kernel/traps.c | 65 ++++-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 207 +++++++++++---
arch/arm64/Kconfig | 9 +
arch/arm64/include/asm/assembler.h | 33 +++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpucaps.h | 3 +-
arch/arm64/include/asm/cpufeature.h | 28 ++
arch/arm64/include/asm/cputype.h | 22 ++
arch/arm64/include/asm/fixmap.h | 6 +-
arch/arm64/include/asm/hwcap.h | 3 +
arch/arm64/include/asm/insn.h | 1 +
arch/arm64/include/asm/kvm_asm.h | 8 +
arch/arm64/include/asm/kvm_mmu.h | 3 +-
arch/arm64/include/asm/mmu.h | 6 +
arch/arm64/include/asm/sections.h | 5 +
arch/arm64/include/asm/spectre.h | 4 +
arch/arm64/include/asm/sysreg.h | 18 ++
arch/arm64/include/asm/vectors.h | 73 +++++
arch/arm64/include/uapi/asm/hwcap.h | 3 +
arch/arm64/include/uapi/asm/kvm.h | 5 +
arch/arm64/kernel/cpu_errata.c | 7 +
arch/arm64/kernel/cpufeature.c | 28 +-
arch/arm64/kernel/cpuinfo.c | 4 +
arch/arm64/kernel/entry.S | 213 ++++++++++----
arch/arm64/kernel/proton-pack.c | 359 +++++++++++++++++++++++-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/kvm/arm.c | 3 +-
arch/arm64/kvm/hyp/hyp-entry.S | 4 +
arch/arm64/kvm/hyp/smccc_wa.S | 75 +++++
arch/arm64/kvm/hyp/vhe/switch.c | 9 +-
arch/arm64/kvm/hypercalls.c | 12 +
arch/arm64/kvm/psci.c | 18 +-
arch/arm64/kvm/sys_regs.c | 2 +-
arch/arm64/mm/mmu.c | 12 +-
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 205 ++++++++++----
include/linux/arm-smccc.h | 5 +
include/linux/bpf.h | 12 +
kernel/sysctl.c | 7 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
54 files changed, 1651 insertions(+), 214 deletions(-)
[PROBLEM]
Patch "btrfs: don't let new writes to trigger autodefrag on the same
inode" now makes autodefrag really only to scan one inode once
per autodefrag run.
That patch works mostly fine, as the trace events show their intervals
are almost 30s:
(only showing the root 257 ino 329891 start 0)
486.810041: defrag_file_start: root=257 ino=329891 start=0 len=754728960
506.407089: defrag_file_start: root=257 ino=329891 start=0 len=754728960
536.463320: defrag_file_start: root=257 ino=329891 start=0 len=754728960
539.721309: defrag_file_start: root=257 ino=329891 start=0 len=754728960
569.741417: defrag_file_start: root=257 ino=329891 start=0 len=754728960
594.832649: defrag_file_start: root=257 ino=329891 start=0 len=754728960
624.258214: defrag_file_start: root=257 ino=329891 start=0 len=754728960
654.856264: defrag_file_start: root=257 ino=329891 start=0 len=754728960
684.943029: defrag_file_start: root=257 ino=329891 start=0 len=754728960
715.288662: defrag_file_start: root=257 ino=329891 start=0 len=754728960
But there are some outlawers, like 536s->539s, it's only 3s, not the 30s
default commit interval.
[CAUSE]
There are several call sites which can wake up transaction kthread
early, while transaction kthread itself can skip transaction if its
timer doesn't reach commit interval, but cleaner is always woken up.
This means each time transaction ktreahd gets woken up, we also trigger
autodefrag, even transaction kthread chooses to skip its work.
This is not a good behavior for files under heavy write load, as we
waste extra IO/CPU while the defragged extents can easily get fragmented
again.
[FIX]
In btrfs_run_defrag_inodes(), we check if our current time is larger
than last run + commit_interval.
If not, skip this run and wait for next opportunity.
This patch along with patch "btrfs: don't let new writes to trigger
autodefrag on the same inode" are mostly for backport to v5.16.
This is just to reduce the unnecessary IO/CPU caused by autodefrag, the
final solution would be allowing users to change autodefrag scan
interval and target extent size.
Cc: stable(a)vger.kernel.org # 5.16+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/ctree.h | 1 +
fs/btrfs/file.c | 11 +++++++++++
2 files changed, 12 insertions(+)
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index a8a3de10cead..44116a47307e 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -899,6 +899,7 @@ struct btrfs_fs_info {
/* auto defrag inodes go here */
spinlock_t defrag_inodes_lock;
+ u64 defrag_last_run_ksec;
struct rb_root defrag_inodes;
atomic_t defrag_running;
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index abba1871e86e..a852754f5601 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -312,9 +312,20 @@ int btrfs_run_defrag_inodes(struct btrfs_fs_info *fs_info)
{
struct inode_defrag *defrag;
struct rb_root defrag_inodes;
+ u64 ksec = ktime_get_seconds();
u64 first_ino = 0;
u64 root_objectid = 0;
+ /*
+ * If cleaner get woken up early, skip this run to avoid frequent
+ * re-dirty, which is not really useful for heavy writes.
+ *
+ * TODO: Make autodefrag to happen in its own thread.
+ */
+ if (ksec - fs_info->defrag_last_run_ksec < fs_info->commit_interval)
+ return 0;
+ fs_info->defrag_last_run_ksec = ksec;
+
atomic_inc(&fs_info->defrag_running);
spin_lock(&fs_info->defrag_inodes_lock);
/*
--
2.35.1
This is the start of the stable review cycle for the 5.15.28 release.
There are 43 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 11 Mar 2022 15:58:48 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.28-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.28-rc1
Christoph Hellwig <hch(a)lst.de>
block: drop unused includes in <linux/genhd.h>
Huang Pei <huangpei(a)loongson.cn>
slip: fix macro redefine warning
Emmanuel Gil Peyrot <linkmauve(a)linkmauve.fr>
ARM: fix build error when BPF_SYSCALL is disabled
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Include unprivileged eBPF status in Spectre v2 mitigation reporting
James Morse <james.morse(a)arm.com>
arm64: Use the clearbhb instruction in mitigations
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated
James Morse <james.morse(a)arm.com>
arm64: Mitigate spectre style branch history side channels
James Morse <james.morse(a)arm.com>
arm64: proton-pack: Report Spectre-BHB vulnerabilities as part of Spectre-v2
James Morse <james.morse(a)arm.com>
arm64: Add percpu vectors for EL1
James Morse <james.morse(a)arm.com>
arm64: entry: Add macro for reading symbol addresses from the trampoline
James Morse <james.morse(a)arm.com>
arm64: entry: Add vectors that have the bhb mitigation sequences
James Morse <james.morse(a)arm.com>
arm64: entry: Add non-kpti __bp_harden_el1_vectors for mitigations
James Morse <james.morse(a)arm.com>
arm64: entry: Allow the trampoline text to occupy multiple pages
James Morse <james.morse(a)arm.com>
arm64: entry: Make the kpti trampoline's kpti sequence optional
James Morse <james.morse(a)arm.com>
arm64: entry: Move trampoline macros out of ifdef'd section
James Morse <james.morse(a)arm.com>
arm64: entry: Don't assume tramp_vectors is the start of the vectors
James Morse <james.morse(a)arm.com>
arm64: entry: Allow tramp_alias to access symbols after the 4K boundary
James Morse <james.morse(a)arm.com>
arm64: entry: Move the trampoline data page before the text page
James Morse <james.morse(a)arm.com>
arm64: entry: Free up another register on kpti's tramp_exit path
James Morse <james.morse(a)arm.com>
arm64: entry: Make the trampoline cleanup optional
James Morse <james.morse(a)arm.com>
KVM: arm64: Allow indirect vectors to be used without SPECTRE_V3A
James Morse <james.morse(a)arm.com>
arm64: spectre: Rename spectre_v4_patch_fw_mitigation_conduit
James Morse <james.morse(a)arm.com>
arm64: entry.S: Add ventry overflow sanity checks
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_RPRES
Joey Gouly <joey.gouly(a)arm.com>
arm64: cpufeature: add HWCAP for FEAT_AFP
Joey Gouly <joey.gouly(a)arm.com>
arm64: add ID_AA64ISAR2_EL1 sys register
Anshuman Khandual <anshuman.khandual(a)arm.com>
arm64: Add Cortex-X2 CPU part definition
Marc Zyngier <maz(a)kernel.org>
arm64: Add HWCAP for self-synchronising virtual counter
Suzuki K Poulose <suzuki.poulose(a)arm.com>
arm64: Add Neoverse-N2, Cortex-A710 CPU part definition
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: include unprivileged BPF status in Spectre V2 reporting
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: Spectre-BHB workaround
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: use LOADADDR() to get load address of sections
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: early traps initialisation
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
ARM: report Spectre v2 status through sysfs
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about eIBRS + LFENCE + Unprivileged eBPF + SMT
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Warn about Spectre v2 LFENCE mitigation
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Update link to AMD speculation whitepaper
Kim Phillips <kim.phillips(a)amd.com>
x86/speculation: Use generic retpoline by default on AMD
Josh Poimboeuf <jpoimboe(a)redhat.com>
x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting
Peter Zijlstra <peterz(a)infradead.org>
Documentation/hw-vuln: Update spectre doc
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Add eIBRS + Retpoline options
Peter Zijlstra (Intel) <peterz(a)infradead.org>
x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE
Peter Zijlstra <peterz(a)infradead.org>
x86,bugs: Unconditionally allow spectre_v2=retpoline,amd
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 48 ++-
Documentation/admin-guide/kernel-parameters.txt | 8 +-
Documentation/arm64/cpu-feature-registers.rst | 29 +-
Documentation/arm64/elf_hwcaps.rst | 12 +
Makefile | 4 +-
arch/arm/include/asm/assembler.h | 10 +
arch/arm/include/asm/spectre.h | 32 ++
arch/arm/include/asm/vmlinux.lds.h | 35 ++-
arch/arm/kernel/Makefile | 2 +
arch/arm/kernel/entry-armv.S | 79 ++++-
arch/arm/kernel/entry-common.S | 24 ++
arch/arm/kernel/spectre.c | 71 +++++
arch/arm/kernel/traps.c | 65 +++-
arch/arm/mm/Kconfig | 11 +
arch/arm/mm/proc-v7-bugs.c | 207 ++++++++++---
arch/arm64/Kconfig | 9 +
arch/arm64/include/asm/assembler.h | 53 ++++
arch/arm64/include/asm/cpu.h | 1 +
arch/arm64/include/asm/cpufeature.h | 29 ++
arch/arm64/include/asm/cputype.h | 14 +
arch/arm64/include/asm/fixmap.h | 6 +-
arch/arm64/include/asm/hwcap.h | 3 +
arch/arm64/include/asm/insn.h | 1 +
arch/arm64/include/asm/kvm_host.h | 5 +
arch/arm64/include/asm/sections.h | 5 +
arch/arm64/include/asm/spectre.h | 4 +
arch/arm64/include/asm/sysreg.h | 18 ++
arch/arm64/include/asm/vectors.h | 73 +++++
arch/arm64/include/uapi/asm/hwcap.h | 3 +
arch/arm64/include/uapi/asm/kvm.h | 5 +
arch/arm64/kernel/cpu_errata.c | 7 +
arch/arm64/kernel/cpufeature.c | 28 +-
arch/arm64/kernel/cpuinfo.c | 4 +
arch/arm64/kernel/entry.S | 214 +++++++++----
arch/arm64/kernel/image-vars.h | 4 +
arch/arm64/kernel/proton-pack.c | 391 +++++++++++++++++++++++-
arch/arm64/kernel/vmlinux.lds.S | 2 +-
arch/arm64/kvm/arm.c | 5 +-
arch/arm64/kvm/hyp/hyp-entry.S | 9 +
arch/arm64/kvm/hyp/nvhe/mm.c | 4 +-
arch/arm64/kvm/hyp/vhe/switch.c | 9 +-
arch/arm64/kvm/hypercalls.c | 12 +
arch/arm64/kvm/psci.c | 18 +-
arch/arm64/kvm/sys_regs.c | 2 +-
arch/arm64/mm/mmu.c | 12 +-
arch/arm64/tools/cpucaps | 1 +
arch/um/drivers/ubd_kern.c | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/nospec-branch.h | 16 +-
arch/x86/kernel/cpu/bugs.c | 205 +++++++++----
arch/x86/lib/retpoline.S | 2 +-
block/genhd.c | 1 +
block/holder.c | 1 +
block/partitions/core.c | 1 +
drivers/block/amiflop.c | 1 +
drivers/block/ataflop.c | 1 +
drivers/block/floppy.c | 1 +
drivers/block/swim.c | 1 +
drivers/block/xen-blkfront.c | 1 +
drivers/md/md.c | 1 +
drivers/net/slip/slip.h | 2 +
drivers/s390/block/dasd_genhd.c | 1 +
drivers/scsi/sd.c | 1 +
drivers/scsi/sg.c | 1 +
drivers/scsi/sr.c | 1 +
drivers/scsi/st.c | 1 +
include/linux/arm-smccc.h | 5 +
include/linux/bpf.h | 12 +
include/linux/genhd.h | 14 +-
include/linux/part_stat.h | 1 +
kernel/sysctl.c | 7 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
72 files changed, 1642 insertions(+), 229 deletions(-)
On 09.03.22 17:46, Reinhold Mannsberger wrote:
> Dear Thorsten!
>
> Thank you for your quick response!
>
> Now that you gave me the advice to check dmesg I found out that the
> messages are the same with kernel version 5.16.10 and 5.16.12. But - as
> I described - with kernel version 5.16.10 I had to press the power
> button to resume from suspend. So I my conclusion that my laptop does
> not go to suspend ist apparently wrong.
Good. :-D
> In any case you find excerpts
> from dmesg with both kernel versions attached.
>
> Now there is one thing I really would like to understand. Concluding
> from the time stamps in dmesg it seems that my laptop goes to suspend
> only for a moment right before I re-open the lid. Of course I did not
> close my laptop lid only for 3 seconds - as it could be concluded from
> the time stamps for "PM: suspend entry (s2idle)" and "PM: suspend exit"
> - but for a longer period of time. Can you please enlighten me about that?
I have no idea, I'm just tracking regressions and sadly don't known much
about this. To me it looks a bit like s2idle is not working properly,
but I might be totally wrong with that. Maybe google might tell you; or
some measurements where you check how quickly the batter drains in
sleep. Oof you ask the PM developers -- but as this is not a regression
neither the regressions list nor the stable list care, so I guess you do
it in a separate mail.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.
> Am Mittwoch, dem 09.03.2022 um 07:51 +0100 schrieb Thorsten Leemhuis:
>> Hi!
>>
>> On 08.03.22 19:21, Reinhold Mannsberger wrote:
>>>
>>> I am using Linux Mint Xfce 20.3 with kernel version 5.16. I had to use
>>> kernel 5.16 because with the standard kernel version of Linux mit 20.3
>>> (which is 5.13) my laptop did not correctly resume, when I closed the
>>> lid.
>>>
>>> With kernel 5.16 my laptop perfectly went to to suspend when I closed
>>> the lid and it perfectly resumed, when I opened the lid again. This
>>> means: I had to press the power button once
>>
>> That sounds odd to me, as most modern Laptops wake up automatically when
>> you open the lid. It's unlikely, but maybe that just that started to
>> work now?
>>
>>> when I reopened the lid -
>>> and then the laptop resumed (to the login screen). This was true until
>>> kernel version 5.16.10. With kernel version > 5.16.10 my laptop does
>>> not go into suspend anymore. This means: When I open the lid I am back
>>> at the login screen immediately (I don't have to press the power button
>>> anymore).
>>
>> You want to check dmesg if the system really didn't go to sleep; it will
>> likely also provide a hint of what went wrong. Just upload the output
>> (generated after a fresh start and where you suspend and resume once the
>> system booted) somewhere and send us a link or sent it as an attachment
>> in a reply. If that doesn't provide any hints of what might be wrong,
>> you might need to find the change that introduced the problem using a
>> bisection.
>>
>> HTH, Ciao, Thorsten
>>
>>> System information for my laptop:
>>> ----------------------------------------------------------------------
>>> System: Kernel: 5.16.10-051610-generic x86_64 bits: 64 compiler: N/A
>>> Desktop: Xfce 4.16.0
>>> tk: Gtk 3.24.20 wm: xfwm4 dm: LightDM Distro: Linux Mint
>>> 20.3 Una
>>> base: Ubuntu 20.04 focal
>>> Machine: Type: Laptop System: HP product: HP ProBook 455 G8 Notebook
>>> PC v: N/A serial: <filter>
>>> Chassis: type: 10 serial: <filter>
>>> Mobo: HP model: 8864 v: KBC Version 41.1E.00 serial:
>>> <filter> UEFI: HP
>>> v: T78 Ver. 01.07.00 date: 10/08/2021
>>> Battery: ID-1: BAT0 charge: 43.8 Wh condition: 44.5/45.0 Wh (99%)
>>> volts: 13.0/11.4
>>> model: Hewlett-Packard Primary serial: <filter> status:
>>> Unknown
>>> CPU: Topology: 8-Core model: AMD Ryzen 7 5800U with Radeon
>>> Graphics bits: 64 type: MT MCP
>>> arch: Zen 3 L2 cache: 4096 KiB
>>> flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a
>>> ssse3 svm bogomips: 60685
>>> Speed: 3497 MHz min/max: 1600/1900 MHz Core speeds (MHz): 1:
>>> 3474 2: 3464 3: 3473
>>> 4: 3471 5: 4362 6: 4332 7: 3478 8: 3455 9: 3459 10: 3452 11:
>>> 3462 12: 3468 13: 3468
>>> 14: 3468 15: 3467 16: 3472
>>> Graphics: Device-1: AMD vendor: Hewlett-Packard driver: amdgpu v:
>>> kernel bus ID: 05:00.0
>>> chip ID: 1002:1638
>>> Display: x11 server: X.Org 1.20.13 driver: amdgpu,ati
>>> unloaded: fbdev,modesetting,vesa
>>> resolution: 1920x1080~60Hz
>>> OpenGL: renderer: AMD RENOIR (DRM 3.44.0 5.16.10-051610-
>>> generic LLVM 12.0.0)
>>> v: 4.6 Mesa 21.2.6 direct render: Yes
>>> ----------------------------------------------------------------------
>>>
>>>
>>> Best regards,
>>>
>>> Reinhold Mannsberger
>>>
>>>
>>>
Hi,
Two fixes for x86 arch.
## Changelog
v4:
- Address comment from Greg, sha1 commit Fixes only needs to be 12 chars.
- Add the author of the fixed commit to the CC list.
v3:
- Fold in changes from Alviro, the previous version is still
leaking @bank[n].
v2:
- Fix wrong copy/paste.
## Short Summary
Patch 1, fixes the wrong asm constraint in delay_loop function.
Fortunately, the constraint violation that's fixed by patch 1 doesn't
yield any bug due to the nature of System V ABI. Should we backport
this?
Patch 2, fixes memory leak in mce/amd code.
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Tony Luck <tony.luck(a)intel.com>
Signed-off-by: Alviro Iskandar Setiawan <alviro.iskandar(a)gnuweeb.org>
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
Ammar Faizi (2):
x86/delay: Fix the wrong asm constraint in `delay_loop()`
x86/mce/amd: Fix memory leak when `threshold_create_bank()` fails
arch/x86/kernel/cpu/mce/amd.c | 16 ++++++++++------
arch/x86/lib/delay.c | 4 ++--
2 files changed, 12 insertions(+), 8 deletions(-)
base-commit: 7e57714cd0ad2d5bb90e50b5096a0e671dec1ef3
--
2.32.0
ld.lld does not support the NOCROSSREFS directive at the moment, which
breaks the build after commit b9baf5c8c5c3 ("ARM: Spectre-BHB
workaround"):
ld.lld: error: ./arch/arm/kernel/vmlinux.lds:34: AT expected, but got NOCROSSREFS
Support for this directive will eventually be implemented, at which
point a version check can be added. To avoid breaking the build in the
meantime, just define NOCROSSREFS to nothing when using ld.lld, with a
link to the issue for tracking.
Cc: stable(a)vger.kernel.org
Fixes: b9baf5c8c5c3 ("ARM: Spectre-BHB workaround")
Link: https://github.com/ClangBuiltLinux/linux/issues/1609
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
Since b9baf5c8c5c3 has been backported to stable, I have marked this for
stable as well, using a Fixes tag to notate that this should go back to
all releases that have b9baf5c8c5c3, not to indicate any blame of
b9baf5c8c5c3, as this is clearly an ld.lld deficiency.
It would be nice if this could be applied directly to unblock our CI if
there are no objections.
arch/arm/include/asm/vmlinux.lds.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/arch/arm/include/asm/vmlinux.lds.h b/arch/arm/include/asm/vmlinux.lds.h
index 0ef21bfae9f6..fad45c884e98 100644
--- a/arch/arm/include/asm/vmlinux.lds.h
+++ b/arch/arm/include/asm/vmlinux.lds.h
@@ -26,6 +26,14 @@
#define ARM_MMU_DISCARD(x) x
#endif
+/*
+ * ld.lld does not support NOCROSSREFS:
+ * https://github.com/ClangBuiltLinux/linux/issues/1609
+ */
+#ifdef CONFIG_LD_IS_LLD
+#define NOCROSSREFS
+#endif
+
/* Set start/end symbol names to the LMA for the section */
#define ARM_LMA(sym, section) \
sym##_start = LOADADDR(section); \
base-commit: e7e19defa57580d679bf0d03f8a34933008a7930
--
2.35.1
When building arm64 defconfig + CONFIG_LTO_CLANG_{FULL,THIN}=y after
commit 558c303c9734 ("arm64: Mitigate spectre style branch history side
channels"), the following error occurs:
<instantiation>:4:2: error: invalid fixup for movz/movk instruction
mov w0, #ARM_SMCCC_ARCH_WORKAROUND_3
^
Marc figured out that moving "#include <linux/init.h>" in
include/linux/arm-smccc.h into a !__ASSEMBLY__ block resolves it. The
full include chain with CONFIG_LTO=y from include/linux/arm-smccc.h:
include/linux/init.h
include/linux/compiler.h
arch/arm64/include/asm/rwonce.h
arch/arm64/include/asm/alternative-macros.h
arch/arm64/include/asm/assembler.h
The asm/alternative-macros.h include in asm/rwonce.h only happens when
CONFIG_LTO is set, which ultimately casues asm/assembler.h to be
included before the definition of ARM_SMCCC_ARCH_WORKAROUND_3. As a
result, the preprocessor does not expand ARM_SMCCC_ARCH_WORKAROUND_3 in
__mitigate_spectre_bhb_fw, which results in the error above.
Avoid this problem by just avoiding the CONFIG_LTO=y __READ_ONCE() block
in asm/rwonce.h with assembly files, as nothing in that block is useful
to assembly files, which allows ARM_SMCCC_ARCH_WORKAROUND_3 to be
properly expanded with CONFIG_LTO=y builds.
Cc: stable(a)vger.kernel.org
Fixes: e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y")
Link: https://lore.kernel.org/r/20220309155716.3988480-1-maz@kernel.org/
Reported-by: Marc Zyngier <maz(a)kernel.org>
Acked-by: James Morse <james.morse(a)arm.com>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
This is based on current mainline; if it should be based on a specific
arm64 branch, please let me know.
As 558c303c9734 is going to stable, I marked this for stable as well to
avoid breaking Android. I used e35123d83ee3 for the fixes tag to make it
clear to the stable team this should only go where that commit is
present. If a different fixes tag should be used, please feel free to
substitute.
arch/arm64/include/asm/rwonce.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h
index 1bce62fa908a..56f7b1d4d54b 100644
--- a/arch/arm64/include/asm/rwonce.h
+++ b/arch/arm64/include/asm/rwonce.h
@@ -5,7 +5,7 @@
#ifndef __ASM_RWONCE_H
#define __ASM_RWONCE_H
-#ifdef CONFIG_LTO
+#if defined(CONFIG_LTO) && !defined(__ASSEMBLY__)
#include <linux/compiler_types.h>
#include <asm/alternative-macros.h>
@@ -66,7 +66,7 @@
})
#endif /* !BUILD_VDSO */
-#endif /* CONFIG_LTO */
+#endif /* CONFIG_LTO && !__ASSEMBLY__ */
#include <asm-generic/rwonce.h>
base-commit: 330f4c53d3c2d8b11d86ec03a964b86dc81452f5
--
2.35.1