Disable the TDP MMU by default in v5.15 kernels to "fix" several severe
performance bugs that have since been found and fixed in the TDP MMU, but
are unsuitable for backporting to v5.15.
The problematic bugs are fixed by upstream commit edbdb43fc96b ("KVM:
x86: Preserve TDP MMU roots until they are explicitly invalidated") and
commit 01b31714bd90 ("KVM: x86: Do not unload MMU roots when only toggling
CR0.WP with TDP enabled"). Both commits fix scenarios where KVM will
rebuild all TDP MMU page tables in paths that are frequently hit by
certain guest workloads. While not exactly common, the guest workloads
are far from rare. The fallout of rebuilding TDP MMU page tables can be
so severe in some cases that it induces soft lockups in the guest.
Commit edbdb43fc96b would require _significant_ effort and churn to
backport due it depending on a major rework that was done in v5.18.
Commit 01b31714bd90 has far fewer direct conflicts, but has several subtle
_known_ dependencies, and it's unclear whether or not there are more
unknown dependencies that have been missed.
Lastly, disabling the TDP MMU in v5.15 kernels also fixes a lurking train
wreck started by upstream commit a955cad84cda ("KVM: x86/mmu: Retry page
fault if root is invalidated by memslot update"). That commit was tagged
for stable to fix a memory leak, but didn't cherry-pick cleanly and was
never backported to v5.15. Which is extremely fortunate, as it introduced
not one but two bugs, one of which was fixed by upstream commit
18c841e1f411 ("KVM: x86: Retry page fault if MMU reload is pending and
root has no sp"), while the other was unknowingly fixed by upstream
commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
kvm_faultin_pfn()") in v6.3 (a one-off fix will be made for v6.1 kernels,
which did receive a backport for a955cad84cda). Disabling the TDP MMU
by default reduces the probability of breaking v5.15 kernels by
backporting only a subset of the fixes.
As far as what is lost by disabling the TDP MMU, the main selling point of
the TDP MMU is its ability to service page fault VM-Exits in parallel,
i.e. the main benefactors of the TDP MMU are deployments of large VMs
(hundreds of vCPUs), and in particular delployments that live-migrate such
VMs and thus need to fault-in huge amounts of memory on many vCPUs after
restarting the VM after migration.
Smaller VMs can see performance improvements, but nowhere enough to make
up for the TDP MMU (in v5.15) absolutely cratering performance for some
workloads. And practically speaking, anyone that is deploying and
migrating VMs with hundreds of vCPUs is likely rolling their own kernel,
not using a stock v5.15 series kernel.
This reverts commit 71ba3f3189c78f756a659568fb473600fd78f207.
Link: https://lore.kernel.org/all/ZDmEGM+CgYpvDLh6@google.com
Link: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameserver…
Cc: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Cc: Mathias Krause <minipli(a)grsecurity.net>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6c2bb60ccd88..7a64fb238044 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -10,7 +10,7 @@
#include <asm/cmpxchg.h>
#include <trace/events/kvm.h>
-static bool __read_mostly tdp_mmu_enabled = true;
+static bool __read_mostly tdp_mmu_enabled = false;
module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
/* Initializes the TDP MMU for the VM, if enabled. */
base-commit: f6f7927ac664ba23447f8dd3c3dfe2f4ee39272f
--
2.42.0.rc2.253.gd59a3bf2b4-goog
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5310760af1d4fbea1452bfc77db5f9a680f7ae47
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082114-remix-cable-0852@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5310760af1d4 ("ipvs: fix racy memcpy in proc_do_sync_threshold")
1b90af292e71 ("ipvs: Improve robustness to the ipvs sysctl")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5310760af1d4fbea1452bfc77db5f9a680f7ae47 Mon Sep 17 00:00:00 2001
From: Sishuai Gong <sishuai.system(a)gmail.com>
Date: Thu, 10 Aug 2023 15:12:42 -0400
Subject: [PATCH] ipvs: fix racy memcpy in proc_do_sync_threshold
When two threads run proc_do_sync_threshold() in parallel,
data races could happen between the two memcpy():
Thread-1 Thread-2
memcpy(val, valp, sizeof(val));
memcpy(valp, val, sizeof(val));
This race might mess up the (struct ctl_table *) table->data,
so we add a mutex lock to serialize them.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.c…
Signed-off-by: Sishuai Gong <sishuai.system(a)gmail.com>
Acked-by: Simon Horman <horms(a)kernel.org>
Acked-by: Julian Anastasov <ja(a)ssi.bg>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 62606fb44d02..4bb0d90eca1c 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1876,6 +1876,7 @@ static int
proc_do_sync_threshold(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ struct netns_ipvs *ipvs = table->extra2;
int *valp = table->data;
int val[2];
int rc;
@@ -1885,6 +1886,7 @@ proc_do_sync_threshold(struct ctl_table *table, int write,
.mode = table->mode,
};
+ mutex_lock(&ipvs->sync_mutex);
memcpy(val, valp, sizeof(val));
rc = proc_dointvec(&tmp, write, buffer, lenp, ppos);
if (write) {
@@ -1894,6 +1896,7 @@ proc_do_sync_threshold(struct ctl_table *table, int write,
else
memcpy(valp, val, sizeof(val));
}
+ mutex_unlock(&ipvs->sync_mutex);
return rc;
}
@@ -4321,6 +4324,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
ipvs->sysctl_sync_threshold[0] = DEFAULT_SYNC_THRESHOLD;
ipvs->sysctl_sync_threshold[1] = DEFAULT_SYNC_PERIOD;
tbl[idx].data = &ipvs->sysctl_sync_threshold;
+ tbl[idx].extra2 = ipvs;
tbl[idx++].maxlen = sizeof(ipvs->sysctl_sync_threshold);
ipvs->sysctl_sync_refresh_period = DEFAULT_SYNC_REFRESH_PERIOD;
tbl[idx++].data = &ipvs->sysctl_sync_refresh_period;
This is a port of commit 379eb01c21795edb4c ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Rename get_fpu_regs() to save_fpu_regs().
arch/loongarch/include/asm/fpu.h | 22 ++++++++++++++++++----
arch/loongarch/kernel/ptrace.c | 4 ++++
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index b541f6248837..08a45e9fd15c 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -173,16 +173,30 @@ static inline void restore_fp(struct task_struct *tsk)
_restore_fp(&tsk->thread.fpu);
}
-static inline union fpureg *get_fpu_regs(struct task_struct *tsk)
+static inline void save_fpu_regs(struct task_struct *tsk)
{
+ unsigned int euen;
+
if (tsk == current) {
preempt_disable();
- if (is_fpu_owner())
+
+ euen = csr_read32(LOONGARCH_CSR_EUEN);
+
+#ifdef CONFIG_CPU_HAS_LASX
+ if (euen & CSR_EUEN_LASXEN)
+ _save_lasx(¤t->thread.fpu);
+ else
+#endif
+#ifdef CONFIG_CPU_HAS_LSX
+ if (euen & CSR_EUEN_LSXEN)
+ _save_lsx(¤t->thread.fpu);
+ else
+#endif
+ if (euen & CSR_EUEN_FPEN)
_save_fp(¤t->thread.fpu);
+
preempt_enable();
}
-
- return tsk->thread.fpu.fpr;
}
static inline int is_simd_owner(void)
diff --git a/arch/loongarch/kernel/ptrace.c b/arch/loongarch/kernel/ptrace.c
index a0767c3a0f0a..9a75dc43eb29 100644
--- a/arch/loongarch/kernel/ptrace.c
+++ b/arch/loongarch/kernel/ptrace.c
@@ -147,6 +147,8 @@ static int fpr_get(struct task_struct *target,
{
int r;
+ save_fpu_regs(target);
+
if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
r = gfpr_get(target, &to);
else
@@ -278,6 +280,8 @@ static int simd_get(struct task_struct *target,
{
const unsigned int wr_size = NUM_FPU_REGS * regset->size;
+ save_fpu_regs(target);
+
if (!tsk_used_math(target)) {
/* The task hasn't used FP or LSX, fill with 0xff */
copy_pad_fprs(target, regset, &to, 0);
--
2.39.3
I'm announcing the release of the 6.1.48 kernel.
All users of the 6.1 kernel series must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/admin-guide/hw-vuln/srso.rst | 4
Makefile | 2
arch/x86/include/asm/entry-common.h | 1
arch/x86/include/asm/nospec-branch.h | 28 +++---
arch/x86/kernel/cpu/amd.c | 1
arch/x86/kernel/cpu/bugs.c | 28 ++++--
arch/x86/kernel/static_call.c | 13 ++
arch/x86/kernel/traps.c | 2
arch/x86/kernel/vmlinux.lds.S | 20 ++--
arch/x86/kvm/svm/svm.c | 2
arch/x86/lib/retpoline.S | 135 ++++++++++++++++++++---------
tools/objtool/arch/x86/decode.c | 2
tools/objtool/check.c | 21 ++--
13 files changed, 178 insertions(+), 81 deletions(-)
Borislav Petkov (AMD) (4):
x86/srso: Explain the untraining sequences a bit more
x86/CPU/AMD: Fix the DIV(0) initial fix attempt
x86/srso: Disable the mitigation on unaffected configurations
x86/srso: Correct the mitigation status when SMT is disabled
Greg Kroah-Hartman (1):
Linux 6.1.48
Peter Zijlstra (9):
x86/cpu: Fix __x86_return_thunk symbol type
x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
x86/alternative: Make custom return thunk unconditional
x86/cpu: Clean up SRSO return thunk mess
x86/cpu: Rename original retbleed methods
x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
x86/cpu: Cleanup the untrain mess
x86/static_call: Fix __static_call_fixup()
objtool/x86: Fixup frame-pointer vs rethunk
Petr Pavlu (1):
x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
Sean Christopherson (1):
x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
This is the start of the stable review cycle for the 6.1.48 release.
There are 15 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 26 Aug 2023 14:14:28 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.48-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.48-rc1
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/srso: Correct the mitigation status when SMT is disabled
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fixup frame-pointer vs rethunk
Petr Pavlu <petr.pavlu(a)suse.com>
x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/srso: Disable the mitigation on unaffected configurations
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/CPU/AMD: Fix the DIV(0) initial fix attempt
Sean Christopherson <seanjc(a)google.com>
x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
Peter Zijlstra <peterz(a)infradead.org>
x86/static_call: Fix __static_call_fixup()
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/srso: Explain the untraining sequences a bit more
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Cleanup the untrain mess
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Rename original retbleed methods
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Clean up SRSO return thunk mess
Peter Zijlstra <peterz(a)infradead.org>
x86/alternative: Make custom return thunk unconditional
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Fix __x86_return_thunk symbol type
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/srso.rst | 4 +-
Makefile | 4 +-
arch/x86/include/asm/entry-common.h | 1 +
arch/x86/include/asm/nospec-branch.h | 28 +++---
arch/x86/kernel/cpu/amd.c | 1 +
arch/x86/kernel/cpu/bugs.c | 28 +++++-
arch/x86/kernel/static_call.c | 13 +++
arch/x86/kernel/traps.c | 2 -
arch/x86/kernel/vmlinux.lds.S | 20 ++--
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/lib/retpoline.S | 141 ++++++++++++++++++++---------
tools/objtool/arch/x86/decode.c | 2 +-
tools/objtool/check.c | 21 +++--
13 files changed, 182 insertions(+), 85 deletions(-)
From: Helge Deller <deller(a)gmx.de>
Older PA-RISC machines have LEDs which show the disk- and LAN-activity.
The computation is done in software and takes quite some time, e.g. on a
J6500 this may take up to 60% time of one CPU if the machine is loaded
via network traffic.
Since most people don't care about the LEDs, start with LEDs disabled and
just show a CPU heartbeat LED. The disk and LAN LEDs can be turned on
manually via /proc/pdc/led.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: <stable(a)vger.kernel.org>
---
drivers/parisc/led.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
index 8bdc5e043831..765f19608f60 100644
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -56,8 +56,8 @@
static int led_type __read_mostly = -1;
static unsigned char lastleds; /* LED state from most recent update */
static unsigned int led_heartbeat __read_mostly = 1;
-static unsigned int led_diskio __read_mostly = 1;
-static unsigned int led_lanrxtx __read_mostly = 1;
+static unsigned int led_diskio __read_mostly;
+static unsigned int led_lanrxtx __read_mostly;
static char lcd_text[32] __read_mostly;
static char lcd_text_default[32] __read_mostly;
static int lcd_no_led_support __read_mostly = 0; /* KittyHawk doesn't support LED on its LCD */
@@ -589,6 +589,9 @@ int __init register_led_driver(int model, unsigned long cmd_reg, unsigned long d
return 1;
}
+ pr_info("LED: Enable disk and LAN activity LEDs "
+ "via /proc/pdc/led\n");
+
/* mark the LCD/LED driver now as initialized and
* register to the reboot notifier chain */
initialized++;
--
2.41.0
As of now, bpf counters (bperf) don't support event groups. But the
default perf stat includes topdown metrics if supported (on recent Intel
machines) which require groups. That makes perf stat exiting.
$ sudo perf stat --bpf-counter true
bpf managed perf events do not yet support groups.
Actually the test explicitly uses cycles event only, but it missed to
pass the option when it checks the availability of the command.
Fixes: 2c0cb9f56020d ("perf test: Add a shell test for 'perf stat --bpf-counters' new option")
Cc: stable(a)vger.kernel.org
Cc: Song Liu <song(a)kernel.org>
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
---
tools/perf/tests/shell/stat_bpf_counters.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
index 513cd1e58e0e..a87bb2814b4c 100755
--- a/tools/perf/tests/shell/stat_bpf_counters.sh
+++ b/tools/perf/tests/shell/stat_bpf_counters.sh
@@ -22,10 +22,10 @@ compare_number()
}
# skip if --bpf-counters is not supported
-if ! perf stat --bpf-counters true > /dev/null 2>&1; then
+if ! perf stat -e cycles --bpf-counters true > /dev/null 2>&1; then
if [ "$1" = "-v" ]; then
echo "Skipping: --bpf-counters not supported"
- perf --no-pager stat --bpf-counters true || true
+ perf --no-pager stat -e cycles --bpf-counters true || true
fi
exit 2
fi
--
2.42.0.rc1.204.g551eb34607-goog
These two are backports for 5.15.y. Conflict resolution in done in
both patches.
I have tested LTP-nfs fchown02 and chown02 on 5.15.y with below patches
applied. The tests passed.
I would like to have a review as I am not familiar with this code.
Thanks to Vegard for helping me with this.
Regards,
Harshit
Christian Brauner (2):
nfs: use vfs setgid helper
nfsd: use vfs setgid helper
fs/attr.c | 1 +
fs/internal.h | 2 --
fs/nfs/inode.c | 4 +---
fs/nfsd/vfs.c | 4 +++-
include/linux/fs.h | 2 ++
5 files changed, 7 insertions(+), 6 deletions(-)
--
2.34.1
The PERF_RECORD_ATTR is used for a pipe mode to describe an event with
attribute and IDs. The ID table comes after the attr and it calculate
size of the table using the total record size and the attr size.
n_ids = (total_record_size - end_of_the_attr_field) / sizeof(u64)
This is fine for most use cases, but sometimes it saves the pipe output
in a file and then process it later. And it becomes a problem if there
is a change in attr size between the record and report.
$ perf record -o- > perf-pipe.data # old version
$ perf report -i- < perf-pipe.data # new version
For example, if the attr size is 128 and it has 4 IDs, then it would
save them in 168 byte like below:
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
128 byte: perf event attr { .size = 128, ... },
32 byte: event IDs [] = { 1234, 1235, 1236, 1237 },
But when report later, it thinks the attr size is 136 then it only read
the last 3 entries as ID.
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
136 byte: perf event attr { .size = 136, ... },
24 byte: event IDs [] = { 1235, 1236, 1237 }, // 1234 is missing
So it should use the recorded version of the attr. The attr has the
size field already then it should honor the size when reading data.
Fixes: 2c46dbb517a10 ("perf: Convert perf header attrs into attr events")
Cc: stable(a)vger.kernel.org
Cc: Tom Zanussi <zanussi(a)kernel.org>
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
---
Keep this version before the libperf change so that it can go through
the stable versions.
tools/perf/util/header.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 52fbf526fe74..f89321cbfdee 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -4381,7 +4381,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct evlist **pevlist)
{
- u32 i, ids, n_ids;
+ u32 i, n_ids;
+ u64 *ids;
struct evsel *evsel;
struct evlist *evlist = *pevlist;
@@ -4397,9 +4398,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
evlist__add(evlist, evsel);
- ids = event->header.size;
- ids -= (void *)&event->attr.id - (void *)event;
- n_ids = ids / sizeof(u64);
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids = n_ids / sizeof(u64);
/*
* We don't have the cpu and thread maps on the header, so
* for allocating the perf_sample_id table we fake 1 cpu and
@@ -4408,8 +4408,9 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
if (perf_evsel__alloc_id(&evsel->core, 1, n_ids))
return -ENOMEM;
+ ids = (void *)&event->attr.attr + event->attr.attr.size;
for (i = 0; i < n_ids; i++) {
- perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, event->attr.id[i]);
+ perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, ids[i]);
}
return 0;
--
2.42.0.rc1.204.g551eb34607-goog
This patch fixes an issues when concurrent fcntl() syscalls are
executing on two different gfs2 filesystems. Each gfs2 filesystem
creates an DLM lockspace, it seems that VFS only allows fcntl() syscalls
at one time on a per filesystem basis. However if there are two
filesystems and we executing fcntl() syscalls our lookup mechanism on the
global plock op list does not work anymore.
It can be reproduced with two mounted gfs2 filesystems using DLM
locking. Then call stress-ng --fcntl 32 on each mount point. The kernel
log will show several:
WARNING: CPU: 4 PID: 943 at fs/dlm/plock.c:574 dev_write+0x15c/0x590
because we have a sanity check if it's was really the meant original
plock op when dev_write() does a lookup. This patch adds just a
additional check for fsid to find the right plock op which is an
indicator that the recv_list should be on a per lockspace basis and not
globally defined. After this patch the sanity check never warned again
that the wrong plock op was being looked up.
Cc: stable(a)vger.kernel.org
Reported-by: Barry Marson <bmarson(a)redhat.com>
Fixes: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace")
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
fs/dlm/plock.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 00e1d802a81c..e6b4c1a21446 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -556,7 +556,8 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
op = plock_lookup_waiter(&info);
} else {
list_for_each_entry(iter, &recv_list, list) {
- if (!iter->info.wait) {
+ if (!iter->info.wait &&
+ iter->info.fsid == info.fsid) {
op = iter;
break;
}
@@ -568,8 +569,7 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
if (info.wait)
WARN_ON(op->info.optype != DLM_PLOCK_OP_LOCK);
else
- WARN_ON(op->info.fsid != info.fsid ||
- op->info.number != info.number ||
+ WARN_ON(op->info.number != info.number ||
op->info.owner != info.owner ||
op->info.optype != info.optype);
--
2.31.1
From: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
Extract the internal code inside a helper function, fix the
initialization of the parameters used in the helper function
(`hidpp->answer_available` was not reset and `*response` wasn't either),
and use a `do {...} while();` loop.
Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Cc: stable(a)vger.kernel.org
Reviewed-by: Bastien Nocera <hadess(a)hadess.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
as requested by https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_V…
This is a rewrite of that particular piece of code.
---
Changes in v2:
- added __must_hold() for KASAN
- Reworked the comment describing the functions and their return values
- Link to v1: https://lore.kernel.org/r/20230621-logitech-fixes-v1-1-32e70933c0b0@redhat.…
---
drivers/hid/hid-logitech-hidpp.c | 115 +++++++++++++++++++++++++--------------
1 file changed, 75 insertions(+), 40 deletions(-)
diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
index 129b01be488d..09ba2086c95c 100644
--- a/drivers/hid/hid-logitech-hidpp.c
+++ b/drivers/hid/hid-logitech-hidpp.c
@@ -275,21 +275,22 @@ static int __hidpp_send_report(struct hid_device *hdev,
}
/*
- * hidpp_send_message_sync() returns 0 in case of success, and something else
- * in case of a failure.
- * - If ' something else' is positive, that means that an error has been raised
- * by the protocol itself.
- * - If ' something else' is negative, that means that we had a classic error
- * (-ENOMEM, -EPIPE, etc...)
+ * Effectively send the message to the device, waiting for its answer.
+ *
+ * Must be called with hidpp->send_mutex locked
+ *
+ * Same return protocol than hidpp_send_message_sync():
+ * - success on 0
+ * - negative error means transport error
+ * - positive value means protocol error
*/
-static int hidpp_send_message_sync(struct hidpp_device *hidpp,
+static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
struct hidpp_report *message,
struct hidpp_report *response)
{
- int ret = -1;
- int max_retries = 3;
+ int ret;
- mutex_lock(&hidpp->send_mutex);
+ __must_hold(&hidpp->send_mutex);
hidpp->send_receive_buf = response;
hidpp->answer_available = false;
@@ -300,47 +301,74 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
*/
*response = *message;
- for (; max_retries != 0 && ret; max_retries--) {
- ret = __hidpp_send_report(hidpp->hid_dev, message);
+ ret = __hidpp_send_report(hidpp->hid_dev, message);
+ if (ret) {
+ dbg_hid("__hidpp_send_report returned err: %d\n", ret);
+ memset(response, 0, sizeof(struct hidpp_report));
+ return ret;
+ }
- if (ret) {
- dbg_hid("__hidpp_send_report returned err: %d\n", ret);
- memset(response, 0, sizeof(struct hidpp_report));
- break;
- }
+ if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
+ 5*HZ)) {
+ dbg_hid("%s:timeout waiting for response\n", __func__);
+ memset(response, 0, sizeof(struct hidpp_report));
+ return -ETIMEDOUT;
+ }
- if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
- 5*HZ)) {
- dbg_hid("%s:timeout waiting for response\n", __func__);
- memset(response, 0, sizeof(struct hidpp_report));
- ret = -ETIMEDOUT;
- break;
- }
+ if (response->report_id == REPORT_ID_HIDPP_SHORT &&
+ response->rap.sub_id == HIDPP_ERROR) {
+ ret = response->rap.params[1];
+ dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
+ return ret;
+ }
- if (response->report_id == REPORT_ID_HIDPP_SHORT &&
- response->rap.sub_id == HIDPP_ERROR) {
- ret = response->rap.params[1];
- dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
+ if ((response->report_id == REPORT_ID_HIDPP_LONG ||
+ response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
+ response->fap.feature_index == HIDPP20_ERROR) {
+ ret = response->fap.params[1];
+ dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+/*
+ * hidpp_send_message_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ *
+ * See __do_hidpp_send_message_sync() for a detailed explanation of the returned
+ * value.
+ */
+static int hidpp_send_message_sync(struct hidpp_device *hidpp,
+ struct hidpp_report *message,
+ struct hidpp_report *response)
+{
+ int ret;
+ int max_retries = 3;
+
+ mutex_lock(&hidpp->send_mutex);
+
+ do {
+ ret = __do_hidpp_send_message_sync(hidpp, message, response);
+ if (ret != HIDPP20_ERROR_BUSY)
break;
- }
- if ((response->report_id == REPORT_ID_HIDPP_LONG ||
- response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
- response->fap.feature_index == HIDPP20_ERROR) {
- ret = response->fap.params[1];
- if (ret != HIDPP20_ERROR_BUSY) {
- dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
- break;
- }
- dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
- }
- }
+ dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
+ } while (--max_retries);
mutex_unlock(&hidpp->send_mutex);
return ret;
}
+/*
+ * hidpp_send_fap_command_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ *
+ * See __do_hidpp_send_message_sync() for a detailed explanation of the returned
+ * value.
+ */
static int hidpp_send_fap_command_sync(struct hidpp_device *hidpp,
u8 feat_index, u8 funcindex_clientid, u8 *params, int param_count,
struct hidpp_report *response)
@@ -373,6 +401,13 @@ static int hidpp_send_fap_command_sync(struct hidpp_device *hidpp,
return ret;
}
+/*
+ * hidpp_send_rap_command_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ *
+ * See __do_hidpp_send_message_sync() for a detailed explanation of the returned
+ * value.
+ */
static int hidpp_send_rap_command_sync(struct hidpp_device *hidpp_dev,
u8 report_id, u8 sub_id, u8 reg_address, u8 *params, int param_count,
struct hidpp_report *response)
---
base-commit: 87854366176403438d01f368b09de3ec2234e0f5
change-id: 20230621-logitech-fixes-a4c0e66ea2ad
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
The goal is to support a bpf_redirect() from an ethernet device (ingress)
to a ppp device (egress).
The l2 header is added automatically by the ppp driver, thus the ethernet
header should be removed.
CC: stable(a)vger.kernel.org
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Tested-by: Siwar Zitouni <siwar.zitouni(a)6wind.com>
---
v2 -> v3:
- add a comment in the code
- rework the commit log
v1 -> v2:
- I forgot the 'Tested-by' tag in the v1 :/
include/linux/if_arp.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h
index 1ed52441972f..10a1e81434cb 100644
--- a/include/linux/if_arp.h
+++ b/include/linux/if_arp.h
@@ -53,6 +53,10 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev)
case ARPHRD_NONE:
case ARPHRD_RAWIP:
case ARPHRD_PIMREG:
+ /* PPP adds its l2 header automatically in ppp_start_xmit().
+ * This makes it look like an l3 device to __bpf_redirect() and tcf_mirred_init().
+ */
+ case ARPHRD_PPP:
return false;
default:
return true;
--
2.39.2
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register shift and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register shift is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource
for Q40 (the IO address translation will then add the correct
ISA window base address and byte access offset), with register
shift 1. Use MMIO base address and register shift 2 as before
for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers
for Q40 except the data transfer register. Encode the MMIO
offset there (pata_falcon_data_xfer() directly uses raw IO
with no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
Changes from v4:
Geert Uytterhoeven:
- use %px for ap->ioaddr.data_addr
Changes from v3:
Sergey Shtylyov:
- change use of reg_scale to reg_shift
Geert Uytterhoeven:
- factor out ata_port_desc() from platform specific code
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..616064b02de6 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_shift = 0;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift);
+ ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift);
+ ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift);
+ ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift);
+ ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift);
+ ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift);
+ ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift);
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
+
+ ata_port_desc(ap, "cmd %px ctl %px data %px",
+ base, ctl_base, ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
Gadget ACM while unloading module try to dequeue not queued usb
request which causes the kernel to crash.
Patch adds extra condition to check whether usb request is processed
by CDNSP driver.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
drivers/usb/cdns3/cdnsp-gadget.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index fff9ec9c391f..3a30c2af0c00 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -1125,6 +1125,9 @@ static int cdnsp_gadget_ep_dequeue(struct usb_ep *ep,
unsigned long flags;
int ret;
+ if (request->status != -EINPROGRESS)
+ return 0;
+
if (!pep->endpoint.desc) {
dev_err(pdev->dev,
"%s: can't dequeue to disabled endpoint\n",
--
2.37.2
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register shift and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register shift is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource
for Q40 (the IO address translation will then add the correct
ISA window base address and byte access offset), with register
shift 1. Use MMIO base address and register shift 2 as before
for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers
for Q40 except the data transfer register. Encode the MMIO
offset there (pata_falcon_data_xfer() directly uses raw IO
with no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
---
Changes from v3:
Sergey Shtylyov:
- change use of reg_scale to reg_shift
Geert Uytterhoeven:
- factor out ata_port_desc() from platform specific code
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..3841ea200bcb 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_shift = 0;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift);
+ ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift);
+ ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift);
+ ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift);
+ ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift);
+ ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift);
+ ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift);
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
+
+ ata_port_desc(ap, "cmd %px ctl %px data %pa",
+ base, ctl_base, &ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
The quilt patch titled
Subject: shmem: fix smaps BUG sleeping while atomic
has been removed from the -mm tree. Its filename was
shmem-fix-smaps-bug-sleeping-while-atomic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: shmem: fix smaps BUG sleeping while atomic
Date: Tue, 22 Aug 2023 22:14:47 -0700 (PDT)
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c~shmem-fix-smaps-bug-sleeping-while-atomic
+++ a/mm/shmem.c
@@ -806,14 +806,16 @@ unsigned long shmem_partial_swap_usage(s
XA_STATE(xas, &mapping->i_pages, start);
struct page *page;
unsigned long swapped = 0;
+ unsigned long max = end - 1;
rcu_read_lock();
- xas_for_each(&xas, page, end - 1) {
+ xas_for_each(&xas, page, max) {
if (xas_retry(&xas, page))
continue;
if (xa_is_value(page))
swapped++;
-
+ if (xas.xa_index == max)
+ break;
if (need_resched()) {
xas_pause(&xas);
cond_resched_rcu();
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-khugepaged-fix-collapse_pte_mapped_thp-versus-uffd.patch
The quilt patch titled
Subject: maple_tree: disable mas_wr_append() when other readers are possible
has been removed from the -mm tree. Its filename was
maple_tree-disable-mas_wr_append-when-other-readers-are-possible.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: disable mas_wr_append() when other readers are possible
Date: Fri, 18 Aug 2023 20:43:55 -0400
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/lib/maple_tree.c~maple_tree-disable-mas_wr_append-when-other-readers-are-possible
+++ a/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_e
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-clean-up-mas_wr_append.patch
The quilt patch titled
Subject: madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
has been removed from the -mm tree. Its filename was
madvise-madvise_free_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yin Fengwei <fengwei.yin(a)intel.com>
Subject: madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
Date: Tue, 8 Aug 2023 10:09:17 +0800
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/madvise.c~madvise-madvise_free_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check
+++ a/mm/madvise.c
@@ -680,7 +680,7 @@ static int madvise_free_pte_range(pmd_t
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (!folio_trylock(folio))
break;
_
Patches currently in -mm which might be from fengwei.yin(a)intel.com are
filemap-add-filemap_map_folio_range.patch
rmap-add-folio_add_file_rmap_range.patch
mm-convert-do_set_pte-to-set_pte_range.patch
filemap-batch-pte-mappings.patch
The quilt patch titled
Subject: madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
has been removed from the -mm tree. Its filename was
madvise-madvise_cold_or_pageout_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yin Fengwei <fengwei.yin(a)intel.com>
Subject: madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
Date: Tue, 8 Aug 2023 10:09:15 +0800
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel…
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/madvise.c~madvise-madvise_cold_or_pageout_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check
+++ a/mm/madvise.c
@@ -384,7 +384,7 @@ static int madvise_cold_or_pageout_pte_r
folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -458,7 +458,7 @@ regular_folio:
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
_
Patches currently in -mm which might be from fengwei.yin(a)intel.com are
filemap-add-filemap_map_folio_range.patch
rmap-add-folio_add_file_rmap_range.patch
mm-convert-do_set_pte-to-set_pte_range.patch
filemap-batch-pte-mappings.patch
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0,
the following error occurs:
Assembler messages:
Error: cannot find default versions of the ISA extension `zicsr'
Error: cannot find default versions of the ISA extension `zifencei'
The above error originated from this commit of binutils[0], which has been
resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2].
So fix this by change the GCC version in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0.
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1… [0]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ca2bbb88f999f4d3cc40e89bc1aba… [1]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e… [2]
Fixes: ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils")
Reported-by: Conor Dooley <conor.dooley(a)microchip.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng(a)iscas.ac.cn>
---
Then below are my test results after this fix:
gcc binutils
10.5.0 2.35 ok
10.5.0 2.36 ok
10.5.0 2.37 ok
10.5.0 2.38 ok
11.1.0 2.35 ok
11.1.0 2.36 ok
11.1.0 2.37 ok
11.1.0 2.38 ok
11.2.0 2.35 ok
11.2.0 2.36 ok
11.2.0 2.37 ok
11.2.0 2.38 ok
11.3.0 2.35 ok
11.3.0 2.36 ok
11.3.0 2.37 ok
11.3.0 2.38 ok
11.4.0 2.35 ok
11.4.0 2.36 ok
11.4.0 2.37 ok
11.4.0 2.38 ok
12.1.0 2.35 ok
12.1.0 2.36 ok
12.1.0 2.37 ok
12.1.0 2.38 ok
12.2.0 2.35 ok
12.2.0 2.36 ok
12.2.0 2.37 ok
12.2.0 2.38 ok
arch/riscv/Kconfig | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 10e7a7ad175a..bea7b73e895d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -580,15 +580,15 @@ config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
and Zifencei are supported in binutils from version 2.36 onwards.
To make life easier, and avoid forcing toolchains that default to a
newer ISA spec to version 2.2, relax the check to binutils >= 2.36.
- For clang < 17 or GCC < 11.1.0, for which this is not possible, this is
- dealt with in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
+ For clang < 17 or GCC < 11.3.0, for which this is not possible or need
+ special treatment, this is dealt with in TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
def_bool y
depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
# https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694…
- # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7…
- depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100)
+ # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e…
+ depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110300)
help
Certain versions of clang and GCC do not support zicsr and zifencei via
-march. This option causes an older ISA spec compatible with these older
--
2.34.1
On Thu, 24 Aug 2023 11:05:13 PDT (-0700), Conor Dooley wrote:
> On Fri, Aug 25, 2023 at 01:46:59AM +0800, Mingzheng Xing wrote:
>
>> Just a question, I see that the previous patch has been merged
>> into 6.5-rc7, and now a new fix patch should be sent out based
>> on that, right?
>
> yes
Ideally ASAP, it's very late in the cycle. I have something for
tomorrow morning, but this will need time to test...
The patch titled
Subject: memcontrol: ensure memcg acquired by id is properly set up
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
memcontrol-ensure-memcg-acquired-by-id-is-properly-set-up.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: memcontrol: ensure memcg acquired by id is properly set up
Date: Wed, 23 Aug 2023 15:54:30 -0700
In the eviction recency check, we attempt to retrieve the memcg to which
the folio belonged when it was evicted, by the memcg id stored in the
shadow entry. However, there is a chance that the retrieved memcg is not
the original memcg that has been killed, but a new one which happens to
have the same id.
This is a somewhat unfortunate, but acceptable and rare inaccuracy in the
heuristics. However, if we retrieve this new memcg between its allocation
and when it is properly attached to the memcg hierarchy, we could run into
the following NULL pointer exception during the memcg hierarchy traversal
done in mem_cgroup_get_nr_swap_pages():
[ 155757.793456] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 155757.807568] #PF: supervisor read access in kernel mode
[ 155757.818024] #PF: error_code(0x0000) - not-present page
[ 155757.828482] PGD 401f77067 P4D 401f77067 PUD 401f76067 PMD 0
[ 155757.839985] Oops: 0000 [#1] SMP
[ 155757.887870] RIP: 0010:mem_cgroup_get_nr_swap_pages+0x3d/0xb0
[ 155757.899377] Code: 29 19 4a 02 48 39 f9 74 63 48 8b 97 c0 00 00 00 48 8b b7 58 02 00 00 48 2b b7 c0 01 00 00 48 39 f0 48 0f 4d c6 48 39 d1 74 42 <48> 8b b2 c0 00 00 00 48 8b ba 58 02 00 00 48 2b ba c0 01 00 00 48
[ 155757.937125] RSP: 0018:ffffc9002ecdfbc8 EFLAGS: 00010286
[ 155757.947755] RAX: 00000000003a3b1c RBX: 000007ffffffffff RCX: ffff888280183000
[ 155757.962202] RDX: 0000000000000000 RSI: 0007ffffffffffff RDI: ffff888bbc2d1000
[ 155757.976648] RBP: 0000000000000001 R08: 000000000000000b R09: ffff888ad9cedba0
[ 155757.991094] R10: ffffea0039c07900 R11: 0000000000000010 R12: ffff888b23a7b000
[ 155758.005540] R13: 0000000000000000 R14: ffff888bbc2d1000 R15: 000007ffffc71354
[ 155758.019991] FS: 00007f6234c68640(0000) GS:ffff88903f9c0000(0000) knlGS:0000000000000000
[ 155758.036356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 155758.048023] CR2: 00000000000000c0 CR3: 0000000a83eb8004 CR4: 00000000007706e0
[ 155758.062473] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 155758.076924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 155758.091376] PKRU: 55555554
[ 155758.096957] Call Trace:
[ 155758.102016] <TASK>
[ 155758.106502] ? __die+0x78/0xc0
[ 155758.112793] ? page_fault_oops+0x286/0x380
[ 155758.121175] ? exc_page_fault+0x5d/0x110
[ 155758.129209] ? asm_exc_page_fault+0x22/0x30
[ 155758.137763] ? mem_cgroup_get_nr_swap_pages+0x3d/0xb0
[ 155758.148060] workingset_test_recent+0xda/0x1b0
[ 155758.157133] workingset_refault+0xca/0x1e0
[ 155758.165508] filemap_add_folio+0x4d/0x70
[ 155758.173538] page_cache_ra_unbounded+0xed/0x190
[ 155758.182919] page_cache_sync_ra+0xd6/0x1e0
[ 155758.191738] filemap_read+0x68d/0xdf0
[ 155758.199495] ? mlx5e_napi_poll+0x123/0x940
[ 155758.207981] ? __napi_schedule+0x55/0x90
[ 155758.216095] __x64_sys_pread64+0x1d6/0x2c0
[ 155758.224601] do_syscall_64+0x3d/0x80
[ 155758.232058] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 155758.242473] RIP: 0033:0x7f62c29153b5
[ 155758.249938] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 b4 e6 f7 ff 41 89 c0 4c 8b 55 e0 48 8b 55 e8 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 e7 e6 f7 ff 48 8b
[ 155758.288005] RSP: 002b:00007f6234c5ffd0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 155758.303474] RAX: ffffffffffffffda RBX: 00007f628c4e70c0 RCX: 00007f62c29153b5
[ 155758.318075] RDX: 000000000003c041 RSI: 00007f61d2986000 RDI: 0000000000000076
[ 155758.332678] RBP: 00007f6234c5fff0 R08: 0000000000000000 R09: 0000000064d5230c
[ 155758.347452] R10: 000000000027d450 R11: 0000000000000293 R12: 000000000003c041
[ 155758.362044] R13: 00007f61d2986000 R14: 00007f629e11b060 R15: 000000000027d450
[ 155758.376661] </TASK>
This patch fixes the issue by moving the memcg's id publication from the
alloc stage to online stage, ensuring that any memcg acquired via id must
be connected to the memcg tree.
Link: https://lkml.kernel.org/r/20230823225430.166925-1-nphamcs@gmail.com
Fixes: f78dfc7b77d5 ("workingset: fix confusion around eviction vs refault container")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Co-developed-by: Nhat Pham <nphamcs(a)gmail.com>
Signed-off-by: Nhat Pham <nphamcs(a)gmail.com>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c~memcontrol-ensure-memcg-acquired-by-id-is-properly-set-up
+++ a/mm/memcontrol.c
@@ -5339,7 +5339,6 @@ static struct mem_cgroup *mem_cgroup_all
INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue);
memcg->deferred_split_queue.split_queue_len = 0;
#endif
- idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
lru_gen_init_memcg(memcg);
return memcg;
fail:
@@ -5411,14 +5410,27 @@ static int mem_cgroup_css_online(struct
if (alloc_shrinker_info(memcg))
goto offline_kmem;
- /* Online state pins memcg ID, memcg ID pins CSS */
- refcount_set(&memcg->id.ref, 1);
- css_get(css);
-
if (unlikely(mem_cgroup_is_root(memcg)))
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
FLUSH_TIME);
lru_gen_online_memcg(memcg);
+
+ /* Online state pins memcg ID, memcg ID pins CSS */
+ refcount_set(&memcg->id.ref, 1);
+ css_get(css);
+
+ /*
+ * Ensure mem_cgroup_from_id() works once we're fully online.
+ *
+ * We could do this earlier and require callers to filter with
+ * css_tryget_online(). But right now there are no users that
+ * need earlier access, and the workingset code relies on the
+ * cgroup tree linkage (mem_cgroup_get_nr_swap_pages()). So
+ * publish it here at the end of onlining. This matches the
+ * regular ID destruction during offlining.
+ */
+ idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+
return 0;
offline_kmem:
memcg_offline_kmem(memcg);
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
memcontrol-ensure-memcg-acquired-by-id-is-properly-set-up.patch
mm-page_alloc-remove-stale-cma-guard-code.patch
On 8/25/23 01:33, Conor Dooley wrote:
> On Fri, Aug 25, 2023 at 01:30:36AM +0800, Mingzheng Xing wrote:
>> On 8/24/23 19:32, Mingzheng Xing wrote:
>>> On 8/23/23 21:31, Conor Dooley wrote:
>>>> On Wed, Aug 23, 2023 at 12:51:13PM +0100, Conor Dooley wrote:
>>>>> On Thu, Aug 17, 2023 at 03:20:24PM +0000,
>>>>> patchwork-bot+linux-riscv(a)kernel.org wrote:
>>>>>> Hello:
>>>>>>
>>>>>> This patch was applied to riscv/linux.git (fixes)
>>>>>> by Palmer Dabbelt <palmer(a)rivosinc.com>:
>>>>>>
>>>>>> On Thu, 10 Aug 2023 00:56:48 +0800 you wrote:
>>>>>>> Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default
>>>>>>> ISA spec to the newer
>>>>>>> 20191213 version which moves some instructions from the
>>>>>>> I extension to the
>>>>>>> Zicsr and Zifencei extensions. So if one of the binutils
>>>>>>> and GCC exceeds
>>>>>>> that version, we should explicitly specifying Zicsr and
>>>>>>> Zifencei via -march
>>>>>>> to cope with the new changes. but this only occurs when
>>>>>>> binutils >= 2.36
>>>>>>> and GCC >= 11.1.0. It's a different story when binutils < 2.36.
>>>>>>>
>>>>>>> [...]
>>>>>> Here is the summary with links:
>>>>>> - [v5] riscv: Handle zicsr/zifencei issue between gcc and binutils
>>>>>> https://git.kernel.org/riscv/c/ca09f772ccca
>>>>> *sigh* so this breaks the build for gcc-11 & binutils 2.37 w/
>>>>> Assembler messages:
>>>>> Error: cannot find default versions of the ISA extension `zicsr'
>>>>> Error: cannot find default versions of the ISA extension `zifencei'
>>>>>
>>>>> I'll have a poke later.
>>>> So uh, are we sure that this should not be:
>>>> - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) ||
>>>> (CC_IS_GCC && GCC_VERSION < 110100)
>>>> + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) ||
>>>> (CC_IS_GCC && GCC_VERSION <= 110100)
>>>>
>>>> My gcc-11.1 + binutils 2.37 toolchain built from riscv-gnu-toolchain
>>>> doesn't have the default versions & the above diff fixes the build.
>>> I reproduced the error, the combination of gcc-11.1 and
>>> binutils 2.37 does cause errors. What a surprise, since binutils
>>> 2.36 and 2.38 are fine.
>>>
>>> I used git bisect to locate this commit[1] for binutils.
>>> I'll test this diff in more detail later. Thanks!
>>>
>>> [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1…
>>>
>> Hi, Conor.
>> The above error does originate from link[1] mentioned above, which was
>> resolved in gcc-12.1.0[2], and gcc-11.3.0 made the backport[3].
>> So gcc-11.2.0 combined with binutils 2.37 produces the same error.
>> I think we should do the following diff to fix it:
>> - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC &&
>> GCC_VERSION < 110100)
>> + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC &&
>> GCC_VERSION < 110300)
> That sounds good to me. Can you make that a real patch please?
Okey.
Just a question, I see that the previous patch has been merged
into 6.5-rc7, and now a new fix patch should be sent out based
on that, right?
Best Regards,
Mingzheng.
>
> Thanks for working on it :)
On 8/23/23 21:31, Conor Dooley wrote:
> On Wed, Aug 23, 2023 at 12:51:13PM +0100, Conor Dooley wrote:
>> On Thu, Aug 17, 2023 at 03:20:24PM +0000, patchwork-bot+linux-riscv(a)kernel.org wrote:
>>> Hello:
>>>
>>> This patch was applied to riscv/linux.git (fixes)
>>> by Palmer Dabbelt <palmer(a)rivosinc.com>:
>>>
>>> On Thu, 10 Aug 2023 00:56:48 +0800 you wrote:
>>>> Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer
>>>> 20191213 version which moves some instructions from the I extension to the
>>>> Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds
>>>> that version, we should explicitly specifying Zicsr and Zifencei via -march
>>>> to cope with the new changes. but this only occurs when binutils >= 2.36
>>>> and GCC >= 11.1.0. It's a different story when binutils < 2.36.
>>>>
>>>> [...]
>>> Here is the summary with links:
>>> - [v5] riscv: Handle zicsr/zifencei issue between gcc and binutils
>>> https://git.kernel.org/riscv/c/ca09f772ccca
>> *sigh* so this breaks the build for gcc-11 & binutils 2.37 w/
>> Assembler messages:
>> Error: cannot find default versions of the ISA extension `zicsr'
>> Error: cannot find default versions of the ISA extension `zifencei'
>>
>> I'll have a poke later.
> So uh, are we sure that this should not be:
> - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100)
> + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION <= 110100)
>
> My gcc-11.1 + binutils 2.37 toolchain built from riscv-gnu-toolchain
> doesn't have the default versions & the above diff fixes the build.
I reproduced the error, the combination of gcc-11.1 and
binutils 2.37 does cause errors. What a surprise, since binutils
2.36 and 2.38 are fine.
I used git bisect to locate this commit[1] for binutils.
I'll test this diff in more detail later. Thanks!
[1]
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1…
Best Regards,
Mingzheng.
>
> Thanks,
> Conor.
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 1f69383b203e28cf8a4ca9570e572da1699f76cd
Gitweb: https://git.kernel.org/tip/1f69383b203e28cf8a4ca9570e572da1699f76cd
Author: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
AuthorDate: Fri, 18 Aug 2023 10:03:05 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 24 Aug 2023 11:01:45 +02:00
x86/fpu: Invalidate FPU state correctly on exec()
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang(a)intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Lijun Pan <lijun.pan(a)intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta(a)intel.com>
Acked-by: Lijun Pan <lijun.pan(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
---
arch/x86/kernel/fpu/context.h | 3 +--
arch/x86/kernel/fpu/core.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index af5cbdd..f6d856b 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,7 @@
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * Invalidate a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1015af1..98e507c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -679,7 +679,7 @@ static void fpu_reset_fpregs(void)
struct fpu *fpu = ¤t->thread.fpu;
fpregs_lock();
- fpu__drop(fpu);
+ __fpu_invalidate_fpregs_state(fpu);
/*
* This does not change the actual hardware registers. It just
* resets the memory image and sets TIF_NEED_FPU_LOAD so a
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index f5f422f9b8507..b6221b4432fd3 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -211,6 +211,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -280,6 +281,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
Hi there,
Reach your target audience and maximize your sales/revenue with our clean and reliable contact database. We have compiled a list of 45158 contacts who are attending Hy Fcell International Expo And Conference 2023.
If interested, I can share the pricing and guarantees.
Looking for your response.
Kind Regards,
Sarah Perez - B2B Marketing & Tradeshow Specialist
If you do not wish to receive these messages, reply back with remove and we will make sure you don't receive any more emails from our end.
iommu_device_register always returns 0 in 4.11-5.12, so
we need to remove redundant comparison with 0
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b16c0170b53c ("iommu/mediatek: Make use of iommu_device_register interface")
Signed-off-by: Alexandra Diupina <adiupina(a)astralinux.ru>
---
drivers/iommu/mtk_iommu.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 051815c9d2bb..208c47218b75 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -748,9 +748,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
iommu_device_set_ops(&data->iommu, &mtk_iommu_ops);
iommu_device_set_fwnode(&data->iommu, &pdev->dev.of_node->fwnode);
- ret = iommu_device_register(&data->iommu);
- if (ret)
- return ret;
+ iommu_device_register(&data->iommu);
spin_lock_init(&data->tlb_lock);
list_add_tail(&data->list, &m4ulist);
--
2.30.2
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 2c66ca3949dc701da7f4c9407f2140ae425683a5
Gitweb: https://git.kernel.org/tip/2c66ca3949dc701da7f4c9407f2140ae425683a5
Author: Feng Tang <feng.tang(a)intel.com>
AuthorDate: Wed, 23 Aug 2023 14:57:47 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 24 Aug 2023 11:01:45 +02:00
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init
identify_boot_cpu
identify_cpu
generic_identify
get_cpu_cap --> setup cpu capability
...
fpu__init_cpu
fpu__init_cpu_xstate
cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Signed-off-by: Feng Tang <feng.tang(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
---
arch/x86/kernel/fpu/xstate.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0bab497..1afbc48 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -882,6 +882,13 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ /*
+ * CPU capabilities initialization runs before FPU init. So
+ * X86_FEATURE_OSXSAVE is not set. Now that XSAVE is completely
+ * functional, set the feature bit so depending code works.
+ */
+ setup_force_cpu_cap(X86_FEATURE_OSXSAVE);
+
print_xstate_offset_size();
pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
fpu_kernel_cfg.max_features,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c164c7bc9775be7bcc68754bb3431fce5823822e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082019-brink-buddhist-4d1b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
c164c7bc9775 ("blk-cgroup: hold queue_lock when removing blkg->q_node")
a06377c5d01e ("Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"")
9a9c261e6b55 ("Revert "blk-cgroup: pass a gendisk to blkg_lookup"")
1231039db31c ("Revert "blk-cgroup: move the cgroup information to struct gendisk"")
dcb522014351 ("Revert "blk-cgroup: simplify blkg freeing from initialization failure paths"")
3f13ab7c80fd ("blk-cgroup: move the cgroup information to struct gendisk")
479664cee14d ("blk-cgroup: pass a gendisk to blkg_lookup")
ba91c849fa50 ("blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos")
3963d84df797 ("blk-rq-qos: constify rq_qos_ops")
ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful")
b494f9c566ba ("blk-rq-qos: move rq_qos_add and rq_qos_del out of line")
f05837ed73d0 ("blk-cgroup: store a gendisk to throttle in struct task_struct")
84d7d462b16d ("blk-cgroup: pin the gendisk in struct blkcg_gq")
180b04d450a7 ("blk-cgroup: remove the !bdi->dev check in blkg_dev_name")
27b642b07a4a ("blk-cgroup: simplify blkg freeing from initialization failure paths")
0b6f93bdf07e ("blk-cgroup: improve error unwinding in blkg_alloc")
f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
dfd6200a0954 ("blk-cgroup: support to track if policy is online")
c7241babf085 ("blk-cgroup: dropping parent refcount after pd_free_fn() is done")
e3ff8887e7db ("blk-cgroup: fix missing pd_online_fn() while activating policy")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c164c7bc9775be7bcc68754bb3431fce5823822e Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei(a)redhat.com>
Date: Thu, 17 Aug 2023 22:17:51 +0800
Subject: [PATCH] blk-cgroup: hold queue_lock when removing blkg->q_node
When blkg is removed from q->blkg_list from blkg_free_workfn(), queue_lock
has to be held, otherwise, all kinds of bugs(list corruption, hard lockup,
..) can be triggered from blkg_destroy_all().
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Cc: Yu Kuai <yukuai3(a)huawei.com>
Cc: xiaoli feng <xifeng(a)redhat.com>
Cc: Chunyu Hu <chuhu(a)redhat.com>
Cc: Mike Snitzer <snitzer(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Link: https://lore.kernel.org/r/20230817141751.1128970-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index fc49be622e05..9faafcd10e17 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -136,7 +136,9 @@ static void blkg_free_workfn(struct work_struct *work)
blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
if (blkg->parent)
blkg_put(blkg->parent);
+ spin_lock_irq(&q->queue_lock);
list_del_init(&blkg->q_node);
+ spin_unlock_irq(&q->queue_lock);
mutex_unlock(&q->blkcg_mutex);
blk_put_queue(q);
Re; Interest,
I am interested in discussing the Investment proposal as I explained
in my previous mail. May you let me know your interest and the
possibility of a cooperation aimed for mutual interest.
Looking forward to your mail for further discussion.
Regards
------
Chen Yun - Chairman of CREC
China Railway Engineering Corporation - CRECG
China Railway Plaza, No.69 Fuxing Road, Haidian District, Beijing, P.R.
China
fbcon requires that we implement &drm_framebuffer_funcs.dirty.
Otherwise, the framebuffer might take a while to flush (which would
manifest as noticeable lag). However, we can't enable this callback for
non-fbcon cases since it may cause too many atomic commits to be made at
once. So, implement amdgpu_dirtyfb() and only enable it for fbcon
framebuffers (we can use the "struct drm_file file" parameter in the
callback to check for this since it is only NULL when called by fbcon,
at least in the mainline kernel) on devices that support atomic KMS.
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.1+
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
--
v3: check if file is NULL instead of doing a strcmp() and make note of
it in the commit message
---
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26 ++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index d20dd3f852fc..363e6a2cad8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -38,6 +38,8 @@
#include <linux/pci.h>
#include <linux/pm_runtime.h>
#include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
+#include <drm/drm_drv.h>
#include <drm/drm_edid.h>
#include <drm/drm_fb_helper.h>
#include <drm/drm_gem_framebuffer_helper.h>
@@ -532,11 +534,29 @@ bool amdgpu_display_ddc_probe(struct amdgpu_connector *amdgpu_connector,
return true;
}
+static int amdgpu_dirtyfb(struct drm_framebuffer *fb, struct drm_file *file,
+ unsigned int flags, unsigned int color,
+ struct drm_clip_rect *clips, unsigned int num_clips)
+{
+
+ if (file)
+ return -ENOSYS;
+
+ return drm_atomic_helper_dirtyfb(fb, file, flags, color, clips,
+ num_clips);
+}
+
static const struct drm_framebuffer_funcs amdgpu_fb_funcs = {
.destroy = drm_gem_fb_destroy,
.create_handle = drm_gem_fb_create_handle,
};
+static const struct drm_framebuffer_funcs amdgpu_fb_funcs_atomic = {
+ .destroy = drm_gem_fb_destroy,
+ .create_handle = drm_gem_fb_create_handle,
+ .dirty = amdgpu_dirtyfb
+};
+
uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,
uint64_t bo_flags)
{
@@ -1139,7 +1159,11 @@ static int amdgpu_display_gem_fb_verify_and_init(struct drm_device *dev,
if (ret)
goto err;
- ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
+ if (drm_drv_uses_atomic_modeset(dev))
+ ret = drm_framebuffer_init(dev, &rfb->base,
+ &amdgpu_fb_funcs_atomic);
+ else
+ ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
if (ret)
goto err;
--
2.41.0
The patch titled
Subject: shmem: fix smaps BUG sleeping while atomic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
shmem-fix-smaps-bug-sleeping-while-atomic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: shmem: fix smaps BUG sleeping while atomic
Date: Tue, 22 Aug 2023 22:14:47 -0700 (PDT)
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c~shmem-fix-smaps-bug-sleeping-while-atomic
+++ a/mm/shmem.c
@@ -806,14 +806,16 @@ unsigned long shmem_partial_swap_usage(s
XA_STATE(xas, &mapping->i_pages, start);
struct page *page;
unsigned long swapped = 0;
+ unsigned long max = end - 1;
rcu_read_lock();
- xas_for_each(&xas, page, end - 1) {
+ xas_for_each(&xas, page, max) {
if (xas_retry(&xas, page))
continue;
if (xa_is_value(page))
swapped++;
-
+ if (xas.xa_index == max)
+ break;
if (need_resched()) {
xas_pause(&xas);
cond_resched_rcu();
_
Patches currently in -mm which might be from hughd(a)google.com are
shmem-fix-smaps-bug-sleeping-while-atomic.patch
Official Name:United States of America
Capitol:Washington
Population:318,814,000
Languages:English, Spanish, numerous others
Geographic Region:Americas Northern America
Geographic Size (km sq):9,526,468
Year of UN Membership:1945
Year of Present State Formation:1787
Current UN Representative:Mr.Dennis Francis
Greetings
This message is converting to you from united nation Headquarter from
New-York America to know what is exactly the reason of being
ungrateful to the received compensation fund, meanwhile you have to
explain to us how the fund was divided to each and every needful one
in your country because united nation compersated you with (€
2,500,000.00 Million EUR ) to use part of the money and help orphan
and widowers including the people covid19 affected in your country for
our proper documentary.
It had been officially known that out of the (150) lucky winners that
has received their compensation fund out there worldwide sum of (€
2,500,000.00 Million EUR ) per each of the lucky winner as it was
listed in our list files and individuals, that was offered by United
Nations compensation in last year 2022,(149) has all returned back
with appreciation letter to united nation office remainder
you.Woodforest National Bank reported to united nation that they has
paid all the lucky winners,after we checked our file we saw that
(149)has come and thanked united nation and explained how they used
there money remaining you to complete the total number(150).we need
your urgent response for our proper documentry.
You are adviced to explain in details how the fund was divided to the
needful as the purpose on your reply mail.
Thank you in advance
Mr.Dennis Francis
PRESIDENT OF THE UNITED NATIONS GENERAL ASSEMBLY
From: Sanjay R Mehta <sanju.mehta(a)amd.com>
Previously, on unplug events, the TMU mode was disabled first
followed by the Time Synchronization Handshake, irrespective of
whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3)
devices, as the TSPacketInterval bits were always enabled by default,
leading the host router to assume that the device router's TMU was
already enabled and preventing it from initiating the Time
Synchronization Handshake. As a result, TBT3 monitors experienced
display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the
Time Synchronization Handshake during TMU disable if the
tb_switch_tmu_rate_write() function is successful. This ensures that
the TBT3 devices function correctly and eliminates the display
flickering issue.
Co-developed-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
(cherry picked from commit 583893a66d731f5da010a3fa38a0460e05f0149b)
USB4v2 introduced support for uni-directional TMU mode as part of
d49b4f043d63 ("thunderbolt: Add support for enhanced uni-directional TMU mode")
This is not a stable candidate commit, so adjust the code for backport.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/thunderbolt/tmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/thunderbolt/tmu.c b/drivers/thunderbolt/tmu.c
index 626aca3124b1..d9544600b386 100644
--- a/drivers/thunderbolt/tmu.c
+++ b/drivers/thunderbolt/tmu.c
@@ -415,7 +415,8 @@ int tb_switch_tmu_disable(struct tb_switch *sw)
* uni-directional mode and we don't want to change it's TMU
* mode.
*/
- tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ ret = tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ return ret;
tb_port_tmu_time_sync_disable(up);
ret = tb_port_tmu_time_sync_disable(down);
--
2.34.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 95344ae49e532..e1beddcc8c84a 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -202,6 +202,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -271,6 +272,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
The IOMMU list has moved and emails to the old list bounce. Bring stable
in alignment with mainline.
Joerg Roedel (1):
MAINTAINERS: Remove iommu(a)lists.linux-foundation.org
Xiang Chen (1):
MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
MAINTAINERS | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
--
2.25.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit e7dd44f4f3166db45248414f5df8f615392de47a ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Michael Turquette <mturquette(a)baylibre.com>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: linux-clk(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/clk/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 016814e15536a..52dfbae4f361c 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -444,6 +444,7 @@ config COMMON_CLK_BD718XX
config COMMON_CLK_FIXED_MMIO
bool "Clock driver for Memory Mapped Fixed values"
depends on COMMON_CLK && OF
+ depends on HAS_IOMEM
help
Support for Memory Mapped IO Fixed clocks
--
2.40.1
From: Peter Wang <peter.wang(a)mediatek.com>
If clock scale up and suspend clock scaling, ufs will keep high
performance/power mode but no read/write requests on going.
It is logic wrong and have power concern.
Fixes: 401f1e4490ee ("scsi: ufs: don't suspend clock scaling during clock gating")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 129446775796..e3672e55efae 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -1458,7 +1458,7 @@ static int ufshcd_devfreq_target(struct device *dev,
ktime_to_us(ktime_sub(ktime_get(), start)), ret);
out:
- if (sched_clk_scaling_suspend_work)
+ if (sched_clk_scaling_suspend_work && !scale_up)
queue_work(hba->clk_scaling.workq,
&hba->clk_scaling.suspend_work);
--
2.18.0
From: Gabe Teeger <gabe.teeger(a)amd.com>
[Why]
We wait for mpc idle while in a locked state, leading to potential
deadlock.
[What]
Move the wait_for_idle call to outside of HW lock. This and a
call to wait_drr_doublebuffer_pending_clear are moved added to a new
static helper function called wait_for_outstanding_hw_updates, to make
the interface clearer.
Cc: stable(a)vger.kernel.org
Fixes: 8f0d304d21b3 ("drm/amd/display: Do not commit pipe when updating DRR")
Reviewed-by: Jun Lei <jun.lei(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/Makefile | 1 +
drivers/gpu/drm/amd/display/dc/core/dc.c | 58 +++++++++++++------
.../drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 ----
3 files changed, 42 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/Makefile b/drivers/gpu/drm/amd/display/dc/Makefile
index 69ffd4424dc7..1b8c2aef4633 100644
--- a/drivers/gpu/drm/amd/display/dc/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/Makefile
@@ -78,3 +78,4 @@ DC_EDID += dc_edid_parser.o
AMD_DISPLAY_DMUB = $(addprefix $(AMDDALPATH)/dc/,$(DC_DMUB))
AMD_DISPLAY_EDID = $(addprefix $(AMDDALPATH)/dc/,$(DC_EDID))
AMD_DISPLAY_FILES += $(AMD_DISPLAY_DMUB) $(AMD_DISPLAY_EDID)
+
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index c6f6dc972c2a..5aab67868cb6 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -3501,6 +3501,45 @@ static void commit_planes_for_stream_fast(struct dc *dc,
top_pipe_to_program->stream->update_flags.raw = 0;
}
+static void wait_for_outstanding_hw_updates(struct dc *dc, const struct dc_state *dc_context)
+{
+/*
+ * This function calls HWSS to wait for any potentially double buffered
+ * operations to complete. It should be invoked as a pre-amble prior
+ * to full update programming before asserting any HW locks.
+ */
+ int pipe_idx;
+ int opp_inst;
+ int opp_count = dc->res_pool->pipe_count;
+ struct hubp *hubp;
+ int mpcc_inst;
+ const struct pipe_ctx *pipe_ctx;
+
+ for (pipe_idx = 0; pipe_idx < dc->res_pool->pipe_count; pipe_idx++) {
+ pipe_ctx = &dc_context->res_ctx.pipe_ctx[pipe_idx];
+
+ if (!pipe_ctx->stream)
+ continue;
+
+ if (pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear)
+ pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear(pipe_ctx->stream_res.tg);
+
+ hubp = pipe_ctx->plane_res.hubp;
+ if (!hubp)
+ continue;
+
+ mpcc_inst = hubp->inst;
+ // MPCC inst is equal to pipe index in practice
+ for (opp_inst = 0; opp_inst < opp_count; opp_inst++) {
+ if (dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst]) {
+ dc->res_pool->mpc->funcs->wait_for_idle(dc->res_pool->mpc, mpcc_inst);
+ dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst] = false;
+ break;
+ }
+ }
+ }
+}
+
static void commit_planes_for_stream(struct dc *dc,
struct dc_surface_update *srf_updates,
int surface_count,
@@ -3519,24 +3558,9 @@ static void commit_planes_for_stream(struct dc *dc,
// dc->current_state anymore, so we have to cache it before we apply
// the new SubVP context
subvp_prev_use = false;
-
-
dc_z10_restore(dc);
-
- if (update_type == UPDATE_TYPE_FULL) {
- /* wait for all double-buffer activity to clear on all pipes */
- int pipe_idx;
-
- for (pipe_idx = 0; pipe_idx < dc->res_pool->pipe_count; pipe_idx++) {
- struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[pipe_idx];
-
- if (!pipe_ctx->stream)
- continue;
-
- if (pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear)
- pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear(pipe_ctx->stream_res.tg);
- }
- }
+ if (update_type == UPDATE_TYPE_FULL)
+ wait_for_outstanding_hw_updates(dc, context);
if (update_type == UPDATE_TYPE_FULL) {
dc_allow_idle_optimizations(dc, false);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index dd4c7a7faf28..971fa8bf6d1f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1563,17 +1563,6 @@ static void dcn20_update_dchubp_dpp(
|| plane_state->update_flags.bits.global_alpha_change
|| plane_state->update_flags.bits.per_pixel_alpha_change) {
// MPCC inst is equal to pipe index in practice
- int mpcc_inst = hubp->inst;
- int opp_inst;
- int opp_count = dc->res_pool->pipe_count;
-
- for (opp_inst = 0; opp_inst < opp_count; opp_inst++) {
- if (dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst]) {
- dc->res_pool->mpc->funcs->wait_for_idle(dc->res_pool->mpc, mpcc_inst);
- dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst] = false;
- break;
- }
- }
hws->funcs.update_mpcc(dc, pipe_ctx);
}
--
2.41.0
From: Wenjing Liu <wenjing.liu(a)amd.com>
ODM power optimization is only supported with single stream. When ODM
power optimization is enabled, we might not have enough free pipes for
enabling other stream. So when we are committing more than 1 stream we
should first switch off ODM power optimization to make room for new
stream and then allocating pipe resource for the new stream.
Cc: stable(a)vger.kernel.org
Fixes: 4fbcb04a2ff5 ("drm/amd/display: add ODM case when looking for first split pipe")
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 025e0fdf486d..c6f6dc972c2a 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -2073,12 +2073,12 @@ enum dc_status dc_commit_streams(struct dc *dc,
}
}
- /* Check for case where we are going from odm 2:1 to max
- * pipe scenario. For these cases, we will call
- * commit_minimal_transition_state() to exit out of odm 2:1
- * first before processing new streams
+ /* ODM Combine 2:1 power optimization is only applied for single stream
+ * scenario, it uses extra pipes than needed to reduce power consumption
+ * We need to switch off this feature to make room for new streams.
*/
- if (stream_count == dc->res_pool->pipe_count) {
+ if (stream_count > dc->current_state->stream_count &&
+ dc->current_state->stream_count == 1) {
for (i = 0; i < dc->res_pool->pipe_count; i++) {
pipe = &dc->current_state->res_ctx.pipe_ctx[i];
if (pipe->next_odm_pipe)
--
2.41.0
From: Wenjing Liu <wenjing.liu(a)amd.com>
When we are dynamically adding new ODM slices, we didn't update
blank state, if the pipe used by new ODM slice is previously blanked,
we will continue outputting blank pixel data on that slice causing
right half of the screen showing blank image.
The previous fix was a temporary hack to directly update current state
when committing new state. This could potentially cause hw and sw
state synchronization issues and it is not permitted by dc commit
design.
Cc: stable(a)vger.kernel.org
Fixes: 7fbf451e7639 ("drm/amd/display: Reinit DPG when exiting dynamic ODM")
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
---
.../drm/amd/display/dc/dcn20/dcn20_hwseq.c | 36 +++++--------------
1 file changed, 9 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index d3caba52d2fc..f3db16cd10db 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1106,29 +1106,6 @@ void dcn20_blank_pixel_data(
v_active,
offset);
- if (!blank && dc->debug.enable_single_display_2to1_odm_policy) {
- /* when exiting dynamic ODM need to reinit DPG state for unused pipes */
- struct pipe_ctx *old_odm_pipe = dc->current_state->res_ctx.pipe_ctx[pipe_ctx->pipe_idx].next_odm_pipe;
-
- odm_pipe = pipe_ctx->next_odm_pipe;
-
- while (old_odm_pipe) {
- if (!odm_pipe || old_odm_pipe->pipe_idx != odm_pipe->pipe_idx)
- dc->hwss.set_disp_pattern_generator(dc,
- old_odm_pipe,
- CONTROLLER_DP_TEST_PATTERN_VIDEOMODE,
- CONTROLLER_DP_COLOR_SPACE_UDEFINED,
- COLOR_DEPTH_888,
- NULL,
- 0,
- 0,
- 0);
- old_odm_pipe = old_odm_pipe->next_odm_pipe;
- if (odm_pipe)
- odm_pipe = odm_pipe->next_odm_pipe;
- }
- }
-
if (!blank)
if (stream_res->abm) {
dc->hwss.set_pipe(pipe_ctx);
@@ -1732,11 +1709,16 @@ static void dcn20_program_pipe(
struct dc_state *context)
{
struct dce_hwseq *hws = dc->hwseq;
- /* Only need to unblank on top pipe */
- if ((pipe_ctx->update_flags.bits.enable || pipe_ctx->stream->update_flags.bits.abm_level)
- && !pipe_ctx->top_pipe && !pipe_ctx->prev_odm_pipe)
- hws->funcs.blank_pixel_data(dc, pipe_ctx, !pipe_ctx->plane_state->visible);
+ /* Only need to unblank on top pipe */
+ if (resource_is_pipe_type(pipe_ctx, OTG_MASTER)) {
+ if (pipe_ctx->update_flags.bits.enable ||
+ pipe_ctx->update_flags.bits.odm ||
+ pipe_ctx->stream->update_flags.bits.abm_level)
+ hws->funcs.blank_pixel_data(dc, pipe_ctx,
+ !pipe_ctx->plane_state ||
+ !pipe_ctx->plane_state->visible);
+ }
/* Only update TG on top pipe */
if (pipe_ctx->update_flags.bits.global_sync && !pipe_ctx->top_pipe
--
2.41.0
From: Fudong Wang <fudong.wang(a)amd.com>
A benchmark stress test (12-40 machines x 48hours) found that DCN315 has
cases where DC writes to an indirect register to set the smu clock msg
id, but when we go to read the same indirect register the returned msg
id doesn't match with what we just set it to. So, to fix this retry the
write until the register's value matches with the requested value.
Cc: stable(a)vger.kernel.org # 6.1+
Fixes: f94903996140 ("drm/amd/display: Add DCN315 CLK_MGR")
Reviewed-by: Charlene Liu <charlene.liu(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Fudong Wang <fudong.wang(a)amd.com>
---
.../display/dc/clk_mgr/dcn315/dcn315_smu.c | 20 +++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c
index 3e0da873cf4c..1042cf1a3ab0 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c
@@ -32,6 +32,7 @@
#define MAX_INSTANCE 6
#define MAX_SEGMENT 6
+#define SMU_REGISTER_WRITE_RETRY_COUNT 5
struct IP_BASE_INSTANCE {
unsigned int segment[MAX_SEGMENT];
@@ -132,6 +133,8 @@ static int dcn315_smu_send_msg_with_param(
unsigned int msg_id, unsigned int param)
{
uint32_t result;
+ uint32_t i = 0;
+ uint32_t read_back_data;
result = dcn315_smu_wait_for_response(clk_mgr, 10, 200000);
@@ -148,10 +151,19 @@ static int dcn315_smu_send_msg_with_param(
/* Set the parameter register for the SMU message, unit is Mhz */
REG_WRITE(MP1_SMN_C2PMSG_37, param);
- /* Trigger the message transaction by writing the message ID */
- generic_write_indirect_reg(CTX,
- REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA),
- mmMP1_C2PMSG_3, msg_id);
+ for (i = 0; i < SMU_REGISTER_WRITE_RETRY_COUNT; i++) {
+ /* Trigger the message transaction by writing the message ID */
+ generic_write_indirect_reg(CTX,
+ REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA),
+ mmMP1_C2PMSG_3, msg_id);
+ read_back_data = generic_read_indirect_reg(CTX,
+ REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA),
+ mmMP1_C2PMSG_3);
+ if (read_back_data == msg_id)
+ break;
+ udelay(2);
+ smu_print("SMU msg id write fail %x times. \n", i + 1);
+ }
result = dcn315_smu_wait_for_response(clk_mgr, 10, 200000);
--
2.41.0
It was noticed that APs stopped to accept clients after a while. With QCA's
ath11k fork, it even printed some additional information:
attach ack fail -28
when new clients tried to connect. hostapd was then usually showing a
message like "deauthenticated due to inactivity (timer DEAUTH/REMOVE)".
While debugging this, it was noticed that this happened when a peer was no
longer known by ath11k but an NL80211_CMD_PROBE_CLIENT triggered TX was
just "finished" for it. In that case, ath11k was just throwing the skb away
and left some information in various data structures in mac80211.
And after Felix pointed out ieee80211_free_txskb(), it is also clear that
dev_kfree_skb_any() in these functions should also be calls to
ieee80211_free_txskb() - but for these, I have nothing to trigger this
error case. Still, a patch is provided as part of this patch series.
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Changes in v2:
- Simply switch to ieee80211_free_txskb() as recommended by Felix Fietkau
+ ieee80211_free_txskb calls ieee80211_report_used_skb
+ ieee80211_report_used_skb calls ieee80211_report_ack_skb (when
ack_frame_id is set and !IEEE80211_TX_INTFL_MLME_CONN_TX)
+ ieee80211_report_ack_skb will remove skb from ack_status_frames
- Add second patch which handles similar situations in the previously
patched functions
- Link to v1: https://lore.kernel.org/r/20230801-ath11k-ack_status_leak-v1-1-539cb72c55bc…
---
Sven Eckelmann (2):
ath11k: Don't drop tx_status when peer cannot be found
ath11k: Cleanup mac80211 references on failure during tx_complete
drivers/net/wireless/ath/ath11k/dp_tx.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
---
base-commit: 1d7dd5aa35474e553b8671b58579e0749b560779
change-id: 20230801-ath11k-ack_status_leak-70a7a30e5d9f
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
This kind of interface doesn't have a mac header. This patch fixes
bpf_redirect() to a ppp interface.
CC: stable(a)vger.kernel.org
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Tested-by: Siwar Zitouni <siwar.zitouni(a)6wind.com>
---
v1 -> v2:
- I forgot the 'Tested-by' tag in the v1 :/
include/linux/if_arp.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h
index 1ed52441972f..8efbe29a6f0c 100644
--- a/include/linux/if_arp.h
+++ b/include/linux/if_arp.h
@@ -53,6 +53,7 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev)
case ARPHRD_NONE:
case ARPHRD_RAWIP:
case ARPHRD_PIMREG:
+ case ARPHRD_PPP:
return false;
default:
return true;
--
2.39.2
Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer
20191213 version which moves some instructions from the I extension to the
Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds
that version, we should explicitly specifying Zicsr and Zifencei via -march
to cope with the new changes. but this only occurs when binutils >= 2.36
and GCC >= 11.1.0. It's a different story when binutils < 2.36.
binutils-2.36 supports the Zifencei extension[2] and splits Zifencei and
Zicsr from I[3]. GCC-11.1.0 is particular[4] because it add support Zicsr
and Zifencei extension for -march. binutils-2.35 does not support the
Zifencei extension, and does not need to specify Zicsr and Zifencei when
working with GCC >= 12.1.0.
To make our lives easier, let's relax the check to binutils >= 2.36 in
CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. For the other two cases,
where clang < 17 or GCC < 11.1.0, we will deal with them in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
For more information, please refer to:
commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38")
commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils")
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae871… [0]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51… [1]
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=5a1b31e1e1cee6e9f… [2]
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=729a53530e86972d1… [3]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7… [4]
Link: https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org
Link: https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org
Reviewed-by: Conor Dooley <conor.dooley(a)microchip.com>
Acked-by: Guo Ren <guoren(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng(a)iscas.ac.cn>
---
Changelog and test results:
v4 -> v5:
- Add Reviewed-by and Acked-by to commit message, no other code
changes.
v3 -> v4:
- Update the Kconfig help text and commit message.
Link: https://lore.kernel.org/all/20230731150511.38140-1-xingmingzheng@iscas.ac.cn
v2 -> v3:
- Relax the check to binutils >= 2.36.
- Update the Kconfig help text and commit message.
Link: https://lore.kernel.org/all/20230731095936.23397-1-xingmingzheng@iscas.ac.cn
v1 -> v2:
- Update the Kconfig help text and commit message.
- Add considerations for low version gcc case.
Link: https://lore.kernel.org/all/20230726174524.340952-1-xingmingzheng@iscas.ac.…
v1:
Link: https://lore.kernel.org/all/20230725170405.251011-1-xingmingzheng@iscas.ac.…
Here are my test results:
* Compiling the kernel for the master branch with a combination of
multiple versions of gcc and binutils.
gcc binutils patched no patch
11.4.0 2.35 ok ok
11.4.0 2.36 ok ok
11.4.0 2.38 ok ok
12.2.0 2.35 ok error[1]
12.2.0 2.36 ok error[2]
12.2.0 2.38 ok ok
10.5.0 2.35 ok ok
10.5.0 2.36 ok ok
10.5.0 2.38 ok error[3]
11.1.0 2.35 ok ok
11.1.0 2.36 ok ok
11.1.0 2.38 ok ok
11.2.0 2.35 ok ok
11.2.0 2.36 ok ok
11.2.0 2.38 ok ok
[1]
Assembler messages:
Fatal error: -march=rv32imafd_zicsr_zifencei: Invalid or unknown z ISA extension: 'zifencei'
make[2]: *** [arch/riscv/kernel/compat_vdso/Makefile:47: arch/riscv/kernel/compat_vdso/rt_sigreturn.o] Error 1
[2]
./arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
make[2]: *** [scripts/Makefile.build:243: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
[3]
cc1: error: '-march=rv64imac_zicsr_zifencei': unsupported ISA subset 'z'
cc1: error: ABI requires '-march=rv64'
make[2]: *** [scripts/Makefile.build:243: scripts/mod/empty.o] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1: error: '-march=rv64imac_zicsr_zifencei': unsupported ISA subset 'z'
cc1: error: ABI requires '-march=rv64'
arch/riscv/Kconfig | 32 +++++++++++++++-----------
arch/riscv/kernel/compat_vdso/Makefile | 8 ++++++-
2 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index f52dd125ac5e..ce3a6667cfdb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -570,24 +570,30 @@ config TOOLCHAIN_HAS_ZIHINTPAUSE
config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
def_bool y
# https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae871…
- depends on AS_IS_GNU && AS_VERSION >= 23800
- help
- Newer binutils versions default to ISA spec version 20191213 which
- moves some instructions from the I extension to the Zicsr and Zifencei
- extensions.
+ # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51…
+ depends on AS_IS_GNU && AS_VERSION >= 23600
+ help
+ Binutils-2.38 and GCC-12.1.0 bumped the default ISA spec to the newer
+ 20191213 version, which moves some instructions from the I extension to
+ the Zicsr and Zifencei extensions. This requires explicitly specifying
+ Zicsr and Zifencei when binutils >= 2.38 or GCC >= 12.1.0. Zicsr
+ and Zifencei are supported in binutils from version 2.36 onwards.
+ To make life easier, and avoid forcing toolchains that default to a
+ newer ISA spec to version 2.2, relax the check to binutils >= 2.36.
+ For clang < 17 or GCC < 11.1.0, for which this is not possible, this is
+ dealt with in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
def_bool y
depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
# https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694…
- depends on CC_IS_CLANG && CLANG_VERSION < 170000
- help
- Certain versions of clang do not support zicsr and zifencei via -march
- but newer versions of binutils require it for the reasons noted in the
- help text of CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. This
- option causes an older ISA spec compatible with these older versions
- of clang to be passed to GAS, which has the same result as passing zicsr
- and zifencei to -march.
+ # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7…
+ depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100)
+ help
+ Certain versions of clang and GCC do not support zicsr and zifencei via
+ -march. This option causes an older ISA spec compatible with these older
+ versions of clang and GCC to be passed to GAS, which has the same result
+ as passing zicsr and zifencei to -march.
config FPU
bool "FPU support"
diff --git a/arch/riscv/kernel/compat_vdso/Makefile b/arch/riscv/kernel/compat_vdso/Makefile
index 189345773e7e..b86e5e2c3aea 100644
--- a/arch/riscv/kernel/compat_vdso/Makefile
+++ b/arch/riscv/kernel/compat_vdso/Makefile
@@ -11,7 +11,13 @@ compat_vdso-syms += flush_icache
COMPAT_CC := $(CC)
COMPAT_LD := $(LD)
-COMPAT_CC_FLAGS := -march=rv32g -mabi=ilp32
+# binutils 2.35 does not support the zifencei extension, but in the ISA
+# spec 20191213, G stands for IMAFD_ZICSR_ZIFENCEI.
+ifdef CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
+ COMPAT_CC_FLAGS := -march=rv32g -mabi=ilp32
+else
+ COMPAT_CC_FLAGS := -march=rv32imafd -mabi=ilp32
+endif
COMPAT_LD_FLAGS := -melf32lriscv
# Disable attributes, as they're useless and break the build.
--
2.34.1
When sending Discover Identity messages to a Port Partner that uses Power
Delivery v2 and SVDM v1, we currently send PD v2 messages with SVDM v2.0,
expecting the port partner to respond with its highest supported SVDM
version as stated in Section 6.4.4.2.3 in the Power Delivery v3
specification. However, sending SVDM v2 to some Power Delivery v2 port
partners results in a NAK whereas sending SVDM v1 does not.
NAK messages can be handled by the initiator (PD v3 section 6.4.4.2.5.1),
and one solution could be to resend Discover Identity on a lower SVDM
version if possible. But, Section 6.4.4.3 of PD v2 states that "A NAK
response Should be taken as an indication not to retry that particular
Command."
Instead, we can set the SVDM version to the maximum one supported by the
negotiated PD revision. When operating in PD v2, this obeys Section
6.4.4.2.3, which states the SVDM field "Shall be set to zero to indicate
Version 1.0." In PD v3, the SVDM field "Shall be set to 01b to indicate
Version 2.0."
Fixes: c34e85fa69b9 ("usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work")
Cc: stable(a)vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera(a)google.com>
---
Changes since v1:
* Fixed styling errors.
---
drivers/usb/typec/tcpm/tcpm.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 829d75ebab42..5024354a0fe0 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -3928,6 +3928,29 @@ static enum typec_cc_status tcpm_pwr_opmode_to_rp(enum typec_pwr_opmode opmode)
}
}
+static void tcpm_set_initial_svdm_version(struct tcpm_port *port)
+{
+ switch (port->negotiated_rev) {
+ case PD_REV30:
+ break;
+ /*
+ * 6.4.4.2.3 Structured VDM Version
+ * 2.0 states "At this time, there is only one version (1.0) defined.
+ * This field Shall be set to zero to indicate Version 1.0."
+ * 3.0 states "This field Shall be set to 01b to indicate Version 2.0."
+ * To ensure that we follow the Power Delivery revision we are currently
+ * operating on, downgrade the SVDM version to the highest one supported
+ * by the Power Delivery revision.
+ */
+ case PD_REV20:
+ typec_partner_set_svdm_version(port->partner, SVDM_VER_1_0);
+ break;
+ default:
+ typec_partner_set_svdm_version(port->partner, SVDM_VER_1_0);
+ break;
+ }
+}
+
static void run_state_machine(struct tcpm_port *port)
{
int ret;
@@ -4165,10 +4188,12 @@ static void run_state_machine(struct tcpm_port *port)
* For now, this driver only supports SOP for DISCOVER_IDENTITY, thus using
* port->explicit_contract to decide whether to send the command.
*/
- if (port->explicit_contract)
+ if (port->explicit_contract) {
+ tcpm_set_initial_svdm_version(port);
mod_send_discover_delayed_work(port, 0);
- else
+ } else {
port->send_discover = false;
+ }
/*
* 6.3.5
@@ -4455,10 +4480,12 @@ static void run_state_machine(struct tcpm_port *port)
* For now, this driver only supports SOP for DISCOVER_IDENTITY, thus using
* port->explicit_contract.
*/
- if (port->explicit_contract)
+ if (port->explicit_contract) {
+ tcpm_set_initial_svdm_version(port);
mod_send_discover_delayed_work(port, 0);
- else
+ } else {
port->send_discover = false;
+ }
power_supply_changed(port->psy);
break;
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0.585.gd2178a4bd4-goog
Dzień dobry,
czy konsultowali Państwo swoją umowę ubezpieczeniową z niezależnym doradcą?
Większość moich Klientów, dotychczas nie była świadoma, jak bardzo przepłacają za polisy.
Jako specjalista niepowiązany z żadną organizacja ubezpieczeniową bezpłatnie przeanalizuję Państwa rozwiązania finansowe i zarekomenduję najkorzystniejsze na rynku alternatywy, które pozwolą zmniejszyć dotychczasowe koszty przy jednocześnie zwiększonej ochronie.
Byliby Państwo zainteresowani?
Pozdrawiam
Grzegorz Frycz
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Author: Wei Chen <harperchen1110(a)gmail.com>
Date: Thu Aug 10 08:23:33 2023 +0000
variable *nplanes is provided by user via system call argument. The
possible value of q_data->fmt->num_planes is 1-3, while the value
of *nplanes can be 1-8. The array access by index i can cause array
out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Fixes: 4e855a6efa54 ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver")
Signed-off-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Chen-Yu Tsai <wenst(a)chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c | 2 ++
1 file changed, 2 insertions(+)
---
diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c
index 9ff439a50f53..315e97a2450e 100644
--- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c
+++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c
@@ -821,6 +821,8 @@ static int vb2ops_venc_queue_setup(struct vb2_queue *vq,
return -EINVAL;
if (*nplanes) {
+ if (*nplanes != q_data->fmt->num_planes)
+ return -EINVAL;
for (i = 0; i < *nplanes; i++)
if (sizes[i] < q_data->sizeimage[i])
return -EINVAL;
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282(a)syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
This problem was just identified by syzbot [1]. I hope it is ok to directly
send this patch to netdev instead of creating a single-patch PR from
the batadv/net branch. If you still prefer a PR then we can also prepare
it.
[1] https://lore.kernel.org/r/0000000000009bbb4b0603717cde@google.com
---
net/batman-adv/netlink.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index ad5714f737be..6efbc9275aec 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info)
attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr));
+
+ rtnl_lock();
batadv_update_min_mtu(bat_priv->soft_iface);
+ rtnl_unlock();
}
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
---
base-commit: 421d467dc2d483175bad4fb76a31b9e5a3d744cf
change-id: 20230821-batadv-missing-mtu-rtnl-lock-bc4cee67731d
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe.
Since only Microsoft Pluton is the only known combination of AMD CPU and
fTPM from other vendor, disable hwrng otherwise. In order to make sysadmin
aware of this, print also info message to the klog.
Cc: stable(a)vger.kernel.org
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2:
* CONFIG_X86
* Removed "Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>"
* Removed "Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>"
---
drivers/char/tpm/tpm_crb.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 65ff4d2fbe8d..28448bfd4062 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -463,28 +463,6 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
return (cancel & CRB_CANCEL_INVOKE) == CRB_CANCEL_INVOKE;
}
-static int crb_check_flags(struct tpm_chip *chip)
-{
- u32 val;
- int ret;
-
- ret = crb_request_locality(chip, 0);
- if (ret)
- return ret;
-
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
- if (ret)
- goto release;
-
- if (val == 0x414D4400U /* AMD */)
- chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
-
-release:
- crb_relinquish_locality(chip, 0);
-
- return ret;
-}
-
static const struct tpm_class_ops tpm_crb = {
.flags = TPM_OPS_AUTO_STARTUP,
.status = crb_status,
@@ -827,9 +805,12 @@ static int crb_acpi_add(struct acpi_device *device)
if (rc)
goto out;
- rc = crb_check_flags(chip);
- if (rc)
- goto out;
+ /* A quirk for https://www.amd.com/en/support/kb/faq/pa-410 */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ priv->sm != ACPI_TPM2_COMMAND_BUFFER_WITH_PLUTON) {
+ dev_info(dev, "Disabling hwrng\n");
+ chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
+ }
rc = tpm_chip_register(chip);
--
2.39.2
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe.
As this isn't crucial for anything but AMD fTPM and AMD fTPM works, check
the chip vendor and if it's not AMD don't run the checks.
Cc: stable(a)vger.kernel.org
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Reported-by: Patrick Steinhardt <ps(a)pks.im>
Reported-by: Ronan Pigott <ronan(a)rjp.ie>
Reported-by: Raymond Jay Golo <rjgolo(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
v1->v2:
* Check x86 vendor for AMD
---
drivers/char/tpm/tpm_crb.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 9eb1a18590123..7faf670201ccc 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -465,8 +465,12 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
static int crb_check_flags(struct tpm_chip *chip)
{
+ int ret = 0;
+#ifdef CONFIG_X86
u32 val;
- int ret;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return ret;
ret = crb_request_locality(chip, 0);
if (ret)
@@ -481,6 +485,7 @@ static int crb_check_flags(struct tpm_chip *chip)
release:
crb_relinquish_locality(chip, 0);
+#endif
return ret;
}
--
2.34.1
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe.
Since only Microsoft Pluton is the only known combination of AMD CPU and
fTPM from other vendor, disable hwrng otherwise. In order to make sysadmin
aware of this, print also info message to the klog.
Cc: stable(a)vger.kernel.org # v6.5+
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
drivers/char/tpm/tpm_crb.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 65ff4d2fbe8d..28448bfd4062 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -463,28 +463,6 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
return (cancel & CRB_CANCEL_INVOKE) == CRB_CANCEL_INVOKE;
}
-static int crb_check_flags(struct tpm_chip *chip)
-{
- u32 val;
- int ret;
-
- ret = crb_request_locality(chip, 0);
- if (ret)
- return ret;
-
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
- if (ret)
- goto release;
-
- if (val == 0x414D4400U /* AMD */)
- chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
-
-release:
- crb_relinquish_locality(chip, 0);
-
- return ret;
-}
-
static const struct tpm_class_ops tpm_crb = {
.flags = TPM_OPS_AUTO_STARTUP,
.status = crb_status,
@@ -827,9 +805,12 @@ static int crb_acpi_add(struct acpi_device *device)
if (rc)
goto out;
- rc = crb_check_flags(chip);
- if (rc)
- goto out;
+ /* A quirk for https://www.amd.com/en/support/kb/faq/pa-410 */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ priv->sm != ACPI_TPM2_COMMAND_BUFFER_WITH_PLUTON) {
+ dev_info(dev, "Disabling hwrng\n");
+ chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
+ }
rc = tpm_chip_register(chip);
--
2.39.2
As made mention of in commit 099303e9a9bd ("drm/amd/display: eDP
intermittent black screen during PnP"), we need to turn off the
display's backlight before powering off an eDP display. Not doing so
will result in undefined behaviour according to the eDP spec. So, set
DCN301's edp_backlight_control() function pointer to
dce110_edp_backlight_control().
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2765
Fixes: 9c75891feef0 ("drm/amd/display: rework recent update PHY state commit")
Suggested-by: Swapnil Patel <swapnil.patel(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
index 257df8660b4c..61205cdbe2d5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
@@ -75,6 +75,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = {
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
+ .edp_backlight_control = dce110_edp_backlight_control,
.edp_power_control = dce110_edp_power_control,
.edp_wait_for_hpd_ready = dce110_edp_wait_for_hpd_ready,
.set_cursor_position = dcn10_set_cursor_position,
--
2.41.0
check_clock doesn't account for vfe_lite which means that vfe_lite will
never get validated by this routine. Add the clock name to the expected set
to remediate.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
drivers/media/platform/qcom/camss/camss-vfe.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe.c b/drivers/media/platform/qcom/camss/camss-vfe.c
index 938f373bcd1fd..b021f81cef123 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe.c
@@ -535,7 +535,8 @@ static int vfe_check_clock_rates(struct vfe_device *vfe)
struct camss_clock *clock = &vfe->clock[i];
if (!strcmp(clock->name, "vfe0") ||
- !strcmp(clock->name, "vfe1")) {
+ !strcmp(clock->name, "vfe1") ||
+ !strcmp(clock->name, "vfe_lite")) {
u64 min_rate = 0;
unsigned long rate;
--
2.41.0
There are two problems with the current vfe_disable_output() routine.
Firstly we rightly use a spinlock to protect output->gen2.active_num
everywhere except for in the IDLE timeout path of vfe_disable_output().
Even if that is not racy "in practice" somehow it is by happenstance not
by design.
Secondly we do not get consistent behaviour from this routine. On
sc8280xp 50% of the time I get "VFE idle timeout - resetting". In this
case the subsequent capture will succeed. The other 50% of the time, we
don't hit the idle timeout, never do the VFE reset and subsequent
captures stall indefinitely.
Rewrite the vfe_disable_output() routine to
- Quiesce write masters with vfe_wm_stop()
- Set active_num = 0
remembering to hold the spinlock when we do so followed by
- Reset the VFE
Testing on sc8280xp and sdm845 shows this to be a valid fix.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
.../media/platform/qcom/camss/camss-vfe-170.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe-170.c b/drivers/media/platform/qcom/camss/camss-vfe-170.c
index 02494c89da91c..ae9137633c301 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe-170.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe-170.c
@@ -500,28 +500,15 @@ static int vfe_disable_output(struct vfe_line *line)
struct vfe_output *output = &line->output;
unsigned long flags;
unsigned int i;
- bool done;
- int timeout = 0;
-
- do {
- spin_lock_irqsave(&vfe->output_lock, flags);
- done = !output->gen2.active_num;
- spin_unlock_irqrestore(&vfe->output_lock, flags);
- usleep_range(10000, 20000);
-
- if (timeout++ == 100) {
- dev_err(vfe->camss->dev, "VFE idle timeout - resetting\n");
- vfe_reset(vfe);
- output->gen2.active_num = 0;
- return 0;
- }
- } while (!done);
spin_lock_irqsave(&vfe->output_lock, flags);
for (i = 0; i < output->wm_num; i++)
vfe_wm_stop(vfe, output->wm_idx[i]);
+ output->gen2.active_num = 0;
spin_unlock_irqrestore(&vfe->output_lock, flags);
+ vfe_reset(vfe);
+
return 0;
}
--
2.41.0
We need to make sure camss_configure_pd() happens before
camss_register_entities() as the vfe_get() path relies on the pointer
provided by camss_configure_pd().
Fix the ordering sequence in probe to ensure the pointers vfe_get() demands
are present by the time camss_register_entities() runs.
In order to facilitate backporting to stable kernels I've moved the
configure_pd() call pretty early on the probe() function so that
irrespective of the existence of the old error handling jump labels this
patch should still apply to -next circa Aug 2023 to v5.13 inclusive.
Fixes: 2f6f8af67203 ("media: camss: Refactor VFE power domain toggling")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
drivers/media/platform/qcom/camss/camss.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c
index f11dc59135a5a..75991d849b571 100644
--- a/drivers/media/platform/qcom/camss/camss.c
+++ b/drivers/media/platform/qcom/camss/camss.c
@@ -1619,6 +1619,12 @@ static int camss_probe(struct platform_device *pdev)
if (ret < 0)
goto err_cleanup;
+ ret = camss_configure_pd(camss);
+ if (ret < 0) {
+ dev_err(dev, "Failed to configure power domains: %d\n", ret);
+ goto err_cleanup;
+ }
+
ret = camss_init_subdevices(camss);
if (ret < 0)
goto err_cleanup;
@@ -1678,12 +1684,6 @@ static int camss_probe(struct platform_device *pdev)
}
}
- ret = camss_configure_pd(camss);
- if (ret < 0) {
- dev_err(dev, "Failed to configure power domains: %d\n", ret);
- return ret;
- }
-
pm_runtime_enable(dev);
return 0;
--
2.41.0
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 08013345d1f24..7e1bd79fbee8f 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -208,6 +208,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -277,6 +278,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
Hello
I need your services in a confidential matter regarding fund. I have all
the details relating to the funds. Please get back to me through my
email with your response if you are willing to work with me.
Best regards.
Anita Anderson
anitaanderson1122(a)proton.me
From: Marco Felsch <m.felsch(a)pengutronix.de>
According the "USB Type-C Port Controller Interface Specification v2.0"
the TCPC sets the fault status register bit-7
(AllRegistersResetToDefault) once the registers have been reset to
their default values.
This triggers an alert(-irq) on PTN5110 devices albeit we do mask the
fault-irq, which may cause a kernel hang. Fix this generically by writing
a one to the corresponding bit-7.
Cc: stable(a)vger.kernel.org
Fixes: 74e656d6b055 ("staging: typec: Type-C Port Controller Interface driver (tcpci)")
Reported-by: Angus Ainslie (Purism) <angus(a)akkea.ca>
Closes: https://lore.kernel.org/all/20190508002749.14816-2-angus@akkea.ca/
Reported-by: Christian Bach <christian.bach(a)scs.ch>
Closes: https://lore.kernel.org/regressions/ZR0P278MB07737E5F1D48632897D51AC3EB329@…
Signed-off-by: Marco Felsch <m.felsch(a)pengutronix.de>
Signed-off-by: Fabio Estevam <festevam(a)denx.de>
---
Changes since v2:
- Submitted it as a standalone patch.
- Explain that it may cause a kernel hang.
- Fixed typos in the commit log. (Guenter)
- Check the tcpci_write16() return value. (Guenter)
- Write to TCPC_FAULT_STATUS unconditionally. (Guenter)
- Added Fixes, Reported-by and Closes tags.
- CCed stable
drivers/usb/typec/tcpm/tcpci.c | 4 ++++
include/linux/usb/tcpci.h | 1 +
2 files changed, 5 insertions(+)
diff --git a/drivers/usb/typec/tcpm/tcpci.c b/drivers/usb/typec/tcpm/tcpci.c
index fc708c289a73..0ee3e6e29bb1 100644
--- a/drivers/usb/typec/tcpm/tcpci.c
+++ b/drivers/usb/typec/tcpm/tcpci.c
@@ -602,6 +602,10 @@ static int tcpci_init(struct tcpc_dev *tcpc)
if (time_after(jiffies, timeout))
return -ETIMEDOUT;
+ ret = tcpci_write16(tcpci, TCPC_FAULT_STATUS, TCPC_FAULT_STATUS_ALL_REG_RST_TO_DEFAULT);
+ if (ret < 0)
+ return ret;
+
/* Handle vendor init */
if (tcpci->data->init) {
ret = tcpci->data->init(tcpci, tcpci->data);
diff --git a/include/linux/usb/tcpci.h b/include/linux/usb/tcpci.h
index 85e95a3251d3..83376473ac76 100644
--- a/include/linux/usb/tcpci.h
+++ b/include/linux/usb/tcpci.h
@@ -103,6 +103,7 @@
#define TCPC_POWER_STATUS_SINKING_VBUS BIT(0)
#define TCPC_FAULT_STATUS 0x1f
+#define TCPC_FAULT_STATUS_ALL_REG_RST_TO_DEFAULT BIT(7)
#define TCPC_ALERT_EXTENDED 0x21
--
2.34.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 5ea37d133f241..6abb80b09db3b 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -209,6 +209,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -254,6 +255,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index e5f31af65aabf..00e1ffa4fcf1c 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -212,6 +212,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -258,6 +259,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 1322461f1f3c5..66aad9dbd58c5 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -208,6 +208,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -268,6 +269,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
From: Eric Biggers <ebiggers(a)google.com>
If an fsverity builtin signature is given for a file but the
".fs-verity" keyring is empty, there's no real reason to run the PKCS#7
parser. Skip this to avoid the PKCS#7 attack surface when builtin
signature support is configured into the kernel but is not being used.
This is a hardening improvement, not a fix per se, but I've added
Fixes and Cc stable to get it out to more users.
Fixes: 432434c9f8e1 ("fs-verity: support builtin file signatures")
Cc: stable(a)vger.kernel.org
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
v3: improve the error message slightly
v2: check keyring and return early before allocating formatted digest
fs/verity/signature.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/fs/verity/signature.c b/fs/verity/signature.c
index b95acae64eac6..90c07573dd77b 100644
--- a/fs/verity/signature.c
+++ b/fs/verity/signature.c
@@ -55,20 +55,36 @@ int fsverity_verify_signature(const struct fsverity_info *vi,
if (sig_size == 0) {
if (fsverity_require_signatures) {
fsverity_err(inode,
"require_signatures=1, rejecting unsigned file!");
return -EPERM;
}
return 0;
}
+ if (fsverity_keyring->keys.nr_leaves_on_tree == 0) {
+ /*
+ * The ".fs-verity" keyring is empty, due to builtin signatures
+ * being supported by the kernel but not actually being used.
+ * In this case, verify_pkcs7_signature() would always return an
+ * error, usually ENOKEY. It could also be EBADMSG if the
+ * PKCS#7 is malformed, but that isn't very important to
+ * distinguish. So, just skip to ENOKEY to avoid the attack
+ * surface of the PKCS#7 parser, which would otherwise be
+ * reachable by any task able to execute FS_IOC_ENABLE_VERITY.
+ */
+ fsverity_err(inode,
+ "fs-verity keyring is empty, rejecting signed file!");
+ return -ENOKEY;
+ }
+
d = kzalloc(sizeof(*d) + hash_alg->digest_size, GFP_KERNEL);
if (!d)
return -ENOMEM;
memcpy(d->magic, "FSVerity", 8);
d->digest_algorithm = cpu_to_le16(hash_alg - fsverity_hash_algs);
d->digest_size = cpu_to_le16(hash_alg->digest_size);
memcpy(d->digest, vi->file_digest, hash_alg->digest_size);
err = verify_pkcs7_signature(d, sizeof(*d) + hash_alg->digest_size,
signature, sig_size, fsverity_keyring,
base-commit: 456ae5fe9b448f44ebe98b391a3bae9c75df465e
--
2.41.0
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit e7dd44f4f3166db45248414f5df8f615392de47a ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Michael Turquette <mturquette(a)baylibre.com>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: linux-clk(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/clk/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index cc871ae3a1792..5b34dbc830ee4 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -302,6 +302,7 @@ config COMMON_CLK_BD718XX
config COMMON_CLK_FIXED_MMIO
bool "Clock driver for Memory Mapped Fixed values"
depends on COMMON_CLK && OF
+ depends on HAS_IOMEM
help
Support for Memory Mapped IO Fixed clocks
--
2.40.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit e7dd44f4f3166db45248414f5df8f615392de47a ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Michael Turquette <mturquette(a)baylibre.com>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: linux-clk(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/clk/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 4ae49eae45869..df739665f2063 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -356,6 +356,7 @@ config COMMON_CLK_BD718XX
config COMMON_CLK_FIXED_MMIO
bool "Clock driver for Memory Mapped Fixed values"
depends on COMMON_CLK && OF
+ depends on HAS_IOMEM
help
Support for Memory Mapped IO Fixed clocks
--
2.40.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit e7dd44f4f3166db45248414f5df8f615392de47a ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Michael Turquette <mturquette(a)baylibre.com>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: linux-clk(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/clk/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 100e474ff3dc5..d12465c227514 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -380,6 +380,7 @@ config COMMON_CLK_BD718XX
config COMMON_CLK_FIXED_MMIO
bool "Clock driver for Memory Mapped Fixed values"
depends on COMMON_CLK && OF
+ depends on HAS_IOMEM
help
Support for Memory Mapped IO Fixed clocks
--
2.40.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit e7dd44f4f3166db45248414f5df8f615392de47a ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Michael Turquette <mturquette(a)baylibre.com>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: linux-clk(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/clk/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 5da82f2bdd211..a5dcc7293a836 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -427,6 +427,7 @@ config COMMON_CLK_BD718XX
config COMMON_CLK_FIXED_MMIO
bool "Clock driver for Memory Mapped Fixed values"
depends on COMMON_CLK && OF
+ depends on HAS_IOMEM
help
Support for Memory Mapped IO Fixed clocks
--
2.40.1
Add a helper to reschedule drm_mode_config::output_poll_work after
polling has been enabled for a connector (and needing a reschedule,
since previously polling was disabled for all connectors and hence
output_poll_work was not running).
This is needed by the next patch fixing HPD polling on i915.
CC: stable(a)vger.kernel.org # 6.4+
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Cc: dri-devel(a)lists.freedesktop.org
Reviewed-by: Jouni Högander <jouni.hogander(a)intel.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/drm_probe_helper.c | 68 ++++++++++++++++++++----------
include/drm/drm_probe_helper.h | 1 +
2 files changed, 47 insertions(+), 22 deletions(-)
diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
index 2fb9bf901a2cc..3f479483d7d80 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -262,6 +262,26 @@ static bool drm_kms_helper_enable_hpd(struct drm_device *dev)
}
#define DRM_OUTPUT_POLL_PERIOD (10*HZ)
+static void reschedule_output_poll_work(struct drm_device *dev)
+{
+ unsigned long delay = DRM_OUTPUT_POLL_PERIOD;
+
+ if (dev->mode_config.delayed_event)
+ /*
+ * FIXME:
+ *
+ * Use short (1s) delay to handle the initial delayed event.
+ * This delay should not be needed, but Optimus/nouveau will
+ * fail in a mysterious way if the delayed event is handled as
+ * soon as possible like it is done in
+ * drm_helper_probe_single_connector_modes() in case the poll
+ * was enabled before.
+ */
+ delay = HZ;
+
+ schedule_delayed_work(&dev->mode_config.output_poll_work, delay);
+}
+
/**
* drm_kms_helper_poll_enable - re-enable output polling.
* @dev: drm_device
@@ -279,37 +299,41 @@ static bool drm_kms_helper_enable_hpd(struct drm_device *dev)
*/
void drm_kms_helper_poll_enable(struct drm_device *dev)
{
- bool poll = false;
- unsigned long delay = DRM_OUTPUT_POLL_PERIOD;
-
if (!dev->mode_config.poll_enabled || !drm_kms_helper_poll ||
dev->mode_config.poll_running)
return;
- poll = drm_kms_helper_enable_hpd(dev);
-
- if (dev->mode_config.delayed_event) {
- /*
- * FIXME:
- *
- * Use short (1s) delay to handle the initial delayed event.
- * This delay should not be needed, but Optimus/nouveau will
- * fail in a mysterious way if the delayed event is handled as
- * soon as possible like it is done in
- * drm_helper_probe_single_connector_modes() in case the poll
- * was enabled before.
- */
- poll = true;
- delay = HZ;
- }
-
- if (poll)
- schedule_delayed_work(&dev->mode_config.output_poll_work, delay);
+ if (drm_kms_helper_enable_hpd(dev) ||
+ dev->mode_config.delayed_event)
+ reschedule_output_poll_work(dev);
dev->mode_config.poll_running = true;
}
EXPORT_SYMBOL(drm_kms_helper_poll_enable);
+/**
+ * drm_kms_helper_poll_reschedule - reschedule the output polling work
+ * @dev: drm_device
+ *
+ * This function reschedules the output polling work, after polling for a
+ * connector has been enabled.
+ *
+ * Drivers must call this helper after enabling polling for a connector by
+ * setting %DRM_CONNECTOR_POLL_CONNECT / %DRM_CONNECTOR_POLL_DISCONNECT flags
+ * in drm_connector::polled. Note that after disabling polling by clearing these
+ * flags for a connector will stop the output polling work automatically if
+ * the polling is disabled for all other connectors as well.
+ *
+ * The function can be called only after polling has been enabled by calling
+ * drm_kms_helper_poll_init() / drm_kms_helper_poll_enable().
+ */
+void drm_kms_helper_poll_reschedule(struct drm_device *dev)
+{
+ if (dev->mode_config.poll_running)
+ reschedule_output_poll_work(dev);
+}
+EXPORT_SYMBOL(drm_kms_helper_poll_reschedule);
+
static enum drm_connector_status
drm_helper_probe_detect_ctx(struct drm_connector *connector, bool force)
{
diff --git a/include/drm/drm_probe_helper.h b/include/drm/drm_probe_helper.h
index 4977e0ab72dbb..fad3c4003b2b5 100644
--- a/include/drm/drm_probe_helper.h
+++ b/include/drm/drm_probe_helper.h
@@ -25,6 +25,7 @@ void drm_kms_helper_connector_hotplug_event(struct drm_connector *connector);
void drm_kms_helper_poll_disable(struct drm_device *dev);
void drm_kms_helper_poll_enable(struct drm_device *dev);
+void drm_kms_helper_poll_reschedule(struct drm_device *dev);
bool drm_kms_helper_is_poll_worker(void);
enum drm_mode_status drm_crtc_helper_mode_valid_fixed(struct drm_crtc *crtc,
--
2.37.2
From: Dominique Martinet <asmadeus(a)codewreck.org>
[ Upstream commit 4a73edab69d3a6623f03817fe950a2d9585f80e4 ]
Similarly to the previous patch: offs can be used in handle_rerrors
without initializing on small payloads; in this case handle_rerrors will
not use it because of the size check, but it doesn't hurt to make sure
it is zero to please scan-build.
This fixes the following warning:
net/9p/trans_virtio.c:539:3: warning: 3rd function call argument is an uninitialized value [core.CallAndMessage]
handle_rerror(req, in_hdr_len, offs, in_pages);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reviewed-by: Simon Horman <simon.horman(a)corigine.com>
Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/9p/trans_virtio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/9p/trans_virtio.c b/net/9p/trans_virtio.c
index c6a46e8e9eda5..25f5caa57289b 100644
--- a/net/9p/trans_virtio.c
+++ b/net/9p/trans_virtio.c
@@ -401,7 +401,7 @@ p9_virtio_zc_request(struct p9_client *client, struct p9_req_t *req,
struct page **in_pages = NULL, **out_pages = NULL;
struct virtio_chan *chan = client->trans;
struct scatterlist *sgs[4];
- size_t offs;
+ size_t offs = 0;
int need_drop = 0;
p9_debug(P9_DEBUG_TRANS, "virtio request\n");
--
2.40.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x cee572756aa2cb46e959e9797ad4b730b78a050b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082156-capped-subtext-c4e2@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
cee572756aa2 ("arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4")
06c5b5690a57 ("arm64: dts: rockchip: sort nodes/properties on rk3399-rock-4")
69448624b770 ("arm64: dts: rockchip: fix regulator name on rk3399-rock-4")
8240e87f16d1 ("arm64: dts: rockchip: fix audio-supply for Rock Pi 4")
697dd494cb1c ("arm64: dts: rockchip: add SPDIF node for ROCK Pi 4")
65bd2b8bdb3b ("arm64: dts: rockchip: add ES8316 codec for ROCK Pi 4")
328c6112787b ("arm64: dts: rockchip: fix supplies on rk3399-rock-pi-4")
b5edb0467370 ("arm64: dts: rockchip: Mark rock-pi-4 as rock-pi-4a dts")
2bc65fef4fe4 ("arm64: dts: rockchip: rename label and nodename pinctrl subnodes that end with gpio")
7a87adbc4afe ("arm64: dts: rockchip: enable DC charger detection pullup on Pinebook Pro")
40df91a894e9 ("arm64: dts: rockchip: fix inverted headphone detection on Pinebook Pro")
5a65505a6988 ("arm64: dts: rockchip: Add initial support for Pinebook Pro")
c2753d15d2b3 ("arm64: dts: rockchip: split rk3399-rockpro64 for v2 and v2.1 boards")
cfd66c682e8b ("arm64: dts: rockchip: Add regulators for PCIe for Radxa Rock Pi 4 board")
023115cdea26 ("arm64: dts: rockchip: add thermal infrastructure to px30")
526ba2e2cf61 ("arm64: dts: rockchip: Enable PCIe for Radxa Rock Pi 4 board")
eb275167d186 ("Merge tag 'armsoc-dt' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cee572756aa2cb46e959e9797ad4b730b78a050b Mon Sep 17 00:00:00 2001
From: Christopher Obbard <chris.obbard(a)collabora.com>
Date: Wed, 5 Jul 2023 15:42:54 +0100
Subject: [PATCH] arm64: dts: rockchip: Disable HS400 for eMMC on ROCK Pi 4
There is some instablity with some eMMC modules on ROCK Pi 4 SBCs running
in HS400 mode. This ends up resulting in some block errors after a while
or after a "heavy" operation utilising the eMMC (e.g. resizing a
filesystem). An example of these errors is as follows:
[ 289.171014] mmc1: running CQE recovery
[ 290.048972] mmc1: running CQE recovery
[ 290.054834] mmc1: running CQE recovery
[ 290.060817] mmc1: running CQE recovery
[ 290.061337] blk_update_request: I/O error, dev mmcblk1, sector 1411072 op 0x1:(WRITE) flags 0x800 phys_seg 36 prio class 0
[ 290.061370] EXT4-fs warning (device mmcblk1p1): ext4_end_bio:348: I/O error 10 writing to inode 29547 starting block 176466)
[ 290.061484] Buffer I/O error on device mmcblk1p1, logical block 172288
[ 290.061531] Buffer I/O error on device mmcblk1p1, logical block 172289
[ 290.061551] Buffer I/O error on device mmcblk1p1, logical block 172290
[ 290.061574] Buffer I/O error on device mmcblk1p1, logical block 172291
[ 290.061592] Buffer I/O error on device mmcblk1p1, logical block 172292
[ 290.061615] Buffer I/O error on device mmcblk1p1, logical block 172293
[ 290.061632] Buffer I/O error on device mmcblk1p1, logical block 172294
[ 290.061654] Buffer I/O error on device mmcblk1p1, logical block 172295
[ 290.061673] Buffer I/O error on device mmcblk1p1, logical block 172296
[ 290.061695] Buffer I/O error on device mmcblk1p1, logical block 172297
Disabling the Command Queue seems to stop the CQE recovery from running,
but doesn't seem to improve the I/O errors. Until this can be investigated
further, disable HS400 mode on the ROCK Pi 4 SBCs to at least stop I/O
errors from occurring.
While we are here, set the eMMC maximum clock frequency to 1.5MHz to
follow the ROCK 4C+.
Fixes: 1b5715c602fd ("arm64: dts: rockchip: add ROCK Pi 4 DTS support")
Signed-off-by: Christopher Obbard <chris.obbard(a)collabora.com>
Tested-By: Folker Schwesinger <dev(a)folker-schwesinger.de>
Link: https://lore.kernel.org/r/20230705144255.115299-2-chris.obbard@collabora.com
Signed-off-by: Heiko Stuebner <heiko(a)sntech.de>
diff --git a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
index 907071d4fe80..95efee311ece 100644
--- a/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi
@@ -645,9 +645,9 @@ &saradc {
};
&sdhci {
+ max-frequency = <150000000>;
bus-width = <8>;
- mmc-hs400-1_8v;
- mmc-hs400-enhanced-strobe;
+ mmc-hs200-1_8v;
non-removable;
status = "okay";
};
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register scale and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register scaling is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource,
and use standard register offsets from that base in order
to calculate register addresses (the IO address translation
will then apply the correct ISA window base and scaling).
Encode PIO_OFFSET into IO port addresses for all registers
except the data transfer register. Encode the MMIO offset
there (pata_falcon_data_xfer() directly uses raw IO with
no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
---
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 55 ++++++++++++++++++++++++---------------
1 file changed, 34 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..346259e3bbc8 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_scale = 4;
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,39 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_scale = 1;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+
+ ata_port_desc(ap, "cmd %pa ctl %pa",
+ &base_res->start,
+ &ctl_res->start);
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+
+ ata_port_desc(ap, "cmd %pa ctl %pa",
+ &base_mem_res->start,
+ &ctl_mem_res->start);
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + 1 * reg_scale;
+ ap->ioaddr.feature_addr = base + io_offset + 1 * reg_scale;
+ ap->ioaddr.nsect_addr = base + io_offset + 2 * reg_scale;
+ ap->ioaddr.lbal_addr = base + io_offset + 3 * reg_scale;
+ ap->ioaddr.lbam_addr = base + io_offset + 4 * reg_scale;
+ ap->ioaddr.lbah_addr = base + io_offset + 5 * reg_scale;
+ ap->ioaddr.device_addr = base + io_offset + 6 * reg_scale;
+ ap->ioaddr.status_addr = base + io_offset + 7 * reg_scale;
+ ap->ioaddr.command_addr = base + io_offset + 7 * reg_scale;
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
This is a backport of the series that fixes the way deadline bandwidth
restoration is done which is causing noticeable delay on resume path. It also
converts the cpuset lock back into a mutex which some users on Android too.
I lack the details but AFAIU the read/write semaphore was slower on high
contention.
Compile tested against some randconfig for different archs and tested against
android14-6.1 GKI kernel.
My testing is limited to resume path only; and general phone usage to make sure
nothing falls apart. Would be good to have some deadline specific testing done
too.
Based on v6.1.46
Original series:
https://lore.kernel.org/lkml/20230508075854.17215-1-juri.lelli@redhat.com/
Thanks!
--
Qais Yousef
Dietmar Eggemann (2):
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails
Juri Lelli (4):
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
sched/cpuset: Bring back cpuset_mutex
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
cgroup/cpuset: Iterate only if DEADLINE tasks are present
include/linux/cpuset.h | 12 +-
include/linux/sched.h | 4 +-
kernel/cgroup/cgroup.c | 4 +
kernel/cgroup/cpuset.c | 246 ++++++++++++++++++++++++++--------------
kernel/sched/core.c | 41 +++----
kernel/sched/deadline.c | 67 ++++++++---
kernel/sched/sched.h | 2 +-
7 files changed, 246 insertions(+), 130 deletions(-)
--
2.34.1
The quilt patch titled
Subject: nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
has been removed from the -mm tree. Its filename was
nilfs2-fix-warning-in-mark_buffer_dirty-due-to-discarded-buffer-reuse.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse
Date: Fri, 18 Aug 2023 22:18:04 +0900
A syzbot stress test using a corrupted disk image reported that
mark_buffer_dirty() called from __nilfs_mark_inode_dirty() or
nilfs_palloc_commit_alloc_entry() may output a kernel warning, and can
panic if the kernel is booted with panic_on_warn.
This is because nilfs2 keeps buffer pointers in local structures for some
metadata and reuses them, but such buffers may be forcibly discarded by
nilfs_clear_dirty_page() in some critical situations.
This issue is reported to appear after commit 28a65b49eb53 ("nilfs2: do
not write dirty data after degenerating to read-only"), but the issue has
potentially existed before.
Fix this issue by checking the uptodate flag when attempting to reuse an
internally held buffer, and reloading the metadata instead of reusing the
buffer if the flag was lost.
Link: https://lkml.kernel.org/r/20230818131804.7758-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+cdfcae656bac88ba0e2d(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000003da75f05fdeffd12@google.com
Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 3.10+
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/alloc.c | 3 ++-
fs/nilfs2/inode.c | 7 +++++--
2 files changed, 7 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/alloc.c~nilfs2-fix-warning-in-mark_buffer_dirty-due-to-discarded-buffer-reuse
+++ a/fs/nilfs2/alloc.c
@@ -205,7 +205,8 @@ static int nilfs_palloc_get_block(struct
int ret;
spin_lock(lock);
- if (prev->bh && blkoff == prev->blkoff) {
+ if (prev->bh && blkoff == prev->blkoff &&
+ likely(buffer_uptodate(prev->bh))) {
get_bh(prev->bh);
*bhp = prev->bh;
spin_unlock(lock);
--- a/fs/nilfs2/inode.c~nilfs2-fix-warning-in-mark_buffer_dirty-due-to-discarded-buffer-reuse
+++ a/fs/nilfs2/inode.c
@@ -1025,7 +1025,7 @@ int nilfs_load_inode_block(struct inode
int err;
spin_lock(&nilfs->ns_inode_lock);
- if (ii->i_bh == NULL) {
+ if (ii->i_bh == NULL || unlikely(!buffer_uptodate(ii->i_bh))) {
spin_unlock(&nilfs->ns_inode_lock);
err = nilfs_ifile_get_inode_block(ii->i_root->ifile,
inode->i_ino, pbh);
@@ -1034,7 +1034,10 @@ int nilfs_load_inode_block(struct inode
spin_lock(&nilfs->ns_inode_lock);
if (ii->i_bh == NULL)
ii->i_bh = *pbh;
- else {
+ else if (unlikely(!buffer_uptodate(ii->i_bh))) {
+ __brelse(ii->i_bh);
+ ii->i_bh = *pbh;
+ } else {
brelse(*pbh);
*pbh = ii->i_bh;
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy
has been removed from the -mm tree. Its filename was
memfd-replace-ratcheting-feature-from-vmmemfd_noexec-with-hierarchy.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Aleksa Sarai <cyphar(a)cyphar.com>
Subject: memfd: replace ratcheting feature from vm.memfd_noexec with hierarchy
Date: Mon, 14 Aug 2023 18:41:00 +1000
This sysctl has the very unusual behaviour of not allowing any user (even
CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you were
to set this sysctl to a more restrictive option in the host pidns you
would need to reboot your machine in order to reset it.
The justification given in [1] is that this is a security feature and thus
it should not be possible to disable. Aside from the fact that we have
plenty of security-related sysctls that can be disabled after being
enabled (fs.protected_symlinks for instance), the protection provided by
the sysctl is to stop users from being able to create a binary and then
execute it. A user with CAP_SYS_ADMIN can trivially do this without
memfd_create(2):
% cat mount-memfd.c
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/mount.h>
#define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:"
int main(void)
{
int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
assert(fsfd >= 0);
assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2));
int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
assert(dfd >= 0);
int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782);
assert(execfd >= 0);
assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE));
assert(!close(execfd));
char *execpath = NULL;
char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL };
execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC);
assert(execfd >= 0);
assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0);
assert(!execve(execpath, argv, envp));
}
% ./mount-memfd
this file was executed from this totally private tmpfs: /proc/self/fd/5
%
Given that it is possible for CAP_SYS_ADMIN users to create executable
binaries without memfd_create(2) and without touching the host filesystem
(not to mention the many other things a CAP_SYS_ADMIN process would be
able to do that would be equivalent or worse), it seems strange to cause a
fair amount of headache to admins when there doesn't appear to be an
actual security benefit to blocking this. There appear to be concerns
about confused-deputy-esque attacks[2] but a confused deputy that can
write to arbitrary sysctls is a bigger security issue than executable
memfds.
/* New API */
The primary requirement from the original author appears to be more based
on the need to be able to restrict an entire system in a hierarchical
manner[3], such that child namespaces cannot re-enable executable memfds.
So, implement that behaviour explicitly -- the vm.memfd_noexec scope is
evaluated up the pidns tree to &init_pid_ns and you have the most
restrictive value applied to you. The new lower limit you can set
vm.memfd_noexec is whatever limit applies to your parent.
Note that a pidns will inherit a copy of the parent pidns's effective
vm.memfd_noexec setting at unshare() time. This matches the existing
behaviour, and it also ensures that a pidns will never have its
vm.memfd_noexec setting *lowered* behind its back (but it will be raised
if the parent raises theirs).
/* Backwards Compatibility */
As the previous version of the sysctl didn't allow you to lower the
setting at all, there are no backwards compatibility issues with this
aspect of the change.
However it should be noted that now that the setting is completely
hierarchical. Previously, a cloned pidns would just copy the current
pidns setting, meaning that if the parent's vm.memfd_noexec was changed it
wouldn't propoagate to existing pid namespaces. Now, the restriction
applies recursively. This is a uAPI change, however:
* The sysctl is very new, having been merged in 6.3.
* Several aspects of the sysctl were broken up until this patchset and
the other patchset by Jeff Xu last month.
And thus it seems incredibly unlikely that any real users would run into
this issue. In the worst case, if this causes userspace isues we could
make it so that modifying the setting follows the hierarchical rules but
the restriction checking uses the cached copy.
[1]: https://lore.kernel.org/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5TKHJw…
[2]: https://lore.kernel.org/CALmYWFs_dNCzw_pW1yRAo4bGCPEtykroEQaowNULp7svwMLjOg…
[3]: https://lore.kernel.org/CALmYWFuahdUF7cT4cm7_TGLqPanuHXJ-hVSfZt7vpTnc18DPrw…
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-4-7ff9e3e1…
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Dominique Martinet <asmadeus(a)codewreck.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Daniel Verkamp <dverkamp(a)chromium.org>
Cc: Jeff Xu <jeffxu(a)google.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pid_namespace.h | 23 ++++++++++++++++++++++-
kernel/pid.c | 3 +++
kernel/pid_namespace.c | 6 +++---
kernel/pid_sysctl.h | 30 +++++++++++++-----------------
mm/memfd.c | 3 ++-
5 files changed, 43 insertions(+), 22 deletions(-)
--- a/include/linux/pid_namespace.h~memfd-replace-ratcheting-feature-from-vmmemfd_noexec-with-hierarchy
+++ a/include/linux/pid_namespace.h
@@ -39,7 +39,6 @@ struct pid_namespace {
int reboot; /* group exit code if this pidns was rebooted */
struct ns_common ns;
#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
- /* sysctl for vm.memfd_noexec */
int memfd_noexec_scope;
#endif
} __randomize_layout;
@@ -56,6 +55,23 @@ static inline struct pid_namespace *get_
return ns;
}
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
+static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns)
+{
+ int scope = MEMFD_NOEXEC_SCOPE_EXEC;
+
+ for (; ns; ns = ns->parent)
+ scope = max(scope, READ_ONCE(ns->memfd_noexec_scope));
+
+ return scope;
+}
+#else
+static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns)
+{
+ return 0;
+}
+#endif
+
extern struct pid_namespace *copy_pid_ns(unsigned long flags,
struct user_namespace *user_ns, struct pid_namespace *ns);
extern void zap_pid_ns_processes(struct pid_namespace *pid_ns);
@@ -70,6 +86,11 @@ static inline struct pid_namespace *get_
return ns;
}
+static inline int pidns_memfd_noexec_scope(struct pid_namespace *ns)
+{
+ return 0;
+}
+
static inline struct pid_namespace *copy_pid_ns(unsigned long flags,
struct user_namespace *user_ns, struct pid_namespace *ns)
{
--- a/kernel/pid.c~memfd-replace-ratcheting-feature-from-vmmemfd_noexec-with-hierarchy
+++ a/kernel/pid.c
@@ -83,6 +83,9 @@ struct pid_namespace init_pid_ns = {
#ifdef CONFIG_PID_NS
.ns.ops = &pidns_operations,
#endif
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
+ .memfd_noexec_scope = MEMFD_NOEXEC_SCOPE_EXEC,
+#endif
};
EXPORT_SYMBOL_GPL(init_pid_ns);
--- a/kernel/pid_namespace.c~memfd-replace-ratcheting-feature-from-vmmemfd_noexec-with-hierarchy
+++ a/kernel/pid_namespace.c
@@ -110,9 +110,9 @@ static struct pid_namespace *create_pid_
ns->user_ns = get_user_ns(user_ns);
ns->ucounts = ucounts;
ns->pid_allocated = PIDNS_ADDING;
-
- initialize_memfd_noexec_scope(ns);
-
+#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
+ ns->memfd_noexec_scope = pidns_memfd_noexec_scope(parent_pid_ns);
+#endif
return ns;
out_free_idr:
--- a/kernel/pid_sysctl.h~memfd-replace-ratcheting-feature-from-vmmemfd_noexec-with-hierarchy
+++ a/kernel/pid_sysctl.h
@@ -5,33 +5,30 @@
#include <linux/pid_namespace.h>
#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
-static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns)
-{
- ns->memfd_noexec_scope =
- task_active_pid_ns(current)->memfd_noexec_scope;
-}
-
static int pid_mfd_noexec_dointvec_minmax(struct ctl_table *table,
int write, void *buf, size_t *lenp, loff_t *ppos)
{
struct pid_namespace *ns = task_active_pid_ns(current);
struct ctl_table table_copy;
+ int err, scope, parent_scope;
if (write && !ns_capable(ns->user_ns, CAP_SYS_ADMIN))
return -EPERM;
table_copy = *table;
- if (ns != &init_pid_ns)
- table_copy.data = &ns->memfd_noexec_scope;
-
- /*
- * set minimum to current value, the effect is only bigger
- * value is accepted.
- */
- if (*(int *)table_copy.data > *(int *)table_copy.extra1)
- table_copy.extra1 = table_copy.data;
- return proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos);
+ /* You cannot set a lower enforcement value than your parent. */
+ parent_scope = pidns_memfd_noexec_scope(ns->parent);
+ /* Equivalent to pidns_memfd_noexec_scope(ns). */
+ scope = max(READ_ONCE(ns->memfd_noexec_scope), parent_scope);
+
+ table_copy.data = &scope;
+ table_copy.extra1 = &parent_scope;
+
+ err = proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos);
+ if (!err && write)
+ WRITE_ONCE(ns->memfd_noexec_scope, scope);
+ return err;
}
static struct ctl_table pid_ns_ctl_table_vm[] = {
@@ -51,7 +48,6 @@ static inline void register_pid_ns_sysct
register_sysctl("vm", pid_ns_ctl_table_vm);
}
#else
-static inline void initialize_memfd_noexec_scope(struct pid_namespace *ns) {}
static inline void register_pid_ns_sysctl_table_vm(void) {}
#endif
--- a/mm/memfd.c~memfd-replace-ratcheting-feature-from-vmmemfd_noexec-with-hierarchy
+++ a/mm/memfd.c
@@ -271,7 +271,8 @@ long memfd_fcntl(struct file *file, unsi
static int check_sysctl_memfd_noexec(unsigned int *flags)
{
#ifdef CONFIG_SYSCTL
- int sysctl = task_active_pid_ns(current)->memfd_noexec_scope;
+ struct pid_namespace *ns = task_active_pid_ns(current);
+ int sysctl = pidns_memfd_noexec_scope(ns);
if (!(*flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
if (sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL)
_
Patches currently in -mm which might be from cyphar(a)cyphar.com are
The quilt patch titled
Subject: memfd: improve userspace warnings for missing exec-related flags
has been removed from the -mm tree. Its filename was
memfd-improve-userspace-warnings-for-missing-exec-related-flags.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Aleksa Sarai <cyphar(a)cyphar.com>
Subject: memfd: improve userspace warnings for missing exec-related flags
Date: Mon, 14 Aug 2023 18:40:59 +1000
In order to incentivise userspace to switch to passing MFD_EXEC and
MFD_NOEXEC_SEAL, we need to provide a warning on each attempt to call
memfd_create() without the new flags. pr_warn_once() is not useful
because on most systems the one warning is burned up during the boot
process (on my system, systemd does this within the first second of boot)
and thus userspace will in practice never see the warnings to push them to
switch to the new flags.
The original patchset[1] used pr_warn_ratelimited(), however there were
concerns about the degree of spam in the kernel log[2,3]. The resulting
inability to detect every case was flagged as an issue at the time[4].
While we could come up with an alternative rate-limiting scheme such as
only outputting the message if vm.memfd_noexec has been modified, or only
outputting the message once for a given task, these alternatives have
downsides that don't make sense given how low-stakes a single kernel
warning message is. Switching to pr_info_ratelimited() instead should be
fine -- it's possible some monitoring tool will be unhappy with a stream
of warning-level messages but there's already plenty of info-level message
spam in dmesg.
[1]: https://lore.kernel.org/20221215001205.51969-4-jeffxu@google.com/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
[3]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[4]: https://lore.kernel.org/f185bb42-b29c-977e-312e-3349eea15383@linuxfoundatio…
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-3-7ff9e3e1…
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Daniel Verkamp <dverkamp(a)chromium.org>
Cc: Dominique Martinet <asmadeus(a)codewreck.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memfd.c~memfd-improve-userspace-warnings-for-missing-exec-related-flags
+++ a/mm/memfd.c
@@ -315,7 +315,7 @@ SYSCALL_DEFINE2(memfd_create,
return -EINVAL;
if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
- pr_warn_once(
+ pr_info_ratelimited(
"%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
current->comm, task_pid_nr(current));
}
_
Patches currently in -mm which might be from cyphar(a)cyphar.com are
The quilt patch titled
Subject: memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2
has been removed from the -mm tree. Its filename was
memfd-do-not-eacces-old-memfd_create-users-with-vmmemfd_noexec=2.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Aleksa Sarai <cyphar(a)cyphar.com>
Subject: memfd: do not -EACCES old memfd_create() users with vm.memfd_noexec=2
Date: Mon, 14 Aug 2023 18:40:58 +1000
Given the difficulty of auditing all of userspace to figure out whether
every memfd_create() user has switched to passing MFD_EXEC and
MFD_NOEXEC_SEAL flags, it seems far less distruptive to make it possible
for older programs that don't make use of executable memfds to run under
vm.memfd_noexec=2. Otherwise, a small dependency change can result in
spurious errors. For programs that don't use executable memfds, passing
MFD_NOEXEC_SEAL is functionally a no-op and thus having the same
In addition, every failure under vm.memfd_noexec=2 needs to print to the
kernel log so that userspace can figure out where the error came from.
The concerns about pr_warn_ratelimited() spam that caused the switch to
pr_warn_once()[1,2] do not apply to the vm.memfd_noexec=2 case.
This is a user-visible API change, but as it allows programs to do
something that would be blocked before, and the sysctl itself was broken
and recently released, it seems unlikely this will cause any issues.
[1]: https://lore.kernel.org/Y5yS8wCnuYGLHMj4@x1n/
[2]: https://lore.kernel.org/202212161233.85C9783FB@keescook/
Link: https://lkml.kernel.org/r/20230814-memfd-vm-noexec-uapi-fixes-v2-2-7ff9e3e1…
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Dominique Martinet <asmadeus(a)codewreck.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Daniel Verkamp <dverkamp(a)chromium.org>
Cc: Jeff Xu <jeffxu(a)google.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/pid_namespace.h | 16 ++--------
mm/memfd.c | 30 ++++++-------------
tools/testing/selftests/memfd/memfd_test.c | 22 ++++++++++---
3 files changed, 32 insertions(+), 36 deletions(-)
--- a/include/linux/pid_namespace.h~memfd-do-not-eacces-old-memfd_create-users-with-vmmemfd_noexec=2
+++ a/include/linux/pid_namespace.h
@@ -17,18 +17,10 @@
struct fs_pin;
#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
-/*
- * sysctl for vm.memfd_noexec
- * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
- * acts like MFD_EXEC was set.
- * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
- * acts like MFD_NOEXEC_SEAL was set.
- * 2: memfd_create() without MFD_NOEXEC_SEAL will be
- * rejected.
- */
-#define MEMFD_NOEXEC_SCOPE_EXEC 0
-#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1
-#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2
+/* modes for vm.memfd_noexec sysctl */
+#define MEMFD_NOEXEC_SCOPE_EXEC 0 /* MFD_EXEC implied if unset */
+#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1 /* MFD_NOEXEC_SEAL implied if unset */
+#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2 /* same as 1, except MFD_EXEC rejected */
#endif
struct pid_namespace {
--- a/mm/memfd.c~memfd-do-not-eacces-old-memfd_create-users-with-vmmemfd_noexec=2
+++ a/mm/memfd.c
@@ -271,30 +271,22 @@ long memfd_fcntl(struct file *file, unsi
static int check_sysctl_memfd_noexec(unsigned int *flags)
{
#ifdef CONFIG_SYSCTL
- char comm[TASK_COMM_LEN];
- int sysctl = MEMFD_NOEXEC_SCOPE_EXEC;
- struct pid_namespace *ns;
-
- ns = task_active_pid_ns(current);
- if (ns)
- sysctl = ns->memfd_noexec_scope;
+ int sysctl = task_active_pid_ns(current)->memfd_noexec_scope;
if (!(*flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
- if (sysctl == MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL)
+ if (sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL)
*flags |= MFD_NOEXEC_SEAL;
else
*flags |= MFD_EXEC;
}
- if (*flags & MFD_EXEC && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) {
- pr_warn_once(
- "memfd_create(): MFD_NOEXEC_SEAL is enforced, pid=%d '%s'\n",
- task_pid_nr(current), get_task_comm(comm, current));
-
+ if (!(*flags & MFD_NOEXEC_SEAL) && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) {
+ pr_err_ratelimited(
+ "%s[%d]: memfd_create() requires MFD_NOEXEC_SEAL with vm.memfd_noexec=%d\n",
+ current->comm, task_pid_nr(current), sysctl);
return -EACCES;
}
#endif
-
return 0;
}
@@ -302,7 +294,6 @@ SYSCALL_DEFINE2(memfd_create,
const char __user *, uname,
unsigned int, flags)
{
- char comm[TASK_COMM_LEN];
unsigned int *file_seals;
struct file *file;
int fd, error;
@@ -325,12 +316,13 @@ SYSCALL_DEFINE2(memfd_create,
if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
pr_warn_once(
- "memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=%d '%s'\n",
- task_pid_nr(current), get_task_comm(comm, current));
+ "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
+ current->comm, task_pid_nr(current));
}
- if (check_sysctl_memfd_noexec(&flags) < 0)
- return -EACCES;
+ error = check_sysctl_memfd_noexec(&flags);
+ if (error < 0)
+ return error;
/* length includes terminating zero */
len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
--- a/tools/testing/selftests/memfd/memfd_test.c~memfd-do-not-eacces-old-memfd_create-users-with-vmmemfd_noexec=2
+++ a/tools/testing/selftests/memfd/memfd_test.c
@@ -1145,11 +1145,23 @@ static void test_sysctl_child(void)
printf("%s sysctl 2\n", memfd_str);
sysctl_assert_write("2");
- mfd_fail_new("kern_memfd_sysctl_2",
- MFD_CLOEXEC | MFD_ALLOW_SEALING);
- mfd_fail_new("kern_memfd_sysctl_2_MFD_EXEC",
- MFD_CLOEXEC | MFD_EXEC);
- fd = mfd_assert_new("", 0, MFD_NOEXEC_SEAL);
+ mfd_fail_new("kern_memfd_sysctl_2_exec",
+ MFD_EXEC | MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_2_dfl",
+ mfd_def_size,
+ MFD_CLOEXEC | MFD_ALLOW_SEALING);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_fail_chmod(fd, 0777);
+ close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_seal",
+ mfd_def_size,
+ MFD_NOEXEC_SEAL | MFD_CLOEXEC | MFD_ALLOW_SEALING);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_fail_chmod(fd, 0777);
close(fd);
sysctl_fail_write("0");
_
Patches currently in -mm which might be from cyphar(a)cyphar.com are
The quilt patch titled
Subject: Multi-gen LRU: avoid race in inc_min_seq()
has been removed from the -mm tree. Its filename was
mm-unstable-multi-gen-lru-avoid-race-in-inc_min_seq.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kalesh Singh <kaleshsingh(a)google.com>
Subject: Multi-gen LRU: avoid race in inc_min_seq()
Date: Tue, 1 Aug 2023 19:56:03 -0700
inc_max_seq() will try to inc_min_seq() if nr_gens == MAX_NR_GENS. This
is because the generations are reused (the last oldest now empty
generation will become the next youngest generation).
inc_min_seq() is retried until successful, dropping the lru_lock
and yielding the CPU on each failure, and retaking the lock before
trying again:
while (!inc_min_seq(lruvec, type, can_swap)) {
spin_unlock_irq(&lruvec->lru_lock);
cond_resched();
spin_lock_irq(&lruvec->lru_lock);
}
However, the initial condition that required incrementing the min_seq
(nr_gens == MAX_NR_GENS) is not retested. This can change by another
call to inc_max_seq() from run_aging() with force_scan=true from the
debugfs interface.
Since the eviction stalls when the nr_gens == MIN_NR_GENS, avoid
unnecessarily incrementing the min_seq by rechecking the number of
generations before each attempt.
This issue was uncovered in previous discussion on the list by Yu Zhao
and Aneesh Kumar [1].
[1] https://lore.kernel.org/linux-mm/CAOUHufbO7CaVm=xjEb1avDhHVvnC8pJmGyKcFf2iY…
Link: https://lkml.kernel.org/r/20230802025606.346758-2-kaleshsingh@google.com
Fixes: d6c3af7d8a2b ("mm: multi-gen LRU: debugfs interface")
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com> [mediatek]
Tested-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Aneesh Kumar K V <aneesh.kumar(a)linux.ibm.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Jan Alexander Steffens (heftig) <heftig(a)archlinux.org>
Cc: Lecopzer Chen <lecopzer.chen(a)mediatek.com>
Cc: Matthias Brugger <matthias.bgg(a)gmail.com>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Steven Barrett <steven(a)liquorix.net>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
--- a/mm/vmscan.c~mm-unstable-multi-gen-lru-avoid-race-in-inc_min_seq
+++ a/mm/vmscan.c
@@ -4439,7 +4439,7 @@ static void inc_max_seq(struct lruvec *l
int prev, next;
int type, zone;
struct lru_gen_folio *lrugen = &lruvec->lrugen;
-
+restart:
spin_lock_irq(&lruvec->lru_lock);
VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4450,11 +4450,12 @@ static void inc_max_seq(struct lruvec *l
VM_WARN_ON_ONCE(!force_scan && (type == LRU_GEN_FILE || can_swap));
- while (!inc_min_seq(lruvec, type, can_swap)) {
- spin_unlock_irq(&lruvec->lru_lock);
- cond_resched();
- spin_lock_irq(&lruvec->lru_lock);
- }
+ if (inc_min_seq(lruvec, type, can_swap))
+ continue;
+
+ spin_unlock_irq(&lruvec->lru_lock);
+ cond_resched();
+ goto restart;
}
/*
_
Patches currently in -mm which might be from kaleshsingh(a)google.com are
The quilt patch titled
Subject: Multi-gen LRU: fix per-zone reclaim
has been removed from the -mm tree. Its filename was
mm-unstable-multi-gen-lru-fix-per-zone-reclaim.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Kalesh Singh <kaleshsingh(a)google.com>
Subject: Multi-gen LRU: fix per-zone reclaim
Date: Tue, 1 Aug 2023 19:56:02 -0700
MGLRU has a LRU list for each zone for each type (anon/file) in each
generation:
long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
The min_seq (oldest generation) can progress independently for each
type but the max_seq (youngest generation) is shared for both anon and
file. This is to maintain a common frame of reference.
In order for eviction to advance the min_seq of a type, all the per-zone
lists in the oldest generation of that type must be empty.
The eviction logic only considers pages from eligible zones for
eviction or promotion.
scan_folios() {
...
for (zone = sc->reclaim_idx; zone >= 0; zone--) {
...
sort_folio(); // Promote
...
isolate_folio(); // Evict
}
...
}
Consider the system has the movable zone configured and default 4
generations. The current state of the system is as shown below
(only illustrating one type for simplicity):
Type: ANON
Zone DMA32 Normal Movable Device
Gen 0 0 0 4GB 0
Gen 1 0 1GB 1MB 0
Gen 2 1MB 4GB 1MB 0
Gen 3 1MB 1MB 1MB 0
Now consider there is a GFP_KERNEL allocation request (eligible zone
index <= Normal), evict_folios() will return without doing any work
since there are no pages to scan in the eligible zones of the oldest
generation. Reclaim won't make progress until triggered from a ZONE_MOVABLE
allocation request; which may not happen soon if there is a lot of free
memory in the movable zone. This can lead to OOM kills, although there
is 1GB pages in the Normal zone of Gen 1 that we have not yet tried to
reclaim.
This issue is not seen in the conventional active/inactive LRU since
there are no per-zone lists.
If there are no (not enough) folios to scan in the eligible zones, move
folios from ineligible zone (zone_index > reclaim_index) to the next
generation. This allows for the progression of min_seq and reclaiming
from the next generation (Gen 1).
Qualcomm, Mediatek and raspberrypi [1] discovered this issue independently.
[1] https://github.com/raspberrypi/linux/issues/5395
Link: https://lkml.kernel.org/r/20230802025606.346758-1-kaleshsingh@google.com
Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation")
Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com>
Reported-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Reported-by: Lecopzer Chen <lecopzer.chen(a)mediatek.com>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com> [mediatek]
Tested-by: Charan Teja Kalla <quic_charante(a)quicinc.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Jan Alexander Steffens (heftig) <heftig(a)archlinux.org>
Cc: Matthias Brugger <matthias.bgg(a)gmail.com>
Cc: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: Steven Barrett <steven(a)liquorix.net>
Cc: Suleiman Souhlal <suleiman(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Aneesh Kumar K V <aneesh.kumar(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
--- a/mm/vmscan.c~mm-unstable-multi-gen-lru-fix-per-zone-reclaim
+++ a/mm/vmscan.c
@@ -4889,7 +4889,8 @@ static int lru_gen_memcg_seg(struct lruv
* the eviction
******************************************************************************/
-static bool sort_folio(struct lruvec *lruvec, struct folio *folio, int tier_idx)
+static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_control *sc,
+ int tier_idx)
{
bool success;
int gen = folio_lru_gen(folio);
@@ -4939,6 +4940,13 @@ static bool sort_folio(struct lruvec *lr
return true;
}
+ /* ineligible */
+ if (zone > sc->reclaim_idx) {
+ gen = folio_inc_gen(lruvec, folio, false);
+ list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
+ return true;
+ }
+
/* waiting for writeback */
if (folio_test_locked(folio) || folio_test_writeback(folio) ||
(type == LRU_GEN_FILE && folio_test_dirty(folio))) {
@@ -4987,7 +4995,8 @@ static bool isolate_folio(struct lruvec
static int scan_folios(struct lruvec *lruvec, struct scan_control *sc,
int type, int tier, struct list_head *list)
{
- int gen, zone;
+ int i;
+ int gen;
enum vm_event_item item;
int sorted = 0;
int scanned = 0;
@@ -5003,9 +5012,10 @@ static int scan_folios(struct lruvec *lr
gen = lru_gen_from_seq(lrugen->min_seq[type]);
- for (zone = sc->reclaim_idx; zone >= 0; zone--) {
+ for (i = MAX_NR_ZONES; i > 0; i--) {
LIST_HEAD(moved);
int skipped = 0;
+ int zone = (sc->reclaim_idx + i) % MAX_NR_ZONES;
struct list_head *head = &lrugen->folios[gen][type][zone];
while (!list_empty(head)) {
@@ -5019,7 +5029,7 @@ static int scan_folios(struct lruvec *lr
scanned += delta;
- if (sort_folio(lruvec, folio, tier))
+ if (sort_folio(lruvec, folio, sc, tier))
sorted += delta;
else if (isolate_folio(lruvec, folio, sc)) {
list_add(&folio->lru, list);
_
Patches currently in -mm which might be from kaleshsingh(a)google.com are
The quilt patch titled
Subject: mm: multi-gen LRU: don't spin during memcg release
has been removed from the -mm tree. Its filename was
mm-multi-gen-lru-dont-spin-during-memcg-release.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "T.J. Mercier" <tjmercier(a)google.com>
Subject: mm: multi-gen LRU: don't spin during memcg release
Date: Mon, 14 Aug 2023 15:16:36 +0000
When a memcg is in the process of being released mem_cgroup_tryget will
fail because its reference count has already reached 0. This can happen
during reclaim if the memcg has already been offlined, and we reclaim all
remaining pages attributed to the offlined memcg. shrink_many attempts to
skip the empty memcg in this case, and continue reclaiming from the
remaining memcgs in the old generation. If there is only one memcg
remaining, or if all remaining memcgs are in the process of being released
then shrink_many will spin until all memcgs have finished being released.
The release occurs through a workqueue, so it can take a while before
kswapd is able to make any further progress.
This fix results in reductions in kswapd activity and direct reclaim in
a test where 28 apps (working set size > total memory) are repeatedly
launched in a random sequence:
A B delta ratio(%)
allocstall_movable 5962 3539 -2423 -40.64
allocstall_normal 2661 2417 -244 -9.17
kswapd_high_wmark_hit_quickly 53152 7594 -45558 -85.71
pageoutrun 57365 11750 -45615 -79.52
Link: https://lkml.kernel.org/r/20230814151636.1639123-1-tjmercier@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: T.J. Mercier <tjmercier(a)google.com>
Acked-by: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
--- a/mm/vmscan.c~mm-multi-gen-lru-dont-spin-during-memcg-release
+++ a/mm/vmscan.c
@@ -4854,16 +4854,17 @@ void lru_gen_release_memcg(struct mem_cg
spin_lock_irq(&pgdat->memcg_lru.lock);
- VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+ if (hlist_nulls_unhashed(&lruvec->lrugen.list))
+ goto unlock;
gen = lruvec->lrugen.gen;
- hlist_nulls_del_rcu(&lruvec->lrugen.list);
+ hlist_nulls_del_init_rcu(&lruvec->lrugen.list);
pgdat->memcg_lru.nr_memcgs[gen]--;
if (!pgdat->memcg_lru.nr_memcgs[gen] && gen == get_memcg_gen(pgdat->memcg_lru.seq))
WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
-
+unlock:
spin_unlock_irq(&pgdat->memcg_lru.lock);
}
}
@@ -5435,8 +5436,10 @@ restart:
rcu_read_lock();
hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][bin], list) {
- if (op)
+ if (op) {
lru_gen_rotate_memcg(lruvec, op);
+ op = 0;
+ }
mem_cgroup_put(memcg);
@@ -5444,7 +5447,7 @@ restart:
memcg = lruvec_memcg(lruvec);
if (!mem_cgroup_tryget(memcg)) {
- op = 0;
+ lru_gen_release_memcg(memcg);
memcg = NULL;
continue;
}
_
Patches currently in -mm which might be from tjmercier(a)google.com are
The quilt patch titled
Subject: mm: memory-failure: fix unexpected return value in soft_offline_page()
has been removed from the -mm tree. Its filename was
mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm: memory-failure: fix unexpected return value in soft_offline_page()
Date: Tue, 27 Jun 2023 19:28:08 +0800
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page
+++ a/mm/memory-failure.c
@@ -2741,10 +2741,13 @@ retry:
if (ret > 0) {
ret = soft_offline_in_use_page(page);
} else if (ret == 0) {
- if (!page_handle_poison(page, true, false) && try_again) {
- try_again = false;
- flags &= ~MF_COUNT_INCREASED;
- goto retry;
+ if (!page_handle_poison(page, true, false)) {
+ if (try_again) {
+ try_again = false;
+ flags &= ~MF_COUNT_INCREASED;
+ goto retry;
+ }
+ ret = -EBUSY;
}
}
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-fix-potential-page-refcnt-leak-in-memory_failure.patch
mm-memcg-fix-obsolete-function-name-in-mem_cgroup_protection.patch
mm-memory-failure-add-pageoffline-check.patch
mm-page_alloc-avoid-unneeded-alike_pages-calculation.patch
mm-memcg-update-obsolete-comment-above-parent_mem_cgroup.patch
mm-page_alloc-remove-unneeded-variable-base.patch
mm-memcg-fix-wrong-function-name-above-obj_cgroup_charge_zswap.patch
mm-memory-failure-use-helper-macro-llist_for_each_entry_safe.patch
mm-mm_init-use-helper-macro-bits_per_long-and-bits_per_byte.patch
The quilt patch titled
Subject: radix tree: remove unused variable
has been removed from the -mm tree. Its filename was
radix-tree-remove-unused-variable.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: radix tree: remove unused variable
Date: Fri, 11 Aug 2023 15:10:13 +0200
Recent versions of clang warn about an unused variable, though older
versions saw the 'slot++' as a use and did not warn:
radix-tree.c:1136:50: error: parameter 'slot' set but not used [-Werror,-Wunused-but-set-parameter]
It's clearly not needed any more, so just remove it.
Link: https://lkml.kernel.org/r/20230811131023.2226509-1-arnd@kernel.org
Fixes: 3a08cd52c37c7 ("radix tree: Remove multiorder support")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Rong Tao <rongtao(a)cestc.cn>
Cc: Tom Rix <trix(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/radix-tree.c | 1 -
1 file changed, 1 deletion(-)
--- a/lib/radix-tree.c~radix-tree-remove-unused-variable
+++ a/lib/radix-tree.c
@@ -1136,7 +1136,6 @@ static void set_iter_tags(struct radix_t
void __rcu **radix_tree_iter_resume(void __rcu **slot,
struct radix_tree_iter *iter)
{
- slot++;
iter->index = __radix_tree_iter_add(iter, 1);
iter->next_index = iter->index;
iter->tags = 0;
_
Patches currently in -mm which might be from arnd(a)arndb.de are
iomem-remove-__weak-ioremap_cache-helper.patch
The quilt patch titled
Subject: mm: add a call to flush_cache_vmap() in vmap_pfn()
has been removed from the -mm tree. Its filename was
mm-add-a-call-to-flush_cache_vmap-in-vmap_pfn.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Subject: mm: add a call to flush_cache_vmap() in vmap_pfn()
Date: Wed, 9 Aug 2023 18:46:33 +0200
flush_cache_vmap() must be called after new vmalloc mappings are installed
in the page table in order to allow architectures to make sure the new
mapping is visible.
It could lead to a panic since on some architectures (like powerpc),
the page table walker could see the wrong pte value and trigger a
spurious page fault that can not be resolved (see commit f1cb8f9beba8
("powerpc/64s/radix: avoid ptesync after set_pte and
ptep_set_access_flags")).
But actually the patch is aiming at riscv: the riscv specification
allows the caching of invalid entries in the TLB, and since we recently
removed the vmalloc page fault handling, we now need to emit a tlb
shootdown whenever a new vmalloc mapping is emitted
(https://lore.kernel.org/linux-riscv/20230725132246.817726-1-alexghiti@rivos…).
That's a temporary solution, there are ways to avoid that :)
Link: https://lkml.kernel.org/r/20230809164633.1556126-1-alexghiti@rivosinc.com
Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
Reported-by: Dylan Jhong <dylan(a)andestech.com>
Closes: https://lore.kernel.org/linux-riscv/ZMytNY2J8iyjbPPy@atctrx.andestech.com/
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Acked-by: Palmer Dabbelt <palmer(a)rivosinc.com>
Reviewed-by: Dylan Jhong <dylan(a)andestech.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/vmalloc.c~mm-add-a-call-to-flush_cache_vmap-in-vmap_pfn
+++ a/mm/vmalloc.c
@@ -2979,6 +2979,10 @@ void *vmap_pfn(unsigned long *pfns, unsi
free_vm_area(area);
return NULL;
}
+
+ flush_cache_vmap((unsigned long)area->addr,
+ (unsigned long)area->addr + count * PAGE_SIZE);
+
return area->addr;
}
EXPORT_SYMBOL_GPL(vmap_pfn);
_
Patches currently in -mm which might be from alexghiti(a)rivosinc.com are
The quilt patch titled
Subject: selftests/mm: FOLL_LONGTERM need to be updated to 0x100
has been removed from the -mm tree. Its filename was
selftests-mm-foll_longterm-need-to-be-updated-to-0x100.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ayush Jain <ayush.jain3(a)amd.com>
Subject: selftests/mm: FOLL_LONGTERM need to be updated to 0x100
Date: Tue, 8 Aug 2023 07:43:47 -0500
After commit 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to
internal.h") FOLL_LONGTERM flag value got updated from 0x10000 to 0x100 at
include/linux/mm_types.h.
As hmm.hmm_device_private.hmm_gup_test uses FOLL_LONGTERM Updating same
here as well.
Before this change test goes in an infinite assert loop in
hmm.hmm_device_private.hmm_gup_test
==========================================================
RUN hmm.hmm_device_private.hmm_gup_test ...
hmm-tests.c:1962:hmm_gup_test:Expected HMM_DMIRROR_PROT_WRITE..
..(2) == m[2] (34)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
hmm-tests.c:157:hmm_gup_test:Expected ret (-1) == 0 (0)
...
==========================================================
Call Trace:
<TASK>
? sched_clock+0xd/0x20
? __lock_acquire.constprop.0+0x120/0x6c0
? ktime_get+0x2c/0xd0
? sched_clock+0xd/0x20
? local_clock+0x12/0xd0
? lock_release+0x26e/0x3b0
pin_user_pages_fast+0x4c/0x70
gup_test_ioctl+0x4ff/0xbb0
? gup_test_ioctl+0x68c/0xbb0
__x64_sys_ioctl+0x99/0xd0
do_syscall_64+0x60/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? syscall_exit_to_user_mode+0x2a/0x50
? do_syscall_64+0x6d/0x90
? irqentry_exit_to_user_mode+0xd/0x20
? irqentry_exit+0x3f/0x50
? exc_page_fault+0x96/0x200
entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f6aaa31aaff
After this change test is able to pass successfully.
Link: https://lkml.kernel.org/r/20230808124347.79163-1-ayush.jain3@amd.com
Fixes: 2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h")
Signed-off-by: Ayush Jain <ayush.jain3(a)amd.com>
Reviewed-by: Raghavendra K T <raghavendra.kt(a)amd.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hmm-tests.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/hmm-tests.c~selftests-mm-foll_longterm-need-to-be-updated-to-0x100
+++ a/tools/testing/selftests/mm/hmm-tests.c
@@ -57,9 +57,14 @@ enum {
#define ALIGN(x, a) (((x) + (a - 1)) & (~((a) - 1)))
/* Just the flags we need, copied from mm.h: */
+
+#ifndef FOLL_WRITE
#define FOLL_WRITE 0x01 /* check pte is writable */
-#define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite */
+#endif
+#ifndef FOLL_LONGTERM
+#define FOLL_LONGTERM 0x100 /* mapping lifetime is indefinite */
+#endif
FIXTURE(hmm)
{
int fd;
_
Patches currently in -mm which might be from ayush.jain3(a)amd.com are
selftests-mm-add-ksm_merge_time-tests.patch
The quilt patch titled
Subject: nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
has been removed from the -mm tree. Its filename was
nilfs2-fix-general-protection-fault-in-nilfs_lookup_dirty_data_buffers.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix general protection fault in nilfs_lookup_dirty_data_buffers()
Date: Sat, 5 Aug 2023 22:20:38 +0900
A syzbot stress test reported that create_empty_buffers() called from
nilfs_lookup_dirty_data_buffers() can cause a general protection fault.
Analysis using its reproducer revealed that the back reference "mapping"
from a page/folio has been changed to NULL after dirty page/folio gang
lookup in nilfs_lookup_dirty_data_buffers().
Fix this issue by excluding pages/folios from being collected if, after
acquiring a lock on each page/folio, its back reference "mapping" differs
from the pointer to the address space struct that held the page/folio.
Link: https://lkml.kernel.org/r/20230805132038.6435-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+0ad741797f4565e7e2d2(a)syzkaller.appspotmail.com
Closes: https://lkml.kernel.org/r/0000000000002930a705fc32b231@google.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 5 +++++
1 file changed, 5 insertions(+)
--- a/fs/nilfs2/segment.c~nilfs2-fix-general-protection-fault-in-nilfs_lookup_dirty_data_buffers
+++ a/fs/nilfs2/segment.c
@@ -725,6 +725,11 @@ static size_t nilfs_lookup_dirty_data_bu
struct folio *folio = fbatch.folios[i];
folio_lock(folio);
+ if (unlikely(folio->mapping != mapping)) {
+ /* Exclude folios removed from the address space */
+ folio_unlock(folio);
+ continue;
+ }
head = folio_buffers(folio);
if (!head) {
create_empty_buffers(&folio->page, i_blocksize(inode), 0);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-warning-in-mark_buffer_dirty-due-to-discarded-buffer-reuse.patch
The quilt patch titled
Subject: mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
has been removed from the -mm tree. Its filename was
mm-gup-handle-cont-pte-hugetlb-pages-correctly-in-gup_must_unshare-via-gup-fast.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: handle cont-PTE hugetlb pages correctly in gup_must_unshare() via GUP-fast
Date: Sat, 5 Aug 2023 12:12:56 +0200
In contrast to most other GUP code, GUP-fast common page table walking
code like gup_pte_range() also handles hugetlb pages. But in contrast to
other hugetlb page table walking code, it does not look at the hugetlb PTE
abstraction whereby we have only a single logical hugetlb PTE per hugetlb
page, even when using multiple cont-PTEs underneath -- which is for
example what huge_ptep_get() abstracts.
So when we have a hugetlb page that is mapped via cont-PTEs, GUP-fast
might stumble over a PTE that does not map the head page of a hugetlb page
-- not the first "head" PTE of such a cont mapping.
Logically, the whole hugetlb page is mapped (entire_mapcount == 1), but we
might end up calling gup_must_unshare() with a tail page of a hugetlb
page.
We only maintain a single PageAnonExclusive flag per hugetlb page (as
hugetlb pages cannot get partially COW-shared), stored for the head page.
That flag is clear for all tail pages.
So when gup_must_unshare() ends up calling PageAnonExclusive() with a tail
page of a hugetlb page:
1) With CONFIG_DEBUG_VM_PGFLAGS
Stumbles over the:
VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page);
For example, when executing the COW selftests with 64k hugetlb pages on
arm64:
[ 61.082187] page:00000000829819ff refcount:3 mapcount:1 mapping:0000000000000000 index:0x1 pfn:0x11ee11
[ 61.082842] head:0000000080f79bf7 order:4 entire_mapcount:1 nr_pages_mapped:0 pincount:2
[ 61.083384] anon flags: 0x17ffff80003000e(referenced|uptodate|dirty|head|mappedtodisk|node=0|zone=2|lastcpupid=0xfffff)
[ 61.084101] page_type: 0xffffffff()
[ 61.084332] raw: 017ffff800000000 fffffc00037b8401 0000000000000402 0000000200000000
[ 61.084840] raw: 0000000000000010 0000000000000000 00000000ffffffff 0000000000000000
[ 61.085359] head: 017ffff80003000e ffffd9e95b09b788 ffffd9e95b09b788 ffff0007ff63cf71
[ 61.085885] head: 0000000000000000 0000000000000002 00000003ffffffff 0000000000000000
[ 61.086415] page dumped because: VM_BUG_ON_PAGE(PageHuge(page) && !PageHead(page))
[ 61.086914] ------------[ cut here ]------------
[ 61.087220] kernel BUG at include/linux/page-flags.h:990!
[ 61.087591] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 61.087999] Modules linked in: ...
[ 61.089404] CPU: 0 PID: 4612 Comm: cow Kdump: loaded Not tainted 6.5.0-rc4+ #3
[ 61.089917] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
[ 61.090409] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 61.090897] pc : gup_must_unshare.part.0+0x64/0x98
[ 61.091242] lr : gup_must_unshare.part.0+0x64/0x98
[ 61.091592] sp : ffff8000825eb940
[ 61.091826] x29: ffff8000825eb940 x28: 0000000000000000 x27: fffffc00037b8440
[ 61.092329] x26: 0400000000000001 x25: 0000000000080101 x24: 0000000000080000
[ 61.092835] x23: 0000000000080100 x22: ffff0000cffb9588 x21: ffff0000c8ec6b58
[ 61.093341] x20: 0000ffffad6b1000 x19: fffffc00037b8440 x18: ffffffffffffffff
[ 61.093850] x17: 2864616548656761 x16: 5021202626202965 x15: 6761702865677548
[ 61.094358] x14: 6567615028454741 x13: 2929656761702864 x12: 6165486567615021
[ 61.094858] x11: 00000000ffff7fff x10: 00000000ffff7fff x9 : ffffd9e958b7a1c0
[ 61.095359] x8 : 00000000000bffe8 x7 : c0000000ffff7fff x6 : 00000000002bffa8
[ 61.095873] x5 : ffff0008bb19e708 x4 : 0000000000000000 x3 : 0000000000000000
[ 61.096380] x2 : 0000000000000000 x1 : ffff0000cf6636c0 x0 : 0000000000000046
[ 61.096894] Call trace:
[ 61.097080] gup_must_unshare.part.0+0x64/0x98
[ 61.097392] gup_pte_range+0x3a8/0x3f0
[ 61.097662] gup_pgd_range+0x1ec/0x280
[ 61.097942] lockless_pages_from_mm+0x64/0x1a0
[ 61.098258] internal_get_user_pages_fast+0xe4/0x1d0
[ 61.098612] pin_user_pages_fast+0x58/0x78
[ 61.098917] pin_longterm_test_start+0xf4/0x2b8
[ 61.099243] gup_test_ioctl+0x170/0x3b0
[ 61.099528] __arm64_sys_ioctl+0xa8/0xf0
[ 61.099822] invoke_syscall.constprop.0+0x7c/0xd0
[ 61.100160] el0_svc_common.constprop.0+0xe8/0x100
[ 61.100500] do_el0_svc+0x38/0xa0
[ 61.100736] el0_svc+0x3c/0x198
[ 61.100971] el0t_64_sync_handler+0x134/0x150
[ 61.101280] el0t_64_sync+0x17c/0x180
[ 61.101543] Code: aa1303e0 f00074c1 912b0021 97fffeb2 (d4210000)
2) Without CONFIG_DEBUG_VM_PGFLAGS
Always detects "not exclusive" for passed tail pages and refuses to PIN
the tail pages R/O, as gup_must_unshare() == true. GUP-fast will fallback
to ordinary GUP. As ordinary GUP properly considers the logical hugetlb
PTE abstraction in hugetlb_follow_page_mask(), pinning the page will
succeed when looking at the PageAnonExclusive on the head page only.
So the only real effect of this is that with cont-PTE hugetlb pages, we'll
always fallback from GUP-fast to ordinary GUP when not working on the head
page, which ends up checking the head page and do the right thing.
Consequently, the cow selftests pass with cont-PTE hugetlb pages as well
without CONFIG_DEBUG_VM_PGFLAGS.
Note that this only applies to anon hugetlb pages that are mapped using
cont-PTEs: for example 64k hugetlb pages on a 4k arm64 kernel.
... and only when R/O-pinning (FOLL_PIN) such pages that are mapped into
the page table R/O using GUP-fast.
On production kernels (and even most debug kernels, that don't set
CONFIG_DEBUG_VM_PGFLAGS) this patch should theoretically not be required
to be backported. But of course, it does not hurt.
Link: https://lkml.kernel.org/r/20230805101256.87306-1-david@redhat.com
Fixes: a7f226604170 ("mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Tested-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 10 ++++++++++
1 file changed, 10 insertions(+)
--- a/mm/internal.h~mm-gup-handle-cont-pte-hugetlb-pages-correctly-in-gup_must_unshare-via-gup-fast
+++ a/mm/internal.h
@@ -1005,6 +1005,16 @@ static inline bool gup_must_unshare(stru
smp_rmb();
/*
+ * During GUP-fast we might not get called on the head page for a
+ * hugetlb page that is mapped using cont-PTE, because GUP-fast does
+ * not work with the abstracted hugetlb PTEs that always point at the
+ * head page. For hugetlb, PageAnonExclusive only applies on the head
+ * page (as it cannot be partially COW-shared), so lookup the head page.
+ */
+ if (unlikely(!PageHead(page) && PageHuge(page)))
+ page = compound_head(page);
+
+ /*
* Note that PageKsm() pages cannot be exclusive, and consequently,
* cannot get pinned.
*/
_
Patches currently in -mm which might be from david(a)redhat.com are
kvm-explicitly-set-foll_honor_numa_fault-in-hva_to_pfn_slow.patch
mm-gup-dont-implicitly-set-foll_honor_numa_fault.patch
pgtable-improve-pte_protnone-comment.patch
selftest-mm-ksm_functional_tests-test-in-mmap_and_merge_range-if-anything-got-merged.patch
selftest-mm-ksm_functional_tests-add-prot_none-test.patch
selftest-mm-ksm_functional_tests-add-prot_none-test-fix.patch
mm-swap-stop-using-page-private-on-tail-pages-for-thp_swap.patch
mm-swap-inline-folio_set_swap_entry-and-folio_swap_entry.patch
mm-huge_memory-work-on-folio-swap-instead-of-page-private-when-splitting-folio.patch
The quilt patch titled
Subject: mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
has been removed from the -mm tree. Its filename was
mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Date: Thu, 3 Aug 2023 16:32:02 +0200
Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") missed that follow_page() and
follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting
page faults from gup/gup_fast"): "Other follow_page callers like KSM
should not use FOLL_NUMA, or they would fail to get the pages if they use
follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they
miss accounting of pages that are mapped PROT_NONE. Further, it's easy to
reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the
pte_protnone()/pmd_protnone() handling in GUP code completely is not
really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't
end up calling is_valid_gup_args() and consequently won't get
FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
hinting faults already.
2) populate_vma_page_range(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
honored NUMA hinting faults already.
3) faultin_vma_page_range(): we similarly don't want to handle NUMA
hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
inaccessible VMAs properly, we have to perform VMA accessibility checks in
gup_can_follow_protnone().
As GUP-fast should reject such pages either way in
pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
arm64 that both implement pte_protnone() -- let's just always fallback to
ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for
selected GUP users.
So we should really see if we can instead let relevant GUP callers specify
it manually, and not trigger NUMA hinting faults from GUP as default.
Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That
comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa:
serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration
entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer
applies. Let's drop it.
[1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
[2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs…
Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com
Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: liubo <liubo254(a)huawei.com>
Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
Reported-by: Peter Xu <peterx(a)redhat.com>
Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 21 +++++++++++++++------
include/linux/mm_types.h | 9 +++++++++
mm/gup.c | 30 ++++++++++++++++++++++++------
mm/huge_memory.c | 3 +--
4 files changed, 49 insertions(+), 14 deletions(-)
--- a/include/linux/mm.h~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault
+++ a/include/linux/mm.h
@@ -3421,15 +3421,24 @@ static inline int vm_fault_to_errno(vm_f
* Indicates whether GUP can follow a PROT_NONE mapped page, or whether
* a (NUMA hinting) fault is required.
*/
-static inline bool gup_can_follow_protnone(unsigned int flags)
+static inline bool gup_can_follow_protnone(struct vm_area_struct *vma,
+ unsigned int flags)
{
/*
- * FOLL_FORCE has to be able to make progress even if the VMA is
- * inaccessible. Further, FOLL_FORCE access usually does not represent
- * application behaviour and we should avoid triggering NUMA hinting
- * faults.
+ * If callers don't want to honor NUMA hinting faults, no need to
+ * determine if we would actually have to trigger a NUMA hinting fault.
*/
- return flags & FOLL_FORCE;
+ if (!(flags & FOLL_HONOR_NUMA_FAULT))
+ return true;
+
+ /*
+ * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs.
+ *
+ * Requiring a fault here even for inaccessible VMAs would mean that
+ * FOLL_FORCE cannot make any progress, because handle_mm_fault()
+ * refuses to process NUMA hinting faults in inaccessible VMAs.
+ */
+ return !vma_is_accessible(vma);
}
typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data);
--- a/include/linux/mm_types.h~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault
+++ a/include/linux/mm_types.h
@@ -1286,6 +1286,15 @@ enum {
FOLL_PCI_P2PDMA = 1 << 10,
/* allow interrupts from generic signals */
FOLL_INTERRUPTIBLE = 1 << 11,
+ /*
+ * Always honor (trigger) NUMA hinting faults.
+ *
+ * FOLL_WRITE implicitly honors NUMA hinting faults because a
+ * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE
+ * apply). get_user_pages_fast_only() always implicitly honors NUMA
+ * hinting faults.
+ */
+ FOLL_HONOR_NUMA_FAULT = 1 << 12,
/* See also internal only FOLL flags in mm/internal.h */
};
--- a/mm/gup.c~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault
+++ a/mm/gup.c
@@ -597,7 +597,7 @@ static struct page *follow_page_pte(stru
pte = ptep_get(ptep);
if (!pte_present(pte))
goto no_page;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
+ if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags))
goto no_page;
page = vm_normal_page(vma, address, pte);
@@ -714,7 +714,7 @@ static struct page *follow_pmd_mask(stru
if (likely(!pmd_trans_huge(pmdval)))
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
- if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags))
+ if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags))
return no_page_table(vma, flags);
ptl = pmd_lock(mm, pmd);
@@ -851,6 +851,10 @@ struct page *follow_page(struct vm_area_
if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
return NULL;
+ /*
+ * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect
+ * to fail on PROT_NONE-mapped pages.
+ */
page = follow_page_mask(vma, address, foll_flags, &ctx);
if (ctx.pgmap)
put_dev_pagemap(ctx.pgmap);
@@ -2227,6 +2231,13 @@ static bool is_valid_gup_args(struct pag
gup_flags |= FOLL_UNLOCKABLE;
}
+ /*
+ * For now, always trigger NUMA hinting faults. Some GUP users like
+ * KVM require the hint to be as the calling context of GUP is
+ * functionally similar to a memory reference from task context.
+ */
+ gup_flags |= FOLL_HONOR_NUMA_FAULT;
+
/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
@@ -2551,7 +2562,14 @@ static int gup_pte_range(pmd_t pmd, pmd_
struct page *page;
struct folio *folio;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
+ /*
+ * Always fallback to ordinary GUP on PROT_NONE-mapped pages:
+ * pte_access_permitted() better should reject these pages
+ * either way: otherwise, GUP-fast might succeed in
+ * cases where ordinary GUP would fail due to VMA access
+ * permissions.
+ */
+ if (pte_protnone(pte))
goto pte_unmap;
if (!pte_access_permitted(pte, flags & FOLL_WRITE))
@@ -2970,8 +2988,8 @@ static int gup_pmd_range(pud_t *pudp, pu
if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
pmd_devmap(pmd))) {
- if (pmd_protnone(pmd) &&
- !gup_can_follow_protnone(flags))
+ /* See gup_pte_range() */
+ if (pmd_protnone(pmd))
return 0;
if (!gup_huge_pmd(pmd, pmdp, addr, next, flags,
@@ -3151,7 +3169,7 @@ static int internal_get_user_pages_fast(
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
FOLL_FORCE | FOLL_PIN | FOLL_GET |
FOLL_FAST_ONLY | FOLL_NOFAULT |
- FOLL_PCI_P2PDMA)))
+ FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT)))
return -EINVAL;
if (gup_flags & FOLL_PIN)
--- a/mm/huge_memory.c~mm-gup-reintroduce-foll_numa-as-foll_honor_numa_fault
+++ a/mm/huge_memory.c
@@ -1467,8 +1467,7 @@ struct page *follow_trans_huge_pmd(struc
if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
return ERR_PTR(-EFAULT);
- /* Full NUMA hinting faults to serialise migration in fault paths */
- if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags))
+ if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
return NULL;
if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page))
_
Patches currently in -mm which might be from david(a)redhat.com are
kvm-explicitly-set-foll_honor_numa_fault-in-hva_to_pfn_slow.patch
mm-gup-dont-implicitly-set-foll_honor_numa_fault.patch
pgtable-improve-pte_protnone-comment.patch
selftest-mm-ksm_functional_tests-test-in-mmap_and_merge_range-if-anything-got-merged.patch
selftest-mm-ksm_functional_tests-add-prot_none-test.patch
selftest-mm-ksm_functional_tests-add-prot_none-test-fix.patch
mm-swap-stop-using-page-private-on-tail-pages-for-thp_swap.patch
mm-swap-inline-folio_set_swap_entry-and-folio_swap_entry.patch
mm-huge_memory-work-on-folio-swap-instead-of-page-private-when-splitting-folio.patch
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 53223f2ed1ef5c90dad814daaaefea4e68a933c8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082114-colonial-manicure-d34e@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
53223f2ed1ef ("xfrm: fix slab-use-after-free in decode_session6")
ee9a113ab634 ("xfrm: interface: rename xfrm_interface.c to xfrm_interface_core.c")
f042365dbffe ("xfrm interface: fix packet tx through bpf_redirect()")
bc56b3340459 ("xfrm: Remove xfrmi interface ID from flowi")
f203b76d7809 ("xfrm: Add virtual xfrm interfaces")
7e6526404ade ("xfrm: Add a new lookup key to match xfrm interfaces.")
9b42c1f179a6 ("xfrm: Extend the output_mark to support input direction and masking.")
e719135881f0 ("xfrm: fix XFRMA_OUTPUT_MARK policy entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 53223f2ed1ef5c90dad814daaaefea4e68a933c8 Mon Sep 17 00:00:00 2001
From: Zhengchao Shao <shaozhengchao(a)huawei.com>
Date: Mon, 10 Jul 2023 17:40:51 +0800
Subject: [PATCH] xfrm: fix slab-use-after-free in decode_session6
When the xfrm device is set to the qdisc of the sfb type, the cb field
of the sent skb may be modified during enqueuing. Then,
slab-use-after-free may occur when the xfrm device sends IPv6 packets.
The stack information is as follows:
BUG: KASAN: slab-use-after-free in decode_session6+0x103f/0x1890
Read of size 1 at addr ffff8881111458ef by task swapper/3/0
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.4.0-next-20230707 #409
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-1.fc33 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xd9/0x150
print_address_description.constprop.0+0x2c/0x3c0
kasan_report+0x11d/0x130
decode_session6+0x103f/0x1890
__xfrm_decode_session+0x54/0xb0
xfrmi_xmit+0x173/0x1ca0
dev_hard_start_xmit+0x187/0x700
sch_direct_xmit+0x1a3/0xc30
__qdisc_run+0x510/0x17a0
__dev_queue_xmit+0x2215/0x3b10
neigh_connected_output+0x3c2/0x550
ip6_finish_output2+0x55a/0x1550
ip6_finish_output+0x6b9/0x1270
ip6_output+0x1f1/0x540
ndisc_send_skb+0xa63/0x1890
ndisc_send_rs+0x132/0x6f0
addrconf_rs_timer+0x3f1/0x870
call_timer_fn+0x1a0/0x580
expire_timers+0x29b/0x4b0
run_timer_softirq+0x326/0x910
__do_softirq+0x1d4/0x905
irq_exit_rcu+0xb7/0x120
sysvec_apic_timer_interrupt+0x97/0xc0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:intel_idle_hlt+0x23/0x30
Code: 1f 84 00 00 00 00 00 f3 0f 1e fa 41 54 41 89 d4 0f 1f 44 00 00 66 90 0f 1f 44 00 00 0f 00 2d c4 9f ab 00 0f 1f 44 00 00 fb f4 <fa> 44 89 e0 41 5c c3 66 0f 1f 44 00 00 f3 0f 1e fa 41 54 41 89 d4
RSP: 0018:ffffc90000197d78 EFLAGS: 00000246
RAX: 00000000000a83c3 RBX: ffffe8ffffd09c50 RCX: ffffffff8a22d8e5
RDX: 0000000000000001 RSI: ffffffff8d3f8080 RDI: ffffe8ffffd09c50
RBP: ffffffff8d3f8080 R08: 0000000000000001 R09: ffffed1026ba6d9d
R10: ffff888135d36ceb R11: 0000000000000001 R12: 0000000000000001
R13: ffffffff8d3f8100 R14: 0000000000000001 R15: 0000000000000000
cpuidle_enter_state+0xd3/0x6f0
cpuidle_enter+0x4e/0xa0
do_idle+0x2fe/0x3c0
cpu_startup_entry+0x18/0x20
start_secondary+0x200/0x290
secondary_startup_64_no_verify+0x167/0x16b
</TASK>
Allocated by task 939:
kasan_save_stack+0x22/0x40
kasan_set_track+0x25/0x30
__kasan_slab_alloc+0x7f/0x90
kmem_cache_alloc_node+0x1cd/0x410
kmalloc_reserve+0x165/0x270
__alloc_skb+0x129/0x330
inet6_ifa_notify+0x118/0x230
__ipv6_ifa_notify+0x177/0xbe0
addrconf_dad_completed+0x133/0xe00
addrconf_dad_work+0x764/0x1390
process_one_work+0xa32/0x16f0
worker_thread+0x67d/0x10c0
kthread+0x344/0x440
ret_from_fork+0x1f/0x30
The buggy address belongs to the object at ffff888111145800
which belongs to the cache skbuff_small_head of size 640
The buggy address is located 239 bytes inside of
freed 640-byte region [ffff888111145800, ffff888111145a80)
As commit f855691975bb ("xfrm6: Fix the nexthdr offset in
_decode_session6.") showed, xfrm_decode_session was originally intended
only for the receive path. IP6CB(skb)->nhoff is not set during
transmission. Therefore, set the cb field in the skb to 0 before
sending packets.
Fixes: f855691975bb ("xfrm6: Fix the nexthdr offset in _decode_session6.")
Signed-off-by: Zhengchao Shao <shaozhengchao(a)huawei.com>
Signed-off-by: Steffen Klassert <steffen.klassert(a)secunet.com>
diff --git a/net/xfrm/xfrm_interface_core.c b/net/xfrm/xfrm_interface_core.c
index a3319965470a..b86474084690 100644
--- a/net/xfrm/xfrm_interface_core.c
+++ b/net/xfrm/xfrm_interface_core.c
@@ -537,8 +537,8 @@ static netdev_tx_t xfrmi_xmit(struct sk_buff *skb, struct net_device *dev)
switch (skb->protocol) {
case htons(ETH_P_IPV6):
- xfrm_decode_session(skb, &fl, AF_INET6);
memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
+ xfrm_decode_session(skb, &fl, AF_INET6);
if (!dst) {
fl.u.ip6.flowi6_oif = dev->ifindex;
fl.u.ip6.flowi6_flags |= FLOWI_FLAG_ANYSRC;
@@ -552,8 +552,8 @@ static netdev_tx_t xfrmi_xmit(struct sk_buff *skb, struct net_device *dev)
}
break;
case htons(ETH_P_IP):
- xfrm_decode_session(skb, &fl, AF_INET);
memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
+ xfrm_decode_session(skb, &fl, AF_INET);
if (!dst) {
struct rtable *rt;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1b254b791d7b7dea6e8adc887fbbd51746d8bb27
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082146-oxidation-equate-185a@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1b254b791d7b ("drm/nouveau/disp: fix use-after-free in error handling of nouveau_connector_create")
773eb04d14a1 ("drm/nouveau/disp: expose conn event class")
ffd2664114c8 ("drm/nouveau/disp: expose head event class")
1d4dce284164 ("drm/nouveau/disp: switch vblank semaphore release to nvkm_event_ntfy")
f43e47c090dc ("drm/nouveau/nvkm: add a replacement for nvkm_notify")
361863ceab1e ("drm/nouveau/disp: move head scanoutpos method")
a2b7eadfef59 ("drm/nouveau/disp: add head class")
8c7d980da9ba ("drm/nouveau/disp: move DP MST payload config method")
8bb30c882334 ("drm/nouveau/disp: add method to trigger DP link retrain")
016dacb60e6d ("drm/nouveau/kms: pass event mask to hpd handler")
d62f8e982cb8 ("drm/nouveau/kms: switch hpd_lock from mutex to spinlock")
a62b74939063 ("drm/nouveau/disp: add method to control DPAUX pad power")
813443721331 ("drm/nouveau/disp: move DP link config into acquire")
a9f5d7721923 ("drm/nouveau/disp: move HDA ELD method")
f530bc60a30b ("drm/nouveau/disp: move HDMI config into acquire + infoframe methods")
9793083f1dd9 ("drm/nouveau/disp: move LVDS protocol information into acquire")
ea6143a86c67 ("drm/nouveau/disp: move and extend the role of outp acquire/release methods")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1b254b791d7b7dea6e8adc887fbbd51746d8bb27 Mon Sep 17 00:00:00 2001
From: Karol Herbst <kherbst(a)redhat.com>
Date: Mon, 14 Aug 2023 16:49:32 +0200
Subject: [PATCH] drm/nouveau/disp: fix use-after-free in error handling of
nouveau_connector_create
We can't simply free the connector after calling drm_connector_init on it.
We need to clean up the drm side first.
It might not fix all regressions from commit 2b5d1c29f6c4
("drm/nouveau/disp: PIOR DP uses GPIO for HPD, not PMGR AUX interrupts"),
but at least it fixes a memory corruption in error handling related to
that commit.
Link: https://lore.kernel.org/lkml/20230806213107.GFZNARG6moWpFuSJ9W@fat_crate.lo…
Fixes: 95983aea8003 ("drm/nouveau/disp: add connector class")
Signed-off-by: Karol Herbst <kherbst(a)redhat.com>
Reviewed-by: Lyude Paul <lyude(a)redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230814144933.3956959-1-kher…
diff --git a/drivers/gpu/drm/nouveau/nouveau_connector.c b/drivers/gpu/drm/nouveau/nouveau_connector.c
index a2e0033e8a26..622f6eb9a8bf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_connector.c
+++ b/drivers/gpu/drm/nouveau/nouveau_connector.c
@@ -1408,8 +1408,7 @@ nouveau_connector_create(struct drm_device *dev,
ret = nvif_conn_ctor(&disp->disp, nv_connector->base.name, nv_connector->index,
&nv_connector->conn);
if (ret) {
- kfree(nv_connector);
- return ERR_PTR(ret);
+ goto drm_conn_err;
}
ret = nvif_conn_event_ctor(&nv_connector->conn, "kmsHotplug",
@@ -1426,8 +1425,7 @@ nouveau_connector_create(struct drm_device *dev,
if (ret) {
nvif_event_dtor(&nv_connector->hpd);
nvif_conn_dtor(&nv_connector->conn);
- kfree(nv_connector);
- return ERR_PTR(ret);
+ goto drm_conn_err;
}
}
}
@@ -1475,4 +1473,9 @@ nouveau_connector_create(struct drm_device *dev,
drm_connector_register(connector);
return connector;
+
+drm_conn_err:
+ drm_connector_cleanup(connector);
+ kfree(nv_connector);
+ return ERR_PTR(ret);
}
The patch titled
Subject: maple_tree: disable mas_wr_append() when other readers are possible
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-disable-mas_wr_append-when-other-readers-are-possible.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: disable mas_wr_append() when other readers are possible
Date: Fri, 18 Aug 2023 20:43:55 -0400
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/lib/maple_tree.c~maple_tree-disable-mas_wr_append-when-other-readers-are-possible
+++ a/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_e
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-disable-mas_wr_append-when-other-readers-are-possible.patch
maple_tree-add-hex-output-to-maple_arange64-dump.patch
maple_tree-reorder-replacement-of-nodes-to-avoid-live-lock.patch
maple_tree-introduce-mas_put_in_tree.patch
maple_tree-introduce-mas_tree_parent-definition.patch
maple_tree-change-mas_adopt_children-parent-usage.patch
maple_tree-replace-data-before-marking-dead-in-split-and-spanning-store.patch
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 2eb9625a3a32251ecea470cd576659a3a03b4e59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082105-decade-shout-4b7c@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
2eb9625a3a32 ("qede: fix firmware halt over suspend and resume")
731815e720ae ("qede: Add support for handling the pcie errors.")
ccc67ef50b90 ("qede: Error recovery process")
f04e48dbfaf7 ("qede: Update link status only when interface is ready.")
149d3775f108 ("qede: Simplify the usage of qede-flags.")
d25b859ccd61 ("qede: Add support for populating ethernet TLVs.")
91dfd02b2300 ("qede: Fix ref-cnt usage count")
3f2176dd7fe9 ("qede: fix spelling mistake: "registeration" -> "registration"")
bd0b2e7fe611 ("net: xdp: make the stack take care of the tear down")
012bb8a8b5a2 ("nfp: bpf: drop support for cls_bpf with legacy actions")
43b45245e5a6 ("nfp: bpf: fall back to core NIC app if BPF not selected")
2c4197a041df ("nfp: reorganize the app table")
f449657f8353 ("nfp: bpf: reject TC offload if XDP loaded")
3248f77fa3ee ("drivers/net: netronome: Convert timers to use timer_setup()")
ee9133a845fe ("nfp: bpf: add stack write support")
70c78fc138b6 ("nfp: bpf: refactor nfp_bpf_check_ptr()")
90d97315b3e7 ("nfp: bpf: Convert ndo_setup_tc offloads to block callbacks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2eb9625a3a32251ecea470cd576659a3a03b4e59 Mon Sep 17 00:00:00 2001
From: Manish Chopra <manishc(a)marvell.com>
Date: Wed, 16 Aug 2023 20:37:11 +0530
Subject: [PATCH] qede: fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87b0 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Reviewed-by: John Meneghini <jmeneghi(a)redhat.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 4b004a728190..99df00c30b8c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -176,6 +176,15 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
}
#endif
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+ dev_info(dev, "Device does not support suspend operation\n");
+
+ return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
static const struct pci_error_handlers qede_err_handler = {
.error_detected = qede_io_error_detected,
};
@@ -190,6 +199,7 @@ static struct pci_driver qede_pci_driver = {
.sriov_configure = qede_sriov_configure,
#endif
.err_handler = &qede_err_handler,
+ .driver.pm = &qede_pm_ops,
};
static struct qed_eth_cb_ops qede_ll_ops = {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2eb9625a3a32251ecea470cd576659a3a03b4e59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082104-recovery-duffel-eafa@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2eb9625a3a32 ("qede: fix firmware halt over suspend and resume")
731815e720ae ("qede: Add support for handling the pcie errors.")
ccc67ef50b90 ("qede: Error recovery process")
f04e48dbfaf7 ("qede: Update link status only when interface is ready.")
149d3775f108 ("qede: Simplify the usage of qede-flags.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2eb9625a3a32251ecea470cd576659a3a03b4e59 Mon Sep 17 00:00:00 2001
From: Manish Chopra <manishc(a)marvell.com>
Date: Wed, 16 Aug 2023 20:37:11 +0530
Subject: [PATCH] qede: fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87b0 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Reviewed-by: John Meneghini <jmeneghi(a)redhat.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 4b004a728190..99df00c30b8c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -176,6 +176,15 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
}
#endif
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+ dev_info(dev, "Device does not support suspend operation\n");
+
+ return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
static const struct pci_error_handlers qede_err_handler = {
.error_detected = qede_io_error_detected,
};
@@ -190,6 +199,7 @@ static struct pci_driver qede_pci_driver = {
.sriov_configure = qede_sriov_configure,
#endif
.err_handler = &qede_err_handler,
+ .driver.pm = &qede_pm_ops,
};
static struct qed_eth_cb_ops qede_ll_ops = {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2eb9625a3a32251ecea470cd576659a3a03b4e59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082103-abiding-overprice-903a@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
2eb9625a3a32 ("qede: fix firmware halt over suspend and resume")
731815e720ae ("qede: Add support for handling the pcie errors.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2eb9625a3a32251ecea470cd576659a3a03b4e59 Mon Sep 17 00:00:00 2001
From: Manish Chopra <manishc(a)marvell.com>
Date: Wed, 16 Aug 2023 20:37:11 +0530
Subject: [PATCH] qede: fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87b0 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Reviewed-by: John Meneghini <jmeneghi(a)redhat.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 4b004a728190..99df00c30b8c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -176,6 +176,15 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
}
#endif
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+ dev_info(dev, "Device does not support suspend operation\n");
+
+ return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
static const struct pci_error_handlers qede_err_handler = {
.error_detected = qede_io_error_detected,
};
@@ -190,6 +199,7 @@ static struct pci_driver qede_pci_driver = {
.sriov_configure = qede_sriov_configure,
#endif
.err_handler = &qede_err_handler,
+ .driver.pm = &qede_pm_ops,
};
static struct qed_eth_cb_ops qede_ll_ops = {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2eb9625a3a32251ecea470cd576659a3a03b4e59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082101-till-squealing-7f7d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2eb9625a3a32 ("qede: fix firmware halt over suspend and resume")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2eb9625a3a32251ecea470cd576659a3a03b4e59 Mon Sep 17 00:00:00 2001
From: Manish Chopra <manishc(a)marvell.com>
Date: Wed, 16 Aug 2023 20:37:11 +0530
Subject: [PATCH] qede: fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87b0 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Reviewed-by: John Meneghini <jmeneghi(a)redhat.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 4b004a728190..99df00c30b8c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -176,6 +176,15 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
}
#endif
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+ dev_info(dev, "Device does not support suspend operation\n");
+
+ return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
static const struct pci_error_handlers qede_err_handler = {
.error_detected = qede_io_error_detected,
};
@@ -190,6 +199,7 @@ static struct pci_driver qede_pci_driver = {
.sriov_configure = qede_sriov_configure,
#endif
.err_handler = &qede_err_handler,
+ .driver.pm = &qede_pm_ops,
};
static struct qed_eth_cb_ops qede_ll_ops = {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2eb9625a3a32251ecea470cd576659a3a03b4e59
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082100-conduit-wildfowl-9f5b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2eb9625a3a32 ("qede: fix firmware halt over suspend and resume")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2eb9625a3a32251ecea470cd576659a3a03b4e59 Mon Sep 17 00:00:00 2001
From: Manish Chopra <manishc(a)marvell.com>
Date: Wed, 16 Aug 2023 20:37:11 +0530
Subject: [PATCH] qede: fix firmware halt over suspend and resume
While performing certain power-off sequences, PCI drivers are
called to suspend and resume their underlying devices through
PCI PM (power management) interface. However this NIC hardware
does not support PCI PM suspend/resume operations so system wide
suspend/resume leads to bad MFW (management firmware) state which
causes various follow-up errors in driver when communicating with
the device/firmware afterwards.
To fix this driver implements PCI PM suspend handler to indicate
unsupported operation to the PCI subsystem explicitly, thus avoiding
system to go into suspended/standby mode.
Without this fix device/firmware does not recover unless system
is power cycled.
Fixes: 2950219d87b0 ("qede: Add basic network device support")
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Reviewed-by: John Meneghini <jmeneghi(a)redhat.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230816150711.59035-1-manishc@marvell.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 4b004a728190..99df00c30b8c 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -176,6 +176,15 @@ static int qede_sriov_configure(struct pci_dev *pdev, int num_vfs_param)
}
#endif
+static int __maybe_unused qede_suspend(struct device *dev)
+{
+ dev_info(dev, "Device does not support suspend operation\n");
+
+ return -EOPNOTSUPP;
+}
+
+static DEFINE_SIMPLE_DEV_PM_OPS(qede_pm_ops, qede_suspend, NULL);
+
static const struct pci_error_handlers qede_err_handler = {
.error_detected = qede_io_error_detected,
};
@@ -190,6 +199,7 @@ static struct pci_driver qede_pci_driver = {
.sriov_configure = qede_sriov_configure,
#endif
.err_handler = &qede_err_handler,
+ .driver.pm = &qede_pm_ops,
};
static struct qed_eth_cb_ops qede_ll_ops = {
From: Namjae Jeon <linkinjeon(a)kernel.org>
[ Upstream commit d42334578eba1390859012ebb91e1e556d51db49 ]
exfat_extract_uni_name copies characters from a given file name entry into
the 'uniname' variable. This variable is actually defined on the stack of
the exfat_readdir() function. According to the definition of
the 'exfat_uni_name' type, the file name should be limited 255 characters
(+ null teminator space), but the exfat_get_uniname_from_ext_entry()
function can write more characters because there is no check if filename
entries exceeds max filename length. This patch add the check not to copy
filename characters when exceeding max filename length.
Cc: stable(a)vger.kernel.org
Cc: Yuezhang Mo <Yuezhang.Mo(a)sony.com>
Reported-by: Maxim Suhanov <dfirblog(a)gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Harshit: backport to 5.15.y]
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
---
The conflict resolved patch for 6.1.y applies cleanly to 5.15.y as
well.
Note: This fix is already present in 5.10.y but missing in 5.15.y
---
fs/exfat/dir.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 8475a8653c3a..f6dd4fc8eaf4 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -34,6 +34,7 @@ static void exfat_get_uniname_from_ext_entry(struct super_block *sb,
{
int i;
struct exfat_entry_set_cache *es;
+ unsigned int uni_len = 0, len;
es = exfat_get_dentry_set(sb, p_dir, entry, ES_ALL_ENTRIES);
if (!es)
@@ -52,7 +53,10 @@ static void exfat_get_uniname_from_ext_entry(struct super_block *sb,
if (exfat_get_entry_type(ep) != TYPE_EXTEND)
break;
- exfat_extract_uni_name(ep, uniname);
+ len = exfat_extract_uni_name(ep, uniname);
+ uni_len += len;
+ if (len != EXFAT_FILE_NAME_LEN || uni_len >= MAX_NAME_LENGTH)
+ break;
uniname += EXFAT_FILE_NAME_LEN;
}
@@ -1032,7 +1036,8 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
if (entry_type == TYPE_EXTEND) {
unsigned short entry_uniname[16], unichar;
- if (step != DIRENT_STEP_NAME) {
+ if (step != DIRENT_STEP_NAME ||
+ name_len >= MAX_NAME_LENGTH) {
step = DIRENT_STEP_FILE;
continue;
}
--
2.34.1
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x c0b067588a4836b762cfc6a4c83f122ca1dbb93a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023081251-conceal-stool-53f1@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c0b067588a4836b762cfc6a4c83f122ca1dbb93a Mon Sep 17 00:00:00 2001
From: Arnaldo Carvalho de Melo <acme(a)kernel.org>
Date: Tue, 1 Aug 2023 18:42:47 -0300
Subject: [PATCH] Revert "perf report: Append inlines to non-DWARF callchains"
This reverts commit 46d21ec067490ab9cdcc89b9de5aae28786a8b8e.
The tests were made with a specific workload, further tests on a
recently updated fedora 38 system with a system wide perf.data file
shows 'perf report' taking excessive time resolving inlines in vmlinux,
so lets revert this until a full investigation and improvement on the
addr2line support code is made.
Reported-by: Jesper Dangaard Brouer <hawk(a)kernel.org>
Acked-by: Artem Savkov <asavkov(a)redhat.com>
Tested-by: Jesper Dangaard Brouer <hawk(a)kernel.org>
Cc: Andrii Nakryiko <andrii.nakryiko(a)gmail.com>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Milian Wolff <milian.wolff(a)kdab.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/ZMl8VyhdwhClTM5g@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 4e62843d51b7..f4cb41ee23cd 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -45,7 +45,6 @@
static void __machine__remove_thread(struct machine *machine, struct thread_rb_node *nd,
struct thread *th, bool lock);
-static int append_inlines(struct callchain_cursor *cursor, struct map_symbol *ms, u64 ip);
static struct dso *machine__kernel_dso(struct machine *machine)
{
@@ -2385,10 +2384,6 @@ static int add_callchain_ip(struct thread *thread,
ms.maps = maps__get(al.maps);
ms.map = map__get(al.map);
ms.sym = al.sym;
-
- if (!branch && append_inlines(cursor, &ms, ip) == 0)
- goto out;
-
srcline = callchain_srcline(&ms, al.addr);
err = callchain_cursor_append(cursor, ip, &ms,
branch, flags, nr_loop_iter,
From: Tim Huang <Tim.Huang(a)amd.com>
For SMU v13.0.4/11, driver does not need to stop RLC for S0i3,
the firmwares will handle that properly.
Signed-off-by: Tim Huang <Tim.Huang(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 730d44e1fa306a20746ad4a85da550662aed9daa)
Cc: stable(a)vger.kernel.org # 6.1.x
---
drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index ea03e8d9a3f6..818379276a58 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1573,9 +1573,9 @@ static int smu_disable_dpms(struct smu_context *smu)
/*
* For SMU 13.0.4/11, PMFW will handle the features disablement properly
- * for gpu reset case. Driver involvement is unnecessary.
+ * for gpu reset and S0i3 cases. Driver involvement is unnecessary.
*/
- if (amdgpu_in_reset(adev)) {
+ if (amdgpu_in_reset(adev) || adev->in_s0ix) {
switch (adev->ip_versions[MP1_HWIP][0]) {
case IP_VERSION(13, 0, 4):
case IP_VERSION(13, 0, 11):
--
2.41.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5d0a8d2fba50e9c07cde4aad7fba28c008b07a5b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082121-chewing-regroup-4f67@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state")
f90b529bcbe5 ("arm64/sme: Implement ZT0 ptrace support")
ce514000da4f ("arm64/sme: Rename za_state to sme_state")
1192b93ba352 ("arm64/fp: Use a struct to pass data to fpsimd_bind_state_to_cpu()")
deeb8f9a80fd ("arm64/fpsimd: Have KVM explicitly say which FP registers to save")
baa8515281b3 ("arm64/fpsimd: Track the saved FPSIMD state type separately to TIF_SVE")
93ae6b01bafe ("KVM: arm64: Discard any SVE state when entering KVM guests")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d0a8d2fba50e9c07cde4aad7fba28c008b07a5b Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Thu, 10 Aug 2023 12:28:19 +0100
Subject: [PATCH] arm64/ptrace: Ensure that SME is set up for target when
writing SSVE state
When we use NT_ARM_SSVE to either enable streaming mode or change the
vector length for a process we do not currently do anything to ensure that
there is storage allocated for the SME specific register state. If the
task had not previously used SME or we changed the vector length then
the task will not have had TIF_SME set or backing storage for ZA/ZT
allocated, resulting in inconsistent register sizes when saving state
and spurious traps which flush the newly set register state.
We should set TIF_SME to disable traps and ensure that storage is
allocated for ZA and ZT if it is not already allocated. This requires
modifying sme_alloc() to make the flush of any existing register state
optional so we don't disturb existing state for ZA and ZT.
Fixes: e12310a0d30f ("arm64/sme: Implement ptrace support for streaming mode SVE registers")
Reported-by: David Spickett <David.Spickett(a)arm.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # 5.19.x
Link: https://lore.kernel.org/r/20230810-arm64-fix-ptrace-race-v1-1-a5361fad2bd6@…
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
index 67f2fb781f59..8df46f186c64 100644
--- a/arch/arm64/include/asm/fpsimd.h
+++ b/arch/arm64/include/asm/fpsimd.h
@@ -356,7 +356,7 @@ static inline int sme_max_virtualisable_vl(void)
return vec_max_virtualisable_vl(ARM64_VEC_SME);
}
-extern void sme_alloc(struct task_struct *task);
+extern void sme_alloc(struct task_struct *task, bool flush);
extern unsigned int sme_get_vl(void);
extern int sme_set_current_vl(unsigned long arg);
extern int sme_get_current_vl(void);
@@ -388,7 +388,7 @@ static inline void sme_smstart_sm(void) { }
static inline void sme_smstop_sm(void) { }
static inline void sme_smstop(void) { }
-static inline void sme_alloc(struct task_struct *task) { }
+static inline void sme_alloc(struct task_struct *task, bool flush) { }
static inline void sme_setup(void) { }
static inline unsigned int sme_get_vl(void) { return 0; }
static inline int sme_max_vl(void) { return 0; }
diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
index 75c37b1c55aa..087c05aa960e 100644
--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -1285,9 +1285,9 @@ void fpsimd_release_task(struct task_struct *dead_task)
* the interest of testability and predictability, the architecture
* guarantees that when ZA is enabled it will be zeroed.
*/
-void sme_alloc(struct task_struct *task)
+void sme_alloc(struct task_struct *task, bool flush)
{
- if (task->thread.sme_state) {
+ if (task->thread.sme_state && flush) {
memset(task->thread.sme_state, 0, sme_state_size(task));
return;
}
@@ -1515,7 +1515,7 @@ void do_sme_acc(unsigned long esr, struct pt_regs *regs)
}
sve_alloc(current, false);
- sme_alloc(current);
+ sme_alloc(current, true);
if (!current->thread.sve_state || !current->thread.sme_state) {
force_sig(SIGKILL);
return;
diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
index 5b9b4305248b..a31af7a1abe3 100644
--- a/arch/arm64/kernel/ptrace.c
+++ b/arch/arm64/kernel/ptrace.c
@@ -881,6 +881,13 @@ static int sve_set_common(struct task_struct *target,
break;
case ARM64_VEC_SME:
target->thread.svcr |= SVCR_SM_MASK;
+
+ /*
+ * Disable traps and ensure there is SME storage but
+ * preserve any currently set values in ZA/ZT.
+ */
+ sme_alloc(target, false);
+ set_tsk_thread_flag(target, TIF_SME);
break;
default:
WARN_ON_ONCE(1);
@@ -1100,7 +1107,7 @@ static int za_set(struct task_struct *target,
}
/* Allocate/reinit ZA storage */
- sme_alloc(target);
+ sme_alloc(target, true);
if (!target->thread.sme_state) {
ret = -ENOMEM;
goto out;
@@ -1171,7 +1178,7 @@ static int zt_set(struct task_struct *target,
return -EINVAL;
if (!thread_za_enabled(&target->thread)) {
- sme_alloc(target);
+ sme_alloc(target, true);
if (!target->thread.sme_state)
return -ENOMEM;
}
diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c
index e304f7ebec2a..c7ebe744c64e 100644
--- a/arch/arm64/kernel/signal.c
+++ b/arch/arm64/kernel/signal.c
@@ -475,7 +475,7 @@ static int restore_za_context(struct user_ctxs *user)
fpsimd_flush_task_state(current);
/* From now, fpsimd_thread_switch() won't touch thread.sve_state */
- sme_alloc(current);
+ sme_alloc(current, true);
if (!current->thread.sme_state) {
current->thread.svcr &= ~SVCR_ZA_MASK;
clear_thread_flag(TIF_SME);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 864bcaa38ee44ec6c0e43f79c2d2997b977e26b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082127-unvalued-sanitary-c79f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM")
d893832d0e1e ("x86/srso: Add IBPB on VMEXIT")
233d6f68b98d ("x86/srso: Add IBPB")
fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
941d77c77339 ("Merge tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 864bcaa38ee44ec6c0e43f79c2d2997b977e26b2 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 14 Aug 2023 13:44:35 +0200
Subject: [PATCH] x86/cpu/kvm: Provide UNTRAIN_RET_VM
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.
This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.
Redefine the meaning of the synthetic IBPB flags to:
- ENTRY_IBPB -- issue IBPB on entry (was: entry + VMEXIT)
- IBPB_ON_VMEXIT -- issue IBPB on VMEXIT
And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.
The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.
Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.
All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 5285c8e93dff..c55cc243592e 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -299,6 +299,17 @@
#endif
.endm
+.macro UNTRAIN_RET_VM
+#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
+ defined(CONFIG_CALL_DEPTH_TRACKING) || defined(CONFIG_CPU_SRSO)
+ VALIDATE_UNRET_END
+ ALTERNATIVE_3 "", \
+ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT, \
+ __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
+#endif
+.endm
+
.macro UNTRAIN_RET_FROM_CALL
#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
defined(CONFIG_CALL_DEPTH_TRACKING)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6f3e19527286..9026e3fe9f6c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1054,6 +1054,7 @@ static void __init retbleed_select_mitigation(void)
case RETBLEED_MITIGATION_IBPB:
setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+ setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
mitigate_smt = true;
break;
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 265452fc9ebe..ef2ebabb059c 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -222,10 +222,7 @@ SYM_FUNC_START(__svm_vcpu_run)
* because interrupt handlers won't sanitize 'ret' if the return is
* from the kernel.
*/
- UNTRAIN_RET
-
- /* SRSO */
- ALTERNATIVE "", "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT
+ UNTRAIN_RET_VM
/*
* Clear all general purpose registers except RSP and RAX to prevent
@@ -362,7 +359,7 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
* because interrupt handlers won't sanitize RET if the return is
* from the kernel.
*/
- UNTRAIN_RET
+ UNTRAIN_RET_VM
/* "Pop" @spec_ctrl_intercepted. */
pop %_ASM_BX
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 864bcaa38ee44ec6c0e43f79c2d2997b977e26b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082126-yanking-circling-2487@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM")
d893832d0e1e ("x86/srso: Add IBPB on VMEXIT")
233d6f68b98d ("x86/srso: Add IBPB")
fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
941d77c77339 ("Merge tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 864bcaa38ee44ec6c0e43f79c2d2997b977e26b2 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 14 Aug 2023 13:44:35 +0200
Subject: [PATCH] x86/cpu/kvm: Provide UNTRAIN_RET_VM
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.
This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.
Redefine the meaning of the synthetic IBPB flags to:
- ENTRY_IBPB -- issue IBPB on entry (was: entry + VMEXIT)
- IBPB_ON_VMEXIT -- issue IBPB on VMEXIT
And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.
The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.
Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.
All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 5285c8e93dff..c55cc243592e 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -299,6 +299,17 @@
#endif
.endm
+.macro UNTRAIN_RET_VM
+#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
+ defined(CONFIG_CALL_DEPTH_TRACKING) || defined(CONFIG_CPU_SRSO)
+ VALIDATE_UNRET_END
+ ALTERNATIVE_3 "", \
+ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT, \
+ __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
+#endif
+.endm
+
.macro UNTRAIN_RET_FROM_CALL
#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
defined(CONFIG_CALL_DEPTH_TRACKING)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6f3e19527286..9026e3fe9f6c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1054,6 +1054,7 @@ static void __init retbleed_select_mitigation(void)
case RETBLEED_MITIGATION_IBPB:
setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+ setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
mitigate_smt = true;
break;
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 265452fc9ebe..ef2ebabb059c 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -222,10 +222,7 @@ SYM_FUNC_START(__svm_vcpu_run)
* because interrupt handlers won't sanitize 'ret' if the return is
* from the kernel.
*/
- UNTRAIN_RET
-
- /* SRSO */
- ALTERNATIVE "", "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT
+ UNTRAIN_RET_VM
/*
* Clear all general purpose registers except RSP and RAX to prevent
@@ -362,7 +359,7 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
* because interrupt handlers won't sanitize RET if the return is
* from the kernel.
*/
- UNTRAIN_RET
+ UNTRAIN_RET_VM
/* "Pop" @spec_ctrl_intercepted. */
pop %_ASM_BX
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 864bcaa38ee44ec6c0e43f79c2d2997b977e26b2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082125-untainted-putdown-eba5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
864bcaa38ee4 ("x86/cpu/kvm: Provide UNTRAIN_RET_VM")
d893832d0e1e ("x86/srso: Add IBPB on VMEXIT")
233d6f68b98d ("x86/srso: Add IBPB")
fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
941d77c77339 ("Merge tag 'x86_cpu_for_v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 864bcaa38ee44ec6c0e43f79c2d2997b977e26b2 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 14 Aug 2023 13:44:35 +0200
Subject: [PATCH] x86/cpu/kvm: Provide UNTRAIN_RET_VM
Similar to how it doesn't make sense to have UNTRAIN_RET have two
untrain calls, it also doesn't make sense for VMEXIT to have an extra
IBPB call.
This cures VMEXIT doing potentially unret+IBPB or double IBPB.
Also, the (SEV) VMEXIT case seems to have been overlooked.
Redefine the meaning of the synthetic IBPB flags to:
- ENTRY_IBPB -- issue IBPB on entry (was: entry + VMEXIT)
- IBPB_ON_VMEXIT -- issue IBPB on VMEXIT
And have 'retbleed=ibpb' set *BOTH* feature flags to ensure it retains
the previous behaviour and issues IBPB on entry+VMEXIT.
The new 'srso=ibpb_vmexit' option only sets IBPB_ON_VMEXIT.
Create UNTRAIN_RET_VM specifically for the VMEXIT case, and have that
check IBPB_ON_VMEXIT.
All this avoids having the VMEXIT case having to check both ENTRY_IBPB
and IBPB_ON_VMEXIT and simplifies the alternatives.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/r/20230814121149.109557833@infradead.org
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 5285c8e93dff..c55cc243592e 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -299,6 +299,17 @@
#endif
.endm
+.macro UNTRAIN_RET_VM
+#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
+ defined(CONFIG_CALL_DEPTH_TRACKING) || defined(CONFIG_CPU_SRSO)
+ VALIDATE_UNRET_END
+ ALTERNATIVE_3 "", \
+ CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT, \
+ __stringify(RESET_CALL_DEPTH), X86_FEATURE_CALL_DEPTH
+#endif
+.endm
+
.macro UNTRAIN_RET_FROM_CALL
#if defined(CONFIG_CPU_UNRET_ENTRY) || defined(CONFIG_CPU_IBPB_ENTRY) || \
defined(CONFIG_CALL_DEPTH_TRACKING)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 6f3e19527286..9026e3fe9f6c 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1054,6 +1054,7 @@ static void __init retbleed_select_mitigation(void)
case RETBLEED_MITIGATION_IBPB:
setup_force_cpu_cap(X86_FEATURE_ENTRY_IBPB);
+ setup_force_cpu_cap(X86_FEATURE_IBPB_ON_VMEXIT);
mitigate_smt = true;
break;
diff --git a/arch/x86/kvm/svm/vmenter.S b/arch/x86/kvm/svm/vmenter.S
index 265452fc9ebe..ef2ebabb059c 100644
--- a/arch/x86/kvm/svm/vmenter.S
+++ b/arch/x86/kvm/svm/vmenter.S
@@ -222,10 +222,7 @@ SYM_FUNC_START(__svm_vcpu_run)
* because interrupt handlers won't sanitize 'ret' if the return is
* from the kernel.
*/
- UNTRAIN_RET
-
- /* SRSO */
- ALTERNATIVE "", "call entry_ibpb", X86_FEATURE_IBPB_ON_VMEXIT
+ UNTRAIN_RET_VM
/*
* Clear all general purpose registers except RSP and RAX to prevent
@@ -362,7 +359,7 @@ SYM_FUNC_START(__svm_sev_es_vcpu_run)
* because interrupt handlers won't sanitize RET if the return is
* from the kernel.
*/
- UNTRAIN_RET
+ UNTRAIN_RET_VM
/* "Pop" @spec_ctrl_intercepted. */
pop %_ASM_BX
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082142-halves-kinship-fcc3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
6f612579be9d ("Merge tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 14 Aug 2023 13:44:29 +0200
Subject: [PATCH] objtool/x86: Fix SRSO mess
Objtool --rethunk does two things:
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 2d51fa8da9e8..cba8a7be040e 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -824,8 +824,11 @@ bool arch_is_retpoline(struct symbol *sym)
bool arch_is_rethunk(struct symbol *sym)
{
- return !strcmp(sym->name, "__x86_return_thunk") ||
- !strcmp(sym->name, "srso_untrain_ret") ||
- !strcmp(sym->name, "srso_safe_ret") ||
- !strcmp(sym->name, "__ret");
+ return !strcmp(sym->name, "__x86_return_thunk");
+}
+
+bool arch_is_embedded_insn(struct symbol *sym)
+{
+ return !strcmp(sym->name, "__ret") ||
+ !strcmp(sym->name, "srso_safe_ret");
}
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e2ee10ce7703..191656ee9fbc 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -455,7 +455,7 @@ static int decode_instructions(struct objtool_file *file)
return -1;
}
- if (func->return_thunk || func->alias != func)
+ if (func->embedded_insn || func->alias != func)
continue;
if (!find_insn(file, sec, func->offset)) {
@@ -1288,16 +1288,33 @@ static int add_ignore_alternatives(struct objtool_file *file)
return 0;
}
+/*
+ * Symbols that replace INSN_CALL_DYNAMIC, every (tail) call to such a symbol
+ * will be added to the .retpoline_sites section.
+ */
__weak bool arch_is_retpoline(struct symbol *sym)
{
return false;
}
+/*
+ * Symbols that replace INSN_RETURN, every (tail) call to such a symbol
+ * will be added to the .return_sites section.
+ */
__weak bool arch_is_rethunk(struct symbol *sym)
{
return false;
}
+/*
+ * Symbols that are embedded inside other instructions, because sometimes crazy
+ * code exists. These are mostly ignored for validation purposes.
+ */
+__weak bool arch_is_embedded_insn(struct symbol *sym)
+{
+ return false;
+}
+
static struct reloc *insn_reloc(struct objtool_file *file, struct instruction *insn)
{
struct reloc *reloc;
@@ -1583,7 +1600,7 @@ static int add_jump_destinations(struct objtool_file *file)
* middle of another instruction. Objtool only
* knows about the outer instruction.
*/
- if (sym && sym->return_thunk) {
+ if (sym && sym->embedded_insn) {
add_return_call(file, insn, false);
continue;
}
@@ -2502,6 +2519,9 @@ static int classify_symbols(struct objtool_file *file)
if (arch_is_rethunk(func))
func->return_thunk = true;
+ if (arch_is_embedded_insn(func))
+ func->embedded_insn = true;
+
if (arch_ftrace_match(func->name))
func->fentry = true;
diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h
index 2b6d2ce4f9a5..0b303eba660e 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -90,6 +90,7 @@ int arch_decode_hint_reg(u8 sp_reg, int *base);
bool arch_is_retpoline(struct symbol *sym);
bool arch_is_rethunk(struct symbol *sym);
+bool arch_is_embedded_insn(struct symbol *sym);
int arch_rewrite_retpolines(struct objtool_file *file);
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index c532d70864dc..9f71e988eca4 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -66,6 +66,7 @@ struct symbol {
u8 fentry : 1;
u8 profiling_func : 1;
u8 warned : 1;
+ u8 embedded_insn : 1;
struct list_head pv_target;
struct reloc *relocs;
};
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082140-regress-nemeses-b6bf@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
6f612579be9d ("Merge tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 14 Aug 2023 13:44:29 +0200
Subject: [PATCH] objtool/x86: Fix SRSO mess
Objtool --rethunk does two things:
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 2d51fa8da9e8..cba8a7be040e 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -824,8 +824,11 @@ bool arch_is_retpoline(struct symbol *sym)
bool arch_is_rethunk(struct symbol *sym)
{
- return !strcmp(sym->name, "__x86_return_thunk") ||
- !strcmp(sym->name, "srso_untrain_ret") ||
- !strcmp(sym->name, "srso_safe_ret") ||
- !strcmp(sym->name, "__ret");
+ return !strcmp(sym->name, "__x86_return_thunk");
+}
+
+bool arch_is_embedded_insn(struct symbol *sym)
+{
+ return !strcmp(sym->name, "__ret") ||
+ !strcmp(sym->name, "srso_safe_ret");
}
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e2ee10ce7703..191656ee9fbc 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -455,7 +455,7 @@ static int decode_instructions(struct objtool_file *file)
return -1;
}
- if (func->return_thunk || func->alias != func)
+ if (func->embedded_insn || func->alias != func)
continue;
if (!find_insn(file, sec, func->offset)) {
@@ -1288,16 +1288,33 @@ static int add_ignore_alternatives(struct objtool_file *file)
return 0;
}
+/*
+ * Symbols that replace INSN_CALL_DYNAMIC, every (tail) call to such a symbol
+ * will be added to the .retpoline_sites section.
+ */
__weak bool arch_is_retpoline(struct symbol *sym)
{
return false;
}
+/*
+ * Symbols that replace INSN_RETURN, every (tail) call to such a symbol
+ * will be added to the .return_sites section.
+ */
__weak bool arch_is_rethunk(struct symbol *sym)
{
return false;
}
+/*
+ * Symbols that are embedded inside other instructions, because sometimes crazy
+ * code exists. These are mostly ignored for validation purposes.
+ */
+__weak bool arch_is_embedded_insn(struct symbol *sym)
+{
+ return false;
+}
+
static struct reloc *insn_reloc(struct objtool_file *file, struct instruction *insn)
{
struct reloc *reloc;
@@ -1583,7 +1600,7 @@ static int add_jump_destinations(struct objtool_file *file)
* middle of another instruction. Objtool only
* knows about the outer instruction.
*/
- if (sym && sym->return_thunk) {
+ if (sym && sym->embedded_insn) {
add_return_call(file, insn, false);
continue;
}
@@ -2502,6 +2519,9 @@ static int classify_symbols(struct objtool_file *file)
if (arch_is_rethunk(func))
func->return_thunk = true;
+ if (arch_is_embedded_insn(func))
+ func->embedded_insn = true;
+
if (arch_ftrace_match(func->name))
func->fentry = true;
diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h
index 2b6d2ce4f9a5..0b303eba660e 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -90,6 +90,7 @@ int arch_decode_hint_reg(u8 sp_reg, int *base);
bool arch_is_retpoline(struct symbol *sym);
bool arch_is_rethunk(struct symbol *sym);
+bool arch_is_embedded_insn(struct symbol *sym);
int arch_rewrite_retpolines(struct objtool_file *file);
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index c532d70864dc..9f71e988eca4 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -66,6 +66,7 @@ struct symbol {
u8 fentry : 1;
u8 profiling_func : 1;
u8 warned : 1;
+ u8 embedded_insn : 1;
struct list_head pv_target;
struct reloc *relocs;
};
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082139-thong-stainless-d304@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
6f612579be9d ("Merge tag 'objtool-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4ae68b26c3ab5a82aa271e6e9fc9b1a06e1d6b40 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon, 14 Aug 2023 13:44:29 +0200
Subject: [PATCH] objtool/x86: Fix SRSO mess
Objtool --rethunk does two things:
- it collects all (tail) call's of __x86_return_thunk and places them
into .return_sites. These are typically compiler generated, but
RET also emits this same.
- it fudges the validation of the __x86_return_thunk symbol; because
this symbol is inside another instruction, it can't actually find
the instruction pointed to by the symbol offset and gets upset.
Because these two things pertained to the same symbol, there was no
pressing need to separate these two separate things.
However, alas, along comes SRSO and more crazy things to deal with
appeared.
The SRSO patch itself added the following symbol names to identify as
rethunk:
'srso_untrain_ret', 'srso_safe_ret' and '__ret'
Where '__ret' is the old retbleed return thunk, 'srso_safe_ret' is a
new similarly embedded return thunk, and 'srso_untrain_ret' is
completely unrelated to anything the above does (and was only included
because of that INT3 vs UD2 issue fixed previous).
Clear things up by adding a second category for the embedded instruction
thing.
Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Link: https://lore.kernel.org/r/20230814121148.704502245@infradead.org
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 2d51fa8da9e8..cba8a7be040e 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -824,8 +824,11 @@ bool arch_is_retpoline(struct symbol *sym)
bool arch_is_rethunk(struct symbol *sym)
{
- return !strcmp(sym->name, "__x86_return_thunk") ||
- !strcmp(sym->name, "srso_untrain_ret") ||
- !strcmp(sym->name, "srso_safe_ret") ||
- !strcmp(sym->name, "__ret");
+ return !strcmp(sym->name, "__x86_return_thunk");
+}
+
+bool arch_is_embedded_insn(struct symbol *sym)
+{
+ return !strcmp(sym->name, "__ret") ||
+ !strcmp(sym->name, "srso_safe_ret");
}
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e2ee10ce7703..191656ee9fbc 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -455,7 +455,7 @@ static int decode_instructions(struct objtool_file *file)
return -1;
}
- if (func->return_thunk || func->alias != func)
+ if (func->embedded_insn || func->alias != func)
continue;
if (!find_insn(file, sec, func->offset)) {
@@ -1288,16 +1288,33 @@ static int add_ignore_alternatives(struct objtool_file *file)
return 0;
}
+/*
+ * Symbols that replace INSN_CALL_DYNAMIC, every (tail) call to such a symbol
+ * will be added to the .retpoline_sites section.
+ */
__weak bool arch_is_retpoline(struct symbol *sym)
{
return false;
}
+/*
+ * Symbols that replace INSN_RETURN, every (tail) call to such a symbol
+ * will be added to the .return_sites section.
+ */
__weak bool arch_is_rethunk(struct symbol *sym)
{
return false;
}
+/*
+ * Symbols that are embedded inside other instructions, because sometimes crazy
+ * code exists. These are mostly ignored for validation purposes.
+ */
+__weak bool arch_is_embedded_insn(struct symbol *sym)
+{
+ return false;
+}
+
static struct reloc *insn_reloc(struct objtool_file *file, struct instruction *insn)
{
struct reloc *reloc;
@@ -1583,7 +1600,7 @@ static int add_jump_destinations(struct objtool_file *file)
* middle of another instruction. Objtool only
* knows about the outer instruction.
*/
- if (sym && sym->return_thunk) {
+ if (sym && sym->embedded_insn) {
add_return_call(file, insn, false);
continue;
}
@@ -2502,6 +2519,9 @@ static int classify_symbols(struct objtool_file *file)
if (arch_is_rethunk(func))
func->return_thunk = true;
+ if (arch_is_embedded_insn(func))
+ func->embedded_insn = true;
+
if (arch_ftrace_match(func->name))
func->fentry = true;
diff --git a/tools/objtool/include/objtool/arch.h b/tools/objtool/include/objtool/arch.h
index 2b6d2ce4f9a5..0b303eba660e 100644
--- a/tools/objtool/include/objtool/arch.h
+++ b/tools/objtool/include/objtool/arch.h
@@ -90,6 +90,7 @@ int arch_decode_hint_reg(u8 sp_reg, int *base);
bool arch_is_retpoline(struct symbol *sym);
bool arch_is_rethunk(struct symbol *sym);
+bool arch_is_embedded_insn(struct symbol *sym);
int arch_rewrite_retpolines(struct objtool_file *file);
diff --git a/tools/objtool/include/objtool/elf.h b/tools/objtool/include/objtool/elf.h
index c532d70864dc..9f71e988eca4 100644
--- a/tools/objtool/include/objtool/elf.h
+++ b/tools/objtool/include/objtool/elf.h
@@ -66,6 +66,7 @@ struct symbol {
u8 fentry : 1;
u8 profiling_func : 1;
u8 warned : 1;
+ u8 embedded_insn : 1;
struct list_head pv_target;
struct reloc *relocs;
};
Attn:
I'm an Investment Consultant in the United Kingdom, I specialize
in searching for potential investments opportunities for high
net-worth clients worldwide.
Should this be of interest to you, please do not hesitate to
email me for further information.
Kind regards,
David Brennan
eMail:davbrennanb@gmail.com
[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
// NOTE: HERE is the race!!! Function can be preempted!
// test_fw_config->reqs can change between the release of
// the lock about and acquire of the lock in the
// test_dev_config_update_u8()
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4, 4.19, 4.14
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg…
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
[ This is the patch to fix the racing condition in locking for the 5.4, ]
[ 4.19 and 4.14 stable branches. Not all the fixes from the upstream ]
[ commit apply, but those which do are verbatim equal to those in the ]
[ upstream commit. ]
---
v4:
verbatim the same patch as for the 5.4 stable tree which patchwork failed to apply.
lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index b5e779bcfb34..be3baea88b61 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -284,16 +284,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
- bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (strtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -323,7 +333,7 @@ static ssize_t test_dev_config_show_int(char *buf, int cfg)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
long new;
@@ -335,14 +345,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (new > U8_MAX)
return -EINVAL;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
{
u8 val;
@@ -375,10 +394,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.34.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5310760af1d4fbea1452bfc77db5f9a680f7ae47
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082116-unfilled-sprinkled-a76f@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5310760af1d4 ("ipvs: fix racy memcpy in proc_do_sync_threshold")
1b90af292e71 ("ipvs: Improve robustness to the ipvs sysctl")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5310760af1d4fbea1452bfc77db5f9a680f7ae47 Mon Sep 17 00:00:00 2001
From: Sishuai Gong <sishuai.system(a)gmail.com>
Date: Thu, 10 Aug 2023 15:12:42 -0400
Subject: [PATCH] ipvs: fix racy memcpy in proc_do_sync_threshold
When two threads run proc_do_sync_threshold() in parallel,
data races could happen between the two memcpy():
Thread-1 Thread-2
memcpy(val, valp, sizeof(val));
memcpy(valp, val, sizeof(val));
This race might mess up the (struct ctl_table *) table->data,
so we add a mutex lock to serialize them.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.c…
Signed-off-by: Sishuai Gong <sishuai.system(a)gmail.com>
Acked-by: Simon Horman <horms(a)kernel.org>
Acked-by: Julian Anastasov <ja(a)ssi.bg>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 62606fb44d02..4bb0d90eca1c 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1876,6 +1876,7 @@ static int
proc_do_sync_threshold(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ struct netns_ipvs *ipvs = table->extra2;
int *valp = table->data;
int val[2];
int rc;
@@ -1885,6 +1886,7 @@ proc_do_sync_threshold(struct ctl_table *table, int write,
.mode = table->mode,
};
+ mutex_lock(&ipvs->sync_mutex);
memcpy(val, valp, sizeof(val));
rc = proc_dointvec(&tmp, write, buffer, lenp, ppos);
if (write) {
@@ -1894,6 +1896,7 @@ proc_do_sync_threshold(struct ctl_table *table, int write,
else
memcpy(valp, val, sizeof(val));
}
+ mutex_unlock(&ipvs->sync_mutex);
return rc;
}
@@ -4321,6 +4324,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
ipvs->sysctl_sync_threshold[0] = DEFAULT_SYNC_THRESHOLD;
ipvs->sysctl_sync_threshold[1] = DEFAULT_SYNC_PERIOD;
tbl[idx].data = &ipvs->sysctl_sync_threshold;
+ tbl[idx].extra2 = ipvs;
tbl[idx++].maxlen = sizeof(ipvs->sysctl_sync_threshold);
ipvs->sysctl_sync_refresh_period = DEFAULT_SYNC_REFRESH_PERIOD;
tbl[idx++].data = &ipvs->sysctl_sync_refresh_period;
From: Vladislav Efanov <VEfanov(a)ispras.ru>
commit 1e0d4adf17e7ef03281d7b16555e7c1508c8ed2d upstream
Bits, which are related to Bitmap Descriptor logical blocks,
are not reset when buffer headers are allocated for them. As the
result, these logical blocks can be treated as free and
be used for other blocks.This can cause usage of one buffer header
for several types of data. UDF issues WARNING in this situation:
WARNING: CPU: 0 PID: 2703 at fs/udf/inode.c:2014
__udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014
RIP: 0010:__udf_add_aext+0x685/0x7d0 fs/udf/inode.c:2014
Call Trace:
udf_setup_indirect_aext+0x573/0x880 fs/udf/inode.c:1980
udf_add_aext+0x208/0x2e0 fs/udf/inode.c:2067
udf_insert_aext fs/udf/inode.c:2233 [inline]
udf_update_extents fs/udf/inode.c:1181 [inline]
inode_getblk+0x1981/0x3b70 fs/udf/inode.c:885
Found by Linux Verification Center (linuxtesting.org) with syzkaller.
[JK: Somewhat cleaned up the boundary checks]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Vladislav Efanov <VEfanov(a)ispras.ru>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
Syzkaller reports this problem in 5.10 stable release. The problem has
been fixed by the following patch which can be cleanly applied to the
5.10 branch.
fs/udf/balloc.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/fs/udf/balloc.c b/fs/udf/balloc.c
index 8e597db4d971..ef50fd263315 100644
--- a/fs/udf/balloc.c
+++ b/fs/udf/balloc.c
@@ -36,18 +36,41 @@ static int read_block_bitmap(struct super_block *sb,
unsigned long bitmap_nr)
{
struct buffer_head *bh = NULL;
- int retval = 0;
+ int i;
+ int max_bits, off, count;
struct kernel_lb_addr loc;
loc.logicalBlockNum = bitmap->s_extPosition;
loc.partitionReferenceNum = UDF_SB(sb)->s_partition;
bh = udf_tread(sb, udf_get_lb_pblock(sb, &loc, block));
+ bitmap->s_block_bitmap[bitmap_nr] = bh;
if (!bh)
- retval = -EIO;
+ return -EIO;
- bitmap->s_block_bitmap[bitmap_nr] = bh;
- return retval;
+ /* Check consistency of Space Bitmap buffer. */
+ max_bits = sb->s_blocksize * 8;
+ if (!bitmap_nr) {
+ off = sizeof(struct spaceBitmapDesc) << 3;
+ count = min(max_bits - off, bitmap->s_nr_groups);
+ } else {
+ /*
+ * Rough check if bitmap number is too big to have any bitmap
+ * blocks reserved.
+ */
+ if (bitmap_nr >
+ (bitmap->s_nr_groups >> (sb->s_blocksize_bits + 3)) + 2)
+ return 0;
+ off = 0;
+ count = bitmap->s_nr_groups - bitmap_nr * max_bits +
+ (sizeof(struct spaceBitmapDesc) << 3);
+ count = min(count, max_bits);
+ }
+
+ for (i = 0; i < count; i++)
+ if (udf_test_bit(i + off, bh->b_data))
+ return -EFSCORRUPTED;
+ return 0;
}
static int __load_block_bitmap(struct super_block *sb,
--
2.34.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x adb9743d6a08778b78d62d16b4230346d3508986
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023081201-exhale-bonelike-1800@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From adb9743d6a08778b78d62d16b4230346d3508986 Mon Sep 17 00:00:00 2001
From: Qi Zheng <zhengqi.arch(a)bytedance.com>
Date: Sun, 25 Jun 2023 15:49:37 +0000
Subject: [PATCH] binder: fix memory leak in binder_init()
In binder_init(), the destruction of binder_alloc_shrinker_init() is not
performed in the wrong path, which will cause memory leaks. So this commit
introduces binder_alloc_shrinker_exit() and calls it in the wrong path to
fix that.
Signed-off-by: Qi Zheng <zhengqi.arch(a)bytedance.com>
Acked-by: Carlos Llamas <cmllamas(a)google.com>
Fixes: f2517eb76f1f ("android: binder: Add global lru shrinker to binder")
Cc: stable <stable(a)kernel.org>
Link: https://lore.kernel.org/r/20230625154937.64316-1-qi.zheng@linux.dev
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 486c8271cab7..d720f93d8b19 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -6617,6 +6617,7 @@ static int __init binder_init(void)
err_alloc_device_names_failed:
debugfs_remove_recursive(binder_debugfs_dir_entry_root);
+ binder_alloc_shrinker_exit();
return ret;
}
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 662a2a2e2e84..e3db8297095a 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -1087,6 +1087,12 @@ int binder_alloc_shrinker_init(void)
return ret;
}
+void binder_alloc_shrinker_exit(void)
+{
+ unregister_shrinker(&binder_shrinker);
+ list_lru_destroy(&binder_alloc_lru);
+}
+
/**
* check_buffer() - verify that buffer/offset is safe to access
* @alloc: binder_alloc for this proc
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 138d1d5af9ce..dc1e2b01dd64 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -129,6 +129,7 @@ extern struct binder_buffer *binder_alloc_new_buf(struct binder_alloc *alloc,
int pid);
extern void binder_alloc_init(struct binder_alloc *alloc);
extern int binder_alloc_shrinker_init(void);
+extern void binder_alloc_shrinker_exit(void);
extern void binder_alloc_vma_close(struct binder_alloc *alloc);
extern struct binder_buffer *
binder_alloc_prepare_to_free(struct binder_alloc *alloc,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 32c877191e022b55fe3a374f3d7e9fb5741c514d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023081202-unloader-t-shirt-eb23@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
d6ef19e25df2 ("mm/hugetlb: convert update_and_free_page() to folios")
cfd5082b5147 ("mm/hugetlb: convert remove_hugetlb_page() to folios")
1a7cdab59b22 ("mm/hugetlb: convert dissolve_free_huge_page() to folios")
911565b82853 ("mm/hugetlb: convert destroy_compound_gigantic_page() to folios")
cb67f4282bf9 ("mm,thp,rmap: simplify compound page mapcount handling")
dad6a5eb5556 ("mm,hugetlb: use folio fields in second tail page")
0356c4b96f68 ("mm/hugetlb: convert free_huge_page to folios")
de656ed376c4 ("mm/hugetlb_cgroup: convert set_hugetlb_cgroup*() to folios")
f074732d599e ("mm/hugetlb_cgroup: convert hugetlb_cgroup_from_page() to folios")
a098c977722c ("mm/hugetlb_cgroup: convert __set_hugetlb_cgroup() to folios")
4781593d5dba ("mm/hugetlb: unify clearing of RestoreReserve for private pages")
149562f75094 ("mm/hugetlb: add hugetlb_folio_subpool() helpers")
d340625f4849 ("mm: add private field of first tail to struct page and struct folio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 32c877191e022b55fe3a374f3d7e9fb5741c514d Mon Sep 17 00:00:00 2001
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Date: Tue, 11 Jul 2023 15:09:41 -0700
Subject: [PATCH] hugetlb: do not clear hugetlb dtor until allocating vmemmap
Patch series "Fix hugetlb free path race with memory errors".
In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
HWPOISON hugepages" the race window was discovered.
https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
Freeing a hugetlb page back to low level memory allocators is performed
in two steps.
1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
2) Outside lock, allocate vmemmap if necessary and call low level free
Between these two steps, the hugetlb page will appear as a normal
compound page. However, vmemmap for tail pages could be missing.
If a memory error occurs at this time, we could try to update page
flags non-existant page structs.
A much more detailed description is in the first patch.
The first patch addresses the race window. However, it adds a
hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
free operation. This could lead to slowdowns if one is freeing a large
number of hugetlb pages.
The second path optimizes the update_and_free_pages_bulk routine to only
take the lock once in bulk operations.
The second patch is technically not a bug fix, but includes a Fixes tag
and Cc stable to avoid a performance regression. It can be combined with
the first, but was done separately make reviewing easier.
This patch (of 2):
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Link: https://lkml.kernel.org/r/20230711220942.43706-1-mike.kravetz@oracle.com
Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com
Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Tested-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jiaqi Yan <jiaqiyan(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 64a3239b6407..6da626bfb52e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1579,9 +1579,37 @@ static inline void destroy_compound_gigantic_folio(struct folio *folio,
unsigned int order) { }
#endif
+static inline void __clear_hugetlb_destructor(struct hstate *h,
+ struct folio *folio)
+{
+ lockdep_assert_held(&hugetlb_lock);
+
+ /*
+ * Very subtle
+ *
+ * For non-gigantic pages set the destructor to the normal compound
+ * page dtor. This is needed in case someone takes an additional
+ * temporary ref to the page, and freeing is delayed until they drop
+ * their reference.
+ *
+ * For gigantic pages set the destructor to the null dtor. This
+ * destructor will never be called. Before freeing the gigantic
+ * page destroy_compound_gigantic_folio will turn the folio into a
+ * simple group of pages. After this the destructor does not
+ * apply.
+ *
+ */
+ if (hstate_is_gigantic(h))
+ folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
+ else
+ folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
+}
+
/*
- * Remove hugetlb folio from lists, and update dtor so that the folio appears
- * as just a compound page.
+ * Remove hugetlb folio from lists.
+ * If vmemmap exists for the folio, update dtor so that the folio appears
+ * as just a compound page. Otherwise, wait until after allocating vmemmap
+ * to update dtor.
*
* A reference is held on the folio, except in the case of demote.
*
@@ -1612,31 +1640,19 @@ static void __remove_hugetlb_folio(struct hstate *h, struct folio *folio,
}
/*
- * Very subtle
- *
- * For non-gigantic pages set the destructor to the normal compound
- * page dtor. This is needed in case someone takes an additional
- * temporary ref to the page, and freeing is delayed until they drop
- * their reference.
- *
- * For gigantic pages set the destructor to the null dtor. This
- * destructor will never be called. Before freeing the gigantic
- * page destroy_compound_gigantic_folio will turn the folio into a
- * simple group of pages. After this the destructor does not
- * apply.
- *
- * This handles the case where more than one ref is held when and
- * after update_and_free_hugetlb_folio is called.
- *
- * In the case of demote we do not ref count the page as it will soon
- * be turned into a page of smaller size.
+ * We can only clear the hugetlb destructor after allocating vmemmap
+ * pages. Otherwise, someone (memory error handling) may try to write
+ * to tail struct pages.
+ */
+ if (!folio_test_hugetlb_vmemmap_optimized(folio))
+ __clear_hugetlb_destructor(h, folio);
+
+ /*
+ * In the case of demote we do not ref count the page as it will soon
+ * be turned into a page of smaller size.
*/
if (!demote)
folio_ref_unfreeze(folio, 1);
- if (hstate_is_gigantic(h))
- folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
- else
- folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
h->nr_huge_pages--;
h->nr_huge_pages_node[nid]--;
@@ -1705,6 +1721,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
{
int i;
struct page *subpage;
+ bool clear_dtor = folio_test_hugetlb_vmemmap_optimized(folio);
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;
@@ -1735,6 +1752,16 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
if (unlikely(folio_test_hwpoison(folio)))
folio_clear_hugetlb_hwpoison(folio);
+ /*
+ * If vmemmap pages were allocated above, then we need to clear the
+ * hugetlb destructor under the hugetlb lock.
+ */
+ if (clear_dtor) {
+ spin_lock_irq(&hugetlb_lock);
+ __clear_hugetlb_destructor(h, folio);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
for (i = 0; i < pages_per_huge_page(h); i++) {
subpage = folio_page(folio, i);
subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
Hi,
In addition to the hang fix patches recently this other patch is needed
for helping a case that ASSERT() catches.
74fa4c81aadf ("drm/amd/display: Implement workaround for writing to
OTG_PIXEL_RATE_DIV register")
Can you please take this to stable 6.1.y too?
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2466
Thanks,
Dear stable team,
I'm asking that
commit 3f61631d47f1 ("take care to handle NULL ->proc_lseek()")
gets backported to the stable and LTS kernels down to 5.10.
Background:
We are in the process of upgrading our kernels. One target kernel
is based on 5.15 LTS.
Here we found that, if proc file drivers do not implement proc_lseek,
user space crashes easily, because various library routines internally
perform lseek(2). The crash happens in proc_reg_llseek, where it
wants to jump to a NULL pointer.
We could, arguably, fix these drivers to use ".proc_lseek = no_llseek".
But this doesn't seem like a worthwhile path forward, considering that
latest Linux kernels (including 6.1 LTS) allow proc_lseek == NULL again
and *remove* no_lseek. Essentially, on HEAD, it's best practice to leave
proc_lseek == NULL.
Therefore, I ask that the above procfs fix gets backported so that our
drivers can work across all kernel versions, including latest 6.x.
I checked that this commit applies and works as expected on a board that
runs Linux 5.15, and the observed crash goes away.
Furthermore, I investigated that the fix applies to older LTS kernels, down
to 5.10. The lseek(2) path uses vfs_llseek() which checks for FMODE_LSEEK. This
has been like that forever since the initial git import. However, 5.4 LTS and
older kernels do not have "struct proc_ops".
Thank you in advance.
Best regards,
Thomas Martitz
Decoupling misfit from overutilized better helps handling misfit due to
uclamp_min only being misfit without triggering overutilized state, which is
bad from energy point of view as it prematurely disables energy aware
scheduling.
The series also makes the search for a better CPU under bad thermal condition
more comprehensive, which is useful improvement when the system is under bad
thermal condition.
Backports to 5.10.y and 5.15.y is hard as find_energy_efficient_cpu() is
different. But it applies cleanly on 6.1.y
Compile tested against various randconfigs for different archs.
Boot tested on android14-6.1 GKI kernel.
Based on v6.1.46
Original series
https://lore.kernel.org/lkml/20230201143628.270912-1-vincent.guittot@linaro…
Thanks!
--
Qais Yousef
Vincent Guittot (2):
sched/fair: unlink misfit task from cpu overutilized
sched/fair: Remove capacity inversion detection
kernel/sched/fair.c | 189 ++++++++++++++++++++-----------------------
kernel/sched/sched.h | 19 -----
2 files changed, 87 insertions(+), 121 deletions(-)
--
2.34.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f1740b1ab2703b2a057da7cf33b03297e0381aa0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082108-wham-accent-52a6@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f1740b1ab270 ("drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix")
3083b1007d4b ("drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f1740b1ab2703b2a057da7cf33b03297e0381aa0 Mon Sep 17 00:00:00 2001
From: Tim Huang <Tim.Huang(a)amd.com>
Date: Mon, 14 Aug 2023 15:13:04 +0800
Subject: [PATCH] drm/amdgpu: skip fence GFX interrupts disable/enable for S0ix
GFX v11.0.1 reported fence fallback timer expired issue on
SDMA and GFX rings after S0ix resume. This is generated by
EOP interrupts are disabled when S0ix suspend but fails to
re-enable when resume because of the GFX is in GFXOFF.
[ 203.349571] [drm] Fence fallback timer expired on ring sdma0
[ 203.349572] [drm] Fence fallback timer expired on ring gfx_0.0.0
[ 203.861635] [drm] Fence fallback timer expired on ring gfx_0.0.0
For S0ix, GFX is in GFXOFF state, avoid to touch the GFX registers
to configure the fence driver interrupts for rings that belong to GFX.
The interrupts configuration will be restored by GFXOFF exit.
Signed-off-by: Tim Huang <Tim.Huang(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index c694b41f6461..7537f5aa76f0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -551,6 +551,41 @@ int amdgpu_fence_driver_sw_init(struct amdgpu_device *adev)
return 0;
}
+/**
+ * amdgpu_fence_need_ring_interrupt_restore - helper function to check whether
+ * fence driver interrupts need to be restored.
+ *
+ * @ring: ring that to be checked
+ *
+ * Interrupts for rings that belong to GFX IP don't need to be restored
+ * when the target power state is s0ix.
+ *
+ * Return true if need to restore interrupts, false otherwise.
+ */
+static bool amdgpu_fence_need_ring_interrupt_restore(struct amdgpu_ring *ring)
+{
+ struct amdgpu_device *adev = ring->adev;
+ bool is_gfx_power_domain = false;
+
+ switch (ring->funcs->type) {
+ case AMDGPU_RING_TYPE_SDMA:
+ /* SDMA 5.x+ is part of GFX power domain so it's covered by GFXOFF */
+ if (adev->ip_versions[SDMA0_HWIP][0] >= IP_VERSION(5, 0, 0))
+ is_gfx_power_domain = true;
+ break;
+ case AMDGPU_RING_TYPE_GFX:
+ case AMDGPU_RING_TYPE_COMPUTE:
+ case AMDGPU_RING_TYPE_KIQ:
+ case AMDGPU_RING_TYPE_MES:
+ is_gfx_power_domain = true;
+ break;
+ default:
+ break;
+ }
+
+ return !(adev->in_s0ix && is_gfx_power_domain);
+}
+
/**
* amdgpu_fence_driver_hw_fini - tear down the fence driver
* for all possible rings.
@@ -579,7 +614,8 @@ void amdgpu_fence_driver_hw_fini(struct amdgpu_device *adev)
amdgpu_fence_driver_force_completion(ring);
if (!drm_dev_is_unplugged(adev_to_drm(adev)) &&
- ring->fence_drv.irq_src)
+ ring->fence_drv.irq_src &&
+ amdgpu_fence_need_ring_interrupt_restore(ring))
amdgpu_irq_put(adev, ring->fence_drv.irq_src,
ring->fence_drv.irq_type);
@@ -655,7 +691,8 @@ void amdgpu_fence_driver_hw_init(struct amdgpu_device *adev)
continue;
/* enable the interrupt */
- if (ring->fence_drv.irq_src)
+ if (ring->fence_drv.irq_src &&
+ amdgpu_fence_need_ring_interrupt_restore(ring))
amdgpu_irq_get(adev, ring->fence_drv.irq_src,
ring->fence_drv.irq_type);
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b6360a5ec31d160d58c1a64387b323b556cedca8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082143-underage-slain-d7a5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
b6360a5ec31d ("drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0")
61319b8e3b58 ("drm/amd/pm: disable the SMU13 OD feature support temporarily")
8f4f5f0b901a ("drm/amd/pm: fulfill SMU13 OD settings init and restore")
f6c0cd55fed8 ("drm/amd/pm: Enable ecc_info table support for smu v13_0_10")
1794f6a9535b ("drm/amd/pm: enable GPO dynamic control support for SMU13.0.0")
48aa62f07467 ("drm/amd/pm: Enable bad memory page/channel recording support for smu v13_0_0")
8ae5a38c8cb3 ("drm/amd/pm: enable runpm support over BACO for SMU13.0.0")
60cfad329ab8 ("drm/amd/pm: enable mode1 reset on smu_v13_0_10")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b6360a5ec31d160d58c1a64387b323b556cedca8 Mon Sep 17 00:00:00 2001
From: Kenneth Feng <kenneth.feng(a)amd.com>
Date: Wed, 9 Aug 2023 18:06:05 +0800
Subject: [PATCH] drm/amd/pm: disallow the fan setting if there is no fan on
smu 13.0.0
drm/amd/pm: disallow the fan setting if there is no fan on smu 13.0.0
V2: depend on pm.no_fan to check
Signed-off-by: Kenneth Feng <kenneth.feng(a)amd.com>
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index fddcd834bcec..0fb6be11a0cc 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -331,6 +331,7 @@ static int smu_v13_0_0_check_powerplay_table(struct smu_context *smu)
struct smu_13_0_0_powerplay_table *powerplay_table =
table_context->power_play_table;
struct smu_baco_context *smu_baco = &smu->smu_baco;
+ PPTable_t *pptable = smu->smu_table.driver_pptable;
#if 0
PPTable_t *pptable = smu->smu_table.driver_pptable;
const OverDriveLimits_t * const overdrive_upperlimits =
@@ -371,6 +372,9 @@ static int smu_v13_0_0_check_powerplay_table(struct smu_context *smu)
table_context->thermal_controller_type =
powerplay_table->thermal_controller_type;
+ smu->adev->pm.no_fan =
+ !(pptable->SkuTable.FeaturesToRun[0] & (1 << FEATURE_FAN_CONTROL_BIT));
+
return 0;
}