commit 8e2442a5f86e1f77b86401fce274a7f622740bc4 upstream.
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
---
scripts/kconfig/confdata.c | 7 +++----
scripts/kconfig/expr.h | 1 +
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 91d0a5c014ac..fd99ae90a618 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -834,11 +834,12 @@ int conf_write(const char *name)
"#\n"
"# %s\n"
"#\n", str);
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1024,8 +1025,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_split_config())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 7c329e179007..43a87f8ea738 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
--
2.17.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 82e10af2248d2d09c99834613f1b47d5002dc379 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 16 May 2019 10:55:21 -0500
Subject: [PATCH] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
I don't think this is userspace visible but SIGKILL does not have
any si_codes that use the fault member of the siginfo union. Correct
this the simple way and call force_sig instead of force_sig_fault when
the signal is SIGKILL.
The two know places where synchronous SIGKILL are generated are
do_bad_area and fpsimd_save. The call paths to force_sig_fault are:
do_bad_area
arm64_force_sig_fault
force_sig_fault
force_signal_inject
arm64_notify_die
arm64_force_sig_fault
force_sig_fault
Which means correcting this in arm64_force_sig_fault is enough
to ensure the arm64 code is not misusing the generic code, which
could lead to maintenance problems later.
Cc: stable(a)vger.kernel.org
Cc: Dave Martin <Dave.Martin(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
Fixes: af40ff687bc9 ("arm64: signal: Ensure si_code is valid for all fault signals")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index ade32046f3fe..e45d5b440fb1 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -256,7 +256,10 @@ void arm64_force_sig_fault(int signo, int code, void __user *addr,
const char *str)
{
arm64_show_signal(signo, str);
- force_sig_fault(signo, code, addr, current);
+ if (signo == SIGKILL)
+ force_sig(SIGKILL, current);
+ else
+ force_sig_fault(signo, code, addr, current);
}
void arm64_force_sig_mceerr(int code, void __user *addr, short lsb,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7a0cf094944e2540758b7f957eb6846d5126f535 Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Wed, 15 May 2019 22:54:56 -0500
Subject: [PATCH] signal: Correct namespace fixups of si_pid and si_uid
The function send_signal was split from __send_signal so that it would
be possible to bypass the namespace logic based upon current[1]. As it
turns out the si_pid and the si_uid fixup are both inappropriate in
the case of kill_pid_usb_asyncio so move that logic into send_signal.
It is difficult to arrange but possible for a signal with an si_code
of SI_TIMER or SI_SIGIO to be sent across namespace boundaries. In
which case tests for when it is ok to change si_pid and si_uid based
on SI_FROMUSER are incorrect. Replace the use of SI_FROMUSER with a
new test has_si_pid_and_used based on siginfo_layout.
Now that the uid fixup is no longer present after expanding
SEND_SIG_NOINFO properly calculate the si_uid that the target
task needs to read.
[1] 7978b567d315 ("signals: add from_ancestor_ns parameter to send_signal()")
Cc: stable(a)vger.kernel.org
Fixes: 6588c1e3ff01 ("signals: SI_USER: Masquerade si_pid when crossing pid ns boundary")
Fixes: 6b550f949594 ("user namespace: make signal.c respect user namespaces")
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/kernel/signal.c b/kernel/signal.c
index 18040d6bd63a..39a3eca5ce22 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1056,27 +1056,6 @@ static inline bool legacy_queue(struct sigpending *signals, int sig)
return (sig < SIGRTMIN) && sigismember(&signals->signal, sig);
}
-#ifdef CONFIG_USER_NS
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- if (current_user_ns() == task_cred_xxx(t, user_ns))
- return;
-
- if (SI_FROMKERNEL(info))
- return;
-
- rcu_read_lock();
- info->si_uid = from_kuid_munged(task_cred_xxx(t, user_ns),
- make_kuid(current_user_ns(), info->si_uid));
- rcu_read_unlock();
-}
-#else
-static inline void userns_fixup_signal_uid(struct kernel_siginfo *info, struct task_struct *t)
-{
- return;
-}
-#endif
-
static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type, int from_ancestor_ns)
{
@@ -1134,7 +1113,11 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
q->info.si_code = SI_USER;
q->info.si_pid = task_tgid_nr_ns(current,
task_active_pid_ns(t));
- q->info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
+ rcu_read_lock();
+ q->info.si_uid =
+ from_kuid_munged(task_cred_xxx(t, user_ns),
+ current_uid());
+ rcu_read_unlock();
break;
case (unsigned long) SEND_SIG_PRIV:
clear_siginfo(&q->info);
@@ -1146,13 +1129,8 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
break;
default:
copy_siginfo(&q->info, info);
- if (from_ancestor_ns)
- q->info.si_pid = 0;
break;
}
-
- userns_fixup_signal_uid(&q->info, t);
-
} else if (!is_si_special(info)) {
if (sig >= SIGRTMIN && info->si_code != SI_USER) {
/*
@@ -1196,6 +1174,28 @@ static int __send_signal(int sig, struct kernel_siginfo *info, struct task_struc
return ret;
}
+static inline bool has_si_pid_and_uid(struct kernel_siginfo *info)
+{
+ bool ret = false;
+ switch (siginfo_layout(info->si_signo, info->si_code)) {
+ case SIL_KILL:
+ case SIL_CHLD:
+ case SIL_RT:
+ ret = true;
+ break;
+ case SIL_TIMER:
+ case SIL_POLL:
+ case SIL_FAULT:
+ case SIL_FAULT_MCEERR:
+ case SIL_FAULT_BNDERR:
+ case SIL_FAULT_PKUERR:
+ case SIL_SYS:
+ ret = false;
+ break;
+ }
+ return ret;
+}
+
static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct *t,
enum pid_type type)
{
@@ -1205,7 +1205,20 @@ static int send_signal(int sig, struct kernel_siginfo *info, struct task_struct
from_ancestor_ns = si_fromuser(info) &&
!task_pid_nr_ns(current, task_active_pid_ns(t));
#endif
+ if (!is_si_special(info) && has_si_pid_and_uid(info)) {
+ struct user_namespace *t_user_ns;
+ rcu_read_lock();
+ t_user_ns = task_cred_xxx(t, user_ns);
+ if (current_user_ns() != t_user_ns) {
+ kuid_t uid = make_kuid(current_user_ns(), info->si_uid);
+ info->si_uid = from_kuid_munged(t_user_ns, uid);
+ }
+ rcu_read_unlock();
+
+ if (!task_pid_nr_ns(current, task_active_pid_ns(t)))
+ info->si_pid = 0;
+ }
return __send_signal(sig, info, t, type, from_ancestor_ns);
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 70f1b0d34bdf03065fe869e93cc17cad1ea20c4a Mon Sep 17 00:00:00 2001
From: "Eric W. Biederman" <ebiederm(a)xmission.com>
Date: Thu, 7 Feb 2019 19:44:12 -0600
Subject: [PATCH] signal/usb: Replace kill_pid_info_as_cred with
kill_pid_usb_asyncio
The usb support for asyncio encoded one of it's values in the wrong
field. It should have used si_value but instead used si_addr which is
not present in the _rt union member of struct siginfo.
The practical result of this is that on a 64bit big endian kernel
when delivering a signal to a 32bit process the si_addr field
is set to NULL, instead of the expected pointer value.
This issue can not be fixed in copy_siginfo_to_user32 as the usb
usage of the the _sigfault (aka si_addr) member of the siginfo
union when SI_ASYNCIO is set is incompatible with the POSIX and
glibc usage of the _rt member of the siginfo union.
Therefore replace kill_pid_info_as_cred with kill_pid_usb_asyncio a
dedicated function for this one specific case. There are no other
users of kill_pid_info_as_cred so this specialization should have no
impact on the amount of code in the kernel. Have kill_pid_usb_asyncio
take instead of a siginfo_t which is difficult and error prone, 3
arguments, a signal number, an errno value, and an address enconded as
a sigval_t. The encoding of the address as a sigval_t allows the
code that reads the userspace request for a signal to handle this
compat issue along with all of the other compat issues.
Add BUILD_BUG_ONs in kernel/signal.c to ensure that we can now place
the pointer value at the in si_pid (instead of si_addr). That is the
code now verifies that si_pid and si_addr always occur at the same
location. Further the code veries that for native structures a value
placed in si_pid and spilling into si_uid will appear in userspace in
si_addr (on a byte by byte copy of siginfo or a field by field copy of
siginfo). The code also verifies that for a 64bit kernel and a 32bit
userspace the 32bit pointer will fit in si_pid.
I have used the usbsig.c program below written by Alan Stern and
slightly tweaked by me to run on a big endian machine to verify the
issue exists (on sparc64) and to confirm the patch below fixes the issue.
/* usbsig.c -- test USB async signal delivery */
#define _GNU_SOURCE
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <sys/ioctl.h>
#include <unistd.h>
#include <endian.h>
#include <linux/usb/ch9.h>
#include <linux/usbdevice_fs.h>
static struct usbdevfs_urb urb;
static struct usbdevfs_disconnectsignal ds;
static volatile sig_atomic_t done = 0;
void urb_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p urb: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &urb);
printf("%s\n", (info->si_addr == &urb) ? "Good" : "Bad");
}
void ds_handler(int sig, siginfo_t *info , void *ucontext)
{
printf("Got signal %d, signo %d errno %d code %d addr: %p ds: %p\n",
sig, info->si_signo, info->si_errno, info->si_code,
info->si_addr, &ds);
printf("%s\n", (info->si_addr == &ds) ? "Good" : "Bad");
done = 1;
}
int main(int argc, char **argv)
{
char *devfilename;
int fd;
int rc;
struct sigaction act;
struct usb_ctrlrequest *req;
void *ptr;
char buf[80];
if (argc != 2) {
fprintf(stderr, "Usage: usbsig device-file-name\n");
return 1;
}
devfilename = argv[1];
fd = open(devfilename, O_RDWR);
if (fd == -1) {
perror("Error opening device file");
return 1;
}
act.sa_sigaction = urb_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR1, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
act.sa_sigaction = ds_handler;
sigemptyset(&act.sa_mask);
act.sa_flags = SA_SIGINFO;
rc = sigaction(SIGUSR2, &act, NULL);
if (rc == -1) {
perror("Error in sigaction");
return 1;
}
memset(&urb, 0, sizeof(urb));
urb.type = USBDEVFS_URB_TYPE_CONTROL;
urb.endpoint = USB_DIR_IN | 0;
urb.buffer = buf;
urb.buffer_length = sizeof(buf);
urb.signr = SIGUSR1;
req = (struct usb_ctrlrequest *) buf;
req->bRequestType = USB_DIR_IN | USB_TYPE_STANDARD | USB_RECIP_DEVICE;
req->bRequest = USB_REQ_GET_DESCRIPTOR;
req->wValue = htole16(USB_DT_DEVICE << 8);
req->wIndex = htole16(0);
req->wLength = htole16(sizeof(buf) - sizeof(*req));
rc = ioctl(fd, USBDEVFS_SUBMITURB, &urb);
if (rc == -1) {
perror("Error in SUBMITURB ioctl");
return 1;
}
rc = ioctl(fd, USBDEVFS_REAPURB, &ptr);
if (rc == -1) {
perror("Error in REAPURB ioctl");
return 1;
}
memset(&ds, 0, sizeof(ds));
ds.signr = SIGUSR2;
ds.context = &ds;
rc = ioctl(fd, USBDEVFS_DISCSIGNAL, &ds);
if (rc == -1) {
perror("Error in DISCSIGNAL ioctl");
return 1;
}
printf("Waiting for usb disconnect\n");
while (!done) {
sleep(1);
}
close(fd);
return 0;
}
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: linux-usb(a)vger.kernel.org
Cc: Alan Stern <stern(a)rowland.harvard.edu>
Cc: Oliver Neukum <oneukum(a)suse.com>
Fixes: v2.3.39
Cc: stable(a)vger.kernel.org
Acked-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com>
diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c
index fa783531ee88..a02448105527 100644
--- a/drivers/usb/core/devio.c
+++ b/drivers/usb/core/devio.c
@@ -63,7 +63,7 @@ struct usb_dev_state {
unsigned int discsignr;
struct pid *disc_pid;
const struct cred *cred;
- void __user *disccontext;
+ sigval_t disccontext;
unsigned long ifclaimed;
u32 disabled_bulk_eps;
bool privileges_dropped;
@@ -90,6 +90,7 @@ struct async {
unsigned int ifnum;
void __user *userbuffer;
void __user *userurb;
+ sigval_t userurb_sigval;
struct urb *urb;
struct usb_memory *usbm;
unsigned int mem_usage;
@@ -582,22 +583,19 @@ static void async_completed(struct urb *urb)
{
struct async *as = urb->context;
struct usb_dev_state *ps = as->ps;
- struct kernel_siginfo sinfo;
struct pid *pid = NULL;
const struct cred *cred = NULL;
unsigned long flags;
- int signr;
+ sigval_t addr;
+ int signr, errno;
spin_lock_irqsave(&ps->lock, flags);
list_move_tail(&as->asynclist, &ps->async_completed);
as->status = urb->status;
signr = as->signr;
if (signr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = as->signr;
- sinfo.si_errno = as->status;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = as->userurb;
+ errno = as->status;
+ addr = as->userurb_sigval;
pid = get_pid(as->pid);
cred = get_cred(as->cred);
}
@@ -615,7 +613,7 @@ static void async_completed(struct urb *urb)
spin_unlock_irqrestore(&ps->lock, flags);
if (signr) {
- kill_pid_info_as_cred(sinfo.si_signo, &sinfo, pid, cred);
+ kill_pid_usb_asyncio(signr, errno, addr, pid, cred);
put_pid(pid);
put_cred(cred);
}
@@ -1427,7 +1425,7 @@ find_memory_area(struct usb_dev_state *ps, const struct usbdevfs_urb *uurb)
static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb,
struct usbdevfs_iso_packet_desc __user *iso_frame_desc,
- void __user *arg)
+ void __user *arg, sigval_t userurb_sigval)
{
struct usbdevfs_iso_packet_desc *isopkt = NULL;
struct usb_host_endpoint *ep;
@@ -1727,6 +1725,7 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
isopkt = NULL;
as->ps = ps;
as->userurb = arg;
+ as->userurb_sigval = userurb_sigval;
if (as->usbm) {
unsigned long uurb_start = (unsigned long)uurb->buffer;
@@ -1801,13 +1800,17 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb
static int proc_submiturb(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (copy_from_user(&uurb, arg, sizeof(uurb)))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_ptr = arg;
+
return proc_do_submiturb(ps, &uurb,
(((struct usbdevfs_urb __user *)arg)->iso_frame_desc),
- arg);
+ arg, userurb_sigval);
}
static int proc_unlinkurb(struct usb_dev_state *ps, void __user *arg)
@@ -1977,7 +1980,7 @@ static int proc_disconnectsignal_compat(struct usb_dev_state *ps, void __user *a
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = compat_ptr(ds.context);
+ ps->disccontext.sival_int = ds.context;
return 0;
}
@@ -2005,13 +2008,17 @@ static int get_urb32(struct usbdevfs_urb *kurb,
static int proc_submiturb_compat(struct usb_dev_state *ps, void __user *arg)
{
struct usbdevfs_urb uurb;
+ sigval_t userurb_sigval;
if (get_urb32(&uurb, (struct usbdevfs_urb32 __user *)arg))
return -EFAULT;
+ memset(&userurb_sigval, 0, sizeof(userurb_sigval));
+ userurb_sigval.sival_int = ptr_to_compat(arg);
+
return proc_do_submiturb(ps, &uurb,
((struct usbdevfs_urb32 __user *)arg)->iso_frame_desc,
- arg);
+ arg, userurb_sigval);
}
static int processcompl_compat(struct async *as, void __user * __user *arg)
@@ -2092,7 +2099,7 @@ static int proc_disconnectsignal(struct usb_dev_state *ps, void __user *arg)
if (copy_from_user(&ds, arg, sizeof(ds)))
return -EFAULT;
ps->discsignr = ds.signr;
- ps->disccontext = ds.context;
+ ps->disccontext.sival_ptr = ds.context;
return 0;
}
@@ -2614,22 +2621,15 @@ const struct file_operations usbdev_file_operations = {
static void usbdev_remove(struct usb_device *udev)
{
struct usb_dev_state *ps;
- struct kernel_siginfo sinfo;
while (!list_empty(&udev->filelist)) {
ps = list_entry(udev->filelist.next, struct usb_dev_state, list);
destroy_all_async(ps);
wake_up_all(&ps->wait);
list_del_init(&ps->list);
- if (ps->discsignr) {
- clear_siginfo(&sinfo);
- sinfo.si_signo = ps->discsignr;
- sinfo.si_errno = EPIPE;
- sinfo.si_code = SI_ASYNCIO;
- sinfo.si_addr = ps->disccontext;
- kill_pid_info_as_cred(ps->discsignr, &sinfo,
- ps->disc_pid, ps->cred);
- }
+ if (ps->discsignr)
+ kill_pid_usb_asyncio(ps->discsignr, EPIPE, ps->disccontext,
+ ps->disc_pid, ps->cred);
}
}
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index 38a0f0785323..c68ca81db0a1 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -329,7 +329,7 @@ extern void force_sigsegv(int sig, struct task_struct *p);
extern int force_sig_info(int, struct kernel_siginfo *, struct task_struct *);
extern int __kill_pgrp_info(int sig, struct kernel_siginfo *info, struct pid *pgrp);
extern int kill_pid_info(int sig, struct kernel_siginfo *info, struct pid *pid);
-extern int kill_pid_info_as_cred(int, struct kernel_siginfo *, struct pid *,
+extern int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr, struct pid *,
const struct cred *);
extern int kill_pgrp(struct pid *pid, int sig, int priv);
extern int kill_pid(struct pid *pid, int sig, int priv);
diff --git a/kernel/signal.c b/kernel/signal.c
index a1eb44dc9ff5..18040d6bd63a 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1439,13 +1439,44 @@ static inline bool kill_as_cred_perm(const struct cred *cred,
uid_eq(cred->uid, pcred->uid);
}
-/* like kill_pid_info(), but doesn't use uid/euid of "current" */
-int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
- const struct cred *cred)
+/*
+ * The usb asyncio usage of siginfo is wrong. The glibc support
+ * for asyncio which uses SI_ASYNCIO assumes the layout is SIL_RT.
+ * AKA after the generic fields:
+ * kernel_pid_t si_pid;
+ * kernel_uid32_t si_uid;
+ * sigval_t si_value;
+ *
+ * Unfortunately when usb generates SI_ASYNCIO it assumes the layout
+ * after the generic fields is:
+ * void __user *si_addr;
+ *
+ * This is a practical problem when there is a 64bit big endian kernel
+ * and a 32bit userspace. As the 32bit address will encoded in the low
+ * 32bits of the pointer. Those low 32bits will be stored at higher
+ * address than appear in a 32 bit pointer. So userspace will not
+ * see the address it was expecting for it's completions.
+ *
+ * There is nothing in the encoding that can allow
+ * copy_siginfo_to_user32 to detect this confusion of formats, so
+ * handle this by requiring the caller of kill_pid_usb_asyncio to
+ * notice when this situration takes place and to store the 32bit
+ * pointer in sival_int, instead of sival_addr of the sigval_t addr
+ * parameter.
+ */
+int kill_pid_usb_asyncio(int sig, int errno, sigval_t addr,
+ struct pid *pid, const struct cred *cred)
{
- int ret = -EINVAL;
+ struct kernel_siginfo info;
struct task_struct *p;
unsigned long flags;
+ int ret = -EINVAL;
+
+ clear_siginfo(&info);
+ info.si_signo = sig;
+ info.si_errno = errno;
+ info.si_code = SI_ASYNCIO;
+ *((sigval_t *)&info.si_pid) = addr;
if (!valid_signal(sig))
return ret;
@@ -1456,17 +1487,17 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
ret = -ESRCH;
goto out_unlock;
}
- if (si_fromuser(info) && !kill_as_cred_perm(cred, p)) {
+ if (!kill_as_cred_perm(cred, p)) {
ret = -EPERM;
goto out_unlock;
}
- ret = security_task_kill(p, info, sig, cred);
+ ret = security_task_kill(p, &info, sig, cred);
if (ret)
goto out_unlock;
if (sig) {
if (lock_task_sighand(p, &flags)) {
- ret = __send_signal(sig, info, p, PIDTYPE_TGID, 0);
+ ret = __send_signal(sig, &info, p, PIDTYPE_TGID, 0);
unlock_task_sighand(p, &flags);
} else
ret = -ESRCH;
@@ -1475,7 +1506,7 @@ int kill_pid_info_as_cred(int sig, struct kernel_siginfo *info, struct pid *pid,
rcu_read_unlock();
return ret;
}
-EXPORT_SYMBOL_GPL(kill_pid_info_as_cred);
+EXPORT_SYMBOL_GPL(kill_pid_usb_asyncio);
/*
* kill_something_info() interprets pid in interesting ways just like kill(2).
@@ -4474,6 +4505,28 @@ static inline void siginfo_buildtime_checks(void)
CHECK_OFFSET(si_syscall);
CHECK_OFFSET(si_arch);
#undef CHECK_OFFSET
+
+ /* usb asyncio */
+ BUILD_BUG_ON(offsetof(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_addr));
+ if (sizeof(int) == sizeof(void __user *)) {
+ BUILD_BUG_ON(sizeof_field(struct siginfo, si_pid) !=
+ sizeof(void __user *));
+ } else {
+ BUILD_BUG_ON((sizeof_field(struct siginfo, si_pid) +
+ sizeof_field(struct siginfo, si_uid)) !=
+ sizeof(void __user *));
+ BUILD_BUG_ON(offsetofend(struct siginfo, si_pid) !=
+ offsetof(struct siginfo, si_uid));
+ }
+#ifdef CONFIG_COMPAT
+ BUILD_BUG_ON(offsetof(struct compat_siginfo, si_pid) !=
+ offsetof(struct compat_siginfo, si_addr));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof(compat_uptr_t));
+ BUILD_BUG_ON(sizeof_field(struct compat_siginfo, si_pid) !=
+ sizeof_field(struct siginfo, si_pid));
+#endif
}
void __init signals_init(void)
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd82d4bd21880b7c4d5f5756be435095d6ae07b5 Mon Sep 17 00:00:00 2001
From: Julien Thierry <julien.thierry(a)arm.com>
Date: Tue, 11 Jun 2019 10:38:10 +0100
Subject: [PATCH] arm64: Fix incorrect irqflag restore for priority masking
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].
In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.
GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.
[1] https://www.spinics.net/lists/arm-kernel/msg716956.html
Fixes: 4a503217ce37 ("arm64: irqflags: Use ICC_PMR_EL1 for interrupt masking")
Cc: <stable(a)vger.kernel.org> # 5.1.x-
Reported-by: Zenghui Yu <yuzenghui(a)huawei.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Wei Li <liwei391(a)huawei.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Christoffer Dall <christoffer.dall(a)arm.com>
Cc: James Morse <james.morse(a)arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose(a)arm.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Marc Zyngier <marc.zyngier(a)arm.com>
Signed-off-by: Julien Thierry <julien.thierry(a)arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
diff --git a/arch/arm64/include/asm/arch_gicv3.h b/arch/arm64/include/asm/arch_gicv3.h
index 14b41ddc68ba..9e991b628706 100644
--- a/arch/arm64/include/asm/arch_gicv3.h
+++ b/arch/arm64/include/asm/arch_gicv3.h
@@ -163,7 +163,9 @@ static inline bool gic_prio_masking_enabled(void)
static inline void gic_pmr_mask_irqs(void)
{
- BUILD_BUG_ON(GICD_INT_DEF_PRI <= GIC_PRIO_IRQOFF);
+ BUILD_BUG_ON(GICD_INT_DEF_PRI < (GIC_PRIO_IRQOFF |
+ GIC_PRIO_PSR_I_SET));
+ BUILD_BUG_ON(GICD_INT_DEF_PRI >= GIC_PRIO_IRQON);
gic_write_pmr(GIC_PRIO_IRQOFF);
}
diff --git a/arch/arm64/include/asm/daifflags.h b/arch/arm64/include/asm/daifflags.h
index db452aa9e651..f93204f319da 100644
--- a/arch/arm64/include/asm/daifflags.h
+++ b/arch/arm64/include/asm/daifflags.h
@@ -18,6 +18,7 @@
#include <linux/irqflags.h>
+#include <asm/arch_gicv3.h>
#include <asm/cpufeature.h>
#define DAIF_PROCCTX 0
@@ -32,6 +33,11 @@ static inline void local_daif_mask(void)
:
:
: "memory");
+
+ /* Don't really care for a dsb here, we don't intend to enable IRQs */
+ if (system_uses_irq_prio_masking())
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
+
trace_hardirqs_off();
}
@@ -43,7 +49,7 @@ static inline unsigned long local_daif_save(void)
if (system_uses_irq_prio_masking()) {
/* If IRQs are masked with PMR, reflect it in the flags */
- if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF)
+ if (read_sysreg_s(SYS_ICC_PMR_EL1) != GIC_PRIO_IRQON)
flags |= PSR_I_BIT;
}
@@ -59,36 +65,44 @@ static inline void local_daif_restore(unsigned long flags)
if (!irq_disabled) {
trace_hardirqs_on();
- if (system_uses_irq_prio_masking())
- arch_local_irq_enable();
- } else if (!(flags & PSR_A_BIT)) {
- /*
- * If interrupts are disabled but we can take
- * asynchronous errors, we can take NMIs
- */
if (system_uses_irq_prio_masking()) {
- flags &= ~PSR_I_BIT;
+ gic_write_pmr(GIC_PRIO_IRQON);
+ dsb(sy);
+ }
+ } else if (system_uses_irq_prio_masking()) {
+ u64 pmr;
+
+ if (!(flags & PSR_A_BIT)) {
/*
- * There has been concern that the write to daif
- * might be reordered before this write to PMR.
- * From the ARM ARM DDI 0487D.a, section D1.7.1
- * "Accessing PSTATE fields":
- * Writes to the PSTATE fields have side-effects on
- * various aspects of the PE operation. All of these
- * side-effects are guaranteed:
- * - Not to be visible to earlier instructions in
- * the execution stream.
- * - To be visible to later instructions in the
- * execution stream
- *
- * Also, writes to PMR are self-synchronizing, so no
- * interrupts with a lower priority than PMR is signaled
- * to the PE after the write.
- *
- * So we don't need additional synchronization here.
+ * If interrupts are disabled but we can take
+ * asynchronous errors, we can take NMIs
*/
- arch_local_irq_disable();
+ flags &= ~PSR_I_BIT;
+ pmr = GIC_PRIO_IRQOFF;
+ } else {
+ pmr = GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET;
}
+
+ /*
+ * There has been concern that the write to daif
+ * might be reordered before this write to PMR.
+ * From the ARM ARM DDI 0487D.a, section D1.7.1
+ * "Accessing PSTATE fields":
+ * Writes to the PSTATE fields have side-effects on
+ * various aspects of the PE operation. All of these
+ * side-effects are guaranteed:
+ * - Not to be visible to earlier instructions in
+ * the execution stream.
+ * - To be visible to later instructions in the
+ * execution stream
+ *
+ * Also, writes to PMR are self-synchronizing, so no
+ * interrupts with a lower priority than PMR is signaled
+ * to the PE after the write.
+ *
+ * So we don't need additional synchronization here.
+ */
+ gic_write_pmr(pmr);
}
write_sysreg(flags, daif);
diff --git a/arch/arm64/include/asm/irqflags.h b/arch/arm64/include/asm/irqflags.h
index fbe1aba6ffb3..a1372722f12e 100644
--- a/arch/arm64/include/asm/irqflags.h
+++ b/arch/arm64/include/asm/irqflags.h
@@ -67,43 +67,46 @@ static inline void arch_local_irq_disable(void)
*/
static inline unsigned long arch_local_save_flags(void)
{
- unsigned long daif_bits;
unsigned long flags;
- daif_bits = read_sysreg(daif);
-
- /*
- * The asm is logically equivalent to:
- *
- * if (system_uses_irq_prio_masking())
- * flags = (daif_bits & PSR_I_BIT) ?
- * GIC_PRIO_IRQOFF :
- * read_sysreg_s(SYS_ICC_PMR_EL1);
- * else
- * flags = daif_bits;
- */
asm volatile(ALTERNATIVE(
- "mov %0, %1\n"
- "nop\n"
- "nop",
- __mrs_s("%0", SYS_ICC_PMR_EL1)
- "ands %1, %1, " __stringify(PSR_I_BIT) "\n"
- "csel %0, %0, %2, eq",
- ARM64_HAS_IRQ_PRIO_MASKING)
- : "=&r" (flags), "+r" (daif_bits)
- : "r" ((unsigned long) GIC_PRIO_IRQOFF)
- : "cc", "memory");
+ "mrs %0, daif",
+ __mrs_s("%0", SYS_ICC_PMR_EL1),
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (flags)
+ :
+ : "memory");
return flags;
}
+static inline int arch_irqs_disabled_flags(unsigned long flags)
+{
+ int res;
+
+ asm volatile(ALTERNATIVE(
+ "and %w0, %w1, #" __stringify(PSR_I_BIT),
+ "eor %w0, %w1, #" __stringify(GIC_PRIO_IRQON),
+ ARM64_HAS_IRQ_PRIO_MASKING)
+ : "=&r" (res)
+ : "r" ((int) flags)
+ : "memory");
+
+ return res;
+}
+
static inline unsigned long arch_local_irq_save(void)
{
unsigned long flags;
flags = arch_local_save_flags();
- arch_local_irq_disable();
+ /*
+ * There are too many states with IRQs disabled, just keep the current
+ * state if interrupts are already disabled/masked.
+ */
+ if (!arch_irqs_disabled_flags(flags))
+ arch_local_irq_disable();
return flags;
}
@@ -124,21 +127,5 @@ static inline void arch_local_irq_restore(unsigned long flags)
: "memory");
}
-static inline int arch_irqs_disabled_flags(unsigned long flags)
-{
- int res;
-
- asm volatile(ALTERNATIVE(
- "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n"
- "nop",
- "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n"
- "cset %w0, ls",
- ARM64_HAS_IRQ_PRIO_MASKING)
- : "=&r" (res)
- : "r" ((int) flags)
- : "cc", "memory");
-
- return res;
-}
#endif
#endif
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 4bcd9c1291d5..33410635b015 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -608,11 +608,12 @@ static inline void kvm_arm_vhe_guest_enter(void)
* will not signal the CPU of interrupts of lower priority, and the
* only way to get out will be via guest exceptions.
* Naturally, we want to avoid this.
+ *
+ * local_daif_mask() already sets GIC_PRIO_PSR_I_SET, we just need a
+ * dsb to ensure the redistributor is forwards EL2 IRQs to the CPU.
*/
- if (system_uses_irq_prio_masking()) {
- gic_write_pmr(GIC_PRIO_IRQON);
+ if (system_uses_irq_prio_masking())
dsb(sy);
- }
}
static inline void kvm_arm_vhe_guest_exit(void)
diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
index b2de32939ada..da2242248466 100644
--- a/arch/arm64/include/asm/ptrace.h
+++ b/arch/arm64/include/asm/ptrace.h
@@ -35,9 +35,15 @@
* means masking more IRQs (or at least that the same IRQs remain masked).
*
* To mask interrupts, we clear the most significant bit of PMR.
+ *
+ * Some code sections either automatically switch back to PSR.I or explicitly
+ * require to not use priority masking. If bit GIC_PRIO_PSR_I_SET is included
+ * in the the priority mask, it indicates that PSR.I should be set and
+ * interrupt disabling temporarily does not rely on IRQ priorities.
*/
-#define GIC_PRIO_IRQON 0xf0
-#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+#define GIC_PRIO_IRQON 0xc0
+#define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80)
+#define GIC_PRIO_PSR_I_SET (1 << 4)
/* Additional SPSR bits not exposed in the UABI */
#define PSR_IL_BIT (1 << 20)
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index 6d5966346710..165da78815c5 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -258,6 +258,7 @@ alternative_else_nop_endif
/*
* Registers that may be useful after this macro is invoked:
*
+ * x20 - ICC_PMR_EL1
* x21 - aborted SP
* x22 - aborted PC
* x23 - aborted PSTATE
@@ -449,6 +450,24 @@ alternative_endif
.endm
#endif
+ .macro gic_prio_kentry_setup, tmp:req
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ mov \tmp, #(GIC_PRIO_PSR_I_SET | GIC_PRIO_IRQON)
+ msr_s SYS_ICC_PMR_EL1, \tmp
+ alternative_else_nop_endif
+#endif
+ .endm
+
+ .macro gic_prio_irq_setup, pmr:req, tmp:req
+#ifdef CONFIG_ARM64_PSEUDO_NMI
+ alternative_if ARM64_HAS_IRQ_PRIO_MASKING
+ orr \tmp, \pmr, #GIC_PRIO_PSR_I_SET
+ msr_s SYS_ICC_PMR_EL1, \tmp
+ alternative_else_nop_endif
+#endif
+ .endm
+
.text
/*
@@ -627,6 +646,7 @@ el1_dbg:
cmp x24, #ESR_ELx_EC_BRK64 // if BRK64
cinc x24, x24, eq // set bit '0'
tbz x24, #0, el1_inv // EL1 only
+ gic_prio_kentry_setup tmp=x3
mrs x0, far_el1
mov x2, sp // struct pt_regs
bl do_debug_exception
@@ -644,12 +664,10 @@ ENDPROC(el1_sync)
.align 6
el1_irq:
kernel_entry 1
+ gic_prio_irq_setup pmr=x20, tmp=x1
enable_da_f
#ifdef CONFIG_ARM64_PSEUDO_NMI
-alternative_if ARM64_HAS_IRQ_PRIO_MASKING
- ldr x20, [sp, #S_PMR_SAVE]
-alternative_else_nop_endif
test_irqs_unmasked res=x0, pmr=x20
cbz x0, 1f
bl asm_nmi_enter
@@ -679,8 +697,9 @@ alternative_else_nop_endif
#ifdef CONFIG_ARM64_PSEUDO_NMI
/*
- * if IRQs were disabled when we received the interrupt, we have an NMI
- * and we are not re-enabling interrupt upon eret. Skip tracing.
+ * When using IRQ priority masking, we can get spurious interrupts while
+ * PMR is set to GIC_PRIO_IRQOFF. An NMI might also have occurred in a
+ * section with interrupts disabled. Skip tracing in those cases.
*/
test_irqs_unmasked res=x0, pmr=x20
cbz x0, 1f
@@ -809,6 +828,7 @@ el0_ia:
* Instruction abort handling
*/
mrs x26, far_el1
+ gic_prio_kentry_setup tmp=x0
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -854,6 +874,7 @@ el0_sp_pc:
* Stack or PC alignment exception handling
*/
mrs x26, far_el1
+ gic_prio_kentry_setup tmp=x0
enable_da_f
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
@@ -888,6 +909,7 @@ el0_dbg:
* Debug exception handling
*/
tbnz x24, #0, el0_inv // EL0 only
+ gic_prio_kentry_setup tmp=x3
mrs x0, far_el1
mov x1, x25
mov x2, sp
@@ -909,7 +931,9 @@ ENDPROC(el0_sync)
el0_irq:
kernel_entry 0
el0_irq_naked:
+ gic_prio_irq_setup pmr=x20, tmp=x0
enable_da_f
+
#ifdef CONFIG_TRACE_IRQFLAGS
bl trace_hardirqs_off
#endif
@@ -931,6 +955,7 @@ ENDPROC(el0_irq)
el1_error:
kernel_entry 1
mrs x1, esr_el1
+ gic_prio_kentry_setup tmp=x2
enable_dbg
mov x0, sp
bl do_serror
@@ -941,6 +966,7 @@ el0_error:
kernel_entry 0
el0_error_naked:
mrs x1, esr_el1
+ gic_prio_kentry_setup tmp=x2
enable_dbg
mov x0, sp
bl do_serror
@@ -965,6 +991,7 @@ work_pending:
*/
ret_to_user:
disable_daif
+ gic_prio_kentry_setup tmp=x3
ldr x1, [tsk, #TSK_TI_FLAGS]
and x2, x1, #_TIF_WORK_MASK
cbnz x2, work_pending
@@ -981,6 +1008,7 @@ ENDPROC(ret_to_user)
*/
.align 6
el0_svc:
+ gic_prio_kentry_setup tmp=x1
mov x0, sp
bl el0_svc_handler
b ret_to_user
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 3767fb21a5b8..58efc3727778 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -94,7 +94,7 @@ static void __cpu_do_idle_irqprio(void)
* be raised.
*/
pmr = gic_read_pmr();
- gic_write_pmr(GIC_PRIO_IRQON);
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
__cpu_do_idle();
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index bb4b3f07761a..4deaee3c2a33 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -192,11 +192,13 @@ static void init_gic_priority_masking(void)
WARN_ON(!(cpuflags & PSR_I_BIT));
- gic_write_pmr(GIC_PRIO_IRQOFF);
-
/* We can only unmask PSR.I if we can take aborts */
- if (!(cpuflags & PSR_A_BIT))
+ if (!(cpuflags & PSR_A_BIT)) {
+ gic_write_pmr(GIC_PRIO_IRQOFF);
write_sysreg(cpuflags & ~PSR_I_BIT, daif);
+ } else {
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
+ }
}
/*
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8799e0c267d4..b89fcf0173b7 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -615,7 +615,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
* Naturally, we want to avoid this.
*/
if (system_uses_irq_prio_masking()) {
- gic_write_pmr(GIC_PRIO_IRQON);
+ gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
dsb(sy);
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 88dddc11a8d6b09201b4db9d255b3394d9bc9e57 Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Fri, 19 Jul 2019 18:41:10 +0200
Subject: [PATCH] KVM: nVMX: do not use dangling shadow VMCS after guest reset
If a KVM guest is reset while running a nested guest, free_nested will
disable the shadow VMCS execution control in the vmcs01. However,
on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync
the VMCS12 to the shadow VMCS which has since been freed.
This causes a vmptrld of a NULL pointer on my machime, but Jan reports
the host to hang altogether. Let's see how much this trivial patch fixes.
Reported-by: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Liran Alon <liran.alon(a)oracle.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 4f23e34f628b..0f1378789bd0 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -194,6 +194,7 @@ static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx)
{
secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS);
vmcs_write64(VMCS_LINK_POINTER, -1ull);
+ vmx->nested.need_vmcs12_to_shadow_sync = false;
}
static inline void nested_release_evmcs(struct kvm_vcpu *vcpu)
@@ -1341,6 +1342,9 @@ static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx)
unsigned long val;
int i;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
preempt_disable();
vmcs_load(shadow_vmcs);
@@ -1373,6 +1377,9 @@ static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx)
unsigned long val;
int i, q;
+ if (WARN_ON(!shadow_vmcs))
+ return;
+
vmcs_load(shadow_vmcs);
for (q = 0; q < ARRAY_SIZE(fields); q++) {
@@ -4436,7 +4443,6 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu)
/* copy to memory all shadowed fields in case
they were modified */
copy_shadow_to_vmcs12(vmx);
- vmx->nested.need_vmcs12_to_shadow_sync = false;
vmx_disable_shadow_vmcs(vmx);
}
vmx->nested.posted_intr_nv = -1;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4d763b168e9c5c366b05812c7bba7662e5ea3669 Mon Sep 17 00:00:00 2001
From: Wanpeng Li <wanpengli(a)tencent.com>
Date: Thu, 20 Jun 2019 17:00:02 +0800
Subject: [PATCH] KVM: VMX: check CPUID before allowing read/write of IA32_XSS
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Raise #GP when guest read/write IA32_XSS, but the CPUID bits
say that it shouldn't exist.
Fixes: 203000993de5 (kvm: vmx: add MSR logic for XSAVES)
Reported-by: Xiaoyao Li <xiaoyao.li(a)linux.intel.com>
Reported-by: Tao Xu <tao3.xu(a)intel.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b939a688ae83..a35459ce7e29 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1732,7 +1732,10 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
&msr_info->data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
msr_info->data = vcpu->arch.ia32_xss;
break;
@@ -1962,7 +1965,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
return 1;
return vmx_set_vmx_msr(vcpu, msr_index, data);
case MSR_IA32_XSS:
- if (!vmx_xsaves_supported())
+ if (!vmx_xsaves_supported() ||
+ (!msr_info->host_initiated &&
+ !(guest_cpuid_has(vcpu, X86_FEATURE_XSAVE) &&
+ guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))))
return 1;
/*
* The only supported bit as of Skylake is bit 8, but
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From beb8d93b3e423043e079ef3dda19dad7b28467a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Fri, 19 Apr 2019 22:50:55 -0700
Subject: [PATCH] KVM: VMX: Fix handling of #MC that occurs during VM-Entry
A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.
Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized. As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.
If a machine-check event occurs during a VM entry, one of the following occurs:
- The machine-check event is handled as if it occurred before the VM entry:
...
- The machine-check event is handled after VM entry completes:
...
- A VM-entry failure occurs as described in Section 26.7. The basic
exit reason is 41, for "VM-entry failure due to machine-check event".
Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
Fixes: b060ca3b2e9e7 ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d903f8909d1..1b3ca0582a0c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,28 +6107,21 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
{
- u32 exit_intr_info = 0;
- u16 basic_exit_reason = (u16)vmx->exit_reason;
-
- if (!(basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
- || basic_exit_reason == EXIT_REASON_EXCEPTION_NMI))
+ if (vmx->exit_reason != EXIT_REASON_EXCEPTION_NMI)
return;
- if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- vmx->exit_intr_info = exit_intr_info;
+ vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* if exit due to PF check for async PF */
- if (is_page_fault(exit_intr_info))
+ if (is_page_fault(vmx->exit_intr_info))
vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
/* Handle machine checks before interrupts are enabled */
- if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
- is_machine_check(exit_intr_info))
+ if (is_machine_check(vmx->exit_intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- if (is_nmi(exit_intr_info)) {
+ if (is_nmi(vmx->exit_intr_info)) {
kvm_before_interrupt(&vmx->vcpu);
asm("int $2");
kvm_after_interrupt(&vmx->vcpu);
@@ -6535,6 +6528,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+ if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
+ kvm_machine_check();
+
if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
return;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From beb8d93b3e423043e079ef3dda19dad7b28467a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Fri, 19 Apr 2019 22:50:55 -0700
Subject: [PATCH] KVM: VMX: Fix handling of #MC that occurs during VM-Entry
A previous fix to prevent KVM from consuming stale VMCS state after a
failed VM-Entry inadvertantly blocked KVM's handling of machine checks
that occur during VM-Entry.
Per Intel's SDM, a #MC during VM-Entry is handled in one of three ways,
depending on when the #MC is recognoized. As it pertains to this bug
fix, the third case explicitly states EXIT_REASON_MCE_DURING_VMENTRY
is handled like any other VM-Exit during VM-Entry, i.e. sets bit 31 to
indicate the VM-Entry failed.
If a machine-check event occurs during a VM entry, one of the following occurs:
- The machine-check event is handled as if it occurred before the VM entry:
...
- The machine-check event is handled after VM entry completes:
...
- A VM-entry failure occurs as described in Section 26.7. The basic
exit reason is 41, for "VM-entry failure due to machine-check event".
Explicitly handle EXIT_REASON_MCE_DURING_VMENTRY as a one-off case in
vmx_vcpu_run() instead of binning it into vmx_complete_atomic_exit().
Doing so allows vmx_vcpu_run() to handle VMX_EXIT_REASONS_FAILED_VMENTRY
in a sane fashion and also simplifies vmx_complete_atomic_exit() since
VMCS.VM_EXIT_INTR_INFO is guaranteed to be fresh.
Fixes: b060ca3b2e9e7 ("kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 5d903f8909d1..1b3ca0582a0c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6107,28 +6107,21 @@ static void vmx_apicv_post_state_restore(struct kvm_vcpu *vcpu)
static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
{
- u32 exit_intr_info = 0;
- u16 basic_exit_reason = (u16)vmx->exit_reason;
-
- if (!(basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY
- || basic_exit_reason == EXIT_REASON_EXCEPTION_NMI))
+ if (vmx->exit_reason != EXIT_REASON_EXCEPTION_NMI)
return;
- if (!(vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
- exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
- vmx->exit_intr_info = exit_intr_info;
+ vmx->exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
/* if exit due to PF check for async PF */
- if (is_page_fault(exit_intr_info))
+ if (is_page_fault(vmx->exit_intr_info))
vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
/* Handle machine checks before interrupts are enabled */
- if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
- is_machine_check(exit_intr_info))
+ if (is_machine_check(vmx->exit_intr_info))
kvm_machine_check();
/* We need to handle NMIs before interrupts are enabled */
- if (is_nmi(exit_intr_info)) {
+ if (is_nmi(vmx->exit_intr_info)) {
kvm_before_interrupt(&vmx->vcpu);
asm("int $2");
kvm_after_interrupt(&vmx->vcpu);
@@ -6535,6 +6528,9 @@ static void vmx_vcpu_run(struct kvm_vcpu *vcpu)
vmx->idt_vectoring_info = 0;
vmx->exit_reason = vmx->fail ? 0xdead : vmcs_read32(VM_EXIT_REASON);
+ if ((u16)vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
+ kvm_machine_check();
+
if (vmx->fail || (vmx->exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY))
return;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 3b013a2972d5bc344d6eaa8f24fdfe268211e45f Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:28 -0700
Subject: [PATCH] KVM: nVMX: Always sync GUEST_BNDCFGS when it comes from
vmcs01
If L1 does not set VM_ENTRY_LOAD_BNDCFGS, then L1's BNDCFGS value must
be propagated to vmcs02 since KVM always runs with VM_ENTRY_LOAD_BNDCFGS
when MPX is supported. Because the value effectively comes from vmcs01,
vmcs02 must be updated even if vmcs12 is clean.
Fixes: 62cf9bd8118c4 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS")
Cc: stable(a)vger.kernel.org
Cc: Liran Alon <liran.alon(a)oracle.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6e82bbca2fe1..c4c0a45245b2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2228,13 +2228,9 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
set_cr4_guest_host_mask(vmx);
- if (kvm_mpx_supported()) {
- if (vmx->nested.nested_run_pending &&
- (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
- vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
- else
- vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
- }
+ if (kvm_mpx_supported() && vmx->nested.nested_run_pending &&
+ (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS))
+ vmcs_write64(GUEST_BNDCFGS, vmcs12->guest_bndcfgs);
}
/*
@@ -2266,6 +2262,9 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
kvm_set_dr(vcpu, 7, vcpu->arch.dr7);
vmcs_write64(GUEST_IA32_DEBUGCTL, vmx->nested.vmcs01_debugctl);
}
+ if (kvm_mpx_supported() && (!vmx->nested.nested_run_pending ||
+ !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_BNDCFGS)))
+ vmcs_write64(GUEST_BNDCFGS, vmx->nested.vmcs01_guest_bndcfgs);
vmx_set_rflags(vcpu, vmcs12->guest_rflags);
/* EXCEPTION_BITMAP and CR0_GUEST_HOST_MASK should basically be the
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d28f4290b53a157191ed9991ad05dffe9e8c0c89 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <sean.j.christopherson(a)intel.com>
Date: Tue, 7 May 2019 09:06:27 -0700
Subject: [PATCH] KVM: VMX: Always signal #GP on WRMSR to MSR_IA32_CR_PAT with
bad value
The behavior of WRMSR is in no way dependent on whether or not KVM
consumes the value.
Fixes: 4566654bb9be9 ("KVM: vmx: Inject #GP on invalid PAT CR")
Cc: stable(a)vger.kernel.org
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Signed-off-by: Sean Christopherson <sean.j.christopherson(a)intel.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 1a87a91e98dc..091610684d28 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1894,9 +1894,10 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
MSR_TYPE_W);
break;
case MSR_IA32_CR_PAT:
+ if (!kvm_pat_valid(data))
+ return 1;
+
if (vmcs_config.vmentry_ctrl & VM_ENTRY_LOAD_IA32_PAT) {
- if (!kvm_pat_valid(data))
- return 1;
vmcs_write64(GUEST_IA32_PAT, data);
vcpu->arch.pat = data;
break;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fbc571290d9f7bfe089c50f4ac4028dd98ebfe98 Mon Sep 17 00:00:00 2001
From: Kailang Yang <kailang(a)realtek.com>
Date: Mon, 15 Jul 2019 10:41:50 +0800
Subject: [PATCH] ALSA: hda/realtek - Fixed Headphone Mic can't record on Dell
platform
It assigned to wrong model. So, The headphone Mic can't work.
Fixes: 3f640970a414 ("ALSA: hda - Fix headset mic detection problem for several Dell laptops")
Signed-off-by: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index f24a757f8239..1c84c12b39b3 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -7657,9 +7657,12 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60130},
{0x17, 0x90170110},
{0x21, 0x03211020}),
- SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x14, 0x90170110},
{0x21, 0x04211020}),
+ SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
+ {0x14, 0x90170110},
+ {0x21, 0x04211030}),
SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE,
ALC295_STANDARD_PINS,
{0x17, 0x21014020},
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8e2442a5f86e1f77b86401fce274a7f622740bc4 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Date: Fri, 12 Jul 2019 15:07:09 +0900
Subject: [PATCH] kconfig: fix missing choice values in auto.conf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 501fdcc5e999..1134892599da 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -895,7 +895,8 @@ int conf_write(const char *name)
"# %s\n"
"#\n", str);
need_newline = false;
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
@@ -903,7 +904,7 @@ int conf_write(const char *name)
fprintf(out, "\n");
need_newline = false;
}
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1063,8 +1064,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_touch_deps())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 8dde65bc3165..017843c9a4f4 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8e2442a5f86e1f77b86401fce274a7f622740bc4 Mon Sep 17 00:00:00 2001
From: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Date: Fri, 12 Jul 2019 15:07:09 +0900
Subject: [PATCH] kconfig: fix missing choice values in auto.conf
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Since commit 00c864f8903d ("kconfig: allow all config targets to write
auto.conf if missing"), Kconfig creates include/config/auto.conf in the
defconfig stage when it is missing.
Joonas Kylmälä reported incorrect auto.conf generation under some
circumstances.
To reproduce it, apply the following diff:
| --- a/arch/arm/configs/imx_v6_v7_defconfig
| +++ b/arch/arm/configs/imx_v6_v7_defconfig
| @@ -345,14 +345,7 @@ CONFIG_USB_CONFIGFS_F_MIDI=y
| CONFIG_USB_CONFIGFS_F_HID=y
| CONFIG_USB_CONFIGFS_F_UVC=y
| CONFIG_USB_CONFIGFS_F_PRINTER=y
| -CONFIG_USB_ZERO=m
| -CONFIG_USB_AUDIO=m
| -CONFIG_USB_ETH=m
| -CONFIG_USB_G_NCM=m
| -CONFIG_USB_GADGETFS=m
| -CONFIG_USB_FUNCTIONFS=m
| -CONFIG_USB_MASS_STORAGE=m
| -CONFIG_USB_G_SERIAL=m
| +CONFIG_USB_FUNCTIONFS=y
| CONFIG_MMC=y
| CONFIG_MMC_SDHCI=y
| CONFIG_MMC_SDHCI_PLTFM=y
And then, run:
$ make ARCH=arm mrproper imx_v6_v7_defconfig
You will see CONFIG_USB_FUNCTIONFS=y is correctly contained in the
.config, but not in the auto.conf.
Please note drivers/usb/gadget/legacy/Kconfig is included from a choice
block in drivers/usb/gadget/Kconfig. So USB_FUNCTIONFS is a choice value.
This is probably a similar situation described in commit beaaddb62540
("kconfig: tests: test defconfig when two choices interact").
When sym_calc_choice() is called, the choice symbol forgets the
SYMBOL_DEF_USER unless all of its choice values are explicitly set by
the user.
The choice symbol is given just one chance to recall it because
set_all_choice_values() is called if SYMBOL_NEED_SET_CHOICE_VALUES
is set.
When sym_calc_choice() is called again, the choice symbol forgets it
forever, since SYMBOL_NEED_SET_CHOICE_VALUES is a one-time aid.
Hence, we cannot call sym_clear_all_valid() again and again.
It is crazy to repeat set and unset of internal flags. However, we
cannot simply get rid of "sym->flags &= flags | ~SYMBOL_DEF_USER;"
Doing so would re-introduce the problem solved by commit 5d09598d488f
("kconfig: fix new choices being skipped upon config update").
To work around the issue, conf_write_autoconf() stopped calling
sym_clear_all_valid().
conf_write() must be changed accordingly. Currently, it clears
SYMBOL_WRITE after the symbol is written into the .config file. This
is needed to prevent it from writing the same symbol multiple times in
case the symbol is declared in two or more locations. I added the new
flag SYMBOL_WRITTEN, to track the symbols that have been written.
Anyway, this is a cheesy workaround in order to suppress the issue
as far as defconfig is concerned.
Handling of choices is totally broken. sym_clear_all_valid() is called
every time a user touches a symbol from the GUI interface. To reproduce
it, just add a new symbol drivers/usb/gadget/legacy/Kconfig, then touch
around unrelated symbols from menuconfig. USB_FUNCTIONFS will disappear
from the .config file.
I added the Fixes tag since it is more fatal than before. But, this
has been broken since long long time before, and still it is.
We should take a closer look to fix this correctly somehow.
Fixes: 00c864f8903d ("kconfig: allow all config targets to write auto.conf if missing")
Cc: linux-stable <stable(a)vger.kernel.org> # 4.19+
Reported-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
Signed-off-by: Masahiro Yamada <yamada.masahiro(a)socionext.com>
Tested-by: Joonas Kylmälä <joonas.kylmala(a)iki.fi>
diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c
index 501fdcc5e999..1134892599da 100644
--- a/scripts/kconfig/confdata.c
+++ b/scripts/kconfig/confdata.c
@@ -895,7 +895,8 @@ int conf_write(const char *name)
"# %s\n"
"#\n", str);
need_newline = false;
- } else if (!(sym->flags & SYMBOL_CHOICE)) {
+ } else if (!(sym->flags & SYMBOL_CHOICE) &&
+ !(sym->flags & SYMBOL_WRITTEN)) {
sym_calc_value(sym);
if (!(sym->flags & SYMBOL_WRITE))
goto next;
@@ -903,7 +904,7 @@ int conf_write(const char *name)
fprintf(out, "\n");
need_newline = false;
}
- sym->flags &= ~SYMBOL_WRITE;
+ sym->flags |= SYMBOL_WRITTEN;
conf_write_symbol(out, sym, &kconfig_printer_cb, NULL);
}
@@ -1063,8 +1064,6 @@ int conf_write_autoconf(int overwrite)
if (!overwrite && is_present(autoconf_name))
return 0;
- sym_clear_all_valid();
-
conf_write_dep("include/config/auto.conf.cmd");
if (conf_touch_deps())
diff --git a/scripts/kconfig/expr.h b/scripts/kconfig/expr.h
index 8dde65bc3165..017843c9a4f4 100644
--- a/scripts/kconfig/expr.h
+++ b/scripts/kconfig/expr.h
@@ -141,6 +141,7 @@ struct symbol {
#define SYMBOL_OPTIONAL 0x0100 /* choice is optional - values can be 'n' */
#define SYMBOL_WRITE 0x0200 /* write symbol to file (KCONFIG_CONFIG) */
#define SYMBOL_CHANGED 0x0400 /* ? */
+#define SYMBOL_WRITTEN 0x0800 /* track info to avoid double-write to .config */
#define SYMBOL_NO_WRITE 0x1000 /* Symbol for internal use only; it will not be written */
#define SYMBOL_CHECKED 0x2000 /* used during dependency checking */
#define SYMBOL_WARNED 0x8000 /* warning has been issued */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d9771f5ec46c282d518b453c793635dbdc3a2a94 Mon Sep 17 00:00:00 2001
From: Xiao Ni <xni(a)redhat.com>
Date: Fri, 14 Jun 2019 15:41:05 -0700
Subject: [PATCH] raid5-cache: Need to do start() part job after adding journal
device
commit d5d885fd514f ("md: introduce new personality funciton start()")
splits the init job to two parts. The first part run() does the jobs that
do not require the md threads. The second part start() does the jobs that
require the md threads.
Now it just does run() in adding new journal device. It needs to do the
second part start() too.
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Cc: stable(a)vger.kernel.org #v4.9+
Reported-by: Michal Soltys <soltys(a)ziu.info>
Signed-off-by: Xiao Ni <xni(a)redhat.com>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b83bce2beb66..da94cbaa1a9e 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7672,7 +7672,7 @@ static int raid5_remove_disk(struct mddev *mddev, struct md_rdev *rdev)
static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
{
struct r5conf *conf = mddev->private;
- int err = -EEXIST;
+ int ret, err = -EEXIST;
int disk;
struct disk_info *p;
int first = 0;
@@ -7687,7 +7687,14 @@ static int raid5_add_disk(struct mddev *mddev, struct md_rdev *rdev)
* The array is in readonly mode if journal is missing, so no
* write requests running. We should be safe
*/
- log_init(conf, rdev, false);
+ ret = log_init(conf, rdev, false);
+ if (ret)
+ return ret;
+
+ ret = r5l_start(conf->log);
+ if (ret)
+ return ret;
+
return 0;
}
if (mddev->recovery_disabled == conf->recovery_disabled)
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2c928c93173f220955030e8440517b87ec7df92 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Fri, 21 Jun 2019 12:33:56 +0100
Subject: [PATCH] ASoC: core: Adapt for debugfs API change
Back in ff9fb72bc07705c (debugfs: return error values, not NULL) the
debugfs APIs were changed to return error pointers rather than NULL
pointers on error, breaking the error checking in ASoC. Update the
code to use IS_ERR() and log the codes that are returned as part of
the error messages.
Fixes: ff9fb72bc07705c (debugfs: return error values, not NULL)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 9138fcb15cd3..6aeba0d66ec5 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -158,9 +158,10 @@ static void soc_init_component_debugfs(struct snd_soc_component *component)
component->card->debugfs_card_root);
}
- if (!component->debugfs_root) {
+ if (IS_ERR(component->debugfs_root)) {
dev_warn(component->dev,
- "ASoC: Failed to create component debugfs directory\n");
+ "ASoC: Failed to create component debugfs directory: %ld\n",
+ PTR_ERR(component->debugfs_root));
return;
}
@@ -212,18 +213,21 @@ static void soc_init_card_debugfs(struct snd_soc_card *card)
card->debugfs_card_root = debugfs_create_dir(card->name,
snd_soc_debugfs_root);
- if (!card->debugfs_card_root) {
+ if (IS_ERR(card->debugfs_card_root)) {
dev_warn(card->dev,
- "ASoC: Failed to create card debugfs directory\n");
+ "ASoC: Failed to create card debugfs directory: %ld\n",
+ PTR_ERR(card->debugfs_card_root));
+ card->debugfs_card_root = NULL;
return;
}
card->debugfs_pop_time = debugfs_create_u32("dapm_pop_time", 0644,
card->debugfs_card_root,
&card->pop_time);
- if (!card->debugfs_pop_time)
+ if (IS_ERR(card->debugfs_pop_time))
dev_warn(card->dev,
- "ASoC: Failed to create pop time debugfs file\n");
+ "ASoC: Failed to create pop time debugfs file: %ld\n",
+ PTR_ERR(card->debugfs_pop_time));
}
static void soc_cleanup_card_debugfs(struct snd_soc_card *card)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c2c928c93173f220955030e8440517b87ec7df92 Mon Sep 17 00:00:00 2001
From: Mark Brown <broonie(a)kernel.org>
Date: Fri, 21 Jun 2019 12:33:56 +0100
Subject: [PATCH] ASoC: core: Adapt for debugfs API change
Back in ff9fb72bc07705c (debugfs: return error values, not NULL) the
debugfs APIs were changed to return error pointers rather than NULL
pointers on error, breaking the error checking in ASoC. Update the
code to use IS_ERR() and log the codes that are returned as part of
the error messages.
Fixes: ff9fb72bc07705c (debugfs: return error values, not NULL)
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 9138fcb15cd3..6aeba0d66ec5 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -158,9 +158,10 @@ static void soc_init_component_debugfs(struct snd_soc_component *component)
component->card->debugfs_card_root);
}
- if (!component->debugfs_root) {
+ if (IS_ERR(component->debugfs_root)) {
dev_warn(component->dev,
- "ASoC: Failed to create component debugfs directory\n");
+ "ASoC: Failed to create component debugfs directory: %ld\n",
+ PTR_ERR(component->debugfs_root));
return;
}
@@ -212,18 +213,21 @@ static void soc_init_card_debugfs(struct snd_soc_card *card)
card->debugfs_card_root = debugfs_create_dir(card->name,
snd_soc_debugfs_root);
- if (!card->debugfs_card_root) {
+ if (IS_ERR(card->debugfs_card_root)) {
dev_warn(card->dev,
- "ASoC: Failed to create card debugfs directory\n");
+ "ASoC: Failed to create card debugfs directory: %ld\n",
+ PTR_ERR(card->debugfs_card_root));
+ card->debugfs_card_root = NULL;
return;
}
card->debugfs_pop_time = debugfs_create_u32("dapm_pop_time", 0644,
card->debugfs_card_root,
&card->pop_time);
- if (!card->debugfs_pop_time)
+ if (IS_ERR(card->debugfs_pop_time))
dev_warn(card->dev,
- "ASoC: Failed to create pop time debugfs file\n");
+ "ASoC: Failed to create pop time debugfs file: %ld\n",
+ PTR_ERR(card->debugfs_pop_time));
}
static void soc_cleanup_card_debugfs(struct snd_soc_card *card)
Hi,
[This is an automated email]
This commit has been processed because it contains a "Fixes:" tag,
fixing commit: d03360aaf5cc pNFS: Ensure we return the error if someone kills a waiting layoutget.
The bot has tested the following trees: v5.2.1, v5.1.18, v4.19.59.
v5.2.1: Build OK!
v5.1.18: Build OK!
v4.19.59: Failed to apply! Possible dependencies:
400417b05f3e ("pNFS: Fix a typo in pnfs_update_layout")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks,
Sasha
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 5.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 5.2-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 50a260e859964002dab162513a10f91ae9d3bcd3 Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:58 +0800
Subject: [PATCH] bcache: fix race in btree_flush_write()
There is a race between mca_reap(), btree_node_free() and journal code
btree_flush_write(), which results very rare and strange deadlock or
panic and are very hard to reproduce.
Let me explain how the race happens. In btree_flush_write() one btree
node with oldest journal pin is selected, then it is flushed to cache
device, the select-and-flush is a two steps operation. Between these two
steps, there are something may happen inside the race window,
- The selected btree node was reaped by mca_reap() and allocated to
other requesters for other btree node.
- The slected btree node was selected, flushed and released by mca
shrink callback bch_mca_scan().
When btree_flush_write() tries to flush the selected btree node, firstly
b->write_lock is held by mutex_lock(). If the race happens and the
memory of selected btree node is allocated to other btree node, if that
btree node's write_lock is held already, a deadlock very probably
happens here. A worse case is the memory of the selected btree node is
released, then all references to this btree node (e.g. b->write_lock)
will trigger NULL pointer deference panic.
This race was introduced in commit cafe56359144 ("bcache: A block layer
cache"), and enlarged by commit c4dc2497d50d ("bcache: fix high CPU
occupancy during journal"), which selected 128 btree nodes and flushed
them one-by-one in a quite long time period.
Such race is not easy to reproduce before. On a Lenovo SR650 server with
48 Xeon cores, and configure 1 NVMe SSD as cache device, a MD raid0
device assembled by 3 NVMe SSDs as backing device, this race can be
observed around every 10,000 times btree_flush_write() gets called. Both
deadlock and kernel panic all happened as aftermath of the race.
The idea of the fix is to add a btree flag BTREE_NODE_journal_flush. It
is set when selecting btree nodes, and cleared after btree nodes
flushed. Then when mca_reap() selects a btree node with this bit set,
this btree node will be skipped. Since mca_reap() only reaps btree node
without BTREE_NODE_journal_flush flag, such race is avoided.
Once corner case should be noticed, that is btree_node_free(). It might
be called in some error handling code path. For example the following
code piece from btree_split(),
2149 err_free2:
2150 bkey_put(b->c, &n2->key);
2151 btree_node_free(n2);
2152 rw_unlock(true, n2);
2153 err_free1:
2154 bkey_put(b->c, &n1->key);
2155 btree_node_free(n1);
2156 rw_unlock(true, n1);
At line 2151 and 2155, the btree node n2 and n1 are released without
mac_reap(), so BTREE_NODE_journal_flush also needs to be checked here.
If btree_node_free() is called directly in such error handling path,
and the selected btree node has BTREE_NODE_journal_flush bit set, just
delay for 1 us and retry again. In this case this btree node won't
be skipped, just retry until the BTREE_NODE_journal_flush bit cleared,
and free the btree node memory.
Fixes: cafe56359144 ("bcache: A block layer cache")
Signed-off-by: Coly Li <colyli(a)suse.de>
Reported-and-tested-by: kbuild test robot <lkp(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 846306c3a887..ba434d9ac720 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -35,7 +35,7 @@
#include <linux/rcupdate.h>
#include <linux/sched/clock.h>
#include <linux/rculist.h>
-
+#include <linux/delay.h>
#include <trace/events/bcache.h>
/*
@@ -659,12 +659,25 @@ static int mca_reap(struct btree *b, unsigned int min_order, bool flush)
up(&b->io_mutex);
}
+retry:
/*
* BTREE_NODE_dirty might be cleared in btree_flush_btree() by
* __bch_btree_node_write(). To avoid an extra flush, acquire
* b->write_lock before checking BTREE_NODE_dirty bit.
*/
mutex_lock(&b->write_lock);
+ /*
+ * If this btree node is selected in btree_flush_write() by journal
+ * code, delay and retry until the node is flushed by journal code
+ * and BTREE_NODE_journal_flush bit cleared by btree_flush_write().
+ */
+ if (btree_node_journal_flush(b)) {
+ pr_debug("bnode %p is flushing by journal, retry", b);
+ mutex_unlock(&b->write_lock);
+ udelay(1);
+ goto retry;
+ }
+
if (btree_node_dirty(b))
__bch_btree_node_write(b, &cl);
mutex_unlock(&b->write_lock);
@@ -1081,7 +1094,20 @@ static void btree_node_free(struct btree *b)
BUG_ON(b == b->c->root);
+retry:
mutex_lock(&b->write_lock);
+ /*
+ * If the btree node is selected and flushing in btree_flush_write(),
+ * delay and retry until the BTREE_NODE_journal_flush bit cleared,
+ * then it is safe to free the btree node here. Otherwise this btree
+ * node will be in race condition.
+ */
+ if (btree_node_journal_flush(b)) {
+ mutex_unlock(&b->write_lock);
+ pr_debug("bnode %p journal_flush set, retry", b);
+ udelay(1);
+ goto retry;
+ }
if (btree_node_dirty(b)) {
btree_complete_write(b, btree_current_write(b));
diff --git a/drivers/md/bcache/btree.h b/drivers/md/bcache/btree.h
index d1c72ef64edf..76cfd121a486 100644
--- a/drivers/md/bcache/btree.h
+++ b/drivers/md/bcache/btree.h
@@ -158,11 +158,13 @@ enum btree_flags {
BTREE_NODE_io_error,
BTREE_NODE_dirty,
BTREE_NODE_write_idx,
+ BTREE_NODE_journal_flush,
};
BTREE_FLAG(io_error);
BTREE_FLAG(dirty);
BTREE_FLAG(write_idx);
+BTREE_FLAG(journal_flush);
static inline struct btree_write *btree_current_write(struct btree *b)
{
diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index 1218e3cada3c..a1e3e1fcea6e 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -430,6 +430,7 @@ static void btree_flush_write(struct cache_set *c)
retry:
best = NULL;
+ mutex_lock(&c->bucket_lock);
for_each_cached_btree(b, c, i)
if (btree_current_write(b)->journal) {
if (!best)
@@ -442,15 +443,21 @@ static void btree_flush_write(struct cache_set *c)
}
b = best;
+ if (b)
+ set_btree_node_journal_flush(b);
+ mutex_unlock(&c->bucket_lock);
+
if (b) {
mutex_lock(&b->write_lock);
if (!btree_current_write(b)->journal) {
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
/* We raced */
goto retry;
}
__bch_btree_node_write(b, NULL);
+ clear_bit(BTREE_NODE_journal_flush, &b->flags);
mutex_unlock(&b->write_lock);
}
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f54d801dda14942dbefa00541d10603015b7859c Mon Sep 17 00:00:00 2001
From: Coly Li <colyli(a)suse.de>
Date: Fri, 28 Jun 2019 19:59:44 +0800
Subject: [PATCH] bcache: destroy dc->writeback_write_wq if failed to create
dc->writeback_thread
Commit 9baf30972b55 ("bcache: fix for gc and write-back race") added a
new work queue dc->writeback_write_wq, but forgot to destroy it in the
error condition when creating dc->writeback_thread failed.
This patch destroys dc->writeback_write_wq if kthread_create() returns
error pointer to dc->writeback_thread, then a memory leak is avoided.
Fixes: 9baf30972b55 ("bcache: fix for gc and write-back race")
Signed-off-by: Coly Li <colyli(a)suse.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 262f7ef20992..21081febcb59 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -833,6 +833,7 @@ int bch_cached_dev_writeback_start(struct cached_dev *dc)
"bcache_writeback");
if (IS_ERR(dc->writeback_thread)) {
cached_dev_put(dc);
+ destroy_workqueue(dc->writeback_write_wq);
return PTR_ERR(dc->writeback_thread);
}
dc->writeback_running = true;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ed527b13d800dd515a9e6c582f0a73eca65b2e1b Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Date: Fri, 31 May 2019 10:13:06 +0200
Subject: [PATCH] crypto: caam - limit output IV to CBC to work around CTR mode
DMA issue
The CAAM driver currently violates an undocumented and slightly
controversial requirement imposed by the crypto stack that a buffer
referred to by the request structure via its virtual address may not
be modified while any scatterlists passed via the same request
structure are mapped for inbound DMA.
This may result in errors like
alg: aead: decryption failed on test 1 for gcm_base(ctr-aes-caam,ghash-generic): ret=74
alg: aead: Failed to load transform for gcm(aes): -2
on non-cache coherent systems, due to the fact that the GCM driver
passes an IV buffer by virtual address which shares a cacheline with
the auth_tag buffer passed via a scatterlist, resulting in corruption
of the auth_tag when the IV is updated while the DMA mapping is live.
Since the IV that is returned to the caller is only valid for CBC mode,
and given that the in-kernel users of CBC (such as CTS) don't trigger the
same issue as the GCM driver, let's just disable the output IV generation
for all modes except CBC for the time being.
Fixes: 854b06f76879 ("crypto: caam - properly set IV after {en,de}crypt")
Cc: Horia Geanta <horia.geanta(a)nxp.com>
Cc: Iuliana Prodan <iuliana.prodan(a)nxp.com>
Reported-by: Sascha Hauer <s.hauer(a)pengutronix.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Reviewed-by: Horia Geanta <horia.geanta(a)nxp.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index 1efa6f5b62cf..4b03c967009b 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -977,6 +977,7 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
struct skcipher_request *req = context;
struct skcipher_edesc *edesc;
struct crypto_skcipher *skcipher = crypto_skcipher_reqtfm(req);
+ struct caam_ctx *ctx = crypto_skcipher_ctx(skcipher);
int ivsize = crypto_skcipher_ivsize(skcipher);
dev_dbg(jrdev, "%s %d: err 0x%x\n", __func__, __LINE__, err);
@@ -990,9 +991,9 @@ static void skcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block. This is used e.g. by the CTS mode.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->dst, req->cryptlen -
ivsize, ivsize, 0);
@@ -1836,9 +1837,9 @@ static int skcipher_decrypt(struct skcipher_request *req)
/*
* The crypto API expects us to set the IV (req->iv) to the last
- * ciphertext block.
+ * ciphertext block when running in CBC mode.
*/
- if (ivsize)
+ if ((ctx->cdata.algtype & OP_ALG_AAI_MASK) == OP_ALG_AAI_CBC)
scatterwalk_map_and_copy(req->iv, req->src, req->cryptlen -
ivsize, ivsize, 0);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 106d45f350c7cac876844dc685845cba4ffdb70b Mon Sep 17 00:00:00 2001
From: Benjamin Block <bblock(a)linux.ibm.com>
Date: Tue, 2 Jul 2019 23:02:01 +0200
Subject: [PATCH] scsi: zfcp: fix request object use-after-free in send path
causing wrong traces
When tracing instances where we open and close WKA ports, we also pass the
request-ID of the respective FSF command.
But after successfully sending the FSF command we must not use the
request-object anymore, as this might result in an use-after-free (see
"zfcp: fix request object use-after-free in send path causing seqno
errors" ).
To fix this add a new variable that caches the request-ID before sending
the request. This won't change during the hand-off to the FCP channel,
and so it's safe to trace this cached request-ID later, instead of using
the request object.
Signed-off-by: Benjamin Block <bblock(a)linux.ibm.com>
Fixes: d27a7cb91960 ("zfcp: trace on request for open and close of WKA port")
Cc: <stable(a)vger.kernel.org> #2.6.38+
Reviewed-by: Steffen Maier <maier(a)linux.ibm.com>
Reviewed-by: Jens Remus <jremus(a)linux.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index c5b2615b49ef..296bbc3c4606 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1627,6 +1627,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1649,6 +1650,8 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
hton24(req->qtcb->bottom.support.d_id, wka_port->d_id);
req->data = wka_port;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1657,7 +1660,7 @@ int zfcp_fsf_open_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fsowp_1", wka_port, req_id);
return retval;
}
@@ -1683,6 +1686,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
{
struct zfcp_qdio *qdio = wka_port->adapter->qdio;
struct zfcp_fsf_req *req;
+ unsigned long req_id = 0;
int retval = -EIO;
spin_lock_irq(&qdio->req_q_lock);
@@ -1705,6 +1709,8 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
req->data = wka_port;
req->qtcb->header.port_handle = wka_port->handle;
+ req_id = req->req_id;
+
zfcp_fsf_start_timer(req, ZFCP_FSF_REQUEST_TIMEOUT);
retval = zfcp_fsf_req_send(req);
if (retval)
@@ -1713,7 +1719,7 @@ int zfcp_fsf_close_wka_port(struct zfcp_fc_wka_port *wka_port)
out:
spin_unlock_irq(&qdio->req_q_lock);
if (!retval)
- zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req->req_id);
+ zfcp_dbf_rec_run_wka("fscwp_1", wka_port, req_id);
return retval;
}
This reverts commit 240c35a3783ab9b3a0afaba0dde7291295680a6b
("kvm: x86: Use task structs fpu field for user", 2018-11-06).
The commit is broken and causes QEMU's FPU state to be destroyed
when KVM_RUN is preempted.
Fixes: 240c35a3783a ("kvm: x86: Use task structs fpu field for user")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/include/asm/kvm_host.h | 7 ++++---
arch/x86/kvm/x86.c | 4 ++--
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0cc5b611a113..b2f1ffb937af 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -607,15 +607,16 @@ struct kvm_vcpu_arch {
/*
* QEMU userspace and the guest each have their own FPU state.
- * In vcpu_run, we switch between the user, maintained in the
- * task_struct struct, and guest FPU contexts. While running a VCPU,
- * the VCPU thread will have the guest FPU context.
+ * In vcpu_run, we switch between the user and guest FPU contexts.
+ * While running a VCPU, the VCPU thread will have the guest FPU
+ * context.
*
* Note that while the PKRU state lives inside the fpu registers,
* it is switched out separately at VMENTER and VMEXIT time. The
* "guest_fpu" state here contains the guest FPU context, with the
* host PRKU bits.
*/
+ struct fpu user_fpu;
struct fpu *guest_fpu;
u64 xcr0;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 58305cf81182..cf2afdf8facf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8270,7 +8270,7 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu)
{
fpregs_lock();
- copy_fpregs_to_fpstate(¤t->thread.fpu);
+ copy_fpregs_to_fpstate(&vcpu->arch.user_fpu);
/* PKRU is separately restored in kvm_x86_ops->run. */
__copy_kernel_to_fpregs(&vcpu->arch.guest_fpu->state,
~XFEATURE_MASK_PKRU);
@@ -8287,7 +8287,7 @@ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
fpregs_lock();
copy_fpregs_to_fpstate(vcpu->arch.guest_fpu);
- copy_kernel_to_fpregs(¤t->thread.fpu.state);
+ copy_kernel_to_fpregs(&vcpu->arch.user_fpu.state);
fpregs_mark_activate();
fpregs_unlock();
--
1.8.3.1
The patch titled
Subject: mm/migrate.c: initialize pud_entry in migrate_vma()
has been added to the -mm tree. Its filename is
mm-migrate-initialize-pud_entry-in-migrate_vma.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-migrate-initialize-pud_entry-in…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-migrate-initialize-pud_entry-in…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Ralph Campbell <rcampbell(a)nvidia.com>
Subject: mm/migrate.c: initialize pud_entry in migrate_vma()
When CONFIG_MIGRATE_VMA_HELPER is enabled, migrate_vma() calls
migrate_vma_collect() which initializes a struct mm_walk but didn't
initialize mm_walk.pud_entry. (Found by code inspection) Use a C
structure initialization to make sure it is set to NULL.
Link: http://lkml.kernel.org/r/20190719233225.12243-1-rcampbell@nvidia.com
Fixes: 8763cb45ab967 ("mm/migrate: new memory migration helper for use with
device memory")
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 17 +++++++----------
1 file changed, 7 insertions(+), 10 deletions(-)
--- a/mm/migrate.c~mm-migrate-initialize-pud_entry-in-migrate_vma
+++ a/mm/migrate.c
@@ -2340,16 +2340,13 @@ next:
static void migrate_vma_collect(struct migrate_vma *migrate)
{
struct mmu_notifier_range range;
- struct mm_walk mm_walk;
-
- mm_walk.pmd_entry = migrate_vma_collect_pmd;
- mm_walk.pte_entry = NULL;
- mm_walk.pte_hole = migrate_vma_collect_hole;
- mm_walk.hugetlb_entry = NULL;
- mm_walk.test_walk = NULL;
- mm_walk.vma = migrate->vma;
- mm_walk.mm = migrate->vma->vm_mm;
- mm_walk.private = migrate;
+ struct mm_walk mm_walk = {
+ .pmd_entry = migrate_vma_collect_pmd,
+ .pte_hole = migrate_vma_collect_hole,
+ .vma = migrate->vma,
+ .mm = migrate->vma->vm_mm,
+ .private = migrate,
+ };
mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm_walk.mm,
migrate->start,
_
Patches currently in -mm which might be from rcampbell(a)nvidia.com are
mm-document-zone-device-struct-page-field-usage.patch
mm-hmm-fix-zone_device-anon-page-mapping-reuse.patch
mm-hmm-fix-bad-subpage-pointer-in-try_to_unmap_one.patch
mm-migrate-initialize-pud_entry-in-migrate_vma.patch
Currently handling of MADV_WILLNEED hint calls directly into readahead
code. Handle it by calling vfs_fadvise() instead so that filesystem can
use its ->fadvise() callback to acquire necessary locks or otherwise
prepare for the request.
Suggested-by: Amir Goldstein <amir73il(a)gmail.com>
CC: stable(a)vger.kernel.org # Needed by "xfs: Fix stale data exposure
when readahead races with hole punch"
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
mm/madvise.c | 22 ++++++++++++++++------
1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 628022e674a7..ae56d0ef337d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -14,6 +14,7 @@
#include <linux/userfaultfd_k.h>
#include <linux/hugetlb.h>
#include <linux/falloc.h>
+#include <linux/fadvise.h>
#include <linux/sched.h>
#include <linux/ksm.h>
#include <linux/fs.h>
@@ -275,6 +276,7 @@ static long madvise_willneed(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
struct file *file = vma->vm_file;
+ loff_t offset;
*prev = vma;
#ifdef CONFIG_SWAP
@@ -298,12 +300,20 @@ static long madvise_willneed(struct vm_area_struct *vma,
return 0;
}
- start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
- if (end > vma->vm_end)
- end = vma->vm_end;
- end = ((end - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
-
- force_page_cache_readahead(file->f_mapping, file, start, end - start);
+ /*
+ * Filesystem's fadvise may need to take various locks. We need to
+ * explicitly grab a reference because the vma (and hence the
+ * vma's reference to the file) can go away as soon as we drop
+ * mmap_sem.
+ */
+ *prev = NULL; /* tell sys_madvise we drop mmap_sem */
+ get_file(file);
+ up_read(¤t->mm->mmap_sem);
+ offset = (loff_t)(start - vma->vm_start)
+ + ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ vfs_fadvise(file, offset, end - start, POSIX_FADV_WILLNEED);
+ fput(file);
+ down_read(¤t->mm->mmap_sem);
return 0;
}
--
2.16.4
The patch titled
Subject: mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
has been added to the -mm tree. Its filename is
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-compaction-clear-total_migratef…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-compaction-clear-total_migratef…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: mm/compaction.c: clear total_{migrate,free}_scanned before scanning a new zone
total_{migrate,free}_scanned will be added to COMPACTMIGRATE_SCANNED and
COMPACTFREE_SCANNED in compact_zone(). We should clear them before
scanning a new zone. In the proc triggered compaction, we forgot clearing
them.
Link: http://lkml.kernel.org/r/1563789275-9639-1-git-send-email-laoar.shao@gmail.…
Fixes: 7f354a548d1c ("mm, compaction: add vmstats for kcompactd work")
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Yafang Shao <shaoyafang(a)didiglobal.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/compaction.c~mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone
+++ a/mm/compaction.c
@@ -2408,8 +2408,6 @@ static void compact_node(int nid)
struct zone *zone;
struct compact_control cc = {
.order = -1,
- .total_migrate_scanned = 0,
- .total_free_scanned = 0,
.mode = MIGRATE_SYNC,
.ignore_skip_hint = true,
.whole_zone = true,
@@ -2425,6 +2423,8 @@ static void compact_node(int nid)
cc.nr_freepages = 0;
cc.nr_migratepages = 0;
+ cc.total_migrate_scanned = 0;
+ cc.total_free_scanned = 0;
cc.zone = zone;
INIT_LIST_HEAD(&cc.freepages);
INIT_LIST_HEAD(&cc.migratepages);
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
mm-vmscan-expose-cgroup_ino-for-memcg-reclaim-tracepoints.patch
mm-compaction-clear-total_migratefree_scanned-before-scanning-a-new-zone.patch
The patch titled
Subject: ubsan: build ubsan.c more conservatively
has been added to the -mm tree. Its filename is
ubsan-build-ubsanc-more-conservatively.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ubsan-build-ubsanc-more-conservati…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ubsan-build-ubsanc-more-conservati…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Arnd Bergmann <arnd(a)arndb.de>
Subject: ubsan: build ubsan.c more conservatively
objtool points out several conditions that it does not like, depending on
the combination with other configuration options and compiler variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file as
we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is mixed
with the gcc specific -fno-conserve-stack option. According to Andrey
Ryabinin, that option is not even needed, dropping it here fixes the
stackprotector issue.
Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Reviewed-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel(a)linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/lib/Makefile~ubsan-build-ubsanc-more-conservatively
+++ a/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
_
Patches currently in -mm which might be from arnd(a)arndb.de are
kasan-remove-clang-version-check-for-kasan_stack.patch
ubsan-build-ubsanc-more-conservatively.patch
mm-sparse-fix-memory-leak-of-sparsemap_buf-in-aliged-memory-fix.patch
This patch series includes the following:
1. Adding compiler options to not use XMM registers in the purgatory code.
2. Reuse the implementation of memcpy and memset instead of relying on
__builtin_memcpy and __builtin_memset as it causes infinite recursion
in clang.
Nick Desaulniers (1):
x86/purgatory: do not use __builtin_memcpy and __builtin_memset.
Vaibhav Rustagi (1):
x86/purgatory: add -mno-sse, -mno-mmx, -mno-sse2 to Makefile
arch/x86/purgatory/Makefile | 4 ++++
arch/x86/purgatory/purgatory.c | 6 ++++++
arch/x86/purgatory/string.c | 23 -----------------------
3 files changed, 10 insertions(+), 23 deletions(-)
delete mode 100644 arch/x86/purgatory/string.c
--
2.22.0.510.g264f2c817a-goog
The patch titled
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
has been removed from the -mm tree. Its filename was
libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: libnvdimm/pfn: fix fsdax-mode namespace info-block zero-fields
At namespace creation time there is the potential for the "expected to be
zero" fields of a 'pfn' info-block to be filled with indeterminate data.
While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.
In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly initialized
to be guaranteed zero. Bump the minor version to indicate it is safe to
assume the 'padding' and 'flags' are zero. Otherwise, this corruption is
expected to benign since all other critical fields are explicitly
initialized.
Note The cc: stable is about spreading this new policy to as many kernels
as possible not fixing an issue in those kernels. It is not until the
change titled "libnvdimm/pfn: Stop padding pmem namespaces to section
alignment" where this improper initialization becomes a problem. So if
someone decides to backport "libnvdimm/pfn: Stop padding pmem namespaces
to section alignment" (which is not tagged for stable), make sure this
pre-requisite is flagged.
Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwil…
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Tested-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com> [ppc64]
Cc: <stable(a)vger.kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jane Chu <jane.chu(a)oracle.com>
Cc: Jeff Moyer <jmoyer(a)redhat.com>
Cc: Jérôme Glisse <jglisse(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/nvdimm/dax_devs.c | 2 +-
drivers/nvdimm/pfn.h | 1 +
drivers/nvdimm/pfn_devs.c | 18 +++++++++++++++---
3 files changed, 17 insertions(+), 4 deletions(-)
--- a/drivers/nvdimm/dax_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/dax_devs.c
@@ -118,7 +118,7 @@ int nd_dax_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!dax_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, DAX_SIG);
dev_dbg(dev, "dax: %s\n", rc == 0 ? dev_name(dax_dev) : "<none>");
--- a/drivers/nvdimm/pfn_devs.c~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn_devs.c
@@ -412,6 +412,15 @@ static int nd_pfn_clear_memmap_errors(st
return 0;
}
+/**
+ * nd_pfn_validate - read and validate info-block
+ * @nd_pfn: fsdax namespace runtime state / properties
+ * @sig: 'devdax' or 'fsdax' signature
+ *
+ * Upon return the info-block buffer contents (->pfn_sb) are
+ * indeterminate when validation fails, and a coherent info-block
+ * otherwise.
+ */
int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig)
{
u64 checksum, offset;
@@ -557,7 +566,7 @@ int nd_pfn_probe(struct device *dev, str
nvdimm_bus_unlock(&ndns->dev);
if (!pfn_dev)
return -ENOMEM;
- pfn_sb = devm_kzalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(dev, sizeof(*pfn_sb), GFP_KERNEL);
nd_pfn = to_nd_pfn(pfn_dev);
nd_pfn->pfn_sb = pfn_sb;
rc = nd_pfn_validate(nd_pfn, PFN_SIG);
@@ -693,7 +702,7 @@ static int nd_pfn_init(struct nd_pfn *nd
u64 checksum;
int rc;
- pfn_sb = devm_kzalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
+ pfn_sb = devm_kmalloc(&nd_pfn->dev, sizeof(*pfn_sb), GFP_KERNEL);
if (!pfn_sb)
return -ENOMEM;
@@ -702,11 +711,14 @@ static int nd_pfn_init(struct nd_pfn *nd
sig = DAX_SIG;
else
sig = PFN_SIG;
+
rc = nd_pfn_validate(nd_pfn, sig);
if (rc != -ENODEV)
return rc;
/* no info block, do init */;
+ memset(pfn_sb, 0, sizeof(*pfn_sb));
+
nd_region = to_nd_region(nd_pfn->dev.parent);
if (nd_region->ro) {
dev_info(&nd_pfn->dev,
@@ -759,7 +771,7 @@ static int nd_pfn_init(struct nd_pfn *nd
memcpy(pfn_sb->uuid, nd_pfn->uuid, 16);
memcpy(pfn_sb->parent_uuid, nd_dev_to_uuid(&ndns->dev), 16);
pfn_sb->version_major = cpu_to_le16(1);
- pfn_sb->version_minor = cpu_to_le16(2);
+ pfn_sb->version_minor = cpu_to_le16(3);
pfn_sb->start_pad = cpu_to_le32(start_pad);
pfn_sb->end_trunc = cpu_to_le32(end_trunc);
pfn_sb->align = cpu_to_le32(nd_pfn->align);
--- a/drivers/nvdimm/pfn.h~libnvdimm-pfn-fix-fsdax-mode-namespace-info-block-zero-fields
+++ a/drivers/nvdimm/pfn.h
@@ -28,6 +28,7 @@ struct nd_pfn_sb {
__le32 end_trunc;
/* minor-version-2 record the base alignment of the mapping */
__le32 align;
+ /* minor-version-3 guarantee the padding and flags are zero */
u8 padding[4000];
__le64 checksum;
};
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
The patch titled
Subject: resource: fix locking in find_next_iomem_res()
has been removed from the -mm tree. Its filename was
resource-fix-locking-in-find_next_iomem_res.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Nadav Amit <namit(a)vmware.com>
Subject: resource: fix locking in find_next_iomem_res()
Since resources can be removed, locking should ensure that the resource is
not removed while accessing it. However, find_next_iomem_res() does not
hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[akpm(a)linux-foundation.org: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/20190613045903.4922-2-namit@vmware.com
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <namit(a)vmware.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Dan Williams <dan.j.williams(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas(a)google.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/resource.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
--- a/kernel/resource.c~resource-fix-locking-in-find_next_iomem_res
+++ a/kernel/resource.c
@@ -326,7 +326,7 @@ EXPORT_SYMBOL(release_resource);
*
* If a resource is found, returns 0 and @*res is overwritten with the part
* of the resource that's within [@start..@end]; if none is found, returns
- * -1 or -EINVAL for other invalid parameters.
+ * -ENODEV. Returns -EINVAL for invalid parameters.
*
* This function walks the whole tree and not just first level children
* unless @first_lvl is true.
@@ -365,16 +365,16 @@ static int find_next_iomem_res(resource_
break;
}
- read_unlock(&resource_lock);
- if (!p)
- return -1;
+ if (p) {
+ /* copy data */
+ res->start = max(start, p->start);
+ res->end = min(end, p->end);
+ res->flags = p->flags;
+ res->desc = p->desc;
+ }
- /* copy data */
- res->start = max(start, p->start);
- res->end = min(end, p->end);
- res->flags = p->flags;
- res->desc = p->desc;
- return 0;
+ read_unlock(&resource_lock);
+ return p ? 0 : -ENODEV;
}
static int __walk_iomem_res_desc(resource_size_t start, resource_size_t end,
_
Patches currently in -mm which might be from namit(a)vmware.com are
Please apply commit a1078e821b605813 ("xen: let
alloc_xenballooned_pages() fail if not enough memory free")
to the stable kernel tree.
It is mitigating a security bug related to Xen (XSA-300).
Juergen
From: Brian Foster <bfoster(a)redhat.com>
commit 6958d11f77d45db80f7e22a21a74d4d5f44dc667 upstream.
We've had rather rare reports of bmap btree block corruption where
the bmap root block has a level count of zero. The root cause of the
corruption is so far unknown. We do have verifier checks to detect
this form of on-disk corruption, but this doesn't cover a memory
corruption variant of the problem. The latter is a reasonable
possibility because the root block is part of the inode fork and can
reside in-core for some time before inode extents are read.
If this occurs, it leads to a system crash such as the following:
BUG: unable to handle kernel paging request at ffffffff00000221
PF error: [normal kernel read fault]
...
RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs]
...
Call Trace:
xfs_iread_extents+0x379/0x540 [xfs]
xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs]
? xfs_attr_get+0xd1/0x120 [xfs]
? iomap_write_begin.constprop.40+0x2d0/0x2d0
xfs_file_iomap_begin+0x4c4/0x6d0 [xfs]
? __vfs_getxattr+0x53/0x70
? iomap_write_begin.constprop.40+0x2d0/0x2d0
iomap_apply+0x63/0x130
? iomap_write_begin.constprop.40+0x2d0/0x2d0
iomap_file_buffered_write+0x62/0x90
? iomap_write_begin.constprop.40+0x2d0/0x2d0
xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs]
__vfs_write+0x150/0x1b0
vfs_write+0xba/0x1c0
ksys_pwrite64+0x64/0xa0
do_syscall_64+0x5a/0x1d0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
The crash occurs because xfs_iread_extents() attempts to release an
uninitialized buffer pointer as the level == 0 value prevented the
buffer from ever being allocated or read. Change the level > 0
assert to an explicit error check in xfs_iread_extents() to avoid
crashing the kernel in the event of localized, in-core inode
corruption.
Signed-off-by: Brian Foster <bfoster(a)redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong(a)oracle.com>
[mcgrof: fixes kz#204223 ]
Signed-off-by: Luis Chamberlain <mcgrof(a)kernel.org>
---
fs/xfs/libxfs/xfs_bmap.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 3a496ffe6551..ab2465bc413a 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1178,7 +1178,10 @@ xfs_iread_extents(
* Root level must use BMAP_BROOT_PTR_ADDR macro to get ptr out.
*/
level = be16_to_cpu(block->bb_level);
- ASSERT(level > 0);
+ if (unlikely(level == 0)) {
+ XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+ return -EFSCORRUPTED;
+ }
pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes);
bno = be64_to_cpu(*pp);
--
2.18.0
When a pin is active-low, logical trigger edge should be inverted to match
the same interrupt opportunity.
For example, a button pushed triggers falling edge in ACTIVE_HIGH case; in
ACTIVE_LOW case, the button pushed triggers rising edge. For user space the
IRQ requesting doesn't need to do any modification except to configuring
GPIOHANDLE_REQUEST_ACTIVE_LOW.
For example, we want to catch the event when the button is pushed. The
button on the original board drives level to be low when it is pushed, and
drives level to be high when it is released.
In user space we can do:
req.handleflags = GPIOHANDLE_REQUEST_INPUT;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
while (1) {
read(fd, &dat, sizeof(dat));
if (dat.id == GPIOEVENT_EVENT_FALLING_EDGE)
printf("button pushed\n");
}
Run the same logic on another board which the polarity of the button is
inverted; it drives level to be high when pushed, and level to be low when
released. For this inversion we add flag GPIOHANDLE_REQUEST_ACTIVE_LOW:
req.handleflags = GPIOHANDLE_REQUEST_INPUT |
GPIOHANDLE_REQUEST_ACTIVE_LOW;
req.eventflags = GPIOEVENT_REQUEST_FALLING_EDGE;
At the result, there are no any events caught when the button is pushed.
By the way, button releasing will emit a "falling" event. The timing of
"falling" catching is not expected.
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Wu <michael.wu(a)vatics.com>
---
Changes from v1:
- Correct undeclared 'IRQ_TRIGGER_RISING'
- Add an example to descibe the issue
---
drivers/gpio/gpiolib.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index e013d417a936..9c9597f929d7 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -956,9 +956,11 @@ static int lineevent_create(struct gpio_device *gdev, void __user *ip)
}
if (eflags & GPIOEVENT_REQUEST_RISING_EDGE)
- irqflags |= IRQF_TRIGGER_RISING;
+ irqflags |= test_bit(FLAG_ACTIVE_LOW, &desc->flags) ?
+ IRQF_TRIGGER_FALLING : IRQF_TRIGGER_RISING;
if (eflags & GPIOEVENT_REQUEST_FALLING_EDGE)
- irqflags |= IRQF_TRIGGER_FALLING;
+ irqflags |= test_bit(FLAG_ACTIVE_LOW, &desc->flags) ?
+ IRQF_TRIGGER_RISING : IRQF_TRIGGER_FALLING;
irqflags |= IRQF_ONESHOT;
INIT_KFIFO(le->events);
--
2.17.1
objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file
as we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option. According
to Andrey Ryabinin, that option is not even needed, dropping it here
fixes the stackprotector issue.
Fixes: d08965a27e84 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
v2:
- drop -fno-conserve-stack
- fix the Fixes: line
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Makefile b/lib/Makefile
index 095601ce371d..29c02a924973 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
--
2.20.0
objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:
stack protector:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled
stackleak plugin:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled
kasan:
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled
The stackleak and kasan options just need to be disabled for this file
as we do for other files already. For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option, so we need to
test them separately.
Fixes: 42440c1f9911 ("lib/ubsan: add type mismatch handler for new GCC/Clang")
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
lib/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/lib/Makefile b/lib/Makefile
index 095601ce371d..320e3b632dd3 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -279,7 +279,8 @@ obj-$(CONFIG_UCS2_STRING) += ucs2_string.o
obj-$(CONFIG_UBSAN) += ubsan.o
UBSAN_SANITIZE_ubsan.o := n
-CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack -fno-stack-protector)
+KASAN_SANITIZE_ubsan.o := n
+CFLAGS_ubsan.o := $(call cc-option, -fno-conserve-stack) $(call cc-option, -fno-stack-protector) $(DISABLE_STACKLEAK_PLUGIN)
obj-$(CONFIG_SBITMAP) += sbitmap.o
--
2.20.0
Shakeel Butt reported premature oom on kernel with
"cgroup_disable=memory" since mem_cgroup_is_root() returns false even
though memcg is actually NULL. The drop_caches is also broken.
It is because commit aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab()
calls in shrink_node()") removed the !memcg check before
!mem_cgroup_is_root(). And, surprisingly root memcg is allocated even
though memory cgroup is disabled by kernel boot parameter.
Add mem_cgroup_disabled() check to make reclaimer work as expected.
Fixes: aeed1d325d42 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Reported-by: Shakeel Butt <shakeelb(a)google.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: stable(a)vger.kernel.org 4.19+
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
---
mm/vmscan.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f8e3dcd..c10dc02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -684,7 +684,14 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid,
unsigned long ret, freed = 0;
struct shrinker *shrinker;
- if (!mem_cgroup_is_root(memcg))
+ /*
+ * The root memcg might be allocated even though memcg is disabled
+ * via "cgroup_disable=memory" boot parameter. This could make
+ * mem_cgroup_is_root() return false, then just run memcg slab
+ * shrink, but skip global shrink. This may result in premature
+ * oom.
+ */
+ if (!mem_cgroup_disabled() && !mem_cgroup_is_root(memcg))
return shrink_slab_memcg(gfp_mask, nid, memcg, priority);
if (!down_read_trylock(&shrinker_rwsem))
--
1.8.3.1
When a ZONE_DEVICE private page is freed, the page->mapping field can be
set. If this page is reused as an anonymous page, the previous value can
prevent the page from being inserted into the CPU's anon rmap table.
For example, when migrating a pte_none() page to device memory:
migrate_vma(ops, vma, start, end, src, dst, private)
migrate_vma_collect()
src[] = MIGRATE_PFN_MIGRATE
migrate_vma_prepare()
/* no page to lock or isolate so OK */
migrate_vma_unmap()
/* no page to unmap so OK */
ops->alloc_and_copy()
/* driver allocates ZONE_DEVICE page for dst[] */
migrate_vma_pages()
migrate_vma_insert_page()
page_add_new_anon_rmap()
__page_set_anon_rmap()
/* This check sees the page's stale mapping field */
if (PageAnon(page))
return
/* page->mapping is not updated */
The result is that the migration appears to succeed but a subsequent CPU
fault will be unable to migrate the page back to system memory or worse.
Clear the page->mapping field when freeing the ZONE_DEVICE page so stale
pointer data doesn't affect future page use.
Fixes: b7a523109fb5c9d2d6dd ("mm: don't clear ->mapping in hmm_devmem_free")
Cc: stable(a)vger.kernel.org
Signed-off-by: Ralph Campbell <rcampbell(a)nvidia.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)mellanox.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Jan Kara <jack(a)suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
---
kernel/memremap.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/kernel/memremap.c b/kernel/memremap.c
index bea6f887adad..98d04466dcde 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -408,6 +408,30 @@ void __put_devmap_managed_page(struct page *page)
mem_cgroup_uncharge(page);
+ /*
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
+ */
+ if (is_device_private_page(page))
+ page->mapping = NULL;
+
page->pgmap->ops->page_free(page);
} else if (!count)
__put_page(page);
--
2.20.1
From: Wanpeng Li <wanpengli(a)tencent.com>
The idea before commit 240c35a37 was that we have the following FPU states:
userspace (QEMU) guest
---------------------------------------------------------------------------
processor vcpu->arch.guest_fpu
>>> KVM_RUN: kvm_load_guest_fpu
vcpu->arch.user_fpu processor
>>> preempt out
vcpu->arch.user_fpu current->thread.fpu
>>> preempt in
vcpu->arch.user_fpu processor
>>> back to userspace
>>> kvm_put_guest_fpu
processor vcpu->arch.guest_fpu
---------------------------------------------------------------------------
With the new lazy model we want to get the state back to the processor
when schedule in from current->thread.fpu.
Reported-by: Thomas Lambertz <mail(a)thomaslambertz.de>
Reported-by: anthony <antdev66(a)gmail.com>
Tested-by: anthony <antdev66(a)gmail.com>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Radim Krčmář <rkrcmar(a)redhat.com>
Cc: Thomas Lambertz <mail(a)thomaslambertz.de>
Cc: anthony <antdev66(a)gmail.com>
Cc: stable(a)vger.kernel.org
Fixes: 5f409e20b (x86/fpu: Defer FPU state load until return to userspace)
Signed-off-by: Wanpeng Li <wanpengli(a)tencent.com>
---
arch/x86/kvm/x86.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf2afdf..bdcd250 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3306,6 +3306,10 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
kvm_x86_ops->vcpu_load(vcpu, cpu);
+ fpregs_assert_state_consistent();
+ if (test_thread_flag(TIF_NEED_FPU_LOAD))
+ switch_fpu_return();
+
/* Apply any externally detected TSC adjustments (due to suspend) */
if (unlikely(vcpu->arch.tsc_offset_adjustment)) {
adjust_tsc_offset_host(vcpu, vcpu->arch.tsc_offset_adjustment);
@@ -7990,9 +7994,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
trace_kvm_entry(vcpu->vcpu_id);
guest_enter_irqoff();
- fpregs_assert_state_consistent();
- if (test_thread_flag(TIF_NEED_FPU_LOAD))
- switch_fpu_return();
+ WARN_ON_ONCE(test_thread_flag(TIF_NEED_FPU_LOAD));
if (unlikely(vcpu->arch.switch_db_regs)) {
set_debugreg(0, 7);
--
2.7.4
From: "Gautham R. Shenoy" <ego(a)linux.vnet.ibm.com>
xive_find_target_in_mask() has the following for(;;) loop which has a
bug when @first == cpumask_first(@mask) and condition 1 fails to hold
for every CPU in @mask. In this case we loop forever in the for-loop.
first = cpu;
for (;;) {
if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
return cpu;
cpu = cpumask_next(cpu, mask);
if (cpu == first) // condition 2
break;
if (cpu >= nr_cpu_ids) // condition 3
cpu = cpumask_first(mask);
}
This is because, when @first == cpumask_first(@mask), we never hit the
condition 2 (cpu == first) since prior to this check, we would have
executed "cpu = cpumask_next(cpu, mask)" which will set the value of
@cpu to a value greater than @first or to nr_cpus_ids. When this is
coupled with the fact that condition 1 is not met, we will never exit
this loop.
This was discovered by the hard-lockup detector while running LTP test
concurrently with SMT switch tests.
watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
watchdog: CPU 68 Hard LOCKUP
watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
NIP: c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
REGS: c000201fff3c7d80 TRAP: 0100 Not tainted (4.18.0-100.el8.ppc64le)
MSR: 9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE> CR: 24028424 XER: 00000000
CFAR: c0000000006f558c IRQMASK: 1
GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
NIP [c0000000006f5578] find_next_bit+0x38/0x90
LR [c000000000cba9ec] cpumask_next+0x2c/0x50
Call Trace:
[c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
[c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
[c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
[c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
[c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
[c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
[c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
[c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
[c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
[c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
[c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
[c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
[c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
[c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
[c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
[c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
[c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
Instruction dump:
78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
To fix this, move the check for condition 2 after the check for
condition 3, so that we are able to break out of the loop soon after
iterating through all the CPUs in the @mask in the problem case. Use
do..while() to achieve this.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE
interrupt controller")
Cc: <stable(a)vger.kernel.org> # 4.12+
Reported-by: Indira P. Joga <indira.priya(a)in.ibm.com>
Signed-off-by: Gautham R. Shenoy <ego(a)linux.vnet.ibm.com>
---
arch/powerpc/sysdev/xive/common.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 082c7e1..1cdb395 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -479,7 +479,7 @@ static int xive_find_target_in_mask(const struct cpumask *mask,
* Now go through the entire mask until we find a valid
* target.
*/
- for (;;) {
+ do {
/*
* We re-check online as the fallback case passes us
* an untested affinity mask
@@ -487,12 +487,11 @@ static int xive_find_target_in_mask(const struct cpumask *mask,
if (cpu_online(cpu) && xive_try_pick_target(cpu))
return cpu;
cpu = cpumask_next(cpu, mask);
- if (cpu == first)
- break;
/* Wrap around */
if (cpu >= nr_cpu_ids)
cpu = cpumask_first(mask);
- }
+ } while (cpu != first);
+
return -1;
}
--
1.9.4
From: Yingying Tang <yintang(a)codeaurora.org>
[ Upstream commit 9e7251fa38978b85108c44743e1436d48e8d0d76 ]
tx_stats will be freed and set to NULL before debugfs_sta node is
removed in station disconnetion process. So if read the debugfs_sta
node there may be NULL pointer error. Add check for tx_stats before
use it to resove this issue.
Signed-off-by: Yingying Tang <yintang(a)codeaurora.org>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/net/wireless/ath/ath10k/debugfs_sta.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/debugfs_sta.c b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
index c704ae371c4d..42931a669b02 100644
--- a/drivers/net/wireless/ath/ath10k/debugfs_sta.c
+++ b/drivers/net/wireless/ath/ath10k/debugfs_sta.c
@@ -663,6 +663,13 @@ static ssize_t ath10k_dbg_sta_dump_tx_stats(struct file *file,
mutex_lock(&ar->conf_mutex);
+ if (!arsta->tx_stats) {
+ ath10k_warn(ar, "failed to get tx stats");
+ mutex_unlock(&ar->conf_mutex);
+ kfree(buf);
+ return 0;
+ }
+
spin_lock_bh(&ar->data_lock);
for (k = 0; k < ATH10K_STATS_TYPE_MAX; k++) {
for (j = 0; j < ATH10K_COUNTER_TYPE_MAX; j++) {
--
2.20.1
Hmm. I just realized when I saw Sasha's autoselect patches flying by
that the floppy ioctl fixes didn't get marked for stable, but they
probably should be.
There's four commits:
da99466ac243 floppy: fix out-of-bounds read in copy_buffer
9b04609b7840 floppy: fix invalid pointer dereference in drive_name
5635f897ed83 floppy: fix out-of-bounds read in next_valid_format
f3554aeb9912 floppy: fix div-by-zero in setup_format_params
that look like stable material - even if I sincerely hope that the
floppy driver isn't critical for anybody.
I leave it to the stable people to decide if they care. I don't think
the hardware matters any more, but I could imagine that people still
use it for some virtual images and have a floppy device inside a VM
for that reason.
Linus
Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen
but advertise a landscape resolution and pitch, resulting in a messed
up display if we try to show anything on the efifb (because of the wrong
pitch).
This commit fixes this by adding a new DMI match table for devices which
need to have their width and height swapped.
At first I tried to use the existing table for overriding some of the
efifb parameters, but some of the affected devices have variants with
different LCD resolutions which will not work with hardcoded override
values.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1730783
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
arch/x86/kernel/sysfb_efi.c | 45 +++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/arch/x86/kernel/sysfb_efi.c b/arch/x86/kernel/sysfb_efi.c
index 8eb67a670b10..80d5b6720a87 100644
--- a/arch/x86/kernel/sysfb_efi.c
+++ b/arch/x86/kernel/sysfb_efi.c
@@ -230,9 +230,54 @@ static const struct dmi_system_id efifb_dmi_system_table[] __initconst = {
{},
};
+/*
+ * Some devices have a portrait LCD but advertise a landscape resolution (and
+ * pitch). We simply swap width and height for these devices so that we can
+ * correctly deal with some of them coming with multiple resolutions.
+ */
+static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = {
+ {
+ /*
+ * Lenovo MIIX310-10ICR, only some batches have the troublesome
+ * 800x1280 portrait screen. Luckily the portrait version has
+ * its own BIOS version, so we match on that.
+ */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "MIIX 310-10ICR"),
+ DMI_EXACT_MATCH(DMI_BIOS_VERSION, "1HCN44WW"),
+ },
+ },
+ {
+ /* Lenovo MIIX 320-10ICR with 800x1280 portrait screen */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_VERSION,
+ "Lenovo MIIX 320-10ICR"),
+ },
+ },
+ {
+ /* Lenovo D330 with 800x1280 or 1200x1920 portrait screen */
+ .matches = {
+ DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_EXACT_MATCH(DMI_PRODUCT_VERSION,
+ "Lenovo ideapad D330-10IGM"),
+ },
+ },
+ {},
+};
+
__init void sysfb_apply_efi_quirks(void)
{
if (screen_info.orig_video_isVGA != VIDEO_TYPE_EFI ||
!(screen_info.capabilities & VIDEO_CAPABILITY_SKIP_QUIRKS))
dmi_check_system(efifb_dmi_system_table);
+
+ if (screen_info.orig_video_isVGA == VIDEO_TYPE_EFI &&
+ dmi_check_system(efifb_dmi_swap_width_height)) {
+ u16 temp = screen_info.lfb_width;
+ screen_info.lfb_width = screen_info.lfb_height;
+ screen_info.lfb_height = temp;
+ screen_info.lfb_linelength = 4 * screen_info.lfb_width;
+ }
}
--
2.21.0
In Resize BAR control register, bits[8:12] represents size of BAR.
As per PCIe specification, below is encoded values in register bits
to actual BAR size table:
Bits BAR size
0 1 MB
1 2 MB
2 4 MB
3 8 MB
--
For 1 MB BAR size, BAR size bits should be set to 0 but incorrectly
these bits are set to "1f".
Latest megaraid_sas and mpt3sas adapters which support Resizable BAR
with 1 MB BAR size fails to initialize during system resume from S3 sleep.
Fix: Correctly set BAR size bits to "0" for 1MB BAR size.
CC: stable(a)vger.kernel.org # v4.16+
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=203939
Fixes: d3252ace0bc652a1a244455556b6a549f969bf99 ("PCI: Restore resized BAR state on resume")
Signed-off-by: Sumit Saxena <sumit.saxena(a)broadcom.com>
---
drivers/pci/pci.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8abc843..b651f32 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1417,12 +1417,13 @@ static void pci_restore_rebar_state(struct pci_dev *pdev)
for (i = 0; i < nbars; i++, pos += 8) {
struct resource *res;
- int bar_idx, size;
+ int bar_idx, size, order;
pci_read_config_dword(pdev, pos + PCI_REBAR_CTRL, &ctrl);
bar_idx = ctrl & PCI_REBAR_CTRL_BAR_IDX;
res = pdev->resource + bar_idx;
- size = order_base_2((resource_size(res) >> 20) | 1) - 1;
+ order = order_base_2((resource_size(res) >> 20) | 1);
+ size = order ? order - 1 : 0;
ctrl &= ~PCI_REBAR_CTRL_BAR_SIZE;
ctrl |= size << PCI_REBAR_CTRL_BAR_SHIFT;
pci_write_config_dword(pdev, pos + PCI_REBAR_CTRL, ctrl);
--
1.8.3.1