When RPMB was converted to a character device, it added support for
multiple RPMB partitions (Commit 97548575bef3 ("mmc: block: Convert RPMB
to a character device").
One of the changes in this commit was transforming the variable
target_part defined in __mmc_blk_ioctl_cmd into a bitmask.
This inadvertedly regressed the validation check done in
mmc_blk_part_switch_pre() and mmc_blk_part_switch_post().
This commit fixes that regression.
Fixes: 97548575bef3 ("mmc: block: Convert RPMB to a character device")
Signed-off-by: Jorge Ramirez-Ortiz <jorge(a)foundries.io>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: <stable(a)vger.kernel.org> # v4.14+
---
v2:
fixes parenthesis around condition
v3:
adds stable to commit header
v4:
fixes the stable version to v4.14
adds Reviewed-by
drivers/mmc/core/block.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 152dfe593c43..13093d26bf81 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -851,9 +851,10 @@ static const struct block_device_operations mmc_bdops = {
static int mmc_blk_part_switch_pre(struct mmc_card *card,
unsigned int part_type)
{
+ const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB;
int ret = 0;
- if (part_type == EXT_CSD_PART_CONFIG_ACC_RPMB) {
+ if ((part_type & mask) == mask) {
if (card->ext_csd.cmdq_en) {
ret = mmc_cmdq_disable(card);
if (ret)
@@ -868,9 +869,10 @@ static int mmc_blk_part_switch_pre(struct mmc_card *card,
static int mmc_blk_part_switch_post(struct mmc_card *card,
unsigned int part_type)
{
+ const unsigned int mask = EXT_CSD_PART_CONFIG_ACC_RPMB;
int ret = 0;
- if (part_type == EXT_CSD_PART_CONFIG_ACC_RPMB) {
+ if ((part_type & mask) == mask) {
mmc_retune_unpause(card->host);
if (card->reenable_cmdq && !card->ext_csd.cmdq_en)
ret = mmc_cmdq_enable(card);
@@ -3143,4 +3145,3 @@ module_exit(mmc_blk_exit);
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Multimedia Card (MMC) block device driver");
-
--
2.34.1
Hi,
On 2023-12-01 08:31:48 +0000, Zhang, Rui wrote:
> As a quick fix, I'm not going to fix the "potential issue" describes
> above because we have not seen a real problem caused by this yet.
>
> Can you please try the below patch to confirm if the problem is gone on
> your system?
> This patch falls back to the previous way as sent at
> https://lore.kernel.org/lkml/87pm4bp54z.ffs@tglx/T/
I've just spent a couple hours bisecting why upgrading to 6.7-rc4 left me with
just a single CPU core on my dual socket workstation.
before:
[ 0.000000] Linux version 6.6.0-andres-00003-g31255e072b2e ...
...
[ 0.022960] ACPI: Using ACPI (MADT) for SMP configuration information
...
[ 0.022968] smpboot: Allowing 40 CPUs, 0 hotplug CPUs
...
[ 0.345921] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
...
[ 0.347229] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 #8 #9
[ 0.349082] .... node #1, CPUs: #10 #11 #12 #13 #14 #15 #16 #17 #18 #19
[ 0.003190] smpboot: CPU 10 Converting physical 0 to logical die 1
[ 0.361053] .... node #0, CPUs: #20 #21 #22 #23 #24 #25 #26 #27 #28 #29
[ 0.363990] .... node #1, CPUs: #30 #31 #32 #33 #34 #35 #36 #37 #38 #39
...
[ 0.370886] smp: Brought up 2 nodes, 40 CPUs
[ 0.370891] smpboot: Max logical packages: 2
[ 0.370896] smpboot: Total of 40 processors activated (200000.00 BogoMIPS)
[ 0.403905] node 0 deferred pages initialised in 32ms
[ 0.408865] node 1 deferred pages initialised in 37ms
after:
[ 0.000000] Linux version 6.6.0-andres-00004-gec9aedb2aa1a ...
...
[ 0.022935] ACPI: Using ACPI (MADT) for SMP configuration information
...
[ 0.022942] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
...
[ 0.356424] smpboot: CPU0: Intel(R) Xeon(R) Gold 5215 CPU @ 2.50GHz (family: 0x6, model: 0x55, stepping: 0x7)
...
[ 0.357098] smp: Bringing up secondary CPUs ...
[ 0.357107] smp: Brought up 2 nodes, 1 CPU
[ 0.357108] smpboot: Max logical packages: 1
[ 0.357110] smpboot: Total of 1 processors activated (5000.00 BogoMIPS)
[ 0.726283] node 0 deferred pages initialised in 368ms
[ 0.774704] node 1 deferred pages initialised in 418ms
There does seem to be something off with the ACPI data, when booting without
the patch, I do see messages like:
[ 0.715228] APIC: NR_CPUS/possible_cpus limit of 40 reached. Processor 40/0x7f00 ignored.
[ 0.715231] ACPI: Unable to map lapic to logical cpu number
But other than that, the system has worked for a couple years.
It's obviously not good to regress from 2x10/20 cores/threads to a single
core. I guess it's at least somewhat funny to imagine a 2 socket system with
a single core...
It seems particularly worrying that this patch has apparently been selected
for -stable:
https://lore.kernel.org/all/20231122153212.852040-2-sashal@kernel.org/
Even if it didn't have these unintended consequences, it seems like a commit
like this hardly is -stable material?
I've attached .config, dmesg of a boot with gec9aedb2aa1a and one with
gec9aedb2aa1a^.
Greetings,
Andres Freund
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Since 64 bit cmpxchg() is very expensive on 32bit architectures, the
timestamp used by the ring buffer does some interesting tricks to be able
to still have an atomic 64 bit number. It originally just used 60 bits and
broke it up into two 32 bit words where the extra 2 bits were used for
synchronization. But this was not enough for all use cases, and all 64
bits were required.
The 32bit version of the ring buffer timestamp was then broken up into 3
32bit words using the same counter trick. But one update was not done. The
check to see if the read operation was done without interruption only
checked the first two words and not last one (like it had before this
update). Fix it by making sure all three updates happen without
interruption by comparing the initial counter with the last updated
counter.
Link: https://lore.kernel.org/linux-trace-kernel/20231206100050.3100b7bb@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: f03f2abce4f39 ("ring-buffer: Have 32 bit time stamps use all 64 bits")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index a6da2d765c78..8d2a4f00eca9 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -644,8 +644,8 @@ static inline bool __rb_time_read(rb_time_t *t, u64 *ret, unsigned long *cnt)
*cnt = rb_time_cnt(top);
- /* If top and bottom counts don't match, this interrupted a write */
- if (*cnt != rb_time_cnt(bottom))
+ /* If top and msb counts don't match, this interrupted a write */
+ if (*cnt != rb_time_cnt(msb))
return false;
/* The shift to msb will lose its cnt bits */
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
There's a race where if an event is discarded from the ring buffer and an
interrupt were to happen at that time and insert an event, the time stamp
is still used from the discarded event as an offset. This can screw up the
timings.
If the event is going to be discarded, set the "before_stamp" to zero.
When a new event comes in, it compares the "before_stamp" with the
"write_stamp" and if they are not equal, it will insert an absolute
timestamp. This will prevent the timings from getting out of sync due to
the discarded event.
Link: https://lore.kernel.org/linux-trace-kernel/20231206100244.5130f9b3@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 6f6be606e763f ("ring-buffer: Force before_stamp and write_stamp to be different on discard")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 19 ++++++++-----------
1 file changed, 8 insertions(+), 11 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 43cc47d7faaf..a6da2d765c78 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -3030,22 +3030,19 @@ rb_try_to_discard(struct ring_buffer_per_cpu *cpu_buffer,
local_read(&bpage->write) & ~RB_WRITE_MASK;
unsigned long event_length = rb_event_length(event);
+ /*
+ * For the before_stamp to be different than the write_stamp
+ * to make sure that the next event adds an absolute
+ * value and does not rely on the saved write stamp, which
+ * is now going to be bogus.
+ */
+ rb_time_set(&cpu_buffer->before_stamp, 0);
+
/* Something came in, can't discard */
if (!rb_time_cmpxchg(&cpu_buffer->write_stamp,
write_stamp, write_stamp - delta))
return false;
- /*
- * It's possible that the event time delta is zero
- * (has the same time stamp as the previous event)
- * in which case write_stamp and before_stamp could
- * be the same. In such a case, force before_stamp
- * to be different than write_stamp. It doesn't
- * matter what it is, as long as its different.
- */
- if (!delta)
- rb_time_set(&cpu_buffer->before_stamp, 0);
-
/*
* If an event were to come in now, it would see that the
* write_stamp and the before_stamp are different, and assume
--
2.42.0
From: Petr Pavlu <petr.pavlu(a)suse.com>
Function trace_buffered_event_disable() is responsible for freeing pages
backing buffered events and this process can run concurrently with
trace_event_buffer_lock_reserve().
The following race is currently possible:
* Function trace_buffered_event_disable() is called on CPU 0. It
increments trace_buffered_event_cnt on each CPU and waits via
synchronize_rcu() for each user of trace_buffered_event to complete.
* After synchronize_rcu() is finished, function
trace_buffered_event_disable() has the exclusive access to
trace_buffered_event. All counters trace_buffered_event_cnt are at 1
and all pointers trace_buffered_event are still valid.
* At this point, on a different CPU 1, the execution reaches
trace_event_buffer_lock_reserve(). The function calls
preempt_disable_notrace() and only now enters an RCU read-side
critical section. The function proceeds and reads a still valid
pointer from trace_buffered_event[CPU1] into the local variable
"entry". However, it doesn't yet read trace_buffered_event_cnt[CPU1]
which happens later.
* Function trace_buffered_event_disable() continues. It frees
trace_buffered_event[CPU1] and decrements
trace_buffered_event_cnt[CPU1] back to 0.
* Function trace_event_buffer_lock_reserve() continues. It reads and
increments trace_buffered_event_cnt[CPU1] from 0 to 1. This makes it
believe that it can use the "entry" that it already obtained but the
pointer is now invalid and any access results in a use-after-free.
Fix the problem by making a second synchronize_rcu() call after all
trace_buffered_event values are set to NULL. This waits on all potential
users in trace_event_buffer_lock_reserve() that still read a previous
pointer from trace_buffered_event.
Link: https://lore.kernel.org/all/20231127151248.7232-2-petr.pavlu@suse.com/
Link: https://lkml.kernel.org/r/20231205161736.19663-4-petr.pavlu@suse.com
Cc: stable(a)vger.kernel.org
Fixes: 0fc1b09ff1ff ("tracing: Use temp buffer when filtering events")
Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index ef72354f61ce..fbcd3bafb93e 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -2791,13 +2791,17 @@ void trace_buffered_event_disable(void)
free_page((unsigned long)per_cpu(trace_buffered_event, cpu));
per_cpu(trace_buffered_event, cpu) = NULL;
}
+
/*
- * Make sure trace_buffered_event is NULL before clearing
- * trace_buffered_event_cnt.
+ * Wait for all CPUs that potentially started checking if they can use
+ * their event buffer only after the previous synchronize_rcu() call and
+ * they still read a valid pointer from trace_buffered_event. It must be
+ * ensured they don't see cleared trace_buffered_event_cnt else they
+ * could wrongly decide to use the pointed-to buffer which is now freed.
*/
- smp_wmb();
+ synchronize_rcu();
- /* Do the work on each cpu */
+ /* For each CPU, relinquish the buffer */
on_each_cpu_mask(tracing_buffer_mask, enable_trace_buffered_event, NULL,
true);
}
--
2.42.0