The following problem exists since x2avic was enabled in the KVM:
svm_set_x2apic_msr_interception is called to enable the interception of
the x2apic msrs.
In particular it is called at the moment the guest resets its apic.
Assuming that the guest's apic was in x2apic mode, the reset will bring
it back to the xapic mode.
The svm_set_x2apic_msr_interception however has an erroneous check for
'!apic_x2apic_mode()' which prevents it from doing anything in this case.
As a result of this, all x2apic msrs are left unintercepted, and that
exposes the bare metal x2apic (if enabled) to the guest.
Oops.
Remove the erroneous '!apic_x2apic_mode()' check to fix that.
This fixes CVE-2023-5090
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
---
arch/x86/kvm/svm/svm.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9507df93f410a63..acdd0b89e4715a3 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -913,8 +913,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;
- if (!x2avic_enabled ||
- !apic_x2apic_mode(svm->vcpu.arch.apic))
+ if (!x2avic_enabled)
return;
for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
--
2.26.3
From: Catalin Marinas <catalin.marinas(a)arm.com>
Since the actual slab freeing is deferred when calling kvfree_rcu(), so
is the kmemleak_free() callback informing kmemleak of the object
deletion. From the perspective of the kvfree_rcu() caller, the object is
freed and it may remove any references to it. Since kmemleak does not
scan RCU internal data storing the pointer, it will report such objects
as leaks during the grace period.
Tell kmemleak to ignore such objects on the kvfree_call_rcu() path. Note
that the tiny RCU implementation does not have such issue since the
objects can be tracked from the rcu_ctrlblk structure.
Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com>
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://lore.kernel.org/all/F903A825-F05F-4B77-A2B5-7356282FBA2C@apple.com/
Cc: <stable(a)vger.kernel.org>
Tested-by: Christoph Paasch <cpaasch(a)apple.com>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
kernel/rcu/tree.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index cb1caefa8bd0..24423877962c 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -31,6 +31,7 @@
#include <linux/bitops.h>
#include <linux/export.h>
#include <linux/completion.h>
+#include <linux/kmemleak.h>
#include <linux/moduleparam.h>
#include <linux/panic.h>
#include <linux/panic_notifier.h>
@@ -3388,6 +3389,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
success = true;
}
+ /*
+ * The kvfree_rcu() caller considers the pointer freed at this point
+ * and likely removes any references to it. Since the actual slab
+ * freeing (and kmemleak_free()) is deferred, tell kmemleak to ignore
+ * this object (no scanning or false positives reporting).
+ */
+ kmemleak_ignore(ptr);
+
// Set timer to drain after KFREE_DRAIN_JIFFIES.
if (rcu_scheduler_active == RCU_SCHEDULER_RUNNING)
schedule_delayed_monitor_work(krcp);
--
2.42.0.582.g8ccd20d70d-goog
Hi Thinh,
I can confirm your patch fixed the issue on RK3399 when I was running
on Linux 6.1.54.
I'm not on the ML for this so I'm sorry if this email causes any issue
as I'm not sure how to reply to a thread from a ML I am not on.
Best,
Da
Like the Lenovo 82TL, 82V2, 82QF and 82UG, the 82YM (Yoga 7 14ARP8)
requires an entry in the quirk list to enable the internal microphone.
The latter two received similar fixes in commit 1263cc0f414d
("ASoC: amd: yc: Fix non-functional mic on Lenovo 82QF and 82UG").
Fixes: c008323fe361 ("ASoC: amd: yc: Fix a non-functional mic on Lenovo 82SJ")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sven Frotscher <sven.frotscher(a)gmail.com>
---
v3->v4 changes:
* re-add blank line between commit message title and body
---
v2->v3 changes:
* add message title of referenced commit to commit message
* make whitespace consistent with surrounding code
* use a patch-friendly e-mail client
---
v1->v2 changes:
* add Fixes and Cc tags to commit message
* remove redundant LKML link from commit message
* fix mangled diff
---
sound/soc/amd/yc/acp6x-mach.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/sound/soc/amd/yc/acp6x-mach.c b/sound/soc/amd/yc/acp6x-mach.c
index 94e9eb8e73f2..15a864dcd7bd 100644
--- a/sound/soc/amd/yc/acp6x-mach.c
+++ b/sound/soc/amd/yc/acp6x-mach.c
@@ -241,6 +241,13 @@ static const struct dmi_system_id yc_acp_quirk_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "82V2"),
}
},
+ {
+ .driver_data = &acp6x_card,
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "82YM"),
+ }
+ },
{
.driver_data = &acp6x_card,
.matches = {
--
2.42.0
Upstream commit 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with
a soft reset on Renesas SoCs") fixes an issue with Renesas own SJA1000
CAN controller reception: the Rx buffer is only 5 messages long, so when
the bus loaded (eg. a message every 50us), overrun may easily
happen. Upon an overrun situation, due to a possible internal crosstalk
situation, the controller enters a frozen state which only can be
unlocked with a soft reset (experimentally). The solution was to offload
a call to sja1000_start() in a threaded handler. This needs to happen in
process context as this operation requires to sleep. sja1000_start()
basically enters "reset mode", performs a proper software reset and
returns back into "normal mode".
Since this fix was introduced, we no longer observe any stalls in
reception. However it was sporadically observed that the transmit path
would now freeze. Further investigation blamed the fix mentioned above,
and especially the reset operation. Reproducing the reset in a loop
helped identifying what could possibly go wrong. The sja1000 is a single
Tx queue device, which leverages the netdev helpers to process one Tx
message at a time. The logic is: the queue is stopped, the message sent
to the transceiver, once properly transmitted the controller sets a
status bit which triggers an interrupt, in the interrupt handler the
transmission status is checked and the queue woken up. Unfortunately, if
an overrun happens, we might perform the soft reset precisely between
the transmission of the buffer to the transceiver and the advent of the
transmission status bit. We would then stop the transmission operation
without re-enabling the queue, leading to all further transmissions to
be ignored.
The reset interrupt can only happen while the device is "open", and
after a reset we anyway want to resume normal operations, no matter if a
packet to transmit got dropped in the process, so we shall wake up the
queue. Restarting the device and waking-up the queue is exactly what
sja1000_set_mode(CAN_MODE_START) does. In order to be consistent about
the queue state, we must acquire a lock both in the reset handler and in
the transmit path to ensure serialization of both operations. As the
reset handler might still be called after the transmission of a frame to
the transceiver but before it actually gets transmitted, we must ensure
we don't leak the skb, so we free it (the behavior is consistent, no
matter if there was an skb on the stack or not).
Fixes: 717c6ec241b5 ("can: sja1000: Prevent overrun stalls with a soft reset on Renesas SoCs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Changes in v2:
* As Marc sugested, use netif_tx_{,un}lock() instead of our own
spin_lock.
drivers/net/can/sja1000/sja1000.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/sja1000/sja1000.c b/drivers/net/can/sja1000/sja1000.c
index ae47fc72aa96..91e3fb3eed20 100644
--- a/drivers/net/can/sja1000/sja1000.c
+++ b/drivers/net/can/sja1000/sja1000.c
@@ -297,6 +297,7 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
if (can_dropped_invalid_skb(dev, skb))
return NETDEV_TX_OK;
+ netif_tx_lock(dev);
netif_stop_queue(dev);
fi = dlc = cf->can_dlc;
@@ -335,6 +336,8 @@ static netdev_tx_t sja1000_start_xmit(struct sk_buff *skb,
sja1000_write_cmdreg(priv, cmd_reg_val);
+ netif_tx_unlock(dev);
+
return NETDEV_TX_OK;
}
@@ -396,7 +399,13 @@ static irqreturn_t sja1000_reset_interrupt(int irq, void *dev_id)
struct net_device *dev = (struct net_device *)dev_id;
netdev_dbg(dev, "performing a soft reset upon overrun\n");
- sja1000_start(dev);
+
+ netif_tx_lock(dev);
+
+ can_free_echo_skb(dev, 0);
+ sja1000_set_mode(dev, CAN_MODE_START);
+
+ netif_tx_unlock(dev);
return IRQ_HANDLED;
}
--
2.34.1
Fix four issues with resctrl selftests.
The signal handling fix became necessary after the mount/umount fixes
and the uninitialized member bug was discovered during the review.
The other two came up when I ran resctrl selftests across the server
fleet in our lab to validate the upcoming CAT test rewrite (the rewrite
is not part of this series).
These are developed and should apply cleanly at least on top the
benchmark cleanup series (might apply cleanly also w/o the benchmark
series, I didn't test).
v3:
- Add fix to uninitialized sa_flags
- Handle ksft_exit_fail_msg() in per test functions
- Make signal handler register fails to also exit
- Improve changelogs
v2:
- Include patch to move _GNU_SOURCE to Makefile to allow normal #include
placement
- Rework the signal register/unregister into patch to use helpers
- Fixed incorrect function parameter description
- Use return !!res to avoid confusing implicit boolean conversion
- Improve MBA/MBM success bound patch's changelog
- Tweak Cc: stable dependencies (make it a chain).
Ilpo Järvinen (7):
selftests/resctrl: Fix uninitialized .sa_flags
selftests/resctrl: Extend signal handler coverage to unmount on
receiving signal
selftests/resctrl: Remove duplicate feature check from CMT test
selftests/resctrl: Move _GNU_SOURCE define into Makefile
selftests/resctrl: Refactor feature check to use resource and feature
name
selftests/resctrl: Fix feature checks
selftests/resctrl: Reduce failures due to outliers in MBA/MBM tests
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cat_test.c | 8 --
tools/testing/selftests/resctrl/cmt_test.c | 3 -
tools/testing/selftests/resctrl/mba_test.c | 2 +-
tools/testing/selftests/resctrl/mbm_test.c | 2 +-
tools/testing/selftests/resctrl/resctrl.h | 7 +-
.../testing/selftests/resctrl/resctrl_tests.c | 82 ++++++++++++-------
tools/testing/selftests/resctrl/resctrl_val.c | 24 +++---
tools/testing/selftests/resctrl/resctrlfs.c | 69 ++++++----------
9 files changed, 96 insertions(+), 103 deletions(-)
--
2.30.2
During the error path, the vma iterator may not be correctly positioned
or set to the correct range. Undo the vma_prev() call by resetting to
the passed in address. Re-walking to the same range will fix the range
to the area previously passed in.
Users would notice increased cycles as vma_merge() would be called an
extra time with vma == prev, and thus would fail to merge and return.
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.")
Cc: stable(a)vger.kernel.org
Cc: Jann Horn <jannh(a)google.com>
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
mm/mmap.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index b56a7f0c9f85..acb7dea49e23 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -975,7 +975,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
/* Error in anon_vma clone. */
if (err)
- return NULL;
+ goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end)
vma_expanded = true;
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
}
if (vma_iter_prealloc(vmi, vma))
- return NULL;
+ goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
@@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_complete(&vp, vmi, mm);
khugepaged_enter_vma(res, vm_flags);
return res;
+
+prealloc_fail:
+anon_vma_fail:
+ vma_iter_set(vmi, addr);
+ vma_iter_load(vmi);
+ return NULL;
}
/*
--
2.40.1
commit 0bdf399342c5 ("net: Avoid address overwrite in kernel_connect")
ensured that kernel_connect() will not overwrite the address parameter
in cases where BPF connect hooks perform an address rewrite. This change
replaces direct calls to sock->ops->connect() in net with kernel_connect()
to make these call safe.
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Cc: stable(a)vger.kernel.org
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: Jordan Rife <jrife(a)google.com>
---
v4->v5: Remove non-net changes.
v3->v4: Remove precondition check for addrlen.
v2->v3: Add "Fixes" tag. Check for positivity in addrlen sanity check.
v1->v2: Split up original patch into patch series. Insulate calls with
kernel_connect() instead of pushing address copy deeper into
sock->ops->connect().
net/netfilter/ipvs/ip_vs_sync.c | 4 ++--
net/rds/tcp_connect.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c
index da5af28ff57b5..6e4ed1e11a3b7 100644
--- a/net/netfilter/ipvs/ip_vs_sync.c
+++ b/net/netfilter/ipvs/ip_vs_sync.c
@@ -1505,8 +1505,8 @@ static int make_send_sock(struct netns_ipvs *ipvs, int id,
}
get_mcast_sockaddr(&mcast_addr, &salen, &ipvs->mcfg, id);
- result = sock->ops->connect(sock, (struct sockaddr *) &mcast_addr,
- salen, 0);
+ result = kernel_connect(sock, (struct sockaddr *)&mcast_addr,
+ salen, 0);
if (result < 0) {
pr_err("Error connecting to the multicast addr\n");
goto error;
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index f0c477c5d1db4..d788c6d28986f 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -173,7 +173,7 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp)
* own the socket
*/
rds_tcp_set_callbacks(sock, cp);
- ret = sock->ops->connect(sock, addr, addrlen, O_NONBLOCK);
+ ret = kernel_connect(sock, addr, addrlen, O_NONBLOCK);
rdsdebug("connect to address %pI6c returned %d\n", &conn->c_faddr, ret);
if (ret == -EINPROGRESS)
--
2.42.0.515.g380fc7ccd1-goog
Hello guys,
During the last few months, I felt a performance regression when using
read() and write() on my high-speed Nvme SSD (about 7GB/s).
To get more precise information about it I quickly developed benchmark
tool basically running read() or write() in a loop to simulate a
sequential file read or write. The tool also measures the real time
consumed by the loop. Finally, the tool can call open() with or without
O_DIRECT.
I ran the tests on EXT4 and Exfat with following settings (buffer
values have been set for best result):
- Write settings: buffer 400mb * 100
- Read settings: buffer 200mb
- Drop caches before non-direct read/write test
With this hardware:
- CPU AMD Ryzen 7600X
- RAM DDR5 5200 32GB
- SSD Kingston Fury Renegade 4TB with 4K LBA
Here are some results I got with last upstream kernels (default
config):
+------------------+----------+------------------+------------------+--
----------------+------------------+------------------+
| ~42GB | O_DIRECT | Linux 6.2.0 | Linux 6.3.0 |
Linux 6.4.0 | Linux 6.5.0 | Linux 6.5.5 |
+------------------+----------+------------------+------------------+--
----------------+------------------+------------------+
| Ext4 (sector 4k) | | | |
| | |
| Read | no | 7.2s (5800MB/s) | 7.1s (5890MB/s) |
8.3s (5050MB/s) | 13.2s (3180MB/s) | 13.2s (3180MB/s) |
| Write | no | 12.0s (3500MB/s) | 12.6s (3340MB/s) |
12.2s (3440MB/s) | 28.9s (1450MB/s) | 28.9s (1450MB/s) |
| Read | yes | 6.0s (7000MB/s) | 6.0s (7020MB/s) |
5.9s (7170MB/s) | 5.9s (7100MB/s) | 5.9s (7100MB/s) |
| Write | yes | 6.7s (6220MB/s) | 6.7s (6290MB/s) |
6.9s (6080MB/s) | 6.9s (6080MB/s) | 6.9s (6970MB/s) |
| Exfat (sector ?) | | | |
| | |
| Read | no | 7.3s (5770MB/s) | 7.2s (5830MB/s) |
9s (4620MB/s) | 13.3s (3150MB/s) | 13.2s (3180MB/s) |
| Write | no | 8.3s (5040MB/s) | 8.9s (4750MB/s) |
8.3s (5040MB/s) | 18.3s (2290MB/s) | 18.5s (2260MB/s) |
| Read | yes | 6.2s (6760MB/s) | 6.1s (6870MB/s) |
6.0s (6980MB/s) | 6.5s (6440MB/s) | 6.6s (6320MB/s) |
| Write | yes | 16.1s (2610MB/s) | 16.0s (2620MB/s) |
18.7s (2240MB/s) | 34.1s (1230MB/s) | 34.5s (1220MB/s) |
+------------------+----------+------------------+------------------+--
----------------+------------------+------------------+
Please note that I rounded some values to clarify readiness. Small
variations can be considered as margin error.
Ext4 results: cached reads/writes time have increased of almost 100%
from 6.2.0 to 6.5.0 with a first increase with 6.4.0. Direct access
times have stayed similar though.
Exfat results: performance decrease too with and without direct access
this time.
I realize there are thousands of commits between, plus the issue can
come from multiple kernel parts such as the page cache, the file system
implementation (especially for Exfat), the IO engine, a driver, etc.
The results also showed that there is not only a specific version
impacted. Anyway, at the end the performance have highly decreased.
If you want to verify my benchmark tool source code, please ask.
PS: sending again as only text body is accepted
Regards
Florent DELAHAYE