KVM creates a debugfs directory for each VM in order to store statistics
about the virtual machine. The directory name is built from the process
pid and a VM fd. While generally unique, it is possible to keep a
file descriptor alive in a way that causes duplicate directories, which
manifests as these messages:
[ 471.846235] debugfs: Directory '20245-4' with parent 'kvm' already present!
Even though this should not happen in practice, it is more or less
expected in the case of KVM for testcases that call KVM_CREATE_VM and
close the resulting file descriptor repeatedly and in parallel.
When this happens, debugfs_create_dir() returns an error but
kvm_create_vm_debugfs() goes on to allocate stat data structs which are
later leaked. The slow memory leak was spotted by syzkaller, where it
caused OOM reports.
Since the issue only affects debugfs, do a lookup before calling
debugfs_create_dir, so that the message is downgraded and rate-limited.
While at it, ensure kvm->debugfs_dentry is NULL rather than an error
if it is not created. This fixes kvm_destroy_vm_debugfs, which was not
checking IS_ERR_OR_NULL correctly.
Cc: stable(a)vger.kernel.org
Fixes: 536a6f88c49d ("KVM: Create debugfs dir and stat files for each VM")
Reported-by: Alexey Kardashevskiy <aik(a)ozlabs.ru>
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
virt/kvm/kvm_main.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d20fba0fc290..b50dbe269f4b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -892,6 +892,8 @@ static void kvm_destroy_vm_debugfs(struct kvm *kvm)
static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
{
+ static DEFINE_MUTEX(kvm_debugfs_lock);
+ struct dentry *dent;
char dir_name[ITOA_MAX_LEN * 2];
struct kvm_stat_data *stat_data;
const struct _kvm_stats_desc *pdesc;
@@ -903,8 +905,20 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
return 0;
snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), fd);
- kvm->debugfs_dentry = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+ mutex_lock(&kvm_debugfs_lock);
+ dent = debugfs_lookup(dir_name, kvm_debugfs_dir);
+ if (dent) {
+ pr_warn_ratelimited("KVM: debugfs: duplicate directory %s\n", dir_name);
+ dput(dent);
+ mutex_unlock(&kvm_debugfs_lock);
+ return 0;
+ }
+ dent = debugfs_create_dir(dir_name, kvm_debugfs_dir);
+ mutex_unlock(&kvm_debugfs_lock);
+ if (IS_ERR(dent))
+ return 0;
+ kvm->debugfs_dentry = dent;
kvm->debugfs_stat_data = kcalloc(kvm_debugfs_num_entries,
sizeof(*kvm->debugfs_stat_data),
GFP_KERNEL_ACCOUNT);
@@ -5201,7 +5215,7 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm)
}
add_uevent_var(env, "PID=%d", kvm->userspace_pid);
- if (!IS_ERR_OR_NULL(kvm->debugfs_dentry)) {
+ if (kvm->debugfs_dentry) {
char *tmp, *p = kmalloc(PATH_MAX, GFP_KERNEL_ACCOUNT);
if (p) {
--
2.27.0
This is the start of the stable review cycle for the 4.19.201 release.
There are 30 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 04 Aug 2021 13:43:24 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.201-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.201-rc1
Lukasz Cieplicki <lukaszx.cieplicki(a)intel.com>
i40e: Add additional info to PHY type error
Arnaldo Carvalho de Melo <acme(a)redhat.com>
Revert "perf map: Fix dso->nsinfo refcounting"
Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
powerpc/pseries: Fix regression while building external modules
Dan Carpenter <dan.carpenter(a)oracle.com>
can: hi311x: fix a signedness bug in hi3110_cmd()
Wang Hai <wanghai38(a)huawei.com>
sis900: Fix missing pci_disable_device() in probe and remove
Wang Hai <wanghai38(a)huawei.com>
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: fix return value check in __sctp_rcv_asconf_lookup
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Fix flow table chaining
Pavel Skripkin <paskripkin(a)gmail.com>
net: llc: fix skb_over_panic
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
mlx4: Fix missing error code in mlx4_load_one()
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix sleeping in tipc accept routine
Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
i40e: Fix log TC creation failure when max num of queues is exceeded
Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com>
i40e: Fix logic of disabling queues
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_nat: allow to specify layer 4 protocol NAT only
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: adjust stop timestamp to real expiry value
Nguyen Dinh Phi <phind.uet(a)gmail.com>
cfg80211: Fix possible memory leak in function cfg80211_bss_update
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: nfcsim: fix use after free during module unload
Paul Jakma <paul(a)jakma.org>
NIU: fix incorrect error return, missed in previous revert
Pavel Skripkin <paskripkin(a)gmail.com>
can: esd_usb2: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: ems_usb: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: usb_8dev: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: mcba_usb_start(): add missing urb->transfer_dma initialization
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: issue zeroout to EOF blocks
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix zero out valid data
Juergen Gross <jgross(a)suse.com>
x86/kvm: fix vcpu-id indexed array sizes
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
btrfs: fix rw device counting in __btrfs_free_extra_devids
Jan Kiszka <jan.kiszka(a)siemens.com>
x86/asm: Ensure asm/proto.h can be included stand-alone
Eric Dumazet <edumazet(a)google.com>
gro: ensure frag0 meets IP header alignment
Eric Dumazet <edumazet(a)google.com>
virtio_net: Do not pull payload in skb->head
-------------
Diffstat:
Makefile | 4 +-
arch/powerpc/platforms/pseries/setup.c | 2 +-
arch/x86/include/asm/proto.h | 2 +
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/ioapic.h | 4 +-
drivers/net/can/spi/hi311x.c | 2 +-
drivers/net/can/usb/ems_usb.c | 14 ++-
drivers/net/can/usb/esd_usb2.c | 16 +++-
drivers/net/can/usb/mcba_usb.c | 2 +
drivers/net/can/usb/usb_8dev.c | 15 +++-
drivers/net/ethernet/dec/tulip/winbond-840.c | 7 +-
drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 60 ++++++++-----
drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 ++-
drivers/net/ethernet/sis/sis900.c | 7 +-
drivers/net/ethernet/sun/niu.c | 3 +-
drivers/net/virtio_net.c | 10 ++-
drivers/nfc/nfcsim.c | 3 +-
fs/btrfs/volumes.c | 1 +
fs/ocfs2/file.c | 103 +++++++++++++---------
include/linux/skbuff.h | 9 ++
include/linux/virtio_net.h | 14 +--
include/net/llc_pdu.h | 31 +++++--
net/can/raw.c | 20 ++++-
net/core/dev.c | 3 +-
net/llc/af_llc.c | 10 ++-
net/llc/llc_s_ac.c | 2 +-
net/netfilter/nf_conntrack_core.c | 7 +-
net/netfilter/nft_nat.c | 4 +-
net/sctp/input.c | 2 +-
net/tipc/socket.c | 9 +-
net/wireless/scan.c | 6 +-
tools/perf/util/map.c | 2 -
34 files changed, 260 insertions(+), 129 deletions(-)
From: Axel Lin <axel.lin(a)ingics.com>
[ Upstream commit 6549c46af8551b346bcc0b9043f93848319acd5c ]
For linear regulators, the n_voltages should be (max - min) / step + 1.
Buck voltage from 1v to 3V, per step 100mV, and vout mask is 0x1f.
If value is from 20 to 31, the voltage will all be fixed to 3V.
And LDO also, just vout range is different from 1.2v to 3v, step is the
same. If value is from 18 to 31, the voltage will also be fixed to 3v.
Signed-off-by: Axel Lin <axel.lin(a)ingics.com>
Reviewed-by: ChiYuan Huang <cy_huang(a)richtek.com>
Link: https://lore.kernel.org/r/20210627080418.1718127-1-axel.lin@ingics.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/linux/mfd/rt5033-private.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/mfd/rt5033-private.h b/include/linux/mfd/rt5033-private.h
index 1b63fc2f42d1..52d53d134f72 100644
--- a/include/linux/mfd/rt5033-private.h
+++ b/include/linux/mfd/rt5033-private.h
@@ -203,13 +203,13 @@ enum rt5033_reg {
#define RT5033_REGULATOR_BUCK_VOLTAGE_MIN 1000000U
#define RT5033_REGULATOR_BUCK_VOLTAGE_MAX 3000000U
#define RT5033_REGULATOR_BUCK_VOLTAGE_STEP 100000U
-#define RT5033_REGULATOR_BUCK_VOLTAGE_STEP_NUM 32
+#define RT5033_REGULATOR_BUCK_VOLTAGE_STEP_NUM 21
/* RT5033 regulator LDO output voltage uV */
#define RT5033_REGULATOR_LDO_VOLTAGE_MIN 1200000U
#define RT5033_REGULATOR_LDO_VOLTAGE_MAX 3000000U
#define RT5033_REGULATOR_LDO_VOLTAGE_STEP 100000U
-#define RT5033_REGULATOR_LDO_VOLTAGE_STEP_NUM 32
+#define RT5033_REGULATOR_LDO_VOLTAGE_STEP_NUM 19
/* RT5033 regulator SAFE LDO output voltage uV */
#define RT5033_REGULATOR_SAFE_LDO_VOLTAGE 4900000U
--
2.30.2
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 476d98018f32e68e7c5d4e8456940cf2b6d66f10 Mon Sep 17 00:00:00 2001
From: John Fastabend <john.fastabend(a)gmail.com>
Date: Tue, 27 Jul 2021 09:04:59 -0700
Subject: [PATCH] bpf, sockmap: On cleanup we additionally need to remove
cached skb
Its possible if a socket is closed and the receive thread is under memory
pressure it may have cached a skb. We need to ensure these skbs are
free'd along with the normal ingress_skb queue.
Before 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
down and backlog processing both had sock_lock for the common case of
socket close or unhash. So it was not possible to have both running in
parrallel so all we would need is the kfree in those kernels.
But, latest kernels include the commit 799aa7f98d5e and this requires a
bit more work. Without the ingress_lock guarding reading/writing the
state->skb case its possible the tear down could run before the state
update causing it to leak memory or worse when the backlog reads the state
it could potentially run interleaved with the tear down and we might end up
free'ing the state->skb from tear down side but already have the reference
from backlog side. To resolve such races we wrap accesses in ingress_lock
on both sides serializing tear down and backlog case. In both cases this
only happens after an EAGAIN error case so having an extra lock in place
is likely fine. The normal path will skip the locks.
Note, we check state->skb before grabbing lock. This works because
we can only enqueue with the mutex we hold already. Avoiding a race
on adding state->skb after the check. And if tear down path is running
that is also fine if the tear down path then removes state->skb we
will simply set skb=NULL and the subsequent goto is skipped. This
slight complication avoids locking in normal case.
With this fix we no longer see this warning splat from tcp side on
socket close when we hit the above case with redirect to ingress self.
[224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
[224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
[224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G I 5.14.0-rc1alu+ #181
[224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
[224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
[224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
[224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
[224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
[224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
[224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
[224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
[224913.935974] FS: 00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
[224913.935981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
[224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[224913.936007] Call Trace:
[224913.936016] inet_csk_destroy_sock+0xba/0x1f0
[224913.936033] __tcp_close+0x620/0x790
[224913.936047] tcp_close+0x20/0x80
[224913.936056] inet_release+0x8f/0xf0
[224913.936070] __sock_release+0x72/0x120
[224913.936083] sock_close+0x14/0x20
Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down")
Signed-off-by: John Fastabend <john.fastabend(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.c…
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 28115ef742e8..036cdb33a94a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -590,23 +590,42 @@ static void sock_drop(struct sock *sk, struct sk_buff *skb)
kfree_skb(skb);
}
+static void sk_psock_skb_state(struct sk_psock *psock,
+ struct sk_psock_work_state *state,
+ struct sk_buff *skb,
+ int len, int off)
+{
+ spin_lock_bh(&psock->ingress_lock);
+ if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
+ state->skb = skb;
+ state->len = len;
+ state->off = off;
+ } else {
+ sock_drop(psock->sk, skb);
+ }
+ spin_unlock_bh(&psock->ingress_lock);
+}
+
static void sk_psock_backlog(struct work_struct *work)
{
struct sk_psock *psock = container_of(work, struct sk_psock, work);
struct sk_psock_work_state *state = &psock->work_state;
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
bool ingress;
u32 len, off;
int ret;
mutex_lock(&psock->work_mutex);
- if (state->skb) {
+ if (unlikely(state->skb)) {
+ spin_lock_bh(&psock->ingress_lock);
skb = state->skb;
len = state->len;
off = state->off;
state->skb = NULL;
- goto start;
+ spin_unlock_bh(&psock->ingress_lock);
}
+ if (skb)
+ goto start;
while ((skb = skb_dequeue(&psock->ingress_skb))) {
len = skb->len;
@@ -621,9 +640,8 @@ static void sk_psock_backlog(struct work_struct *work)
len, ingress);
if (ret <= 0) {
if (ret == -EAGAIN) {
- state->skb = skb;
- state->len = len;
- state->off = off;
+ sk_psock_skb_state(psock, state, skb,
+ len, off);
goto end;
}
/* Hard errors break pipe and stop xmit. */
@@ -722,6 +740,11 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
skb_bpf_redirect_clear(skb);
sock_drop(psock->sk, skb);
}
+ kfree_skb(psock->work_state.skb);
+ /* We null the skb here to ensure that calls to sk_psock_backlog
+ * do not pick up the free'd skb.
+ */
+ psock->work_state.skb = NULL;
__sk_psock_purge_ingress_msg(psock);
}
From: Bob Peterson <rpeterso(a)redhat.com>
Before this patch, functions save_callbacks and restore_callbacks
called function lock_sock and release_sock to prevent other processes
from messing with the struct sock while the callbacks were saved and
restored. However, function add_sock calls write_lock_bh prior to
calling it save_callbacks, which disables preempts. So the call to
lock_sock would try to schedule when we can't schedule.
Cc: stable(a)vger.kernel.org # 4.9.x
Fixes: b81171cb6869 ("DLM: Save and restore socket callbacks properly")
Signed-off-by: Bob Peterson <rpeterso(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
fs/dlm/lowcomms.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 0d8aaf9c61be..9a9fc5ce1166 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -519,24 +519,20 @@ static void lowcomms_error_report(struct sock *sk)
/* Note: sk_callback_lock must be locked before calling this function. */
static void save_callbacks(struct connection *con, struct sock *sk)
{
- lock_sock(sk);
con->orig_data_ready = sk->sk_data_ready;
con->orig_state_change = sk->sk_state_change;
con->orig_write_space = sk->sk_write_space;
con->orig_error_report = sk->sk_error_report;
- release_sock(sk);
}
static void restore_callbacks(struct connection *con, struct sock *sk)
{
write_lock_bh(&sk->sk_callback_lock);
- lock_sock(sk);
sk->sk_user_data = NULL;
sk->sk_data_ready = con->orig_data_ready;
sk->sk_state_change = con->orig_state_change;
sk->sk_write_space = con->orig_write_space;
sk->sk_error_report = con->orig_error_report;
- release_sock(sk);
write_unlock_bh(&sk->sk_callback_lock);
}
--
2.27.0
There is a race in ceph_put_snap_realm. The change to the nref and the
spinlock acquisition are not done atomically, so you could decrement nref,
and before you take the spinlock, the nref is incremented again. At that
point, you end up putting it on the empty list when it shouldn't be
there. Eventually __cleanup_empty_realms runs and frees it when it's
still in-use.
Fix this by protecting the 1->0 transition with atomic_dec_and_lock, and
just drop the spinlock if we can get the rwsem.
Because these objects can also undergo a 0->1 refcount transition, we
must protect that change as well with the spinlock. Increment locklessly
unless the value is at 0, in which case we take the spinlock, increment
and then take it off the empty list if it did the 0->1 transition.
With these changes, I'm removing the dout() messages from these
functions, as well as in __put_snap_realm. They've always been racy, and
it's better to not print values that may be misleading.
Cc: stable(a)vger.kernel.org
Cc: Sage Weil <sage(a)redhat.com>
Reported-by: Mark Nelson <mnelson(a)redhat.com>
URL: https://tracker.ceph.com/issues/46419
Signed-off-by: Jeff Layton <jlayton(a)kernel.org>
---
fs/ceph/snap.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)
v2: No functional changes, but I cleaned up the comments a bit and
added another in __put_snap_realm.
diff --git a/fs/ceph/snap.c b/fs/ceph/snap.c
index 9dbc92cfda38..158c11e96fb7 100644
--- a/fs/ceph/snap.c
+++ b/fs/ceph/snap.c
@@ -67,19 +67,19 @@ void ceph_get_snap_realm(struct ceph_mds_client *mdsc,
{
lockdep_assert_held(&mdsc->snap_rwsem);
- dout("get_realm %p %d -> %d\n", realm,
- atomic_read(&realm->nref), atomic_read(&realm->nref)+1);
/*
- * since we _only_ increment realm refs or empty the empty
- * list with snap_rwsem held, adjusting the empty list here is
- * safe. we do need to protect against concurrent empty list
- * additions, however.
+ * The 0->1 and 1->0 transitions must take the snap_empty_lock
+ * atomically with the refcount change. Go ahead and bump the
+ * nref here, unless it's 0, in which case we take the spinlock
+ * and then do the increment and remove it from the list.
*/
- if (atomic_inc_return(&realm->nref) == 1) {
- spin_lock(&mdsc->snap_empty_lock);
+ if (atomic_add_unless(&realm->nref, 1, 0))
+ return;
+
+ spin_lock(&mdsc->snap_empty_lock);
+ if (atomic_inc_return(&realm->nref) == 1)
list_del_init(&realm->empty_item);
- spin_unlock(&mdsc->snap_empty_lock);
- }
+ spin_unlock(&mdsc->snap_empty_lock);
}
static void __insert_snap_realm(struct rb_root *root,
@@ -208,28 +208,28 @@ static void __put_snap_realm(struct ceph_mds_client *mdsc,
{
lockdep_assert_held_write(&mdsc->snap_rwsem);
- dout("__put_snap_realm %llx %p %d -> %d\n", realm->ino, realm,
- atomic_read(&realm->nref), atomic_read(&realm->nref)-1);
+ /*
+ * We do not require the snap_empty_lock here, as any caller that
+ * increments the value must hold the snap_rwsem.
+ */
if (atomic_dec_and_test(&realm->nref))
__destroy_snap_realm(mdsc, realm);
}
/*
- * caller needn't hold any locks
+ * See comments in ceph_get_snap_realm. Caller needn't hold any locks.
*/
void ceph_put_snap_realm(struct ceph_mds_client *mdsc,
struct ceph_snap_realm *realm)
{
- dout("put_snap_realm %llx %p %d -> %d\n", realm->ino, realm,
- atomic_read(&realm->nref), atomic_read(&realm->nref)-1);
- if (!atomic_dec_and_test(&realm->nref))
+ if (!atomic_dec_and_lock(&realm->nref, &mdsc->snap_empty_lock))
return;
if (down_write_trylock(&mdsc->snap_rwsem)) {
+ spin_unlock(&mdsc->snap_empty_lock);
__destroy_snap_realm(mdsc, realm);
up_write(&mdsc->snap_rwsem);
} else {
- spin_lock(&mdsc->snap_empty_lock);
list_add(&realm->empty_item, &mdsc->snap_empty);
spin_unlock(&mdsc->snap_empty_lock);
}
--
2.31.1
From: Sudeep Holla <sudeep.holla(a)arm.com>
[ Upstream commit 5e469dac326555d2038d199a6329458cc82a34e5 ]
The bus probe callback calls the driver callback without further
checking. Better be safe than sorry and refuse registration of a driver
without a probe function to prevent a NULL pointer exception.
Cc: <stable(a)vger.kernel.org> # v4.19+
Link: https://lore.kernel.org/r/20210624095059.4010157-2-sudeep.holla@arm.com
Fixes: 933c504424a2 ("firmware: arm_scmi: add scmi protocol bus to enumerate protocol devices")
Reported-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Tested-by: Cristian Marussi <cristian.marussi(a)arm.com>
Reviewed-by: Cristian Marussi <cristian.marussi(a)arm.com>
Acked-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Sudeep Holla <sudeep.holla(a)arm.com>
---
Upstream commit 5e469dac326555d2038d199a6329458cc82a34e5 has been already
applied to stable/linux-5.13.y, this is a backport with conflicts resolved
for v4.19, v5.4 and v5.10. (SCMI stack did not even exist before 4.19)
---
drivers/firmware/arm_scmi/bus.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/firmware/arm_scmi/bus.c b/drivers/firmware/arm_scmi/bus.c
index 7a30952b463d..66d445b14e51 100644
--- a/drivers/firmware/arm_scmi/bus.c
+++ b/drivers/firmware/arm_scmi/bus.c
@@ -100,6 +100,9 @@ int scmi_driver_register(struct scmi_driver *driver, struct module *owner,
{
int retval;
+ if (!driver->probe)
+ return -EINVAL;
+
driver->driver.bus = &scmi_bus_type;
driver->driver.name = driver->name;
driver->driver.owner = owner;
--
2.17.1
From: Petko Manolov <petkan(a)nucleusys.com>
v3:
Pavel Skripkin again: make sure -ETIMEDOUT is returned by __mii_op() on timeout
condition;
v2:
Special thanks to Pavel Skripkin for the review and who caught a few bugs.
setup_pegasus_II() would not print an erroneous message on the success path.
v1:
Add error checking for get_registers() and derivatives. If the usb transfer
fail then just don't use the buffer where the legal data should have been
returned.
Remove DRIVER_VERSION per Greg KH request.
Petko Manolov (2):
Check the return value of get_geristers() and friends;
Remove the changelog and DRIVER_VERSION.
drivers/net/usb/pegasus.c | 138 +++++++++++++++++++++-----------------
1 file changed, 77 insertions(+), 61 deletions(-)
--
2.30.2
At least some PL2303GT have a bcdDevice of 0x305 instead of 0x100 as the
datasheet claims. Add it to the list of known release numbers for the
HXN (G) type.
Fixes: 894758d0571d ("USB: serial: pl2303: tighten type HXN (G) detection")
Reported-by: Vasily Khoruzhick <anarsoul(a)gmail.com>
Tested-by: Vasily Khoruzhick <anarsoul(a)gmail.com>
Cc: stable(a)vger.kernel.org # 5.13
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/pl2303.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/serial/pl2303.c b/drivers/usb/serial/pl2303.c
index 17601e32083e..930b3d50a330 100644
--- a/drivers/usb/serial/pl2303.c
+++ b/drivers/usb/serial/pl2303.c
@@ -432,6 +432,7 @@ static int pl2303_detect_type(struct usb_serial *serial)
case 0x200:
switch (bcdDevice) {
case 0x100:
+ case 0x305:
/*
* Assume it's an HXN-type if the device doesn't
* support the old read request value.
--
2.31.1
The physical address may exceed 32 bits on 32-bit systems with
more than 32 bits of physcial address,use PFN_PHYS() in devmem_is_allowed(),
or the physical address may overflow and be truncated.
We found this bug when mapping a high addresses through devmem tool,
when CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem
is used to map a high address that is not in the iomem address range,
an unexpected error indicating no permission is returned.
This bug was initially introduced from v2.6.37, and the function was moved
to lib when v5.11.
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
Cc: stable(a)vger.kernel.org # v2.6.37
Signed-off-by: Liang Wang <wangliang101(a)huawei.com>
---
v3: update changelog suggested by Luis Chamberlain <mcgrof(a)kernel.org>
lib/devmem_is_allowed.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/devmem_is_allowed.c b/lib/devmem_is_allowed.c
index c0d67c541849..60be9e24bd57 100644
--- a/lib/devmem_is_allowed.c
+++ b/lib/devmem_is_allowed.c
@@ -19,7 +19,7 @@
*/
int devmem_is_allowed(unsigned long pfn)
{
- if (iomem_is_exclusive(pfn << PAGE_SHIFT))
+ if (iomem_is_exclusive(PFN_PHYS(pfn)))
return 0;
if (!page_is_ram(pfn))
return 1;
--
2.32.0
On 2021/7/30 14:49, Liang Wang wrote:
> The physical address may exceed 32 bits on ARM(when ARM_LPAE enabled),
> use PFN_PHYS() in devmem_is_allowed(), or the physical address may
> overflow and be truncated.
>
> This bug was initially introduced from v2.6.37, and the function was moved
> to lib when v5.10.
>
> Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
> Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
> Cc: stable(a)vger.kernel.org # v2.6.37
> Signed-off-by: Liang Wang <wangliang101(a)huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
> ---
> v2: update subject and changelog
> lib/devmem_is_allowed.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/devmem_is_allowed.c b/lib/devmem_is_allowed.c
> index c0d67c541849..60be9e24bd57 100644
> --- a/lib/devmem_is_allowed.c
> +++ b/lib/devmem_is_allowed.c
> @@ -19,7 +19,7 @@
> */
> int devmem_is_allowed(unsigned long pfn)
> {
> - if (iomem_is_exclusive(pfn << PAGE_SHIFT))
> + if (iomem_is_exclusive(PFN_PHYS(pfn)))
> return 0;
> if (!page_is_ram(pfn))
> return 1;
This is the start of the stable review cycle for the 4.14.242 release.
There are 38 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 04 Aug 2021 13:43:24 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.242-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.242-rc1
Arnaldo Carvalho de Melo <acme(a)redhat.com>
Revert "perf map: Fix dso->nsinfo refcounting"
Dan Carpenter <dan.carpenter(a)oracle.com>
can: hi311x: fix a signedness bug in hi3110_cmd()
Wang Hai <wanghai38(a)huawei.com>
sis900: Fix missing pci_disable_device() in probe and remove
Wang Hai <wanghai38(a)huawei.com>
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: fix return value check in __sctp_rcv_asconf_lookup
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Fix flow table chaining
Pavel Skripkin <paskripkin(a)gmail.com>
net: llc: fix skb_over_panic
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
mlx4: Fix missing error code in mlx4_load_one()
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix sleeping in tipc accept routine
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_nat: allow to specify layer 4 protocol NAT only
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: adjust stop timestamp to real expiry value
Nguyen Dinh Phi <phind.uet(a)gmail.com>
cfg80211: Fix possible memory leak in function cfg80211_bss_update
Jan Kiszka <jan.kiszka(a)siemens.com>
x86/asm: Ensure asm/proto.h can be included stand-alone
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: nfcsim: fix use after free during module unload
Paul Jakma <paul(a)jakma.org>
NIU: fix incorrect error return, missed in previous revert
Pavel Skripkin <paskripkin(a)gmail.com>
can: esd_usb2: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: ems_usb: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: usb_8dev: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: mcba_usb_start(): add missing urb->transfer_dma initialization
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: issue zeroout to EOF blocks
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix zero out valid data
Juergen Gross <jgross(a)suse.com>
x86/kvm: fix vcpu-id indexed array sizes
Eric Dumazet <edumazet(a)google.com>
gro: ensure frag0 meets IP header alignment
Eric Dumazet <edumazet(a)google.com>
virtio_net: Do not pull payload in skb->head
Sudeep Holla <sudeep.holla(a)arm.com>
ARM: dts: versatile: Fix up interrupt controller node names
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add lock nesting notation to hfs_find_init
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: fix high memory mapping in hfs_bnode_read
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add missing clean-up in hfs_fill_super
Xin Long <lucien.xin(a)gmail.com>
sctp: move 198 addresses from unusable to private scope
Eric Dumazet <edumazet(a)google.com>
net: annotate data race around sk_ll_usec
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/garp: fix memleak in garp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/mrp: fix memleak in mrp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
workqueue: fix UAF in pwq_unbound_release_workfn()
Miklos Szeredi <mszeredi(a)redhat.com>
af_unix: fix garbage collect vs MSG_PEEK
Jens Axboe <axboe(a)kernel.dk>
net: split out functions related to registering inflight socket files
Maxim Levitsky <mlevitsk(a)redhat.com>
KVM: x86: determine if an exception has an error code only when injecting it.
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
selftest: fix build error in tools/testing/selftests/vm/userfaultfd.c
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/versatile-ab.dts | 5 +-
arch/arm/boot/dts/versatile-pb.dts | 2 +-
arch/x86/include/asm/proto.h | 2 +
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/ioapic.h | 4 +-
arch/x86/kvm/x86.c | 13 +-
drivers/net/can/spi/hi311x.c | 2 +-
drivers/net/can/usb/ems_usb.c | 14 +-
drivers/net/can/usb/esd_usb2.c | 16 ++-
drivers/net/can/usb/mcba_usb.c | 2 +
drivers/net/can/usb/usb_8dev.c | 15 ++-
drivers/net/ethernet/dec/tulip/winbond-840.c | 7 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +-
drivers/net/ethernet/sis/sis900.c | 7 +-
drivers/net/ethernet/sun/niu.c | 3 +-
drivers/net/virtio_net.c | 10 +-
drivers/nfc/nfcsim.c | 3 +-
fs/hfs/bfind.c | 14 +-
fs/hfs/bnode.c | 25 +++-
fs/hfs/btree.h | 7 +
fs/hfs/super.c | 10 +-
fs/ocfs2/file.c | 103 +++++++++------
include/linux/skbuff.h | 9 ++
include/linux/virtio_net.h | 14 +-
include/net/af_unix.h | 1 +
include/net/busy_poll.h | 2 +-
include/net/llc_pdu.h | 31 +++--
include/net/sctp/constants.h | 4 +-
kernel/workqueue.c | 20 ++-
net/802/garp.c | 14 ++
net/802/mrp.c | 14 ++
net/Makefile | 2 +-
net/can/raw.c | 20 ++-
net/core/dev.c | 3 +-
net/core/sock.c | 2 +-
net/llc/af_llc.c | 10 +-
net/llc/llc_s_ac.c | 2 +-
net/netfilter/nf_conntrack_core.c | 7 +-
net/netfilter/nft_nat.c | 4 +-
net/sctp/input.c | 2 +-
net/sctp/protocol.c | 3 +-
net/tipc/socket.c | 9 +-
net/unix/Kconfig | 5 +
net/unix/Makefile | 2 +
net/unix/af_unix.c | 102 +++++++--------
net/unix/garbage.c | 68 +---------
net/unix/scm.c | 149 ++++++++++++++++++++++
net/unix/scm.h | 10 ++
net/wireless/scan.c | 6 +-
tools/perf/util/map.c | 2 -
tools/testing/selftests/vm/userfaultfd.c | 2 +-
53 files changed, 540 insertions(+), 260 deletions(-)
The patch titled
Subject: lib: use PFN_PHYS() in devmem_is_allowed()
has been added to the -mm tree. Its filename is
lib-use-pfn_phys-in-devmem_is_allowed.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/lib-use-pfn_phys-in-devmem_is_all…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/lib-use-pfn_phys-in-devmem_is_all…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Liang Wang <wangliang101(a)huawei.com>
Subject: lib: use PFN_PHYS() in devmem_is_allowed()
The physical address may exceed 32 bits on 32-bit systems with more than
32 bits of physcial address,use PFN_PHYS() in devmem_is_allowed(), or the
physical address may overflow and be truncated.
We found this bug when mapping a high addresses through devmem tool, when
CONFIG_STRICT_DEVMEM is enabled on the ARM with ARM_LPAE and devmem is
used to map a high address that is not in the iomem address range, an
unexpected error indicating no permission is returned.
This bug was initially introduced from v2.6.37, and the function was moved
to lib when v5.11.
Link: https://lkml.kernel.org/r/20210731025057.78825-1-wangliang101@huawei.com
Fixes: 087aaffcdf9c ("ARM: implement CONFIG_STRICT_DEVMEM by disabling access to RAM via /dev/mem")
Fixes: 527701eda5f1 ("lib: Add a generic version of devmem_is_allowed()")
Signed-off-by: Liang Wang <wangliang101(a)huawei.com>
Cc: Palmer Dabbelt <palmerdabbelt(a)google.com>
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Liang Wang <wangliang101(a)huawei.com>
Cc: Xiaoming Ni <nixiaoming(a)huawei.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: <stable(a)vger.kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/devmem_is_allowed.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/devmem_is_allowed.c~lib-use-pfn_phys-in-devmem_is_allowed
+++ a/lib/devmem_is_allowed.c
@@ -19,7 +19,7 @@
*/
int devmem_is_allowed(unsigned long pfn)
{
- if (iomem_is_exclusive(pfn << PAGE_SHIFT))
+ if (iomem_is_exclusive(PFN_PHYS(pfn)))
return 0;
if (!page_is_ram(pfn))
return 1;
_
Patches currently in -mm which might be from wangliang101(a)huawei.com are
lib-use-pfn_phys-in-devmem_is_allowed.patch
This is the start of the stable review cycle for the 5.4.138 release.
There are 40 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 04 Aug 2021 13:43:24 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.138-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.138-rc1
Oleksij Rempel <linux(a)rempel-privat.de>
can: j1939: j1939_session_deactivate(): clarify lifetime of session object
Lukasz Cieplicki <lukaszx.cieplicki(a)intel.com>
i40e: Add additional info to PHY type error
Arnaldo Carvalho de Melo <acme(a)redhat.com>
Revert "perf map: Fix dso->nsinfo refcounting"
Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
powerpc/pseries: Fix regression while building external modules
Shmuel Hazan <sh(a)tkos.co.il>
PCI: mvebu: Setup BAR0 in order to fix MSI
Dan Carpenter <dan.carpenter(a)oracle.com>
can: hi311x: fix a signedness bug in hi3110_cmd()
Wang Hai <wanghai38(a)huawei.com>
sis900: Fix missing pci_disable_device() in probe and remove
Wang Hai <wanghai38(a)huawei.com>
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: fix return value check in __sctp_rcv_asconf_lookup
Dima Chumak <dchumak(a)nvidia.com>
net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Fix flow table chaining
Pavel Skripkin <paskripkin(a)gmail.com>
net: llc: fix skb_over_panic
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
mlx4: Fix missing error code in mlx4_load_one()
Gilad Naaman <gnaaman(a)drivenets.com>
net: Set true network header for ECN decapsulation
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix sleeping in tipc accept routine
Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
i40e: Fix log TC creation failure when max num of queues is exceeded
Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
i40e: Fix queue-to-TC mapping on Tx
Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com>
i40e: Fix firmware LLDP agent related warning
Arkadiusz Kubalewski <arkadiusz.kubalewski(a)intel.com>
i40e: Fix logic of disabling queues
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_nat: allow to specify layer 4 protocol NAT only
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: adjust stop timestamp to real expiry value
Nguyen Dinh Phi <phind.uet(a)gmail.com>
cfg80211: Fix possible memory leak in function cfg80211_bss_update
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: nfcsim: fix use after free during module unload
Paul Jakma <paul(a)jakma.org>
NIU: fix incorrect error return, missed in previous revert
Jason Gerecke <killertofu(a)gmail.com>
HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
Pavel Skripkin <paskripkin(a)gmail.com>
can: esd_usb2: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: ems_usb: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: usb_8dev: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: mcba_usb_start(): add missing urb->transfer_dma initialization
Ziyang Xuan <william.xuanziyang(a)huawei.com>
can: raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
Zhang Changzhong <zhangchangzhong(a)huawei.com>
can: j1939: j1939_xtp_rx_dat_one(): fix rxtimer value between consecutive TP.DT to 750ms
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: issue zeroout to EOF blocks
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix zero out valid data
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: add missing compat KVM_CLEAR_DIRTY_LOG
Juergen Gross <jgross(a)suse.com>
x86/kvm: fix vcpu-id indexed array sizes
Hui Wang <hui.wang(a)canonical.com>
Revert "ACPI: resources: Add checks for ACPI IRQ override"
Goldwyn Rodrigues <rgoldwyn(a)suse.de>
btrfs: mark compressed range uptodate only if all bio succeed
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
btrfs: fix rw device counting in __btrfs_free_extra_devids
Jan Kiszka <jan.kiszka(a)siemens.com>
x86/asm: Ensure asm/proto.h can be included stand-alone
Cong Wang <xiyou.wangcong(a)gmail.com>
net_sched: check error pointer in tcf_dump_walker()
-------------
Diffstat:
Makefile | 4 +-
arch/powerpc/platforms/pseries/setup.c | 2 +-
arch/x86/include/asm/proto.h | 2 +
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/ioapic.h | 4 +-
drivers/acpi/resource.c | 9 +-
drivers/hid/wacom_wac.c | 2 +-
drivers/net/can/spi/hi311x.c | 2 +-
drivers/net/can/usb/ems_usb.c | 14 ++-
drivers/net/can/usb/esd_usb2.c | 16 +++-
drivers/net/can/usb/mcba_usb.c | 2 +
drivers/net/can/usb/usb_8dev.c | 15 +++-
drivers/net/ethernet/dec/tulip/winbond-840.c | 7 +-
drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 6 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 61 ++++++++-----
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 50 +++++++++++
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 2 +
drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 33 ++++++-
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 ++-
drivers/net/ethernet/sis/sis900.c | 7 +-
drivers/net/ethernet/sun/niu.c | 3 +-
drivers/nfc/nfcsim.c | 3 +-
drivers/pci/controller/pci-mvebu.c | 16 +++-
fs/btrfs/compression.c | 2 +-
fs/btrfs/volumes.c | 1 +
fs/ocfs2/file.c | 103 +++++++++++++---------
include/net/llc_pdu.h | 31 +++++--
net/can/j1939/transport.c | 11 ++-
net/can/raw.c | 20 ++++-
net/ipv4/ip_tunnel.c | 2 +-
net/llc/af_llc.c | 10 ++-
net/llc/llc_s_ac.c | 2 +-
net/netfilter/nf_conntrack_core.c | 7 +-
net/netfilter/nft_nat.c | 4 +-
net/sched/act_api.c | 2 +
net/sctp/input.c | 2 +-
net/tipc/socket.c | 9 +-
net/wireless/scan.c | 6 +-
tools/perf/util/map.c | 2 -
virt/kvm/kvm_main.c | 28 ++++++
41 files changed, 375 insertions(+), 140 deletions(-)
This is the start of the stable review cycle for the 4.9.278 release.
There are 32 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 04 Aug 2021 13:43:24 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.278-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.278-rc1
Wang Hai <wanghai38(a)huawei.com>
sis900: Fix missing pci_disable_device() in probe and remove
Wang Hai <wanghai38(a)huawei.com>
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: Fix flow table chaining
Pavel Skripkin <paskripkin(a)gmail.com>
net: llc: fix skb_over_panic
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
mlx4: Fix missing error code in mlx4_load_one()
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix sleeping in tipc accept routine
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_nat: allow to specify layer 4 protocol NAT only
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: adjust stop timestamp to real expiry value
Nguyen Dinh Phi <phind.uet(a)gmail.com>
cfg80211: Fix possible memory leak in function cfg80211_bss_update
Jan Kiszka <jan.kiszka(a)siemens.com>
x86/asm: Ensure asm/proto.h can be included stand-alone
Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
nfc: nfcsim: fix use after free during module unload
Paul Jakma <paul(a)jakma.org>
NIU: fix incorrect error return, missed in previous revert
Pavel Skripkin <paskripkin(a)gmail.com>
can: esd_usb2: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: ems_usb: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: usb_8dev: fix memory leak
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: issue zeroout to EOF blocks
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix zero out valid data
Juergen Gross <jgross(a)suse.com>
x86/kvm: fix vcpu-id indexed array sizes
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: ensure the signal page contains defined contents
Matthew Wilcox <mawilcox(a)microsoft.com>
lib/string.c: add multibyte memset functions
Sudeep Holla <sudeep.holla(a)arm.com>
ARM: dts: versatile: Fix up interrupt controller node names
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add lock nesting notation to hfs_find_init
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: fix high memory mapping in hfs_bnode_read
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add missing clean-up in hfs_fill_super
Xin Long <lucien.xin(a)gmail.com>
sctp: move 198 addresses from unusable to private scope
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/garp: fix memleak in garp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/mrp: fix memleak in mrp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
workqueue: fix UAF in pwq_unbound_release_workfn()
Miklos Szeredi <mszeredi(a)redhat.com>
af_unix: fix garbage collect vs MSG_PEEK
Jens Axboe <axboe(a)kernel.dk>
net: split out functions related to registering inflight socket files
Nathan Chancellor <nathan(a)kernel.org>
tipc: Fix backport of b77413446408fdd256599daf00d5be72b5f3e7c6
Nathan Chancellor <nathan(a)kernel.org>
iommu/amd: Fix backport of 140456f994195b568ecd7fc2287a34eadffef3ca
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/versatile-ab.dts | 5 +-
arch/arm/boot/dts/versatile-pb.dts | 2 +-
arch/arm/kernel/signal.c | 14 +-
arch/x86/include/asm/proto.h | 2 +
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/ioapic.h | 4 +-
drivers/iommu/amd_iommu.c | 2 +-
drivers/net/can/usb/ems_usb.c | 14 +-
drivers/net/can/usb/esd_usb2.c | 16 ++-
drivers/net/can/usb/usb_8dev.c | 15 +-
drivers/net/ethernet/dec/tulip/winbond-840.c | 7 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 10 +-
drivers/net/ethernet/sis/sis900.c | 7 +-
drivers/net/ethernet/sun/niu.c | 3 +-
drivers/nfc/nfcsim.c | 3 +-
fs/hfs/bfind.c | 14 +-
fs/hfs/bnode.c | 25 +++-
fs/hfs/btree.h | 7 +
fs/hfs/super.c | 10 +-
fs/ocfs2/file.c | 103 ++++++++------
include/linux/string.h | 30 ++++
include/net/af_unix.h | 1 +
include/net/llc_pdu.h | 31 +++--
include/net/sctp/constants.h | 4 +-
kernel/workqueue.c | 20 ++-
lib/string.c | 66 +++++++++
net/802/garp.c | 14 ++
net/802/mrp.c | 14 ++
net/Makefile | 2 +-
net/llc/af_llc.c | 10 +-
net/llc/llc_s_ac.c | 2 +-
net/netfilter/nf_conntrack_core.c | 7 +-
net/netfilter/nft_nat.c | 4 +-
net/sctp/protocol.c | 3 +-
net/tipc/link.c | 2 +-
net/tipc/socket.c | 9 +-
net/unix/Kconfig | 5 +
net/unix/Makefile | 2 +
net/unix/af_unix.c | 115 ++++++----------
net/unix/garbage.c | 68 +--------
net/unix/scm.c | 161 ++++++++++++++++++++++
net/unix/scm.h | 10 ++
net/wireless/scan.c | 6 +-
45 files changed, 597 insertions(+), 259 deletions(-)
This is the start of the stable review cycle for the 4.4.278 release.
There are 26 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 04 Aug 2021 13:43:24 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.278-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.278-rc1
Wang Hai <wanghai38(a)huawei.com>
sis900: Fix missing pci_disable_device() in probe and remove
Wang Hai <wanghai38(a)huawei.com>
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
Pavel Skripkin <paskripkin(a)gmail.com>
net: llc: fix skb_over_panic
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
mlx4: Fix missing error code in mlx4_load_one()
Hoang Le <hoang.h.le(a)dektech.com.au>
tipc: fix sleeping in tipc accept routine
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_nat: allow to specify layer 4 protocol NAT only
Nguyen Dinh Phi <phind.uet(a)gmail.com>
cfg80211: Fix possible memory leak in function cfg80211_bss_update
Jan Kiszka <jan.kiszka(a)siemens.com>
x86/asm: Ensure asm/proto.h can be included stand-alone
Paul Jakma <paul(a)jakma.org>
NIU: fix incorrect error return, missed in previous revert
Pavel Skripkin <paskripkin(a)gmail.com>
can: esd_usb2: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: ems_usb: fix memory leak
Pavel Skripkin <paskripkin(a)gmail.com>
can: usb_8dev: fix memory leak
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: issue zeroout to EOF blocks
Junxiao Bi <junxiao.bi(a)oracle.com>
ocfs2: fix zero out valid data
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: ensure the signal page contains defined contents
Matthew Wilcox <mawilcox(a)microsoft.com>
lib/string.c: add multibyte memset functions
Sudeep Holla <sudeep.holla(a)arm.com>
ARM: dts: versatile: Fix up interrupt controller node names
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add lock nesting notation to hfs_find_init
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: fix high memory mapping in hfs_bnode_read
Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
hfs: add missing clean-up in hfs_fill_super
Xin Long <lucien.xin(a)gmail.com>
sctp: move 198 addresses from unusable to private scope
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/garp: fix memleak in garp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
net/802/mrp: fix memleak in mrp_request_join()
Yang Yingliang <yangyingliang(a)huawei.com>
workqueue: fix UAF in pwq_unbound_release_workfn()
Miklos Szeredi <mszeredi(a)redhat.com>
af_unix: fix garbage collect vs MSG_PEEK
Jens Axboe <axboe(a)kernel.dk>
net: split out functions related to registering inflight socket files
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/versatile-ab.dts | 5 +-
arch/arm/boot/dts/versatile-pb.dts | 2 +-
arch/arm/kernel/signal.c | 14 ++-
arch/x86/include/asm/proto.h | 2 +
drivers/net/can/usb/ems_usb.c | 14 ++-
drivers/net/can/usb/esd_usb2.c | 16 ++-
drivers/net/can/usb/usb_8dev.c | 15 ++-
drivers/net/ethernet/dec/tulip/winbond-840.c | 7 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 1 +
drivers/net/ethernet/sis/sis900.c | 7 +-
drivers/net/ethernet/sun/niu.c | 3 +-
fs/hfs/bfind.c | 14 ++-
fs/hfs/bnode.c | 25 ++++-
fs/hfs/btree.h | 7 ++
fs/hfs/super.c | 10 +-
fs/ocfs2/file.c | 103 ++++++++++-------
include/linux/string.h | 30 +++++
include/net/af_unix.h | 1 +
include/net/llc_pdu.h | 31 ++++--
include/net/sctp/constants.h | 4 +-
kernel/workqueue.c | 20 ++--
lib/string.c | 66 +++++++++++
net/802/garp.c | 14 +++
net/802/mrp.c | 14 +++
net/Makefile | 2 +-
net/llc/af_llc.c | 10 +-
net/llc/llc_s_ac.c | 2 +-
net/netfilter/nft_nat.c | 4 +-
net/sctp/protocol.c | 3 +-
net/tipc/socket.c | 9 +-
net/unix/Kconfig | 5 +
net/unix/Makefile | 2 +
net/unix/af_unix.c | 115 ++++++++-----------
net/unix/garbage.c | 68 +----------
net/unix/scm.c | 161 +++++++++++++++++++++++++++
net/unix/scm.h | 10 ++
net/wireless/scan.c | 6 +-
38 files changed, 579 insertions(+), 247 deletions(-)
From: Kan Liang <kan.liang(a)linux.intel.com>
A warning as below may be occasionally triggered in an ADL machine when
these conditions occur,
- Two perf record commands run one by one. Both record a PEBS event.
- Both runs on small cores.
- They have different adaptive PEBS configuration (PEBS_DATA_CFG).
[ 673.663291] WARNING: CPU: 4 PID: 9874 at
arch/x86/events/intel/ds.c:1743
setup_pebs_adaptive_sample_data+0x55e/0x5b0
[ 673.663348] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0
[ 673.663357] Call Trace:
[ 673.663357] <NMI>
[ 673.663357] intel_pmu_drain_pebs_icl+0x48b/0x810
[ 673.663360] perf_event_nmi_handler+0x41/0x80
[ 673.663368] </NMI>
[ 673.663370] __perf_event_task_sched_in+0x2c2/0x3a0
Different from the big core, the small core requires the ACK right
before re-enabling counters in the NMI handler, otherwise a stale PEBS
record may be dumped into the later NMI handler, which trigger the
warning.
Add a new mid_ack flag to track the case. Add all PMI handler bits in
the struct x86_hybrid_pmu to track the bits for different types of PMUs.
Apply mid ACK for the small cores on an Alder Lake machine.
The existing hybrid() macro has a compile error when taking address of a
bit-field variable. Add a new macro hybrid_bit() to get the bit-field
value of a given PMU.
Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Ammy Yi <ammy.yi(a)intel.com>
Tested-by: Ammy Yi <ammy.yi(a)intel.com>
Reviewed-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
The V1 patch set can be found at
https://lore.kernel.org/lkml/1625774073-153697-1-git-send-email-kan.liang@l…
Changes since v1:
- Introduce mid ACK. The early ACK in V1 may trigger other issue based
on the latest test result.
- Add comments regarding early, mid and late ACK.
arch/x86/events/intel/core.c | 23 +++++++++++++++--------
arch/x86/events/perf_event.h | 15 +++++++++++++++
2 files changed, 30 insertions(+), 8 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index d76be3b..511d1f9 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2904,24 +2904,28 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
*/
static int intel_pmu_handle_irq(struct pt_regs *regs)
{
- struct cpu_hw_events *cpuc;
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ bool late_ack = hybrid_bit(cpuc->pmu, late_ack);
+ bool mid_ack = hybrid_bit(cpuc->pmu, mid_ack);
int loops;
u64 status;
int handled;
int pmu_enabled;
- cpuc = this_cpu_ptr(&cpu_hw_events);
-
/*
* Save the PMU state.
* It needs to be restored when leaving the handler.
*/
pmu_enabled = cpuc->enabled;
/*
- * No known reason to not always do late ACK,
- * but just in case do it opt-in.
+ * In general, the early ACK is only applied for old platforms.
+ * For the big core starts from Haswell, the late ACK should be
+ * applied.
+ * For the small core after Tremont, we have to do the ACK right
+ * before re-enabling counters, which is in the middle of the
+ * NMI handler.
*/
- if (!x86_pmu.late_ack)
+ if (!late_ack && !mid_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
intel_bts_disable_local();
cpuc->enabled = 0;
@@ -2958,6 +2962,8 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
goto again;
done:
+ if (mid_ack)
+ apic_write(APIC_LVTPC, APIC_DM_NMI);
/* Only restore PMU state when it's active. See x86_pmu_disable(). */
cpuc->enabled = pmu_enabled;
if (pmu_enabled)
@@ -2969,7 +2975,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
* have been reset. This avoids spurious NMIs on
* Haswell CPUs.
*/
- if (x86_pmu.late_ack)
+ if (late_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
return handled;
}
@@ -6123,7 +6129,6 @@ __init int intel_pmu_init(void)
static_branch_enable(&perf_is_hybrid);
x86_pmu.num_hybrid_pmus = X86_HYBRID_NUM_PMUS;
- x86_pmu.late_ack = true;
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.pebs_block = true;
@@ -6161,6 +6166,7 @@ __init int intel_pmu_init(void)
pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX];
pmu->name = "cpu_core";
pmu->cpu_type = hybrid_big;
+ pmu->late_ack = true;
if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) {
pmu->num_counters = x86_pmu.num_counters + 2;
pmu->num_counters_fixed = x86_pmu.num_counters_fixed + 1;
@@ -6186,6 +6192,7 @@ __init int intel_pmu_init(void)
pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_ATOM_IDX];
pmu->name = "cpu_atom";
pmu->cpu_type = hybrid_small;
+ pmu->mid_ack = true;
pmu->num_counters = x86_pmu.num_counters;
pmu->num_counters_fixed = x86_pmu.num_counters_fixed;
pmu->max_pebs_events = x86_pmu.max_pebs_events;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index ad87cb3..eec7ce8 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -655,6 +655,10 @@ struct x86_hybrid_pmu {
struct event_constraint *event_constraints;
struct event_constraint *pebs_constraints;
struct extra_reg *extra_regs;
+
+ unsigned int late_ack :1,
+ mid_ack :1,
+ enabled_ack :1;
};
static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
@@ -685,6 +689,16 @@ extern struct static_key_false perf_is_hybrid;
__Fp; \
}))
+#define hybrid_bit(_pmu, _field) \
+({ \
+ bool __Fp = x86_pmu._field; \
+ \
+ if (is_hybrid() && (_pmu)) \
+ __Fp = hybrid_pmu(_pmu)->_field; \
+ \
+ __Fp; \
+})
+
enum hybrid_pmu_type {
hybrid_big = 0x40,
hybrid_small = 0x20,
@@ -754,6 +768,7 @@ struct x86_pmu {
/* PMI handler bits */
unsigned int late_ack :1,
+ mid_ack :1,
enabled_ack :1;
/*
* sysfs attrs
--
2.7.4
Hi all,
Here is a the v2 improving EAS estimation precision. The v1 and discussion
can be found at:
https://lore.kernel.org/lkml/20210625152603.25960-1-lukasz.luba@arm.com/
changes:
v2:
- dropped first two patches
- implemented better resolution for 64-bit only machines according to
Peter's comment and scale_load() example
- power value multiplied by 1000 not 10000
- added 'Fixes' tag, so it could be picked into stable trees easily
- extended patch descirpion with example corenr case scenario
Regards,
Lukasz
Lukasz Luba (1):
PM: EM: Increase energy calculation precision
include/linux/energy_model.h | 16 ++++++++++++++++
kernel/power/energy_model.c | 3 ++-
2 files changed, 18 insertions(+), 1 deletion(-)
--
2.17.1
From: Rander Wang <rander.wang(a)intel.com>
commit 33c8516841ea4fa12fdb8961711bf95095c607ee upstream
On TGL platform with max98373 codec the trigger start sequence is
fe first, then codec component and sdw link is the last. Recently
a delay was introduced in max98373 codec driver and this resulted
to the start of sdw stream transmission was delayed and the data
transmitted by fw can't be consumed by sdw controller, so xrun happened.
Adding delay in trigger function is a bad idea. This patch enable spk
pin in prepare function and disable it in hw_free to avoid xrun issue
caused by delay in trigger.
Fixes: 3a27875e91fb ("ASoC: max98373: Added 30ms turn on/off time delay")
BugLink: https://github.com/thesofproject/sof/issues/4066
Reviewed-by: Bard Liao <bard.liao(a)intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Signed-off-by: Rander Wang <rander.wang(a)intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Link: https://lore.kernel.org/r/20210625205042.65181-2-pierre-louis.bossart@linux…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
backport to stable/linux-5.13.y and stable/linux-5.12.y since upstream
commit does not apply directly due to a rename in 9c5046e4b3e7 which
creates a conflict.
There is no need to apply this patch to earlier versions since the
commit 3a27875e91fb is only present in V5.12+
sound/soc/intel/boards/sof_sdw_max98373.c | 81 +++++++++++++++--------
1 file changed, 53 insertions(+), 28 deletions(-)
diff --git a/sound/soc/intel/boards/sof_sdw_max98373.c b/sound/soc/intel/boards/sof_sdw_max98373.c
index cfdf970c5800..25daef910aee 100644
--- a/sound/soc/intel/boards/sof_sdw_max98373.c
+++ b/sound/soc/intel/boards/sof_sdw_max98373.c
@@ -55,43 +55,68 @@ static int spk_init(struct snd_soc_pcm_runtime *rtd)
return ret;
}
-static int max98373_sdw_trigger(struct snd_pcm_substream *substream, int cmd)
+static int mx8373_enable_spk_pin(struct snd_pcm_substream *substream, bool enable)
{
+ struct snd_soc_pcm_runtime *rtd = asoc_substream_to_rtd(substream);
+ struct snd_soc_dai *codec_dai;
+ struct snd_soc_dai *cpu_dai;
int ret;
+ int j;
- switch (cmd) {
- case SNDRV_PCM_TRIGGER_START:
- case SNDRV_PCM_TRIGGER_RESUME:
- case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
- /* enable max98373 first */
- ret = max98373_trigger(substream, cmd);
- if (ret < 0)
- break;
-
- ret = sdw_trigger(substream, cmd);
- break;
- case SNDRV_PCM_TRIGGER_STOP:
- case SNDRV_PCM_TRIGGER_SUSPEND:
- case SNDRV_PCM_TRIGGER_PAUSE_PUSH:
- ret = sdw_trigger(substream, cmd);
- if (ret < 0)
- break;
-
- ret = max98373_trigger(substream, cmd);
- break;
- default:
- ret = -EINVAL;
- break;
+ /* set spk pin by playback only */
+ if (substream->stream == SNDRV_PCM_STREAM_CAPTURE)
+ return 0;
+
+ cpu_dai = asoc_rtd_to_cpu(rtd, 0);
+ for_each_rtd_codec_dais(rtd, j, codec_dai) {
+ struct snd_soc_dapm_context *dapm =
+ snd_soc_component_get_dapm(cpu_dai->component);
+ char pin_name[16];
+
+ snprintf(pin_name, ARRAY_SIZE(pin_name), "%s Spk",
+ codec_dai->component->name_prefix);
+
+ if (enable)
+ ret = snd_soc_dapm_enable_pin(dapm, pin_name);
+ else
+ ret = snd_soc_dapm_disable_pin(dapm, pin_name);
+
+ if (!ret)
+ snd_soc_dapm_sync(dapm);
}
- return ret;
+ return 0;
+}
+
+static int mx8373_sdw_prepare(struct snd_pcm_substream *substream)
+{
+ int ret = 0;
+
+ /* according to soc_pcm_prepare dai link prepare is called first */
+ ret = sdw_prepare(substream);
+ if (ret < 0)
+ return ret;
+
+ return mx8373_enable_spk_pin(substream, true);
+}
+
+static int mx8373_sdw_hw_free(struct snd_pcm_substream *substream)
+{
+ int ret = 0;
+
+ /* according to soc_pcm_hw_free dai link free is called first */
+ ret = sdw_hw_free(substream);
+ if (ret < 0)
+ return ret;
+
+ return mx8373_enable_spk_pin(substream, false);
}
static const struct snd_soc_ops max_98373_sdw_ops = {
.startup = sdw_startup,
- .prepare = sdw_prepare,
- .trigger = max98373_sdw_trigger,
- .hw_free = sdw_hw_free,
+ .prepare = mx8373_sdw_prepare,
+ .trigger = sdw_trigger,
+ .hw_free = mx8373_sdw_hw_free,
.shutdown = sdw_shutdown,
};
--
2.25.1
On 8/2/2021 7:43 AM, ci_notify(a)linaro.org wrote:
> Successfully identified regression in *linux* in CI configuration tcwg_kernel/llvm-release-arm-stable-allyesconfig. So far, this commit has regressed CI configurations:
> - tcwg_kernel/llvm-release-arm-stable-allyesconfig
>
> Culprit:
> <cut>
> commit 341db343768bc44f3512facc464021730d64071c
> Author: Linus Walleij <linus.walleij(a)linaro.org>
> Date: Sun May 23 00:50:39 2021 +0200
>
> power: supply: ab8500: Move to componentized binding
>
> [ Upstream commit 1c1f13a006ed0d71bb5664c8b7e3e77a28da3beb ]
>
> The driver has problems with the different components of
> the charging code racing with each other to probe().
>
> This results in all four subdrivers populating battery
> information to ascertain that it is populated for their
> own needs for example.
>
> Fix this by using component probing and thus expressing
> to the kernel that these are dependent components.
> The probes can happen in any order and will only acquire
> resources such as state container, regulators and
> interrupts and initialize the data structures, but no
> execution happens until the .bind() callback is called.
>
> The charging driver is the main component and binds
> first, then bind in order the three subcomponents:
> ab8500-fg, ab8500-btemp and ab8500-chargalg.
>
> Do some housekeeping while we are moving the code around.
> Like use devm_* for IRQs so as to cut down on some
> boilerplate.
>
> Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
> Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
> </cut>
>
> Results regressed to (for first_bad == 341db343768bc44f3512facc464021730d64071c)
> # reset_artifacts:
> -10
> # build_abe binutils:
> -9
> # build_llvm:
> -5
> # build_abe qemu:
> -2
> # linux_n_obj:
> 19634
> # First few build errors in logs:
> # 00:03:07 drivers/power/supply/ab8500_fg.c:3061:32: error: use of undeclared identifier 'np'
> # 00:03:08 make[3]: *** [drivers/power/supply/ab8500_fg.o] Error 1
> # 00:03:10 make[2]: *** [drivers/power/supply] Error 2
> # 00:03:10 make[1]: *** [drivers/power] Error 2
> # 00:04:05 make: *** [drivers] Error 2
Greg and Sasha,
Please cherry pick upstream commit 7e2bb83c617f ("power: supply: ab8500:
Call battery population once") to resolve this build error on 5.13.
Cheers,
Nathan
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 240246f6b913b0c23733cfd2def1d283f8cc9bbe Mon Sep 17 00:00:00 2001
From: Goldwyn Rodrigues <rgoldwyn(a)suse.de>
Date: Fri, 9 Jul 2021 11:29:22 -0500
Subject: [PATCH] btrfs: mark compressed range uptodate only if all bio succeed
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 9a023ae0f98b..30d82cdf128c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -352,7 +352,7 @@ static void end_compressed_bio_write(struct bio *bio)
btrfs_record_physical_zoned(inode, cb->start, bio);
btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL,
cb->start, cb->start + cb->len - 1,
- bio->bi_status == BLK_STS_OK);
+ !cb->errors);
end_compressed_writeback(inode, cb);
/* note, our inode could be gone now */
The commit <1b6b26ae7053>("pipe: fix and clarify pipe write wakeup
logic") changed pipe write logic to wakeup readers only if the pipe
was empty at the time of write. However, there are libraries that relied
upon the older behavior for notification scheme similar to what's
described in [1]
One such library 'realm-core'[2] is used by numerous Android applications.
The library uses a similar notification mechanism as GNU Make but it
never drains the pipe until it is full. When Android moved to v5.10
kernel, all applications using this library stopped working.
The C program at the end of this email mimics the library code.
The program works with 5.4 kernel. It fails with v5.10 and I am fairly
certain it will fail wiht v5.5 as well. The single patch in this series
restores the old behavior. With the patch, the test and all affected
Android applications start working with v5.10
After reading through epoll(7), I think the pipe should be drained after
each epoll_wait() comes back. Also, that a non-empty pipe is
considered to be "ready" for readers. The problem is that prior
to the commit above, any new data written to non-empty pipes
would wakeup threads waiting in epoll(EPOLLIN|EPILLET) and thats
how this library worked.
I do think the program below is using EPOLLET wrong. However, it
used to work before and now it doesn't. So, I thought it is
worth asking if this counts as userspace break.
There was also a symmetrical change made to pipe_read in commit
<f467a6a66419> ("pipe: fix and clarify pipe read wakeup logic")
that I am not sure needs changing.
The library has since been fixed[3] but it will be a while
before all applications incorporate the updated library.
- ssp
1. https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0…
2. https://github.com/realm/realm-core
3. https://github.com/realm/realm-core/issues/4666
====
#include <stdio.h>
#include <error.h>
#include <errno.h>
#include <fcntl.h>
#include <pthread.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <sys/epoll.h>
#include <sys/stat.h>
#include <sys/types.h>
#define FIFO_NAME "epoll-test-fifo"
pthread_t tid;
int max_delay_ms = 20;
int fifo_fd;
unsigned char written;
unsigned char received;
int epoll_fd;
void *wait_on_fifo(void *unused)
{
while (1) {
struct epoll_event ev;
int ret;
unsigned char c;
ret = epoll_wait(epoll_fd, &ev, 1, 5000);
if (ret == -1) {
/* If interrupted syscall, continue .. */
if (errno == EINTR)
continue;
/* epoll_wait failed, bail.. */
error(99, errno, "epoll_wait failed \n");
}
/* timeout */
if (ret == 0)
break;
if (ev.data.fd == fifo_fd) {
/* Assume this is notification where the thread is catching up.
* pipe is emptied by the writer when it detects it is full.
*/
received = written;
}
}
return NULL;
}
int write_fifo(int fd, unsigned char c)
{
while (1) {
int actual;
char buf[1024];
ssize_t ret = write(fd, &c, 1);
if (ret == 1)
break;
/*
* If the pipe's buffer is full, we need to read some of the old data in
* it to make space. We dont read in the code waiting for
* notifications so that we can notify multiple waiters with a single
* write.
*/
if (ret != 0) {
if (errno != EAGAIN)
return -EIO;
}
actual = read(fd, buf, 1024);
if (actual == 0)
return -errno;
}
return 0;
}
int create_and_setup_fifo()
{
int ret;
char fifo_path[4096];
struct epoll_event ev;
char *tmpdir = getenv("TMPDIR");
if (tmpdir == NULL)
tmpdir = ".";
ret = sprintf(fifo_path, "%s/%s", tmpdir, FIFO_NAME);
if (access(fifo_path, F_OK) == 0)
unlink(fifo_path);
ret = mkfifo(fifo_path, 0600);
if (ret < 0)
error(1, errno, "Failed to create fifo");
fifo_fd = open(fifo_path, O_RDWR | O_NONBLOCK);
if (fifo_fd < 0)
error(2, errno, "Failed to open Fifo");
ev.events = EPOLLIN | EPOLLET;
ev.data.fd = fifo_fd;
ret = epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fifo_fd, &ev);
if (ret < 0)
error(4, errno, "Failed to add fifo to epoll instance");
return 0;
}
int main(int argc, char *argv[])
{
int ret, random;
unsigned char c = 1;
epoll_fd = epoll_create(1);
if (epoll_fd == -1)
error(3, errno, "Failed to create epoll instance");
ret = create_and_setup_fifo();
if (ret != 0)
error(45, EINVAL, "Failed to setup fifo");
ret = pthread_create(&tid, NULL, wait_on_fifo, NULL);
if (ret != 0)
error(2, errno, "Failed to create a thread");
srand(time(NULL));
/* Write 256 bytes to fifo one byte at a time with random delays upto 20ms */
do {
written = c;
ret = write_fifo(fifo_fd, c);
if (ret != 0)
error(55, errno, "Failed to notify fifo, write #%u", (unsigned int)c);
c++;
random = rand();
usleep((random % max_delay_ms) * 1000);
} while (written <= c); /* stop after c = 255 */
pthread_join(tid, NULL);
printf("Test: %s", written == received ? "PASS\n" : "FAIL");
if (written != received)
printf(": Written (%d) Received (%d)\n", written, received);
close(fifo_fd);
close(epoll_fd);
return 0;
}
====
Sandeep Patil (1):
fs: pipe: wakeup readers everytime new data written is to pipe
fs/pipe.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
--
2.32.0.554.ge1b32706d8-goog
The HYP rodata section is currently lumped together with the BSS,
which isn't exactly what is expected (it gets registered with
kmemleak, for example).
Move it away so that it is actually marked RO. As an added
benefit, it isn't registered with kmemleak anymore.
Fixes: 380e18ade4a5 ("KVM: arm64: Introduce a BSS section for use at Hyp")
Suggested-by: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org #5.13
---
arch/arm64/kernel/vmlinux.lds.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 709d2c433c5e..f6b1a88245db 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -181,6 +181,8 @@ SECTIONS
/* everything from this point to __init_begin will be marked RO NX */
RO_DATA(PAGE_SIZE)
+ HYPERVISOR_DATA_SECTIONS
+
idmap_pg_dir = .;
. += IDMAP_DIR_SIZE;
idmap_pg_end = .;
@@ -260,8 +262,6 @@ SECTIONS
_sdata = .;
RW_DATA(L1_CACHE_BYTES, PAGE_SIZE, THREAD_ALIGN)
- HYPERVISOR_DATA_SECTIONS
-
/*
* Data written with the MMU off but read with the MMU on requires
* cache lines to be invalidated, discarding up to a Cache Writeback
--
2.30.2
Booting a KVM host in protected mode with kmemleak quickly results
in a pretty bad crash, as kmemleak doesn't know that the HYP sections
have been taken away. This is specially true for the BSS section,
which is part of the kernel BSS section and registered at boot time
by kmemleak itself.
Unregister the HYP part of the BSS before making that section
HYP-private. The rest of the HYP-specific data is obtained via
the page allocator or lives in other sections, none of which is
subjected to kmemleak.
Fixes: 90134ac9cabb ("KVM: arm64: Protect the .hyp sections from the host")
Reviewed-by: Quentin Perret <qperret(a)google.com>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: stable(a)vger.kernel.org # 5.13
---
arch/arm64/kvm/arm.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..52242f32c4be 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -15,6 +15,7 @@
#include <linux/fs.h>
#include <linux/mman.h>
#include <linux/sched.h>
+#include <linux/kmemleak.h>
#include <linux/kvm.h>
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
@@ -1982,6 +1983,12 @@ static int finalize_hyp_mode(void)
if (ret)
return ret;
+ /*
+ * Exclude HYP BSS from kmemleak so that it doesn't get peeked
+ * at, which would end badly once the section is inaccessible.
+ * None of other sections should ever be introspected.
+ */
+ kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start);
ret = pkvm_mark_hyp_section(__hyp_bss);
if (ret)
return ret;
--
2.30.2
A recent change in LLVM causes module_{c,d}tor sections to appear when
CONFIG_K{A,C}SAN are enabled, which results in orphan section warnings
because these are not handled anywhere:
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_ctor) is being placed in '.text.asan.module_ctor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.asan.module_dtor) is being placed in '.text.asan.module_dtor'
ld.lld: warning: arch/x86/pci/built-in.a(legacy.o):(.text.tsan.module_ctor) is being placed in '.text.tsan.module_ctor'
Place them in the TEXT_TEXT section so that these technologies continue
to work with the newer compiler versions. All of the KASAN and KCSAN
KUnit tests continue to pass after this change.
Cc: stable(a)vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1432
Link: https://github.com/llvm/llvm-project/commit/7b789562244ee941b7bf2cefeb3fc08…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
include/asm-generic/vmlinux.lds.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 17325416e2de..3b79b1e76556 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -586,6 +586,7 @@
NOINSTR_TEXT \
*(.text..refcount) \
*(.ref.text) \
+ *(.text.asan .text.asan.*) \
TEXT_CFI_JT \
MEM_KEEP(init.text*) \
MEM_KEEP(exit.text*) \
base-commit: 4669e13cd67f8532be12815ed3d37e775a9bdc16
--
2.32.0.264.g75ae10bc75
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6be50f5d83adc9541de3d5be26e968182b5ac150 Mon Sep 17 00:00:00 2001
From: Stylon Wang <stylon.wang(a)amd.com>
Date: Wed, 21 Jul 2021 12:25:24 +0800
Subject: [PATCH] drm/amd/display: Fix ASSR regression on embedded panels
[Why]
Regression found in some embedded panels traces back to the earliest
upstreamed ASSR patch. The changed code flow are causing problems
with some panels.
[How]
- Change ASSR enabling code while preserving original code flow
as much as possible
- Simplify the code on guarding with internal display flag
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=213779
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1620
Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Stylon Wang <stylon.wang(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
index 12066f5a53fc..9fb8c46dc606 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
@@ -1820,8 +1820,7 @@ bool perform_link_training_with_retries(
*/
panel_mode = DP_PANEL_MODE_DEFAULT;
}
- } else
- panel_mode = DP_PANEL_MODE_DEFAULT;
+ }
}
#endif
@@ -4650,7 +4649,10 @@ enum dp_panel_mode dp_get_panel_mode(struct dc_link *link)
}
}
- if (link->dpcd_caps.panel_mode_edp) {
+ if (link->dpcd_caps.panel_mode_edp &&
+ (link->connector_signal == SIGNAL_TYPE_EDP ||
+ (link->connector_signal == SIGNAL_TYPE_DISPLAY_PORT &&
+ link->is_internal_display))) {
return DP_PANEL_MODE_EDP;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 240246f6b913b0c23733cfd2def1d283f8cc9bbe Mon Sep 17 00:00:00 2001
From: Goldwyn Rodrigues <rgoldwyn(a)suse.de>
Date: Fri, 9 Jul 2021 11:29:22 -0500
Subject: [PATCH] btrfs: mark compressed range uptodate only if all bio succeed
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 9a023ae0f98b..30d82cdf128c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -352,7 +352,7 @@ static void end_compressed_bio_write(struct bio *bio)
btrfs_record_physical_zoned(inode, cb->start, bio);
btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL,
cb->start, cb->start + cb->len - 1,
- bio->bi_status == BLK_STS_OK);
+ !cb->errors);
end_compressed_writeback(inode, cb);
/* note, our inode could be gone now */
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 240246f6b913b0c23733cfd2def1d283f8cc9bbe Mon Sep 17 00:00:00 2001
From: Goldwyn Rodrigues <rgoldwyn(a)suse.de>
Date: Fri, 9 Jul 2021 11:29:22 -0500
Subject: [PATCH] btrfs: mark compressed range uptodate only if all bio succeed
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 9a023ae0f98b..30d82cdf128c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -352,7 +352,7 @@ static void end_compressed_bio_write(struct bio *bio)
btrfs_record_physical_zoned(inode, cb->start, bio);
btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL,
cb->start, cb->start + cb->len - 1,
- bio->bi_status == BLK_STS_OK);
+ !cb->errors);
end_compressed_writeback(inode, cb);
/* note, our inode could be gone now */
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 240246f6b913b0c23733cfd2def1d283f8cc9bbe Mon Sep 17 00:00:00 2001
From: Goldwyn Rodrigues <rgoldwyn(a)suse.de>
Date: Fri, 9 Jul 2021 11:29:22 -0500
Subject: [PATCH] btrfs: mark compressed range uptodate only if all bio succeed
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Goldwyn Rodrigues <rgoldwyn(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 9a023ae0f98b..30d82cdf128c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -352,7 +352,7 @@ static void end_compressed_bio_write(struct bio *bio)
btrfs_record_physical_zoned(inode, cb->start, bio);
btrfs_writepage_endio_finish_ordered(BTRFS_I(inode), NULL,
cb->start, cb->start + cb->len - 1,
- bio->bi_status == BLK_STS_OK);
+ !cb->errors);
end_compressed_writeback(inode, cb);
/* note, our inode could be gone now */
Booting a KVM host in protected mode with kmemleak quickly results
in a pretty bad crash, as kmemleak doesn't know that the HYP sections
have been taken away.
Make the unregistration from kmemleak part of marking the sections
as HYP-private. The rest of the HYP-specific data is obtained via
the page allocator, which is not subjected to kmemleak.
Fixes: 90134ac9cabb ("KVM: arm64: Protect the .hyp sections from the host")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: Quentin Perret <qperret(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: stable(a)vger.kernel.org # 5.13
---
arch/arm64/kvm/arm.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index e9a2b8f27792..23f12e602878 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -15,6 +15,7 @@
#include <linux/fs.h>
#include <linux/mman.h>
#include <linux/sched.h>
+#include <linux/kmemleak.h>
#include <linux/kvm.h>
#include <linux/kvm_irqfd.h>
#include <linux/irqbypass.h>
@@ -1960,8 +1961,12 @@ static inline int pkvm_mark_hyp(phys_addr_t start, phys_addr_t end)
}
#define pkvm_mark_hyp_section(__section) \
+({ \
+ u64 sz = __section##_end - __section##_start; \
+ kmemleak_free_part(__section##_start, sz); \
pkvm_mark_hyp(__pa_symbol(__section##_start), \
- __pa_symbol(__section##_end))
+ __pa_symbol(__section##_end)); \
+})
static int finalize_hyp_mode(void)
{
--
2.30.2
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 476d98018f32e68e7c5d4e8456940cf2b6d66f10 Mon Sep 17 00:00:00 2001
From: John Fastabend <john.fastabend(a)gmail.com>
Date: Tue, 27 Jul 2021 09:04:59 -0700
Subject: [PATCH] bpf, sockmap: On cleanup we additionally need to remove
cached skb
Its possible if a socket is closed and the receive thread is under memory
pressure it may have cached a skb. We need to ensure these skbs are
free'd along with the normal ingress_skb queue.
Before 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
down and backlog processing both had sock_lock for the common case of
socket close or unhash. So it was not possible to have both running in
parrallel so all we would need is the kfree in those kernels.
But, latest kernels include the commit 799aa7f98d5e and this requires a
bit more work. Without the ingress_lock guarding reading/writing the
state->skb case its possible the tear down could run before the state
update causing it to leak memory or worse when the backlog reads the state
it could potentially run interleaved with the tear down and we might end up
free'ing the state->skb from tear down side but already have the reference
from backlog side. To resolve such races we wrap accesses in ingress_lock
on both sides serializing tear down and backlog case. In both cases this
only happens after an EAGAIN error case so having an extra lock in place
is likely fine. The normal path will skip the locks.
Note, we check state->skb before grabbing lock. This works because
we can only enqueue with the mutex we hold already. Avoiding a race
on adding state->skb after the check. And if tear down path is running
that is also fine if the tear down path then removes state->skb we
will simply set skb=NULL and the subsequent goto is skipped. This
slight complication avoids locking in normal case.
With this fix we no longer see this warning splat from tcp side on
socket close when we hit the above case with redirect to ingress self.
[224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
[224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
[224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G I 5.14.0-rc1alu+ #181
[224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
[224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
[224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
[224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
[224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
[224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
[224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
[224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
[224913.935974] FS: 00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
[224913.935981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
[224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[224913.936007] Call Trace:
[224913.936016] inet_csk_destroy_sock+0xba/0x1f0
[224913.936033] __tcp_close+0x620/0x790
[224913.936047] tcp_close+0x20/0x80
[224913.936056] inet_release+0x8f/0xf0
[224913.936070] __sock_release+0x72/0x120
[224913.936083] sock_close+0x14/0x20
Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down")
Signed-off-by: John Fastabend <john.fastabend(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.c…
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 28115ef742e8..036cdb33a94a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -590,23 +590,42 @@ static void sock_drop(struct sock *sk, struct sk_buff *skb)
kfree_skb(skb);
}
+static void sk_psock_skb_state(struct sk_psock *psock,
+ struct sk_psock_work_state *state,
+ struct sk_buff *skb,
+ int len, int off)
+{
+ spin_lock_bh(&psock->ingress_lock);
+ if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
+ state->skb = skb;
+ state->len = len;
+ state->off = off;
+ } else {
+ sock_drop(psock->sk, skb);
+ }
+ spin_unlock_bh(&psock->ingress_lock);
+}
+
static void sk_psock_backlog(struct work_struct *work)
{
struct sk_psock *psock = container_of(work, struct sk_psock, work);
struct sk_psock_work_state *state = &psock->work_state;
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
bool ingress;
u32 len, off;
int ret;
mutex_lock(&psock->work_mutex);
- if (state->skb) {
+ if (unlikely(state->skb)) {
+ spin_lock_bh(&psock->ingress_lock);
skb = state->skb;
len = state->len;
off = state->off;
state->skb = NULL;
- goto start;
+ spin_unlock_bh(&psock->ingress_lock);
}
+ if (skb)
+ goto start;
while ((skb = skb_dequeue(&psock->ingress_skb))) {
len = skb->len;
@@ -621,9 +640,8 @@ static void sk_psock_backlog(struct work_struct *work)
len, ingress);
if (ret <= 0) {
if (ret == -EAGAIN) {
- state->skb = skb;
- state->len = len;
- state->off = off;
+ sk_psock_skb_state(psock, state, skb,
+ len, off);
goto end;
}
/* Hard errors break pipe and stop xmit. */
@@ -722,6 +740,11 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
skb_bpf_redirect_clear(skb);
sock_drop(psock->sk, skb);
}
+ kfree_skb(psock->work_state.skb);
+ /* We null the skb here to ensure that calls to sk_psock_backlog
+ * do not pick up the free'd skb.
+ */
+ psock->work_state.skb = NULL;
__sk_psock_purge_ingress_msg(psock);
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 476d98018f32e68e7c5d4e8456940cf2b6d66f10 Mon Sep 17 00:00:00 2001
From: John Fastabend <john.fastabend(a)gmail.com>
Date: Tue, 27 Jul 2021 09:04:59 -0700
Subject: [PATCH] bpf, sockmap: On cleanup we additionally need to remove
cached skb
Its possible if a socket is closed and the receive thread is under memory
pressure it may have cached a skb. We need to ensure these skbs are
free'd along with the normal ingress_skb queue.
Before 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") tear
down and backlog processing both had sock_lock for the common case of
socket close or unhash. So it was not possible to have both running in
parrallel so all we would need is the kfree in those kernels.
But, latest kernels include the commit 799aa7f98d5e and this requires a
bit more work. Without the ingress_lock guarding reading/writing the
state->skb case its possible the tear down could run before the state
update causing it to leak memory or worse when the backlog reads the state
it could potentially run interleaved with the tear down and we might end up
free'ing the state->skb from tear down side but already have the reference
from backlog side. To resolve such races we wrap accesses in ingress_lock
on both sides serializing tear down and backlog case. In both cases this
only happens after an EAGAIN error case so having an extra lock in place
is likely fine. The normal path will skip the locks.
Note, we check state->skb before grabbing lock. This works because
we can only enqueue with the mutex we hold already. Avoiding a race
on adding state->skb after the check. And if tear down path is running
that is also fine if the tear down path then removes state->skb we
will simply set skb=NULL and the subsequent goto is skipped. This
slight complication avoids locking in normal case.
With this fix we no longer see this warning splat from tcp side on
socket close when we hit the above case with redirect to ingress self.
[224913.935822] WARNING: CPU: 3 PID: 32100 at net/core/stream.c:208 sk_stream_kill_queues+0x212/0x220
[224913.935841] Modules linked in: fuse overlay bpf_preload x86_pkg_temp_thermal intel_uncore wmi_bmof squashfs sch_fq_codel efivarfs ip_tables x_tables uas xhci_pci ixgbe mdio xfrm_algo xhci_hcd wmi
[224913.935897] CPU: 3 PID: 32100 Comm: fgs-bench Tainted: G I 5.14.0-rc1alu+ #181
[224913.935908] Hardware name: Dell Inc. Precision 5820 Tower/002KVM, BIOS 1.9.2 01/24/2019
[224913.935914] RIP: 0010:sk_stream_kill_queues+0x212/0x220
[224913.935923] Code: 8b 83 20 02 00 00 85 c0 75 20 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 df e8 2b 11 fe ff eb c3 0f 0b e9 7c ff ff ff 0f 0b eb ce <0f> 0b 5b 5d 41 5c 41 5d 41 5e 41 5f c3 90 0f 1f 44 00 00 41 57 41
[224913.935932] RSP: 0018:ffff88816271fd38 EFLAGS: 00010206
[224913.935941] RAX: 0000000000000ae8 RBX: ffff88815acd5240 RCX: dffffc0000000000
[224913.935948] RDX: 0000000000000003 RSI: 0000000000000ae8 RDI: ffff88815acd5460
[224913.935954] RBP: ffff88815acd5460 R08: ffffffff955c0ae8 R09: fffffbfff2e6f543
[224913.935961] R10: ffffffff9737aa17 R11: fffffbfff2e6f542 R12: ffff88815acd5390
[224913.935967] R13: ffff88815acd5480 R14: ffffffff98d0c080 R15: ffffffff96267500
[224913.935974] FS: 00007f86e6bd1700(0000) GS:ffff888451cc0000(0000) knlGS:0000000000000000
[224913.935981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[224913.935988] CR2: 000000c0008eb000 CR3: 00000001020e0005 CR4: 00000000003706e0
[224913.935994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[224913.936000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[224913.936007] Call Trace:
[224913.936016] inet_csk_destroy_sock+0xba/0x1f0
[224913.936033] __tcp_close+0x620/0x790
[224913.936047] tcp_close+0x20/0x80
[224913.936056] inet_release+0x8f/0xf0
[224913.936070] __sock_release+0x72/0x120
[224913.936083] sock_close+0x14/0x20
Fixes: a136678c0bdbb ("bpf: sk_msg, zap ingress queue on psock down")
Signed-off-by: John Fastabend <john.fastabend(a)gmail.com>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Martin KaFai Lau <kafai(a)fb.com>
Link: https://lore.kernel.org/bpf/20210727160500.1713554-3-john.fastabend@gmail.c…
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 28115ef742e8..036cdb33a94a 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -590,23 +590,42 @@ static void sock_drop(struct sock *sk, struct sk_buff *skb)
kfree_skb(skb);
}
+static void sk_psock_skb_state(struct sk_psock *psock,
+ struct sk_psock_work_state *state,
+ struct sk_buff *skb,
+ int len, int off)
+{
+ spin_lock_bh(&psock->ingress_lock);
+ if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
+ state->skb = skb;
+ state->len = len;
+ state->off = off;
+ } else {
+ sock_drop(psock->sk, skb);
+ }
+ spin_unlock_bh(&psock->ingress_lock);
+}
+
static void sk_psock_backlog(struct work_struct *work)
{
struct sk_psock *psock = container_of(work, struct sk_psock, work);
struct sk_psock_work_state *state = &psock->work_state;
- struct sk_buff *skb;
+ struct sk_buff *skb = NULL;
bool ingress;
u32 len, off;
int ret;
mutex_lock(&psock->work_mutex);
- if (state->skb) {
+ if (unlikely(state->skb)) {
+ spin_lock_bh(&psock->ingress_lock);
skb = state->skb;
len = state->len;
off = state->off;
state->skb = NULL;
- goto start;
+ spin_unlock_bh(&psock->ingress_lock);
}
+ if (skb)
+ goto start;
while ((skb = skb_dequeue(&psock->ingress_skb))) {
len = skb->len;
@@ -621,9 +640,8 @@ static void sk_psock_backlog(struct work_struct *work)
len, ingress);
if (ret <= 0) {
if (ret == -EAGAIN) {
- state->skb = skb;
- state->len = len;
- state->off = off;
+ sk_psock_skb_state(psock, state, skb,
+ len, off);
goto end;
}
/* Hard errors break pipe and stop xmit. */
@@ -722,6 +740,11 @@ static void __sk_psock_zap_ingress(struct sk_psock *psock)
skb_bpf_redirect_clear(skb);
sock_drop(psock->sk, skb);
}
+ kfree_skb(psock->work_state.skb);
+ /* We null the skb here to ensure that calls to sk_psock_backlog
+ * do not pick up the free'd skb.
+ */
+ psock->work_state.skb = NULL;
__sk_psock_purge_ingress_msg(psock);
}
The performance reporting driver added cpu hotplug
feature but it didn't add pmu migration call in cpu
offline function.
This can create an issue incase the current designated
cpu being used to collect fme pmu data got offline,
as based on current code we are not migrating fme pmu to
new target cpu. Because of that perf will still try to
fetch data from that offline cpu and hence we will not
get counter data.
Patch fixed this issue by adding pmu_migrate_context call
in fme_perf_offline_cpu function.
Fixes: 724142f8c42a ("fpga: dfl: fme: add performance reporting support")
Tested-by: Xu Yilun <yilun.xu(a)intel.com>
Acked-by: Wu Hao <hao.wu(a)intel.com>
Signed-off-by: Kajol Jain <kjain(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org
---
drivers/fpga/dfl-fme-perf.c | 2 ++
1 file changed, 2 insertions(+)
---
Changelog:
v2 -> v3:
- Added Acked-by tag
- Removed comment as suggested by Wu Hao
- Link to patch v2: https://lkml.org/lkml/2021/7/9/143
v1 -> v2:
- Add stable(a)vger.kernel.org in cc list
- Link to patch v1: https://lkml.org/lkml/2021/6/28/275
RFC -> PATCH v1
- Remove RFC tag
- Did nits changes on subject and commit message as suggested by Xu Yilun
- Added Tested-by tag
- Link to rfc patch: https://lkml.org/lkml/2021/6/28/112
---
diff --git a/drivers/fpga/dfl-fme-perf.c b/drivers/fpga/dfl-fme-perf.c
index 4299145ef347..587c82be12f7 100644
--- a/drivers/fpga/dfl-fme-perf.c
+++ b/drivers/fpga/dfl-fme-perf.c
@@ -953,6 +953,8 @@ static int fme_perf_offline_cpu(unsigned int cpu, struct hlist_node *node)
return 0;
priv->cpu = target;
+ perf_pmu_migrate_context(&priv->pmu, cpu, target);
+
return 0;
}
--
2.31.1
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b946dbcfa4df80ec81b442964e07ad37000cc059 Mon Sep 17 00:00:00 2001
From: Ronnie Sahlberg <lsahlber(a)redhat.com>
Date: Wed, 28 Jul 2021 16:38:29 +1000
Subject: [PATCH] cifs: add missing parsing of backupuid
We lost parsing of backupuid in the switch to new mount API.
Add it back.
Signed-off-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Reviewed-by: Shyam Prasad N <sprasad(a)microsoft.com>
Cc: <stable(a)vger.kernel.org> # v5.11+
Reported-by: Xiaoli Feng <xifeng(a)redhat.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/cifs/fs_context.c b/fs/cifs/fs_context.c
index 9a59d7ff9a11..eed59bc1d913 100644
--- a/fs/cifs/fs_context.c
+++ b/fs/cifs/fs_context.c
@@ -925,6 +925,13 @@ static int smb3_fs_context_parse_param(struct fs_context *fc,
ctx->cred_uid = uid;
ctx->cruid_specified = true;
break;
+ case Opt_backupuid:
+ uid = make_kuid(current_user_ns(), result.uint_32);
+ if (!uid_valid(uid))
+ goto cifs_parse_mount_err;
+ ctx->backupuid = uid;
+ ctx->backupuid_specified = true;
+ break;
case Opt_backupgid:
gid = make_kgid(current_user_ns(), result.uint_32);
if (!gid_valid(gid))
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 110aa25c3ce417a44e35990cf8ed22383277933a Mon Sep 17 00:00:00 2001
From: Jens Axboe <axboe(a)kernel.dk>
Date: Mon, 26 Jul 2021 10:42:56 -0600
Subject: [PATCH] io_uring: fix race in unified task_work running
We use a bit to manage if we need to add the shared task_work, but
a list + lock for the pending work. Before aborting a current run
of the task_work we check if the list is empty, but we do so without
grabbing the lock that protects it. This can lead to races where
we think we have nothing left to run, where in practice we could be
racing with a task adding new work to the list. If we do hit that
race condition, we could be left with work items that need processing,
but the shared task_work is not active.
Ensure that we grab the lock before checking if the list is empty,
so we know if it's safe to exit the run or not.
Link: https://lore.kernel.org/io-uring/c6bd5987-e9ae-cd02-49d0-1b3ac1ef65b1@tnonl…
Cc: stable(a)vger.kernel.org # 5.11+
Reported-by: Forza <forza(a)tnonline.net>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index c4d2b320cdd4..a4331deb0427 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1959,9 +1959,13 @@ static void tctx_task_work(struct callback_head *cb)
node = next;
}
if (wq_list_empty(&tctx->task_list)) {
+ spin_lock_irq(&tctx->task_lock);
clear_bit(0, &tctx->task_state);
- if (wq_list_empty(&tctx->task_list))
+ if (wq_list_empty(&tctx->task_list)) {
+ spin_unlock_irq(&tctx->task_lock);
break;
+ }
+ spin_unlock_irq(&tctx->task_lock);
/* another tctx_task_work() is enqueued, yield */
if (test_and_set_bit(0, &tctx->task_state))
break;
Hi -stable kernel team,
Commit 216f8e95aacc8e ("PCI: mvebu: Setup BAR0 in order to fix MSI")
fixes ath10k_pci devices on Armada 385 systems, and probably also others
using similar PCI hosts. Can you consider applying it to v5.4.x? The
commit should apply cleanly to current v5.4.
Technically this bug is not a regression. MSI only worked with vendor
provided U-Boot. So I'm not sure it is eligible for -stable. But still,
this bug bites users of latest OpenWrt release that is based on kernel
v5.4. Oli (on Cc) verified that this commit fixes the issue on OpenWrt
with kernel v5.4.
I put PCI/mvebu maintainers to Cc so they can provide their input.
Thanks,
baruch
--
~. .~ Tk Open Systems
=}------------------------------------------------ooO--U--Ooo------------{=
- baruch(a)tkos.co.il - tel: +972.52.368.4656, http://www.tkos.co.il -
From: Linus Torvalds <torvalds(a)linux-foundation.org>
commit 3a34b13a88caeb2800ab44a4918f230041b37dd9 upstream.
Since commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.
In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already. Doing
extraneous wakeups will only cause potential thundering herd problems.
However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it". Even if there was no edge in sight.
Quoting Sandeep Patil:
"The commit 1b6b26ae7053 ('pipe: fix and clarify pipe write wakeup
logic') changed pipe write logic to wakeup readers only if the pipe
was empty at the time of write. However, there are libraries that
relied upon the older behavior for notification scheme similar to
what's described in [1]
One such library 'realm-core'[2] is used by numerous Android
applications. The library uses a similar notification mechanism as GNU
Make but it never drains the pipe until it is full. When Android moved
to v5.10 kernel, all applications using this library stopped working.
The library has since been fixed[3] but it will be a while before all
applications incorporate the updated library"
Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong. Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.
So add the extraneous wakeup, to approximate the old behavior.
[ I say "approximate", because the exact old behavior was to do a wakeup
not for each write(), but for each pipe buffer chunk that was filled
in. The behavior introduced by this change is not that - this is just
"every write will cause a wakeup, whether necessary or not", which
seems to be sufficient for the broken library use. ]
It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.
See commit f467a6a66419 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".
Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0… [1]
Link: https://github.com/realm/realm-core [2]
Link: https://github.com/realm/realm-core/issues/4666 [3]
Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwX… [4]
Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/
Reported-by: Sandeep Patil <sspatil(a)android.com>
Cc: Michael Kerrisk <mtk.manpages(a)gmail.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
fs/pipe.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c
index bfd946a9ad01..9ef4231cce61 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -429,20 +429,20 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from)
#endif
/*
- * Only wake up if the pipe started out empty, since
- * otherwise there should be no readers waiting.
+ * Epoll nonsensically wants a wakeup whether the pipe
+ * was already empty or not.
*
* If it wasn't empty we try to merge new data into
* the last buffer.
*
* That naturally merges small writes, but it also
- * page-aligs the rest of the writes for large writes
+ * page-aligns the rest of the writes for large writes
* spanning multiple pages.
*/
head = pipe->head;
- was_empty = pipe_empty(head, pipe->tail);
+ was_empty = true;
chars = total_len & (PAGE_SIZE-1);
- if (chars && !was_empty) {
+ if (chars && !pipe_empty(head, pipe->tail)) {
unsigned int mask = pipe->ring_size - 1;
struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
int offset = buf->offset + buf->len;
--
2.30.2
The patch titled
Subject: mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
has been removed from the -mm tree. Its filename was
mm-memcg-fix-null-pointer-dereference-in-memcg_slab_free_hook.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Wang Hai <wanghai38(a)huawei.com>
Subject: mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
the following dump occurs.
BUG: kernel NULL pointer dereference, address: 0000000000000020
[...]
Oops: 0000 [#1] SMP
[...]
Workqueue: events kfree_rcu_work
RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
[...]
Call Trace:
kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
kfree_bulk include/linux/slab.h:413 [inline]
kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
process_one_work+0x207/0x530 kernel/workqueue.c:2276
worker_thread+0x320/0x610 kernel/workqueue.c:2422
kthread+0x13d/0x160 kernel/kthread.c:313
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.
Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.
Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
Signed-off-by: Wang Hai <wanghai38(a)huawei.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Reviewed-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slab.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/slab.h~mm-memcg-fix-null-pointer-dereference-in-memcg_slab_free_hook
+++ a/mm/slab.h
@@ -346,7 +346,7 @@ static inline void memcg_slab_free_hook(
continue;
page = virt_to_head_page(p[i]);
- objcgs = page_objcgs(page);
+ objcgs = page_objcgs_check(page);
if (!objcgs)
continue;
_
Patches currently in -mm which might be from wanghai38(a)huawei.com are
The patch titled
Subject: mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-blocking-rstat-function-called-from-atomic-cgroup1-thresholding-code.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
Dan Carpenter reports:
The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
29, 2021, leads to the following static checker warning:
kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
warn: sleeping in atomic context
mm/memcontrol.c
3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
3573 {
3574 unsigned long val;
3575
3576 if (mem_cgroup_is_root(memcg)) {
3577 cgroup_rstat_flush(memcg->css.cgroup);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is from static analysis and potentially a false positive. The
problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
can sleep.
3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
3579 memcg_page_state(memcg, NR_ANON_MAPPED);
3580 if (swap)
3581 val += memcg_page_state(memcg, MEMCG_SWAP);
3582 } else {
3583 if (!swap)
3584 val = page_counter_read(&memcg->memory);
3585 else
3586 val = page_counter_read(&memcg->memsw);
3587 }
3588 return val;
3589 }
__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
thresholding code is invoked during stat changes, and those contexts have
irqs disabled as well. If the lock breaking occurs inside the flush
function, it will result in a sleep from an atomic context.
Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Acked-by: Chris Down <chris(a)chrisdown.name>
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-blocking-rstat-function-called-from-atomic-cgroup1-thresholding-code
+++ a/mm/memcontrol.c
@@ -3574,7 +3574,8 @@ static unsigned long mem_cgroup_usage(st
unsigned long val;
if (mem_cgroup_is_root(memcg)) {
- cgroup_rstat_flush(memcg->css.cgroup);
+ /* mem_cgroup_threshold() calls here from irqsafe context */
+ cgroup_rstat_flush_irqsafe(memcg->css.cgroup);
val = memcg_page_state(memcg, NR_FILE_PAGES) +
memcg_page_state(memcg, NR_ANON_MAPPED);
if (swap)
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
mm-remove-irqsave-restore-locking-from-contexts-with-irqs-enabled.patch
fs-drop_caches-fix-skipping-over-shadow-cache-inodes.patch
fs-inode-count-invalidated-shadow-pages-in-pginodesteal.patch
vfs-keep-inodes-with-page-cache-off-the-inode-shrinker-lru.patch
The patch titled
Subject: ocfs2: issue zeroout to EOF blocks
has been removed from the -mm tree. Its filename was
ocfs2-issue-zeroout-to-eof-blocks.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Junxiao Bi <junxiao.bi(a)oracle.com>
Subject: ocfs2: issue zeroout to EOF blocks
For punch holes in EOF blocks, fallocate used buffer write to zero the EOF
blocks in last cluster. But since ->writepage will ignore EOF pages,
those zeros will not be flushed.
This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by
fallocate") will zero the EOF blocks when extend the file size, but it
isn't. The problem happened on those EOF pages, before writeback, those
pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
set, when writeback run by write_cache_pages(), DIRTY flag on the page was
cleared, but DIRTY flag on the buffer_head not.
When next write happened to those EOF pages, since buffer_head already had
DIRTY flag set, it would not mark page DIRTY again. That made writeback
ignore them forever. That will cause data corruption. Even directio
write can't work because it will fail when trying to drop pages caches
before direct io, as it found the buffer_head for those pages still had
DIRTY flag set, then it will fall back to buffer io mode.
To make a summary of the issue, as writeback ingores EOF pages, once any
EOF page is generated, any write to it will only go to the page cache, it
will never be flushed to disk even file size extends and that page is not
EOF page any more. The fix is to avoid zero EOF blocks with buffer write.
The following code snippet from qemu-img could trigger the corruption.
656 open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
...
660 fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
660 fallocate(11, 0, 2275868672, 327680) = 0
658 pwrite64(11, "
Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com
Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 99 +++++++++++++++++++++++++++-------------------
1 file changed, 60 insertions(+), 39 deletions(-)
--- a/fs/ocfs2/file.c~ocfs2-issue-zeroout-to-eof-blocks
+++ a/fs/ocfs2/file.c
@@ -1529,6 +1529,45 @@ static void ocfs2_truncate_cluster_pages
}
}
+/*
+ * zero out partial blocks of one cluster.
+ *
+ * start: file offset where zero starts, will be made upper block aligned.
+ * len: it will be trimmed to the end of current cluster if "start + len"
+ * is bigger than it.
+ */
+static int ocfs2_zeroout_partial_cluster(struct inode *inode,
+ u64 start, u64 len)
+{
+ int ret;
+ u64 start_block, end_block, nr_blocks;
+ u64 p_block, offset;
+ u32 cluster, p_cluster, nr_clusters;
+ struct super_block *sb = inode->i_sb;
+ u64 end = ocfs2_align_bytes_to_clusters(sb, start);
+
+ if (start + len < end)
+ end = start + len;
+
+ start_block = ocfs2_blocks_for_bytes(sb, start);
+ end_block = ocfs2_blocks_for_bytes(sb, end);
+ nr_blocks = end_block - start_block;
+ if (!nr_blocks)
+ return 0;
+
+ cluster = ocfs2_bytes_to_clusters(sb, start);
+ ret = ocfs2_get_clusters(inode, cluster, &p_cluster,
+ &nr_clusters, NULL);
+ if (ret)
+ return ret;
+ if (!p_cluster)
+ return 0;
+
+ offset = start_block - ocfs2_clusters_to_blocks(sb, cluster);
+ p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset;
+ return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS);
+}
+
static int ocfs2_zero_partial_clusters(struct inode *inode,
u64 start, u64 len)
{
@@ -1538,6 +1577,7 @@ static int ocfs2_zero_partial_clusters(s
struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
unsigned int csize = osb->s_clustersize;
handle_t *handle;
+ loff_t isize = i_size_read(inode);
/*
* The "start" and "end" values are NOT necessarily part of
@@ -1558,6 +1598,26 @@ static int ocfs2_zero_partial_clusters(s
if ((start & (csize - 1)) == 0 && (end & (csize - 1)) == 0)
goto out;
+ /* No page cache for EOF blocks, issue zero out to disk. */
+ if (end > isize) {
+ /*
+ * zeroout eof blocks in last cluster starting from
+ * "isize" even "start" > "isize" because it is
+ * complicated to zeroout just at "start" as "start"
+ * may be not aligned with block size, buffer write
+ * would be required to do that, but out of eof buffer
+ * write is not supported.
+ */
+ ret = ocfs2_zeroout_partial_cluster(inode, isize,
+ end - isize);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+ if (start >= isize)
+ goto out;
+ end = isize;
+ }
handle = ocfs2_start_trans(osb, OCFS2_INODE_UPDATE_CREDITS);
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);
@@ -1856,45 +1916,6 @@ out:
}
/*
- * zero out partial blocks of one cluster.
- *
- * start: file offset where zero starts, will be made upper block aligned.
- * len: it will be trimmed to the end of current cluster if "start + len"
- * is bigger than it.
- */
-static int ocfs2_zeroout_partial_cluster(struct inode *inode,
- u64 start, u64 len)
-{
- int ret;
- u64 start_block, end_block, nr_blocks;
- u64 p_block, offset;
- u32 cluster, p_cluster, nr_clusters;
- struct super_block *sb = inode->i_sb;
- u64 end = ocfs2_align_bytes_to_clusters(sb, start);
-
- if (start + len < end)
- end = start + len;
-
- start_block = ocfs2_blocks_for_bytes(sb, start);
- end_block = ocfs2_blocks_for_bytes(sb, end);
- nr_blocks = end_block - start_block;
- if (!nr_blocks)
- return 0;
-
- cluster = ocfs2_bytes_to_clusters(sb, start);
- ret = ocfs2_get_clusters(inode, cluster, &p_cluster,
- &nr_clusters, NULL);
- if (ret)
- return ret;
- if (!p_cluster)
- return 0;
-
- offset = start_block - ocfs2_clusters_to_blocks(sb, cluster);
- p_block = ocfs2_clusters_to_blocks(sb, p_cluster) + offset;
- return sb_issue_zeroout(sb, p_block, nr_blocks, GFP_NOFS);
-}
-
-/*
* Parts of this function taken from xfs_change_file_space()
*/
static int __ocfs2_change_file_space(struct file *file, struct inode *inode,
_
Patches currently in -mm which might be from junxiao.bi(a)oracle.com are
The patch titled
Subject: ocfs2: fix zero out valid data
has been removed from the -mm tree. Its filename was
ocfs2-fix-zero-out-valid-data.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Junxiao Bi <junxiao.bi(a)oracle.com>
Subject: ocfs2: fix zero out valid data
If append-dio feature is enabled, direct-io write and fallocate could run
in parallel to extend file size, fallocate used "orig_isize" to record
i_size before taking "ip_alloc_sem", when ocfs2_zeroout_partial_cluster()
zeroout EOF blocks, i_size maybe already extended by
ocfs2_dio_end_io_write(), that will cause valid data zeroed out.
Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com
Fixes: 6bba4471f0cc ("ocfs2: fix data corruption by fallocate")
Signed-off-by: Junxiao Bi <junxiao.bi(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ocfs2/file.c~ocfs2-fix-zero-out-valid-data
+++ a/fs/ocfs2/file.c
@@ -1935,7 +1935,6 @@ static int __ocfs2_change_file_space(str
goto out_inode_unlock;
}
- orig_isize = i_size_read(inode);
switch (sr->l_whence) {
case 0: /*SEEK_SET*/
break;
@@ -1943,7 +1942,7 @@ static int __ocfs2_change_file_space(str
sr->l_start += f_pos;
break;
case 2: /*SEEK_END*/
- sr->l_start += orig_isize;
+ sr->l_start += i_size_read(inode);
break;
default:
ret = -EINVAL;
@@ -1998,6 +1997,7 @@ static int __ocfs2_change_file_space(str
ret = -EINVAL;
}
+ orig_isize = i_size_read(inode);
/* zeroout eof blocks in the cluster. */
if (!ret && change_size && orig_isize < size) {
ret = ocfs2_zeroout_partial_cluster(inode, orig_isize,
_
Patches currently in -mm which might be from junxiao.bi(a)oracle.com are