Hi,
in 5.10.94 these two xfrm changes cause userspace programs like Cilium to
suddenly fail (https://github.com/cilium/cilium/pull/18789):
- xfrm: interface with if_id 0 should return error
8dce43919566f06e865f7e8949f5c10d8c2493f5
- xfrm: state and policy should fail if XFRMA_IF_ID 0
68ac0f3810e76a853b5f7b90601a05c3048b8b54
I see that these changes are a reaction to
- xfrm: fix disable_xfrm sysctl when used on xfrm interfaces
9f8550e4bd9d
but even if the "wrong" usage caused weird behavior I still wonder if it
was the right decision to do the changes as part of a bugfix update for an
LTS kernel.
What do you think about reverting the changes at least for 5.10?
Regards,
Kai
This is the start of the stable review cycle for the 4.14.269 release.
There are 31 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 02 Mar 2022 17:20:16 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.269-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.269-rc1
Linus Torvalds <torvalds(a)linux-foundation.org>
fget: clarify and improve __fget_files() implementation
Miaohe Lin <linmiaohe(a)huawei.com>
memblock: use kfree() to release kmalloced memblock regions
Karol Herbst <kherbst(a)redhat.com>
Revert "drm/nouveau/pmu/gm200-: avoid touching PMU outside of DEVINIT/PREOS/ACR"
daniel.starke(a)siemens.com <daniel.starke(a)siemens.com>
tty: n_gsm: fix proper link termination after failed open
daniel.starke(a)siemens.com <daniel.starke(a)siemens.com>
tty: n_gsm: fix encoding of control signal octet bit DV
Hongyu Xie <xiehongyu1(a)kylinos.cn>
xhci: Prevent futile URB re-submissions due to incorrect return value.
Puma Hsu <pumahsu(a)google.com>
xhci: re-initialize the HC during resume if HCE was set
Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
usb: dwc3: gadget: Let the interrupt handler disable bottom halves.
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit LE910R1 compositions
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add support for DW5829e
Steven Rostedt (Google) <rostedt(a)goodmis.org>
tracefs: Set the group ownership in apply_options() not parse_options()
Szymon Heidrich <szymon.heidrich(a)gmail.com>
USB: gadget: validate endpoint index for xilinx udc
Daehwan Jung <dh10.jung(a)samsung.com>
usb: gadget: rndis: add spinlock for rndis response list
Dmytro Bagrii <dimich.dmb(a)gmail.com>
Revert "USB: serial: ch341: add new Product ID for CH341A"
Sergey Shtylyov <s.shtylyov(a)omp.ru>
ata: pata_hpt37x: disable primary channel on HPT371
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
iio: adc: men_z188_adc: Fix a resource leak in an error handling path
Bart Van Assche <bvanassche(a)acm.org>
RDMA/ib_srp: Fix a deadlock
ChenXiaoSong <chenxiaosong2(a)huawei.com>
configfs: fix a race in configfs_{,un}register_subsystem()
Gal Pressman <gal(a)nvidia.com>
net/mlx5e: Fix wrong return value on ioctl EEPROM query failure
Maxime Ripard <maxime(a)cerno.tech>
drm/edid: Always set RGB444
Paul Blakey <paulb(a)nvidia.com>
openvswitch: Fix setting ipv6 fields causing hw csum failure
Tao Liu <thomas.liu(a)ucloud.cn>
gso: do not skip outer ip header in case of ipip and net_failover
Eric Dumazet <edumazet(a)google.com>
net: __pskb_pull_tail() & pskb_carve_frag_list() drop_monitor friends
Xin Long <lucien.xin(a)gmail.com>
ping: remove pr_err from ping_lookup
Robert Hancock <robert.hancock(a)calian.com>
serial: 8250: of: Fix mapped region size when using reg-offset property
Oliver Neukum <oneukum(a)suse.com>
USB: zaurus: support another broken Zaurus
Oliver Neukum <oneukum(a)suse.com>
sr9700: sanity check for packet length
Helge Deller <deller(a)gmx.de>
parisc/unaligned: Fix ldw() and stw() unalignment handlers
Helge Deller <deller(a)gmx.de>
parisc/unaligned: Fix fldd and fstd unaligned handlers on 32-bit kernel
Stefano Garzarella <sgarzare(a)redhat.com>
vhost/vsock: don't check owner in vhost_vsock_stop() while releasing
Zhang Qiao <zhangqiao22(a)huawei.com>
cgroup/cpuset: Fix a race between cpuset_attach() and cpu hotplug
-------------
Diffstat:
Makefile | 4 +-
arch/parisc/kernel/unaligned.c | 14 ++---
drivers/ata/pata_hpt37x.c | 14 +++++
drivers/gpu/drm/drm_edid.c | 2 +-
drivers/gpu/drm/nouveau/nvkm/subdev/pmu/base.c | 37 +++++------
drivers/iio/adc/men_z188_adc.c | 9 ++-
drivers/infiniband/ulp/srp/ib_srp.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 2 +-
drivers/net/usb/cdc_ether.c | 12 ++++
drivers/net/usb/sr9700.c | 2 +-
drivers/net/usb/zaurus.c | 12 ++++
drivers/tty/n_gsm.c | 4 +-
drivers/tty/serial/8250/8250_of.c | 11 +++-
drivers/usb/dwc3/gadget.c | 2 +
drivers/usb/gadget/function/rndis.c | 8 +++
drivers/usb/gadget/function/rndis.h | 1 +
drivers/usb/gadget/udc/udc-xilinx.c | 6 ++
drivers/usb/host/xhci.c | 28 ++++++---
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 12 ++++
drivers/vhost/vsock.c | 21 ++++---
fs/configfs/dir.c | 14 +++++
fs/file.c | 73 +++++++++++++++++-----
fs/tracefs/inode.c | 5 +-
include/net/checksum.h | 5 ++
kernel/cgroup/cpuset.c | 2 +
mm/memblock.c | 10 ++-
net/core/skbuff.c | 4 +-
net/ipv4/af_inet.c | 5 +-
net/ipv4/ping.c | 1 -
net/ipv6/ip6_offload.c | 2 +
net/openvswitch/actions.c | 46 +++++++++++---
32 files changed, 287 insertions(+), 88 deletions(-)
Daniel Dao has reported [1] a regression on workloads that may trigger
a lot of refaults (anon and file). The underlying issue is that flushing
rstat is expensive. Although rstat flush are batched with (nr_cpus *
MEMCG_BATCH) stat updates, it seems like there are workloads which
genuinely do stat updates larger than batch value within short amount of
time. Since the rstat flush can happen in the performance critical
codepaths like page faults, such workload can suffer greatly.
The easiest fix for now is for performance critical codepaths trigger
the rstat flush asynchronously. This patch converts the refault codepath
to use async rstat flush. In addition, this patch has premptively
converted mem_cgroup_wb_stats and shrink_node to also use the async
rstat flush as they may also similar performance regressions.
Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndg… [1]
Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault")
Reported-by: Daniel Dao <dqminh(a)cloudflare.com>
Signed-off-by: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
include/linux/memcontrol.h | 1 +
mm/memcontrol.c | 10 +++++++++-
mm/vmscan.c | 2 +-
mm/workingset.c | 2 +-
4 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ef4b445392a9..bfdd48be60ff 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -998,6 +998,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec,
}
void mem_cgroup_flush_stats(void);
+void mem_cgroup_flush_stats_async(void);
void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c695608c521c..4338e8d779b2 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -690,6 +690,14 @@ void mem_cgroup_flush_stats(void)
__mem_cgroup_flush_stats();
}
+void mem_cgroup_flush_stats_async(void)
+{
+ if (atomic_read(&stats_flush_threshold) > num_online_cpus()) {
+ atomic_set(&stats_flush_threshold, 0);
+ mod_delayed_work(system_unbound_wq, &stats_flush_dwork, 0);
+ }
+}
+
static void flush_memcg_stats_dwork(struct work_struct *w)
{
__mem_cgroup_flush_stats();
@@ -4522,7 +4530,7 @@ void mem_cgroup_wb_stats(struct bdi_writeback *wb, unsigned long *pfilepages,
struct mem_cgroup *memcg = mem_cgroup_from_css(wb->memcg_css);
struct mem_cgroup *parent;
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_async();
*pdirty = memcg_page_state(memcg, NR_FILE_DIRTY);
*pwriteback = memcg_page_state(memcg, NR_WRITEBACK);
diff --git a/mm/vmscan.c b/mm/vmscan.c
index c6f77e3e6d59..b6c6b165c1ef 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3188,7 +3188,7 @@ static void shrink_node(pg_data_t *pgdat, struct scan_control *sc)
* Flush the memory cgroup stats, so that we read accurate per-memcg
* lruvec stats for heuristics.
*/
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_async();
memset(&sc->nr, 0, sizeof(sc->nr));
diff --git a/mm/workingset.c b/mm/workingset.c
index b717eae4e0dd..a4f2b1aa5bcc 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -355,7 +355,7 @@ void workingset_refault(struct folio *folio, void *shadow)
mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
- mem_cgroup_flush_stats();
+ mem_cgroup_flush_stats_async();
/*
* Compare the distance to the existing workingset size. We
* don't activate pages that couldn't stay resident even if
--
2.35.1.574.g5d30c73bfb-goog
From: Maxim Levitsky <mlevitsk(a)redhat.com>
[ Upstream commit 755c2bf878607dbddb1423df9abf16b82205896f ]
kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
can be set by the CPU after it is called, and don't cause the irr_pending
to be set to true.
Also logic in avic_kick_target_vcpu doesn't expect a race with this
function so to make it simple, just keep irr_pending set to true and
let the next interrupt injection to the guest clear it.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20220207155447.840194-9-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kvm/lapic.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 91c2dc9f198df..5f935e7a09566 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2306,7 +2306,12 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
apic->irr_pending = true;
apic->isr_count = 1;
} else {
- apic->irr_pending = (apic_search_irr(apic) != -1);
+ /*
+ * Don't clear irr_pending, searching the IRR can race with
+ * updates from the CPU as APICv is still active from hardware's
+ * perspective. The flag will be cleared as appropriate when
+ * KVM injects the interrupt.
+ */
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
}
--
2.34.1
From: Maxim Levitsky <mlevitsk(a)redhat.com>
[ Upstream commit 755c2bf878607dbddb1423df9abf16b82205896f ]
kvm_apic_update_apicv is called when AVIC is still active, thus IRR bits
can be set by the CPU after it is called, and don't cause the irr_pending
to be set to true.
Also logic in avic_kick_target_vcpu doesn't expect a race with this
function so to make it simple, just keep irr_pending set to true and
let the next interrupt injection to the guest clear it.
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Message-Id: <20220207155447.840194-9-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/x86/kvm/lapic.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e8e383fbe8868..bfac6d0933c39 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2306,7 +2306,12 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
apic->irr_pending = true;
apic->isr_count = 1;
} else {
- apic->irr_pending = (apic_search_irr(apic) != -1);
+ /*
+ * Don't clear irr_pending, searching the IRR can race with
+ * updates from the CPU as APICv is still active from hardware's
+ * perspective. The flag will be cleared as appropriate when
+ * KVM injects the interrupt.
+ */
apic->isr_count = count_vectors(apic->regs + APIC_ISR);
}
}
--
2.34.1
In https://lore.kernel.org/lkml/20211209150910.GA23668@axis.com/
Vincent's patch commented on, and worked around, a bug toggling
static_branch's, when a 2nd PRINTK-ish flag was added. The bug
results in a premature static_branch_disable when the 1st of 2 flags
was disabled.
The cited commit computed newflags, but then in the JUMP_LABEL block,
did not use that result, but used just one of the terms in it. Using
newflags instead made the code work properly.
This is Vincents test-case, reduced. It needs the 2nd flag to work
properly, but it's explanatory here.
pt_test() {
echo 5 > /sys/module/dynamic_debug/verbose
site="module tcp" # just one callsite
echo " $site =_ " > /proc/dynamic_debug/control # clear it
# A B ~A ~B
for flg in +T +p "-T #broke here" -p; do
echo " $site $flg " > /proc/dynamic_debug/control
done;
# A B ~B ~A
for flg in +T +p "-p #broke here" -T; do
echo " $site $flg " > /proc/dynamic_debug/control
done
}
pt_test
Fixes: 84da83a6ffc0 dyndbg: combine flags & mask into a struct, simplify with it
CC: vincent.whitchurch(a)axis.com
CC: stable(a)vger.kernel.org
Signed-off-by: Jim Cromie <jim.cromie(a)gmail.com>
---
lib/dynamic_debug.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index dd7f56af9aed..a56c1286ffa4 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -211,10 +211,11 @@ static int ddebug_change(const struct ddebug_query *query,
continue;
#ifdef CONFIG_JUMP_LABEL
if (dp->flags & _DPRINTK_FLAGS_PRINT) {
- if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
+ if (!(newflags & _DPRINTK_FLAGS_PRINT))
static_branch_disable(&dp->key.dd_key_true);
- } else if (modifiers->flags & _DPRINTK_FLAGS_PRINT)
+ } else if (newflags & _DPRINTK_FLAGS_PRINT) {
static_branch_enable(&dp->key.dd_key_true);
+ }
#endif
dp->flags = newflags;
v4pr_info("changed %s:%d [%s]%s =%s\n",
--
2.35.1
From: Paul Davey <paul.davey(a)alliedtelesis.co.nz>
On big endian architectures the mhi debugfs files which report pm state
give "Invalid State" for all states. This is caused by using
find_last_bit which takes an unsigned long* while the state is passed in
as an enum mhi_pm_state which will be of int size.
Fix by using __fls to pass the value of state instead of find_last_bit.
Also the current API expects "mhi_pm_state" enumerator as the function
argument but the function only works with bitmasks. So as Alex suggested,
let's change the argument to u32 to avoid confusion.
Fixes: a6e2e3522f29 ("bus: mhi: core: Add support for PM state transitions")
Signed-off-by: Paul Davey <paul.davey(a)alliedtelesis.co.nz>
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
Reviewed-by: Hemant Kumar <hemantk(a)codeaurora.org>
Reviewed-by: Alex Elder <elder(a)linaro.org>
Cc: stable(a)vger.kernel.org
[mani: changed the function argument to u32]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/bus/mhi/core/init.c | 10 ++++++----
drivers/bus/mhi/core/internal.h | 2 +-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/bus/mhi/core/init.c b/drivers/bus/mhi/core/init.c
index 046f407dc5d6..09394a1c29ec 100644
--- a/drivers/bus/mhi/core/init.c
+++ b/drivers/bus/mhi/core/init.c
@@ -77,12 +77,14 @@ static const char * const mhi_pm_state_str[] = {
[MHI_PM_STATE_LD_ERR_FATAL_DETECT] = "Linkdown or Error Fatal Detect",
};
-const char *to_mhi_pm_state_str(enum mhi_pm_state state)
+const char *to_mhi_pm_state_str(u32 state)
{
- unsigned long pm_state = state;
- int index = find_last_bit(&pm_state, 32);
+ int index;
- if (index >= ARRAY_SIZE(mhi_pm_state_str))
+ if (state)
+ index = __fls(state);
+
+ if (!state || index >= ARRAY_SIZE(mhi_pm_state_str))
return "Invalid State";
return mhi_pm_state_str[index];
diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h
index e2e10474a9d9..3508cbbf555d 100644
--- a/drivers/bus/mhi/core/internal.h
+++ b/drivers/bus/mhi/core/internal.h
@@ -622,7 +622,7 @@ void mhi_free_bhie_table(struct mhi_controller *mhi_cntrl,
enum mhi_pm_state __must_check mhi_tryset_pm_state(
struct mhi_controller *mhi_cntrl,
enum mhi_pm_state state);
-const char *to_mhi_pm_state_str(enum mhi_pm_state state);
+const char *to_mhi_pm_state_str(u32 state);
int mhi_queue_state_transition(struct mhi_controller *mhi_cntrl,
enum dev_st_transition state);
void mhi_pm_st_worker(struct work_struct *work);
--
2.25.1