From: Ard Biesheuvel <ardb(a)kernel.org>
When the host stage1 is configured for LPA2, the value currently being
programmed into TCR_EL2.T0SZ may be invalid unless LPA2 is configured
at HYP as well. This means kvm_lpa2_is_enabled() is not the right
condition to test when setting TCR_EL2.DS, as it will return false if
LPA2 is only available for stage 1 but not for stage 2.
Similary, programming TCR_EL2.PS based on a limited IPA range due to
lack of stage2 LPA2 support could potentially result in problems.
So use lpa2_is_enabled() instead, and set the PS field according to the
host's IPS, which is capped at 48 bits if LPA2 support is absent or
disabled. Whether or not we can make meaningful use of such a
configuration is a different question.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
arch/arm64/kvm/arm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a0d01c46e408..1d20d86bb9f5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -2005,8 +2005,7 @@ static int kvm_init_vector_slots(void)
static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
{
struct kvm_nvhe_init_params *params = per_cpu_ptr_nvhe_sym(kvm_init_params, cpu);
- u64 mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
- unsigned long tcr;
+ unsigned long tcr, ips;
/*
* Calculate the raw per-cpu offset without a translation from the
@@ -2020,6 +2019,7 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
params->mair_el2 = read_sysreg(mair_el1);
tcr = read_sysreg(tcr_el1);
+ ips = FIELD_GET(TCR_IPS_MASK, tcr);
if (cpus_have_final_cap(ARM64_KVM_HVHE)) {
tcr |= TCR_EPD1_MASK;
} else {
@@ -2029,8 +2029,8 @@ static void __init cpu_prepare_hyp_mode(int cpu, u32 hyp_va_bits)
tcr &= ~TCR_T0SZ_MASK;
tcr |= TCR_T0SZ(hyp_va_bits);
tcr &= ~TCR_EL2_PS_MASK;
- tcr |= FIELD_PREP(TCR_EL2_PS_MASK, kvm_get_parange(mmfr0));
- if (kvm_lpa2_is_enabled())
+ tcr |= FIELD_PREP(TCR_EL2_PS_MASK, ips);
+ if (lpa2_is_enabled())
tcr |= TCR_EL2_DS;
params->tcr_el2 = tcr;
--
2.47.0.277.g8800431eea-goog
The quilt patch titled
Subject: maple_tree: refine mas_store_root() on storing NULL
has been removed from the -mm tree. Its filename was
maple_tree-refine-mas_store_root-on-storing-null.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Wei Yang <richard.weiyang(a)gmail.com>
Subject: maple_tree: refine mas_store_root() on storing NULL
Date: Thu, 31 Oct 2024 23:16:26 +0000
Currently, when storing NULL on mas_store_root(), the behavior could be
improved.
Storing NULLs over the entire tree may result in a node being used to
store a single range. Further stores of NULL may cause the node and
tree to be corrupt and cause incorrect behaviour. Fixing the store to
the root null fixes the issue by ensuring that a range of 0 - ULONG_MAX
results in an empty tree.
Users of the tree may experience incorrect values returned if the tree
was expanded to store values, then overwritten by all NULLS, then
continued to store NULLs over the empty area.
For example possible cases are:
* store NULL at any range result a new node
* store NULL at range [m, n] where m > 0 to a single entry tree result
a new node with range [m, n] set to NULL
* store NULL at range [m, n] where m > 0 to an empty tree result
consecutive NULL slot
* it allows for multiple NULL entries by expanding root
to store NULLs to an empty tree
This patch tries to improve in:
* memory efficient by setting to empty tree instead of using a node
* remove the possibility of consecutive NULL slot which will prohibit
extended null in later operation
Link: https://lkml.kernel.org/r/20241031231627.14316-5-richard.weiyang@gmail.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Wei Yang <richard.weiyang(a)gmail.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
--- a/lib/maple_tree.c~maple_tree-refine-mas_store_root-on-storing-null
+++ a/lib/maple_tree.c
@@ -3447,9 +3447,20 @@ static inline void mas_root_expand(struc
return;
}
+/*
+ * mas_store_root() - Storing value into root.
+ * @mas: The maple state
+ * @entry: The entry to store.
+ *
+ * There is no root node now and we are storing a value into the root - this
+ * function either assigns the pointer or expands into a node.
+ */
static inline void mas_store_root(struct ma_state *mas, void *entry)
{
- if (likely((mas->last != 0) || (mas->index != 0)))
+ if (!entry) {
+ if (!mas->index)
+ rcu_assign_pointer(mas->tree->ma_root, NULL);
+ } else if (likely((mas->last != 0) || (mas->index != 0)))
mas_root_expand(mas, entry);
else if (((unsigned long) (entry) & 3) == 2)
mas_root_expand(mas, entry);
_
Patches currently in -mm which might be from richard.weiyang(a)gmail.com are
The patch titled
Subject: selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-hugetlb_dio-fixup-check-for-initial-conditions-to-skip-in-the-start.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Donet Tom <donettom(a)linux.ibm.com>
Subject: selftests: hugetlb_dio: fixup check for initial conditions to skip in the start
Date: Sun, 10 Nov 2024 00:49:03 -0600
This test verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping. To test this, we read the
count of free hugepages before and after the mmap, DIO, and munmap
operations, then check if the free hugepage count is the same.
Reading free hugepages before the test was removed by commit 0268d4579901
('selftests: hugetlb_dio: check for initial conditions to skip at the
start'), causing the test to always fail.
This patch adds back reading the free hugepages before starting the test.
With this patch, the tests are now passing.
Test results without this patch:
./tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 1 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 2 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 3 : Huge pages not freed!
# No. Free pages before allocation : 0
# No. Free pages after munmap : 100
not ok 4 : Huge pages not freed!
# Totals: pass:0 fail:4 xfail:0 xpass:0 skip:0 error:0
Test results with this patch:
/tools/testing/selftests/mm/hugetlb_dio
TAP version 13
1..4
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 1 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 2 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 3 : Huge pages freed successfully !
# No. Free pages before allocation : 100
# No. Free pages after munmap : 100
ok 4 : Huge pages freed successfully !
# Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Link: https://lkml.kernel.org/r/20241110064903.23626-1-donettom@linux.ibm.com
Fixes: 0268d4579901 ("selftests: hugetlb_dio: check for initial conditions to skip in the start")
Signed-off-by: Donet Tom <donettom(a)linux.ibm.com>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugetlb_dio.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/tools/testing/selftests/mm/hugetlb_dio.c~selftests-hugetlb_dio-fixup-check-for-initial-conditions-to-skip-in-the-start
+++ a/tools/testing/selftests/mm/hugetlb_dio.c
@@ -44,6 +44,13 @@ void run_dio_using_hugetlb(unsigned int
if (fd < 0)
ksft_exit_fail_perror("Error opening file\n");
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
/* Allocate a hugetlb page */
orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
if (orig_buffer == MAP_FAILED) {
_
Patches currently in -mm which might be from donettom(a)linux.ibm.com are
selftests-hugetlb_dio-fixup-check-for-initial-conditions-to-skip-in-the-start.patch
From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
Currently "timeout-sec" Device Tree property is being silently ignored:
even though watchdog_init_timeout() is being used, the driver always passes
"heartbeat" == DEFAULT_HEARTBEAT == 60 as argument.
Fix this by setting struct watchdog_device::timeout to DEFAULT_HEARTBEAT
and passing real module parameter value to watchdog_init_timeout() (which
may now be 0 if not specified).
Cc: stable(a)vger.kernel.org
Fixes: 2d63908bdbfb ("watchdog: Add K3 RTI watchdog support")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
---
drivers/watchdog/rti_wdt.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/watchdog/rti_wdt.c b/drivers/watchdog/rti_wdt.c
index f410b6e39fb6f..58c9445c0f885 100644
--- a/drivers/watchdog/rti_wdt.c
+++ b/drivers/watchdog/rti_wdt.c
@@ -61,7 +61,7 @@
#define MAX_HW_ERROR 250
-static int heartbeat = DEFAULT_HEARTBEAT;
+static int heartbeat;
/*
* struct to hold data for each WDT device
@@ -252,6 +252,7 @@ static int rti_wdt_probe(struct platform_device *pdev)
wdd->min_timeout = 1;
wdd->max_hw_heartbeat_ms = (WDT_PRELOAD_MAX << WDT_PRELOAD_SHIFT) /
wdt->freq * 1000;
+ wdd->timeout = DEFAULT_HEARTBEAT;
wdd->parent = dev;
watchdog_set_drvdata(wdd, wdt);
--
2.47.0
From: Yu Kuai <yukuai3(a)huawei.com>
Fix patch is patch 27, relied patches are from:
- patches from set [1] to add helpers to maple_tree, the last patch to
improve fork() performance is not backported;
- patches from set [2] to change maple_tree, and follow up fixes;
- patches from set [3] to convert offset_ctx from xarray to maple_tree;
Please notice that I'm not an expert in this area, and I'm afraid to
make manual changes. That's why patch 16 revert the commit that is
different from mainline and will cause conflict backporting new patches.
patch 28 pick the original mainline patch again.
(And this is what we did to fix the CVE in downstream kernels).
[1] https://lore.kernel.org/all/20231027033845.90608-1-zhangpeng.00@bytedance.c…
[2] https://lore.kernel.org/all/20231101171629.3612299-2-Liam.Howlett@oracle.co…
[3] https://lore.kernel.org/all/170820083431.6328.16233178852085891453.stgit@91…
Andrew Morton (1):
lib/maple_tree.c: fix build error due to hotfix alteration
Chuck Lever (5):
libfs: Re-arrange locking in offset_iterate_dir()
libfs: Define a minimum directory offset
libfs: Add simple_offset_empty()
maple_tree: Add mtree_alloc_cyclic()
libfs: Convert simple directory offsets to use a Maple Tree
Liam R. Howlett (12):
maple_tree: remove unnecessary default labels from switch statements
maple_tree: make mas_erase() more robust
maple_tree: move debug check to __mas_set_range()
maple_tree: add end of node tracking to the maple state
maple_tree: use cached node end in mas_next()
maple_tree: use cached node end in mas_destroy()
maple_tree: clean up inlines for some functions
maple_tree: separate ma_state node from status
maple_tree: remove mas_searchable()
maple_tree: use maple state end for write operations
maple_tree: don't find node end in mtree_lookup_walk()
maple_tree: mtree_range_walk() clean up
Lorenzo Stoakes (1):
maple_tree: correct tree corruption on spanning store
Peng Zhang (7):
maple_tree: add mt_free_one() and mt_attr() helpers
maple_tree: introduce {mtree,mas}_lock_nested()
maple_tree: introduce interfaces __mt_dup() and mtree_dup()
maple_tree: skip other tests when BENCH is enabled
maple_tree: preserve the tree attributes when destroying maple tree
maple_tree: add test for mtree_dup()
maple_tree: avoid checking other gaps after getting the largest gap
Yu Kuai (1):
Revert "maple_tree: correct tree corruption on spanning store"
yangerkun (1):
libfs: fix infinite directory reads for offset dir
fs/libfs.c | 129 ++-
include/linux/fs.h | 6 +-
include/linux/maple_tree.h | 356 +++---
include/linux/mm_types.h | 3 +-
lib/maple_tree.c | 1096 +++++++++++++------
lib/test_maple_tree.c | 218 ++--
mm/internal.h | 10 +-
mm/shmem.c | 4 +-
tools/include/linux/spinlock.h | 1 +
tools/testing/radix-tree/linux/maple_tree.h | 2 +-
tools/testing/radix-tree/maple.c | 390 ++++++-
11 files changed, 1564 insertions(+), 651 deletions(-)
--
2.39.2
Commit 60e3318e3e900 ("cifs: use fs_context for automounts") was
released in v6.1.54 and broke the failover when one of the servers
inside DFS becomes unavailable. We reproduced the problem on the EC2
instances of different types. Reverting aforementioned commint on top of
the latest stable verison v6.1.94 helps to resolve the problem.
Earliest working version is v6.2-rc1. There were two big merges of CIFS fixes:
[1] and [2]. We would like to ask for the help to investigate this problem and
if some of those patches need to be backported. Also, is it safe to just revert
problematic commit until proper fixes/backports will be available?
We will help to do testing and confirm if fix works, but let me also list the
steps we used to reproduce the problem if it will help to identify the problem:
1. Create Active Directory domain eg. 'corp.fsxtest.local' in AWS Directory
Service with:
- three AWS FSX file systems filesystem1..filesystem3
- three Windows servers; They have DFS installed as per
https://learn.microsoft.com/en-us/windows-server/storage/dfs-namespaces/dfs…:
- dfs-srv1: EC2AMAZ-2EGTM59
- dfs-srv2: EC2AMAZ-1N36PRD
- dfs-srv3: EC2AMAZ-0PAUH2U
2. Create DFS namespace eg. 'dfs-namespace' in Windows server 2008 mode
and three folders targets in it:
- referral-a mapped to filesystem1.corp.local
- referral-b mapped to filesystem2.corp.local
- referral-c mapped to filesystem3.corp.local
- local folders dfs-srv1..dfs-srv3 in C:\DFSRoots\dfs-namespace of every
Windows server. This helps to quickly define underlying server when
DFS is mounted.
3. Enabled cifs debug logs:
```
echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
echo 7 > /proc/fs/cifs/cifsFYI
```
4. Mount DFS namespace on Amazon Linux 2023 instance running any vanilla
kernel v6.1.54+:
```
dmesg -c &>/dev/null
cd /mnt
mount -t cifs -o cred=/mnt/creds,echo_interval=5 \
//corp.fsxtest.local/dfs-namespace \
./dfs-namespace
```
5. List DFS root, it's also required to avoid recursive mounts that happen
during regular 'ls' run:
```
sh -c 'ls dfs-namespace'
dfs-srv2 referral-a referral-b
```
The DFS server is EC2AMAZ-1N36PRD, it's also listed in mount:
```
[root@ip-172-31-2-82 mnt]# mount | grep dfs
//corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
//EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
```
List files in first folder:
```
sh -c 'ls dfs-namespace/referral-a'
filea.txt.txt
```
6. Shutdown DFS server-2.
List DFS root again, server changed from dfs-srv2 to dfs-srv1 EC2AMAZ-2EGTM59:
```
sh -c 'ls dfs-namespace'
dfs-srv1 referral-a referral-b
```
7. Try to list files in another folder, this causes ls to fail with error:
```
sh -c 'ls dfs-namespace/referral-b'
ls: cannot access 'dfs-namespace/referral-b': No route to host```
Sometimes it's also 'Operation now in progress' error.
mount shows the same output:
```
//corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
//EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
```
I also attached kernel debug logs from this test.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Reported-by: Andrei Paniakin <apanyaki(a)amazon.com>
Bisected-by: Simba Bonga <simbarb(a)amazon.com>
---
#regzbot introduced: v6.1.54..v6.2-rc1
On x86 platform, kernel v5.10.228, perf-report command aborts due to "free():
invalid pointer" when perf-record command is run with taken branch stack
sampling enabled. This regression can be reproduced with the following steps:
- sudo perf record -b
- sudo perf report
The root cause is that bi[i].to.ms.maps does not always point to thread->maps,
which is a buffer dynamically allocated by maps_new(). Instead, it may point to
&machine->kmaps, while kmaps is not a pointer but a variable. The original
upstream commit c1149037f65b ("perf hist: Add missing puts to
hist__account_cycles") worked well because machine->kmaps had been refactored to
a pointer by the previous commit 1a97cee604dc ("perf maps: Use a pointer for
kmaps").
The memory leak issue, which the reverted patch intended to fix, has been solved
by commit cf96b8e45a9b ("perf session: Add missing evlist__delete when deleting
a session"). The root cause is that the evlist is not being deleted on exit in
perf-report, perf-script, and perf-data. Consequently, the reference count of
the thread increased by thread__get() in hist_entry__init() is not decremented
in hist_entry__delete(). As a result, thread->maps is not properly freed.
To this end,
- PATCH 1/2 reverts commit a83fc293acd5c5050a4828eced4a71d2b2fffdd3 to fix the
abort regression.
- PATCH 2/2 backports cf96b8e45a9b ("perf session: Add missing evlist__delete
when deleting a session") to fix memory leak issue.
Riccardo Mancini (1):
perf session: Add missing evlist__delete when deleting a session
Shuai Xue (1):
Revert "perf hist: Add missing puts to hist__account_cycles"
tools/perf/util/hist.c | 10 +++-------
tools/perf/util/session.c | 5 ++++-
2 files changed, 7 insertions(+), 8 deletions(-)
--
2.39.3