Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of AMD Pstate Preferred Core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
AMD Pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it AMD Pstate Preferrred Core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
AMD Pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
AMD Pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When AMD Pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Meng Li (6):
ACPI: CPPC: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable AMD Pstate Preferred Core Supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update AMD Pstate Preferred Core ranking
dynamically
Documentation: amd-pstate: introduce AMD Pstate Preferred Core
Documentation: introduce AMD Pstate Preferrd Core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 55 ++++++
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 181 ++++++++++++++++--
drivers/cpufreq/cpufreq.c | 13 ++
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 1 +
include/linux/cpufreq.h | 4 +
9 files changed, 267 insertions(+), 16 deletions(-)
--
2.34.1
Submit the top-level headers also from the kunit test module notifier
initialization callback, so external tools that are parsing dmesg for
kunit test output are able to tell how many test suites should be expected
and whether to continue parsing after complete output from the first test
suite is collected.
Extend kunit module notifier initialization callback with a processing
path for only listing the tests provided by a module if the kunit action
parameter is set to "list", so external tools can obtain a list of test
cases to be executed in advance and can make a better job on assigning
kernel messages interleaved with kunit output to specific tests.
Use test filtering functions in kunit module notifier callback functions,
so external tools are able to execute individual test cases from kunit
test modules in order to still better isolate their potential impact on
kernel messages that appear interleaved with output from other tests.
v5: Fix new name of a structure moved to kunit namespace not updated in
executor_test functions (lkp(a)intel.com),
- refresh on tpp of attributes filtering fix.
v4: Use kunit_exec_run_tests() (Mauro, Rae), but prevent it from
emitting the headers when called on load of non-test modules,
- don't use a different list format, use kunit_exec_list_tests() (Rae),
- refresh on top of newly introduced attributes patches, handle newly
introduced kunit.action=list_attr case (Rae).
v3: Fix CONFIG_GLOB, required by filtering functions, not selected when
building as a module (lkp(a)intel.com).
v2: Fix new name of a structure moved to kunit namespace not updated
across all uses (lkp(a)intel.com).
Janusz Krzysztofik (3):
kunit: Report the count of test suites in a module
kunit: Make 'list' action available to kunit test modules
kunit: Allow kunit test modules to use test filtering
include/kunit/test.h | 21 +++++++
lib/kunit/Kconfig | 2 +-
lib/kunit/executor.c | 115 ++++++++++++++++++++++----------------
lib/kunit/executor_test.c | 36 ++++++++----
lib/kunit/test.c | 37 +++++++++++-
5 files changed, 149 insertions(+), 62 deletions(-)
base-commit: 1c9fd080dffe5e5ad763527fbc2aa3f6f8c653e9
--
2.41.0
Make sv48 the default address space for mmap as some applications
currently depend on this assumption. Users can now select a
desired address space using a non-zero hint address to mmap. Previously,
requesting the default address space from mmap by passing zero as the hint
address would result in using the largest address space possible. Some
applications depend on empty bits in the virtual address space, like Go and
Java, so this patch provides more flexibility for application developers.
-Charlie
---
v8:
- Fix RV32 and the RV32 compat mode of RV64
- Extract out addr and base from the mmap macros
v7:
- Changing RLIMIT_STACK inside of an executing program does not trigger
arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a
script before executing tests. RLIMIT_STACK of infinity forces bottomup
mmap allocation.
- Make arch_get_mmap_base macro more readible by extracting out the rnd
calculation.
- Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap
attempts to allocate address smaller than DEFAULT_MAP_WINDOW.
- Fix incorrect wording in documentation.
v6:
- Rebase onto the correct base
v5:
- Minor wording change in documentation
- Change some parenthesis in arch_get_mmap_ macros
- Added case for addr==0 in arch_get_mmap_ because without this, programs would
crash if RLIMIT_STACK was modified before executing the program. This was
tested using the libhugetlbfs tests.
v4:
- Split testcases/document patch into test cases, in-code documentation, and
formal documentation patches
- Modified the mmap_base macro to be more legible and better represent memory
layout
- Fixed documentation to better reflect the implmentation
- Renamed DEFAULT_VA_BITS to MMAP_VA_BITS
- Added additional test case for rlimit changes
---
Charlie Jenkins (4):
RISC-V: mm: Restrict address space for sv39,sv48,sv57
RISC-V: mm: Add tests for RISC-V mm
RISC-V: mm: Update pgtable comment documentation
RISC-V: mm: Document mmap changes
Documentation/riscv/vm-layout.rst | 22 +++++++
arch/riscv/include/asm/elf.h | 2 +-
arch/riscv/include/asm/pgtable.h | 28 ++++++--
arch/riscv/include/asm/processor.h | 52 +++++++++++++--
tools/testing/selftests/riscv/Makefile | 2 +-
tools/testing/selftests/riscv/mm/.gitignore | 2 +
tools/testing/selftests/riscv/mm/Makefile | 15 +++++
.../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++
.../riscv/mm/testcases/mmap_default.c | 35 ++++++++++
.../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++
.../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++
11 files changed, 257 insertions(+), 12 deletions(-)
create mode 100644 tools/testing/selftests/riscv/mm/.gitignore
create mode 100644 tools/testing/selftests/riscv/mm/Makefile
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c
create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h
create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh
--
2.41.0
Here is a series with some fixes and cleanups to resctrl selftests.
v5:
- Improve changelogs
- Close fd_lm only in cat_val()
- Improve unmount error handling
v4:
- Move resctrlfs (unconditional) umount after resctrl fs support check
v3:
- Don't include rewritten CAT test into this series!
- Tweak wildcard style in Makefile
- Fix many changelog typos, remove some wrong claims, and generally
improve them.
- Add fix to PARENT_EXIT() to unmount resctrl FS
- Add unmounting resctrl FS before starting any tests
- Add fix for buf leak
- Add fix for perf fd closing
- Split mount/remount/umount patches differently
- Use size_t and %zu for span
- Keep MBM print as MB, only internally use span in bytes
- Drop start_buf global from fill_buf
v2 (was sent with CAT test rewrite which is no longer included in v3):
- Rebased on top of next to solve the conflicts
- Added 2 patches related to resctrl FS mount/umount (fix + cleanup)
- Consistently use "alloc" in cache_alloc_size()
- CAT test error handling tweaked
- Remove a spurious newline change from the CAT patch
- Small improvements to changelogs
Ilpo Järvinen (19):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Don't leak buffer in fill_cache()
selftests/resctrl: Unmount resctrl FS if child fails to run benchmark
selftests/resctrl: Close perf value read fd on errors
selftests/resctrl: Unmount resctrl FS before starting the first test
selftests/resctrl: Move resctrl FS mount/umount to higher level
selftests/resctrl: Refactor remount_resctrl(bool mum_resctrlfs) to
mount_resctrl()
selftests/resctrl: Remove mum_resctrlfs from struct resctrl_val_param
selftests/resctrl: Convert span to size_t
selftests/resctrl: Express span internally in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Remove unnecessary startptr global from fill_buf
selftests/resctrl: Improve parameter consistency in fill_buf
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Move CAT/CMT test global vars to function they are
used in
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Remove test type checks from cat_val()
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 66 +++++++-------
tools/testing/selftests/resctrl/cat_test.c | 28 ++----
tools/testing/selftests/resctrl/cmt_test.c | 29 ++-----
tools/testing/selftests/resctrl/fill_buf.c | 87 +++++++------------
tools/testing/selftests/resctrl/mba_test.c | 9 +-
tools/testing/selftests/resctrl/mbm_test.c | 17 ++--
tools/testing/selftests/resctrl/resctrl.h | 17 ++--
.../testing/selftests/resctrl/resctrl_tests.c | 83 ++++++++++++------
tools/testing/selftests/resctrl/resctrl_val.c | 7 +-
tools/testing/selftests/resctrl/resctrlfs.c | 64 +++++++-------
11 files changed, 178 insertions(+), 231 deletions(-)
--
2.30.2
This is agains mm/mm-unstable, but everything except patch #6 and #7
should apply on current master. Especially patch #1 and #2 should go
upstream first, so we can let the other stuff mature a bit longer.
Handle the fallout of 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") where I accidentially missed that
follow_page() and smaps implicitly kept the FOLL_NUMA flag clear by not
setting it if FOLL_FORCE is absent, to not trigger faults on
PROT_NONE-mapped PTEs.
Patch #1 fixes the known issues by reintroducing FOLL_NUMA as
FOLL_HONOR_NUMA_FAULT and decoupling it from FOLL_FORCE.
Patch #2 is a cleanup that I think actually fixes some corner cases, so
I added a Fixes: tag.
Patch #3 makes KVM explicitly set FOLL_HONOR_NUMA_FAULT in the single
case where it is required, and documents the situation.
Patch #4 then stops implicitly setting FOLL_HONOR_NUMA_FAULT. But note that
for FOLL_WRITE we always implicitly honor NUMA hinting faults.
Patch #5 cleans up a comments.
Patch #6 improves the KVM functional tests such that patch #7 can
actually check for one of the known issues: KSM no longer working on
PROT_NONE mappings on x86-64 with CONFIG_NUMA_BALANCING.
v2 -> V3:
* "mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT"
-> Squash one comment removal
-> Adjust the KSM comment
* smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
-> Move follow_trans_huge_pmd() to mm/internal.h
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: liubo <liubo254(a)huawei.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
David Hildenbrand (7):
mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd()
kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()
mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT
pgtable: improve pte_protnone() comment
selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if
anything got merged
selftest/mm: ksm_functional_tests: Add PROT_NONE test
fs/proc/task_mmu.c | 3 +-
include/linux/huge_mm.h | 3 -
include/linux/mm.h | 21 +++-
include/linux/mm_types.h | 9 ++
include/linux/pgtable.h | 16 ++-
mm/gup.c | 23 +++-
mm/huge_memory.c | 3 +-
mm/internal.h | 7 ++
.../selftests/mm/ksm_functional_tests.c | 106 ++++++++++++++++--
virt/kvm/kvm_main.c | 13 ++-
10 files changed, 171 insertions(+), 33 deletions(-)
--
2.41.0
As reported and suggested by Willy, the inline __sysret() helper
introduces three types of conversions and increases the size:
(1) the "unsigned long" argument to __sysret() forces a sign extension
from all sys_* functions that used to return 'int'
(2) the comparison with the error range now has to be performed on a
'unsigned long' instead of an 'int'
(3) the return value from __sysret() is a 'long' (note, a signed long)
which then has to be turned back to an 'int' before being returned by the
caller to satisfy the caller's prototype.
To fix up this, firstly, let's use macro instead of inline function to
preserves the input type and avoids these useless conversions (1), (3).
Secondly, comparison to -MAX_ERRNO inflicts on all integer returns where
we could previously keep a simple sign comparison, let's use a new
is_signed_type() macro from include/linux/compiler.h to limit the
comparision to -MAX_ERRNO (2) only on demand and preserves a simple sign
comparision for most of the cases as before.
Thirdly, fix up the following warning by an explicit conversion and let
__sysret() be able to accept the (void *) type of argument and return
value with the same (void *) type:
sysroot/powerpc/include/sys.h: In function 'sbrk':
sysroot/powerpc/include/sys.h:104:16: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
104 | return (void *)__sysret(-ENOMEM);
Fourthly, to further workaround the argument type with 'const', must use
__auto_type for a new enough gcc versions and use 'long' for the old gcc
versions as before.
Here reports the size testing result with nolibc-test:
before:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27916 8 80 28004 6d64 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23276 64 64 23404 5b6c nolibc-test
after:
// ppc64le
$ size nolibc-test
text data bss dec hex filename
27736 8 80 27824 6cb0 nolibc-test
// mips
$ size nolibc-test
text data bss dec hex filename
23036 64 64 23164 5a7c nolibc-test
Suggested-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/lkml/20230806095846.GB10627@1wt.eu/
Link: https://lore.kernel.org/lkml/20230806134348.GA19145@1wt.eu/
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
Hi, Willy
To increase readability, v3 further defines a
__GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT macro for gcc >= 11.0
(ABI_VERSION >= 1016) who has __auto_type with 'const' support.
When this macro is defined, provides a __sysret version with
__auto_type, otherwise, use a fixed 'long' type as a fallback.
Tested for all of the nolibc supported architectures with Arnd's
13.2.0 toolchains. and also for x86_64 with gcc-4.8 and gcc-9, no
compile failures, no compile warnings, no running failures.
Changes from v2 --> v3:
* define a __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT for gcc >= 11.0 (ABI_VERSION >= 1016)
* split __sysret() to two versions by the macro instead of a mixed unified and unreadable version
* use shorter __ret instead of __sysret_arg
Changes from v1 --> v2:
* fix up argument with 'const' in the type
* support "void *" argument
v2: https://lore.kernel.org/lkml/95fe3e732f455fab653fe1427118d905e4d04257.16913…
v1: https://lore.kernel.org/lkml/20230806131921.52453-1-falcon@tinylab.org/
---
tools/include/nolibc/sys.h | 66 +++++++++++++++++++++++++++++++-------
1 file changed, 55 insertions(+), 11 deletions(-)
diff --git a/tools/include/nolibc/sys.h b/tools/include/nolibc/sys.h
index 56f63eb48a1b..b137f7771db9 100644
--- a/tools/include/nolibc/sys.h
+++ b/tools/include/nolibc/sys.h
@@ -35,15 +35,59 @@
* (src/internal/syscall_ret.c) and glibc (sysdeps/unix/sysv/linux/sysdep.h)
*/
-static __inline__ __attribute__((unused, always_inline))
-long __sysret(unsigned long ret)
-{
- if (ret >= (unsigned long)-MAX_ERRNO) {
- SET_ERRNO(-(long)ret);
- return -1;
- }
- return ret;
-}
+/*
+ * Whether 'type' is a signed type or an unsigned type. Supports scalar types,
+ * bool and also pointer types. (from include/linux/compiler.h)
+ */
+#define __is_signed_type(type) (((type)(-1)) < (type)1)
+
+/* __auto_type is used instead of __typeof__ to workaround the build error
+ * 'error: assignment of read-only variable' when the argument has 'const' in
+ * the type, but __auto_type is a new feature from newer gcc version and it
+ * only works with 'const' from gcc 11.0 (__GXX_ABI_VERSION = 1016)
+ * https://gcc.gnu.org/legacy-ml/gcc-patches/2013-11/msg01378.html
+ */
+
+#if __GXX_ABI_VERSION >= 1016
+#define __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT
+#endif
+
+#ifdef __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT
+#define __sysret(arg) \
+({ \
+ __auto_type __ret = (arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if (__ret < 0) { \
+ SET_ERRNO(-(long)__ret); \
+ __ret = (__typeof__(arg))(-1L); \
+ } \
+ } else { \
+ if ((unsigned long)__ret >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-(long)__ret); \
+ __ret = (__typeof__(arg))(-1L); \
+ } \
+ } \
+ __ret; \
+})
+
+#else /* ! __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT */
+#define __sysret(arg) \
+({ \
+ long __ret = (long)(arg); \
+ if (__is_signed_type(__typeof__(arg))) { \
+ if (__ret < 0) { \
+ SET_ERRNO(-__ret); \
+ __ret = -1L; \
+ } \
+ } else { \
+ if ((unsigned long)__ret >= (unsigned long)-MAX_ERRNO) { \
+ SET_ERRNO(-__ret); \
+ __ret = -1L; \
+ } \
+ } \
+ (__typeof__(arg))__ret; \
+})
+#endif /* ! __GXX_HAS_AUTO_TYPE_WITH_CONST_SUPPORT */
/* Functions in this file only describe syscalls. They're declared static so
* that the compiler usually decides to inline them while still being allowed
@@ -94,7 +138,7 @@ void *sbrk(intptr_t inc)
if (ret && sys_brk(ret + inc) == ret + inc)
return ret + inc;
- return (void *)__sysret(-ENOMEM);
+ return __sysret((void *)-ENOMEM);
}
@@ -682,7 +726,7 @@ void *sys_mmap(void *addr, size_t length, int prot, int flags, int fd,
static __attribute__((unused))
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset)
{
- return (void *)__sysret((unsigned long)sys_mmap(addr, length, prot, flags, fd, offset));
+ return __sysret(sys_mmap(addr, length, prot, flags, fd, offset));
}
static __attribute__((unused))
--
2.25.1
Silence the following warnings reported by the new -Wall -Wextra options
with pure assembly code.
In file included from sysroot/powerpc/include/stdio.h:13,
from nolibc-test.c:13:
sysroot/powerpc/include/arch.h: In function '_start':
sysroot/powerpc/include/arch.h:192:32: warning: unused variable 'r2' [-Wunused-variable]
192 | register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start;
| ^~
sysroot/powerpc/include/arch.h:187:97: warning: optimization may eliminate reads and/or writes to register variables [-Wvolatile-register-var]
187 | void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
| ^~~~~~
Since only elfv2 ABI requires to save the TOC/GOT pointer to r2
register, when using elfv1 ABI, the old C code is simply ignored by the
compiler, but the compiler can not ignore the inline assembly code and
will introduce build failure or running segfaults. So, let's further
only add the new assembly code for elfv2 ABI with the checking of
_CALL_ELF == 2.
Link: https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf
Link: https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf
Signed-off-by: Zhangjin Wu <falcon(a)tinylab.org>
---
Hi, Willy
When rebase on latest 20230806-for-6.6-1 branch, -Wall -Wextra reported
the above warnings.
Here uses volatile inline assembly code instead of C code to silence the
unused and optimization warnings.
And since only elfv2 require to save TOC pointer to r2 register, this
further only add the assembly code for elfv2.
BR,
Zhangjin
---
tools/include/nolibc/arch-powerpc.h | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/tools/include/nolibc/arch-powerpc.h b/tools/include/nolibc/arch-powerpc.h
index 76c3784f9dc7..ac212e6185b2 100644
--- a/tools/include/nolibc/arch-powerpc.h
+++ b/tools/include/nolibc/arch-powerpc.h
@@ -187,9 +187,17 @@
void __attribute__((weak, noreturn, optimize("Os", "omit-frame-pointer"))) __no_stack_protector _start(void)
{
#ifdef __powerpc64__
- /* On 64-bit PowerPC, save TOC/GOT pointer to r2 */
- extern char TOC __asm__ (".TOC.");
- register volatile long r2 __asm__ ("r2") = (void *)&TOC - (void *)_start;
+#if _CALL_ELF == 2
+ /* with -mabi=elfv2, save TOC/GOT pointer to r2
+ * r12 is global entry pointer, we use it to compute TOC from r12
+ * https://www.llvm.org/devmtg/2014-04/PDFs/Talks/Euro-LLVM-2014-Weigand.pdf
+ * https://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.pdf
+ */
+ __asm__ volatile (
+ "addis 2, 12, .TOC. - _start@ha\n"
+ "addi 2, 2, .TOC. - _start@l\n"
+ );
+#endif /* _CALL_ELF == 2 */
__asm__ volatile (
"mr 3, 1\n" /* save stack pointer to r3, as arg1 of _start_c */
--
2.25.1