This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identified
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter without it
controlling object lifetimes and state changes.
There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.
The purpose of these counters is twofold: 1. clearly differentiate
atomic_t counters from atomic_t usages that guard object lifetimes,
hence prone to overflow and underflow errors. It allows tools that scan
for underflow and overflow on atomic_t usages to detect overflow and
underflows to scan just the cases that are prone to errors. 2. provides
non-atomic counters for cases where atomic isn't necessary.
Simple atomic and non-atomic counters api provides interfaces for simple
atomic and non-atomic counters that just count, and don't guard resource
lifetimes. Counters will wrap around to 0 when it overflows and should
not be used to guard resource lifetimes, device usage and open counts
that control state changes, and pm states.
Using counter_atomic to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.
This patch series introduces Simple atomic and non-atomic counters.
Counter atomic ops leverage atomic_t and provide a sub-set of atomic_t
ops.
In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:
1. Variable doesn't guard object lifetimes, manage state changes e.g:
device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.
Changes since RFC:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
the patches with the tags.
-- Addressed Kees's comments:
1. Non-atomic counters renamed to counter_simple32 and counter_simple64
to clearly indicate size.
2. Added warning for counter_simple* usage and it should be used only
when there is no need for atomicity.
3. Renamed counter_atomic to counter_atomic32 to clearly indicate size.
4. Renamed counter_atomic_long to counter_atomic64 and it now uses
atomic64_t ops and indicates size.
5. Test updated for the API renames.
6. Added helper functions for test results printing
7. Verified that the test module compiles in kunit env. and test
module can be loaded to run the test.
8. Updated Documentation to reflect the intent to make the API
restricted so it can never be used to guard object lifetimes
and state management. I left _return ops for now, inc_return
is necessary for now as per the discussion we had on this topic.
-- Updated driver patches with API name changes.
-- We discussed if binder counters can be non-atomic. For now I left
them the same as the RFC patch - using counter_atomic32
-- Unrelated to this patch series:
The patch series review uncovered improvements could be made to
test_async_driver_probe and vmw_vmci/vmci_guest. I will track
these for fixing later.
Shuah Khan (11):
counters: Introduce counter_simple* and counter_atomic* counters
selftests:lib:test_counters: add new test for counters
drivers/base: convert deferred_trigger_count and probe_count to
counter_atomic32
drivers/base/devcoredump: convert devcd_count to counter_atomic32
drivers/acpi: convert seqno counter_atomic32
drivers/acpi/apei: convert seqno counter_atomic32
drivers/android/binder: convert stats, transaction_log to
counter_atomic32
drivers/base/test/test_async_driver_probe: convert to use
counter_atomic32
drivers/char/ipmi: convert stats to use counter_atomic32
drivers/misc/vmw_vmci: convert num guest devices counter to
counter_atomic32
drivers/edac: convert pci counters to counter_atomic32
Documentation/core-api/counters.rst | 174 +++++++++
MAINTAINERS | 8 +
drivers/acpi/acpi_extlog.c | 5 +-
drivers/acpi/apei/ghes.c | 5 +-
drivers/android/binder.c | 41 +--
drivers/android/binder_internal.h | 3 +-
drivers/base/dd.c | 19 +-
drivers/base/devcoredump.c | 5 +-
drivers/base/test/test_async_driver_probe.c | 23 +-
drivers/char/ipmi/ipmi_msghandler.c | 9 +-
drivers/char/ipmi/ipmi_si_intf.c | 9 +-
drivers/edac/edac_pci.h | 5 +-
drivers/edac/edac_pci_sysfs.c | 28 +-
drivers/misc/vmw_vmci/vmci_guest.c | 9 +-
include/linux/counters.h | 350 +++++++++++++++++++
lib/Kconfig | 10 +
lib/Makefile | 1 +
lib/test_counters.c | 276 +++++++++++++++
tools/testing/selftests/lib/Makefile | 1 +
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/test_counters.sh | 5 +
21 files changed, 913 insertions(+), 74 deletions(-)
create mode 100644 Documentation/core-api/counters.rst
create mode 100644 include/linux/counters.h
create mode 100644 lib/test_counters.c
create mode 100755 tools/testing/selftests/lib/test_counters.sh
--
2.25.1
These patch series adds below kselftests to test the user-space support for the
ARMv8.5 Memory Tagging Extension present in arm64 tree [1]. This patch
series is based on Linux v5.9-rc3.
1) This test-case verifies that the memory allocated by kernel mmap interface
can support tagged memory access. It first checks the presence of tags at
address[56:59] and then proceeds with read and write. The pass criteria for
this test is that tag fault exception should not happen.
2) This test-case crosses the valid memory to the invalid memory. In this
memory area valid tags are not inserted so read and write should not pass. The
pass criteria for this test is that tag fault exception should happen for all
the illegal addresses. This test also verfies that PSTATE.TCO works properly.
3) This test-case verifies that the memory inherited by child process from
parent process should have same tags copied. The pass criteria for this test is
that tag fault exception should not happen.
4) This test checks different mmap flags with PROT_MTE memory protection.
5) This testcase checks that KSM should not merge pages containing different
MTE tag values. However, if the tags are same then the pages may merge. This
testcase uses the generic ksm sysfs interfaces to verify the MTE behaviour, so
this testcase is not fullproof and may be impacted due to other load in the system.
6) Fifth test verifies that syscalls read/write etc works by considering that
user pointer has valid/invalid allocation tags.
Changes since v1 [2]:
* Redefined MTE kernel header definitions to decouple kselftest compilations.
* Removed gmi masking instructions in mte_insert_random_tag assembly
function. This simplifies the tag inclusion mask test with only GCR
mask register used.
* Created a new mte_insert_random_tag function with gmi instruction.
This is useful for the 6th test which reuses the original tag.
* Now use /dev/shm/* to hold temporary files.
* Updated the 6th test to handle the error properly in case of failure
in accessing memory with invalid tag in kernel.
* Code and comment clean-ups.
Thanks,
Amit Daniel
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/mte
[2]: https://patchwork.kernel.org/patch/11747791/
Amit Daniel Kachhap (6):
kselftest/arm64: Add utilities and a test to validate mte memory
kselftest/arm64: Verify mte tag inclusion via prctl
kselftest/arm64: Check forked child mte memory accessibility
kselftest/arm64: Verify all different mmap MTE options
kselftest/arm64: Verify KSM page merge for MTE pages
kselftest/arm64: Check mte tagged user address in kernel
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/mte/.gitignore | 6 +
tools/testing/selftests/arm64/mte/Makefile | 29 ++
.../selftests/arm64/mte/check_buffer_fill.c | 475 ++++++++++++++++++
.../selftests/arm64/mte/check_child_memory.c | 195 +++++++
.../selftests/arm64/mte/check_ksm_options.c | 159 ++++++
.../selftests/arm64/mte/check_mmap_options.c | 262 ++++++++++
.../arm64/mte/check_tags_inclusion.c | 185 +++++++
.../selftests/arm64/mte/check_user_mem.c | 111 ++++
.../selftests/arm64/mte/mte_common_util.c | 341 +++++++++++++
.../selftests/arm64/mte/mte_common_util.h | 118 +++++
tools/testing/selftests/arm64/mte/mte_def.h | 60 +++
.../testing/selftests/arm64/mte/mte_helper.S | 128 +++++
13 files changed, 2070 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/arm64/mte/.gitignore
create mode 100644 tools/testing/selftests/arm64/mte/Makefile
create mode 100644 tools/testing/selftests/arm64/mte/check_buffer_fill.c
create mode 100644 tools/testing/selftests/arm64/mte/check_child_memory.c
create mode 100644 tools/testing/selftests/arm64/mte/check_ksm_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_mmap_options.c
create mode 100644 tools/testing/selftests/arm64/mte/check_tags_inclusion.c
create mode 100644 tools/testing/selftests/arm64/mte/check_user_mem.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.c
create mode 100644 tools/testing/selftests/arm64/mte/mte_common_util.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_def.h
create mode 100644 tools/testing/selftests/arm64/mte/mte_helper.S
--
2.17.1
This version 3 of the mremap speed up patches previously posted at:
v1 - https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
v2 - https://lore.kernel.org/r/20201002162101.665549-1-kaleshsingh@google.com
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Changes in v2:
- Reduce mremap_test time by only validating a configurable
threshold of the remapped region, as per John.
- Use a random pattern for mremap validation. Provide pattern
seed in test output, as per John.
- Moved set_pud_at() to separate patch, per Kirill.
- Use switch() instead of ifs in move_pgt_entry(), per Kirill.
- Update commit message with description of Android
garbage collector use case for HAVE_MOVE_PUD, as per Joel.
- Fix build test error reported by kernel test robot in [1].
Changes in v3:
- Make lines 80 cols or less where they don’t need to be longer,
per John.
- Removed unused PATTERN_SIZE in mremap_test
- Added Reviewed-by tag for patch 1/5 (mremap kselftest patch).
- Use switch() instead of ifs in get_extent(), per Kirill
- Add BUILD_BUG() is get_extent() default case.
- Move get_old_pud() and alloc_new_pud() out of
#ifdef CONFIG_HAVE_MOVE_PUD, per Kirill.
- Have get_old_pmd() and alloc_new_pmd() use get_old_pud() and
alloc_old_pud(), per Kirill.
- Replace #ifdef CONFIG_HAVE_MOVE_PMD / PUD in move_page_tables()
with IS_ENABLED(CONFIG_HAVE_MOVE_PMD / PUD), per Kirill.
- Fold Add set_pud_at() patch into patch 4/5, per Kirill.
[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4F…
Kalesh Singh (5):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 230 ++++++++++++---
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 344 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 558 insertions(+), 40 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
base-commit: 549738f15da0e5a00275977623be199fbbf7df50
--
2.28.0.806.g8561365e88-goog
v4: https://lkml.org/lkml/2020/9/2/356
v4-->v5
Based on comments from Artem Bityutskiy, evaluation of timer based
wakeup latencies may not be a fruitful measurement especially on the x86
platform which has the capability to pre-arm a CPU when a timer is set.
Hence, including only the IPI based tests for latency measurement to
acheive expected behaviour across platforms.
kernel module + bash selftest approach which presents lower deviations
and higher accuracy: https://lkml.org/lkml/2020/7/21/567
---
The patch series introduces a mechanism to measure wakeup latency for
IPI based interrupts.
The motivation behind this series is to find significant deviations
behind advertised latency values
To achieve this in the userspace, IPI latencies are calculated by
sending information through pipes and inducing a wakeup.
To account for delays from kernel-userspace interactions baseline
observations are taken on a 100% busy CPU and subsequent obervations
must be considered relative to that.
One downside of the userspace approach in contrast to the kernel
implementation is that the run to run variance can turn out to be high
in the order of ms; which is the scope of the experiments at times.
Another downside of the userspace approach is that it takes much longer
to run and hence a command-line option quick and full are added to make
sure quick 1 CPU tests can be carried out when needed and otherwise it
can carry out a full system comprehensive test.
Usage
---
./cpuidle --mode <full / quick / num_cpus> --output <output location>
full: runs on all CPUS
quick: run on a random CPU
num_cpus: Limit the number of CPUS to run on
Sample output snippet
---------------------
--IPI Latency Test---
SRC_CPU DEST_CPU IPI_Latency(ns)
...
0 5 256178
0 6 478161
0 7 285445
0 8 273553
Expected IPI latency(ns): 100000
Observed Average IPI latency(ns): 248334
Pratik Rajesh Sampat (1):
selftests/cpuidle: Add support for cpuidle latency measurement
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 7 +
tools/testing/selftests/cpuidle/cpuidle.c | 479 ++++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 1 +
4 files changed, 488 insertions(+)
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100644 tools/testing/selftests/cpuidle/cpuidle.c
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.26.2
Changes since v1:
* check_config.sh now invokes the compiler via the Makefile's ($CC),
thanks to Jason Gunthorpe for calling that out.
* Removed a misleading sentence from patch #6, as identified by Ira
Weiny.
* Removed a forward-looking sentence, about using -lpthread in
gup_test.c soon, from the commit message in patch #4, since I'm not yet
sure if my local pthread-based stress tests are actually worthwhile or
not.
Original cover letter, still accurate at this point:
This is based on the latest mmotm.
Summary: This series provides two main things, and a number of smaller
supporting goodies. The two main points are:
1) Add a new sub-test to gup_test, which in turn is a renamed version of
gup_benchmark. This sub-test allows nicer testing of dump_pages(), at
least on user-space pages.
For quite a while, I was doing a quick hack to gup_test.c whenever I
wanted to try out changes to dump_page(). Then Matthew Wilcox asked me
what I meant when I said "I used my dump_page() unit test", and I
realized that it might be nice to check in a polished up version of
that.
Details about how it works and how to use it are in the commit
description for patch #6.
2) Fixes a limitation of hmm-tests: these tests are incredibly useful,
but only if people actually build and run them. And it turns out that
libhugetlbfs is a little too effective at throwing a wrench in the
works, there. So I've added a little configuration check that removes
just two of the 21 hmm-tests, if libhugetlbfs is not available.
Further details in the commit description of patch #8.
Other smaller things that this series does:
a) Remove code duplication by creating gup_test.h.
b) Clear up the sub-test organization, and their invocation within
run_vmtests.sh.
c) Other minor assorted improvements.
John Hubbard (8):
mm/gup_benchmark: rename to mm/gup_test
selftests/vm: use a common gup_test.h
selftests/vm: rename run_vmtests --> run_vmtests.sh
selftests/vm: minor cleanup: Makefile and gup_test.c
selftests/vm: only some gup_test items are really benchmarks
selftests/vm: gup_test: introduce the dump_pages() sub-test
selftests/vm: run_vmtest.sh: update and clean up gup_test invocation
selftests/vm: hmm-tests: remove the libhugetlbfs dependency
Documentation/core-api/pin_user_pages.rst | 6 +-
arch/s390/configs/debug_defconfig | 2 +-
arch/s390/configs/defconfig | 2 +-
mm/Kconfig | 21 +-
mm/Makefile | 2 +-
mm/{gup_benchmark.c => gup_test.c} | 109 ++++++----
mm/gup_test.h | 32 +++
tools/testing/selftests/vm/.gitignore | 3 +-
tools/testing/selftests/vm/Makefile | 38 +++-
tools/testing/selftests/vm/check_config.sh | 31 +++
tools/testing/selftests/vm/config | 2 +-
tools/testing/selftests/vm/gup_benchmark.c | 137 -------------
tools/testing/selftests/vm/gup_test.c | 188 ++++++++++++++++++
tools/testing/selftests/vm/hmm-tests.c | 10 +-
.../vm/{run_vmtests => run_vmtest.sh} | 24 ++-
15 files changed, 404 insertions(+), 203 deletions(-)
rename mm/{gup_benchmark.c => gup_test.c} (59%)
create mode 100644 mm/gup_test.h
create mode 100755 tools/testing/selftests/vm/check_config.sh
delete mode 100644 tools/testing/selftests/vm/gup_benchmark.c
create mode 100644 tools/testing/selftests/vm/gup_test.c
rename tools/testing/selftests/vm/{run_vmtests => run_vmtest.sh} (91%)
--
2.28.0
This version 2 of the mremap speed up patches previously posted at:
https://lore.kernel.org/r/20200930222130.4175584-1-kaleshsingh@google.com
mremap time can be optimized by moving entries at the PMD/PUD level if
the source and destination addresses are PMD/PUD-aligned and
PMD/PUD-sized. Enable moving at the PMD and PUD levels on arm64 and
x86. Other architectures where this type of move is supported and known to
be safe can also opt-in to these optimizations by enabling HAVE_MOVE_PMD
and HAVE_MOVE_PUD.
Observed Performance Improvements for remapping a PUD-aligned 1GB-sized
region on x86 and arm64:
- HAVE_MOVE_PMD is already enabled on x86 : N/A
- Enabling HAVE_MOVE_PUD on x86 : ~13x speed up
- Enabling HAVE_MOVE_PMD on arm64 : ~ 8x speed up
- Enabling HAVE_MOVE_PUD on arm64 : ~19x speed up
Altogether, HAVE_MOVE_PMD and HAVE_MOVE_PUD
give a total of ~150x speed up on arm64.
Changes in v2:
- Reduce mremap_test time by only validating a configurable
threshold of the remapped region, as per John.
- Use a random pattern for mremap validation. Provide pattern
seed in test output, as per John.
- Moved set_pud_at() to separate patch, per Kirill.
- Use switch() instead of ifs in move_pgt_entry(), per Kirill.
- Update commit message with description of Android
garbage collector use case for HAVE_MOVE_PUD, as per Joel.
- Fix build test error reported by kernel test robot in [1].
[1] https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org/thread/CKPGL4F…
Kalesh Singh (6):
kselftests: vm: Add mremap tests
arm64: mremap speedup - Enable HAVE_MOVE_PMD
mm: Speedup mremap on 1GB or larger regions
arm64: Add set_pud_at() functions
arm64: mremap speedup - Enable HAVE_MOVE_PUD
x86: mremap speedup - Enable HAVE_MOVE_PUD
arch/Kconfig | 7 +
arch/arm64/Kconfig | 2 +
arch/arm64/include/asm/pgtable.h | 1 +
arch/x86/Kconfig | 1 +
mm/mremap.c | 220 +++++++++++++--
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 1 +
tools/testing/selftests/vm/mremap_test.c | 333 +++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 11 +
9 files changed, 547 insertions(+), 30 deletions(-)
create mode 100644 tools/testing/selftests/vm/mremap_test.c
base-commit: 472e5b056f000a778abb41f1e443de58eb259783
--
2.28.0.806.g8561365e88-goog
Hi,
Maybe this should really be an RFC, given that I don't fully understand
why the compaction_test.c program was mmap'ing 1 MB at a time. So
apologies in advance if I've mucked up something important, but if so,
maybe we can still find a way to get this fixed up to something better.
Anyway: there are 20+ tests in tools/testing/selftests/vm/. The entire
running time for these (via run_vmtest.sh) is about 56 seconds, of which
over half is due to just one test: compaction_test, which takes 27 sec!
(A runner-up is HMM, at 11 sec, so it's up for a look next.) The other
tests mostly take a few ms, and a few take 1.0 sec.
This drops the compaction_test run time from 27, to 3.3 sec. Enjoy. :)
thanks,
John Hubbard
NVIDIA
John Hubbard (1):
selftests/vm: 8x compaction_test speedup
tools/testing/selftests/vm/compaction_test.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
--
2.28.0