This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
and does not require tests to be written in userspace running on a host
kernel. Additionally, KUnit is fast: From invocation to completion KUnit
can run several dozen tests in under a second. Currently, the entire
KUnit test suite for KUnit runs in under a second from the initial
invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
## What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
## Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
## More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here:
https://google.github.io/kunit-docs/third_party/kernel/docs/
Additionally for convenience, I have applied these patches to a branch:
https://kunit.googlesource.com/linux/+/kunit/rfc/4.19/v3
The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/4.19/v3 branch.
## Changes Since Last Version
- Changed namespace prefix from `test_*` to `kunit_*` as requested by
Shuah.
- Started converting/cleaning up the device tree unittest to use KUnit.
- Started adding KUnit expectations with custom messages.
--
2.20.0.rc0.387.gc7a69e6b6c-goog
## TL;DR
This revision addresses comments from Linus[1] and Randy[2], by moving
top level `kunit/` directory to `lib/kunit/` and likewise moves top
level Kconfig entry under lib/Kconfig.debug, so the KUnit submenu now
shows up under the "Kernel Hacking" menu.
As a consequence of this, I rewrote patch 06/18 (kbuild: enable building
KUnit), and now needs to be re-acked/reviewed.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[3]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in about a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[4].
Additionally for convenience, I have applied these patches to a
branch[5]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/initial/v5.3/v16 branch.
[1] https://www.lkml.org/lkml/2019/9/20/696
[2] https://www.lkml.org/lkml/2019/9/20/738
[3] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[4] https://google.github.io/kunit-docs/third_party/kernel/docs/
[5] https://kunit.googlesource.com/linux/+/kunit/initial/v5.3/v16
---
Avinash Kondareddy (1):
kunit: test: add tests for KUnit managed resources
Brendan Higgins (16):
kunit: test: add KUnit test runner core
kunit: test: add test resource management API
kunit: test: add string_stream a std::stream like string builder
kunit: test: add assertion printing library
kunit: test: add the concept of expectations
lib: enable building KUnit in lib/
kunit: test: add initial tests
objtool: add kunit_try_catch_throw to the noreturn list
kunit: test: add support for test abort
kunit: test: add tests for kunit test abort
kunit: test: add the concept of assertions
kunit: defconfig: add defconfigs for building KUnit tests
Documentation: kunit: add documentation for KUnit
MAINTAINERS: add entry for KUnit the unit testing framework
MAINTAINERS: add proc sysctl KUnit test to PROC SYSCTL section
kunit: fix failure to build without printk
Felix Guo (1):
kunit: tool: add Python wrappers for running KUnit tests
Iurii Zaikin (1):
kernel/sysctl-test: Add null pointer test for sysctl.c:proc_dointvec()
Documentation/dev-tools/index.rst | 1 +
Documentation/dev-tools/kunit/api/index.rst | 16 +
Documentation/dev-tools/kunit/api/test.rst | 11 +
Documentation/dev-tools/kunit/faq.rst | 62 +
Documentation/dev-tools/kunit/index.rst | 79 +
Documentation/dev-tools/kunit/start.rst | 180 ++
Documentation/dev-tools/kunit/usage.rst | 576 +++++++
MAINTAINERS | 13 +
arch/um/configs/kunit_defconfig | 3 +
include/kunit/assert.h | 356 ++++
include/kunit/string-stream.h | 51 +
include/kunit/test.h | 1490 +++++++++++++++++
include/kunit/try-catch.h | 75 +
kernel/Makefile | 2 +
kernel/sysctl-test.c | 392 +++++
lib/Kconfig.debug | 13 +
lib/Makefile | 2 +
lib/kunit/Kconfig | 38 +
lib/kunit/Makefile | 9 +
lib/kunit/assert.c | 141 ++
lib/kunit/example-test.c | 88 +
lib/kunit/string-stream-test.c | 52 +
lib/kunit/string-stream.c | 217 +++
lib/kunit/test-test.c | 331 ++++
lib/kunit/test.c | 478 ++++++
lib/kunit/try-catch.c | 118 ++
tools/objtool/check.c | 1 +
tools/testing/kunit/.gitignore | 3 +
tools/testing/kunit/configs/all_tests.config | 3 +
tools/testing/kunit/kunit.py | 136 ++
tools/testing/kunit/kunit_config.py | 66 +
tools/testing/kunit/kunit_kernel.py | 149 ++
tools/testing/kunit/kunit_parser.py | 310 ++++
tools/testing/kunit/kunit_tool_test.py | 206 +++
.../test_is_test_passed-all_passed.log | 32 +
.../test_data/test_is_test_passed-crash.log | 69 +
.../test_data/test_is_test_passed-failure.log | 36 +
.../test_is_test_passed-no_tests_run.log | 75 +
.../test_output_isolated_correctly.log | 106 ++
.../test_data/test_read_from_file.kconfig | 17 +
40 files changed, 6003 insertions(+)
create mode 100644 Documentation/dev-tools/kunit/api/index.rst
create mode 100644 Documentation/dev-tools/kunit/api/test.rst
create mode 100644 Documentation/dev-tools/kunit/faq.rst
create mode 100644 Documentation/dev-tools/kunit/index.rst
create mode 100644 Documentation/dev-tools/kunit/start.rst
create mode 100644 Documentation/dev-tools/kunit/usage.rst
create mode 100644 arch/um/configs/kunit_defconfig
create mode 100644 include/kunit/assert.h
create mode 100644 include/kunit/string-stream.h
create mode 100644 include/kunit/test.h
create mode 100644 include/kunit/try-catch.h
create mode 100644 kernel/sysctl-test.c
create mode 100644 lib/kunit/Kconfig
create mode 100644 lib/kunit/Makefile
create mode 100644 lib/kunit/assert.c
create mode 100644 lib/kunit/example-test.c
create mode 100644 lib/kunit/string-stream-test.c
create mode 100644 lib/kunit/string-stream.c
create mode 100644 lib/kunit/test-test.c
create mode 100644 lib/kunit/test.c
create mode 100644 lib/kunit/try-catch.c
create mode 100644 tools/testing/kunit/.gitignore
create mode 100644 tools/testing/kunit/configs/all_tests.config
create mode 100755 tools/testing/kunit/kunit.py
create mode 100644 tools/testing/kunit/kunit_config.py
create mode 100644 tools/testing/kunit/kunit_kernel.py
create mode 100644 tools/testing/kunit/kunit_parser.py
create mode 100755 tools/testing/kunit/kunit_tool_test.py
create mode 100644 tools/testing/kunit/test_data/test_is_test_passed-all_passed.log
create mode 100644 tools/testing/kunit/test_data/test_is_test_passed-crash.log
create mode 100644 tools/testing/kunit/test_data/test_is_test_passed-failure.log
create mode 100644 tools/testing/kunit/test_data/test_is_test_passed-no_tests_run.log
create mode 100644 tools/testing/kunit/test_data/test_output_isolated_correctly.log
create mode 100644 tools/testing/kunit/test_data/test_read_from_file.kconfig
--
2.23.0.351.gc4317032e6-goog
Commit a745f7af3cbd ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.
To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".
Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
tools/testing/selftests/kselftest/runner.sh | 36 +++++++++++++++++++--
tools/testing/selftests/rtc/settings | 1 +
2 files changed, 34 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/rtc/settings
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 00c9020bdda8..84de7bc74f2c 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -3,9 +3,14 @@
#
# Runs a set of tests in a given subdirectory.
export skip_rc=4
+export timeout_rc=124
export logfile=/dev/stdout
export per_test_logging=
+# Defaults for "settings" file fields:
+# "timeout" how many seconds to let each test run before failing.
+export kselftest_default_timeout=45
+
# There isn't a shell-agnostic way to find the path of a sourced file,
# so we must rely on BASE_DIR being set to find other tools.
if [ -z "$BASE_DIR" ]; then
@@ -24,6 +29,16 @@ tap_prefix()
fi
}
+tap_timeout()
+{
+ # Make sure tests will time out if utility is available.
+ if [ -x /usr/bin/timeout ] ; then
+ /usr/bin/timeout "$kselftest_timeout" "$1"
+ else
+ "$1"
+ fi
+}
+
run_one()
{
DIR="$1"
@@ -32,6 +47,18 @@ run_one()
BASENAME_TEST=$(basename $TEST)
+ # Reset any "settings"-file variables.
+ export kselftest_timeout="$kselftest_default_timeout"
+ # Load per-test-directory kselftest "settings" file.
+ settings="$BASE_DIR/$DIR/settings"
+ if [ -r "$settings" ] ; then
+ while read line ; do
+ field=$(echo "$line" | cut -d= -f1)
+ value=$(echo "$line" | cut -d= -f2-)
+ eval "kselftest_$field"="$value"
+ done < "$settings"
+ fi
+
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
if [ ! -x "$TEST" ]; then
@@ -44,14 +71,17 @@ run_one()
echo "not ok $test_num $TEST_HDR_MSG"
else
cd `dirname $TEST` > /dev/null
- (((((./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
- (if [ $? -eq $skip_rc ]; then \
+ (rc=$?; \
+ if [ $rc -eq $skip_rc ]; then \
echo "not ok $test_num $TEST_HDR_MSG # SKIP"
+ elif [ $rc -eq $timeout_rc ]; then \
+ echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT"
else
- echo "not ok $test_num $TEST_HDR_MSG"
+ echo "not ok $test_num $TEST_HDR_MSG # exit=$rc"
fi)
cd - >/dev/null
fi
diff --git a/tools/testing/selftests/rtc/settings b/tools/testing/selftests/rtc/settings
new file mode 100644
index 000000000000..ba4d85f74cd6
--- /dev/null
+++ b/tools/testing/selftests/rtc/settings
@@ -0,0 +1 @@
+timeout=90
--
2.17.1
--
Kees Cook
On Tue, Sep 17, 2019 at 12:26 PM Shuah Khan <skhan(a)linuxfoundation.org> wrote:
>
> This Kselftest update for Linux 5.4-rc1 consists of several fixes to
> existing tests and adds KUnit, a lightweight unit testing and mocking
> framework for the Linux kernel from Brendan Higgins.
So I pulled this, but then I almost immediately unpulled it.
My reason for doing that may be odd, but it's because of the top-level
'kunit' directory. This shouldn't be on the top level.
The reason I react so strongly is that it actually breaks my finger
memory. I don't type out filenames - I auto-compete them. So "kernel/"
is "k<tab>", "drivers/" is "d<tab>" etc.
It already doesn't work for everything ("mm/" is actually "mm<tab>"
not because we have files in the git tree, but because the build
creates various "module" files), but this breaks a common pattern for
me.
> In the future KUnit will be linked to Kselftest framework to provide
> a way to trigger KUnit tests from user-space.
Can the kernel parts please move to lib/kunit/ or something like that?
Linus
Hey everyone,
This is the patchset coming out of the KSummit session Kees and I gave
in Lisbon last week (cf. [3] which also contains slides with more
details on related things such as deep argument inspection).
The simple idea is to extend the seccomp notifier to allow for the
continuation of a syscall. The rationale for this can be found in the
commit message to [1]. For the curious there is more detail in [2].
This patchset would unblock supervising an extended set of syscalls such
as mount() where a privileged process is supervising the syscalls of a
lesser privileged process and emulates the syscall for the latter in
userspace.
For more comments on security see [1].
Kees, if you prefer a pr the series can be pulled from:
git@gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux tags/seccomp-notify-syscall-continue-v5.5
For anyone who wants to play with this it's sitting in:
https://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux.git/log/?h=se…
/* v1 */
- Kees Cook <keescook(a)chromium.org>:
- dropped patch because it is already present in linux-next
[PATCH 2/4] seccomp: add two missing ptrace ifdefines
Link: https://lore.kernel.org/r/20190918084833.9369-3-christian.brauner@ubuntu.com
/* v0 */
Link: https://lore.kernel.org/r/20190918084833.9369-1-christian.brauner@ubuntu.com
Thanks!
Christian
/* References */
[1]: [PATCH 1/3] seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
[2]: https://lore.kernel.org/r/20190719093538.dhyopljyr5ns33qx@brauner.io
[3]: https://linuxplumbersconf.org/event/4/contributions/560
Christian Brauner (3):
seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE
seccomp: avoid overflow in implicit constant conversion
seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE
include/uapi/linux/seccomp.h | 20 ++++
kernel/seccomp.c | 28 ++++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 105 +++++++++++++++++-
3 files changed, 146 insertions(+), 7 deletions(-)
--
2.23.0
Hey everyone,
This is the patchset coming out of the KSummit session Kees and I gave
in Lisbon last week (cf. [3] which also contains slides with more
details on related things such as deep argument inspection).
The simple idea is to extend the seccomp notifier to allow for the
continuation of a syscall. The rationale for this can be found in the
commit message to [1]. For the curious there is more detail in [2].
This patchset would unblock supervising an extended set of syscalls such
as mount() where a privileged process is supervising the syscalls of a
lesser privileged process and emulates the syscall for the latter in
userspace.
For more comments on security see [1].
Thanks!
Christian
/* References */
[1]: [PATCH 1/4] seccomp: add SECCOMP_RET_USER_NOTIF_ALLOW
[2]: https://lore.kernel.org/r/20190719093538.dhyopljyr5ns33qx@brauner.io
[3]: https://linuxplumbersconf.org/event/4/contributions/560
Christian Brauner (4):
seccomp: add SECCOMP_RET_USER_NOTIF_ALLOW
seccomp: add two missing ptrace ifdefines
seccomp: avoid overflow in implicit constant conversion
seccomp: test SECCOMP_RET_USER_NOTIF_ALLOW
include/uapi/linux/seccomp.h | 2 +
kernel/seccomp.c | 24 +++-
tools/testing/selftests/seccomp/seccomp_bpf.c | 110 +++++++++++++++++-
3 files changed, 131 insertions(+), 5 deletions(-)
--
2.23.0
From: "George G. Davis" <george_davis(a)mentor.com>
The newly added optional file argument does not validate if the
file is indeed a watchdog, e.g.:
./watchdog-test -f /dev/zero
Watchdog Ticking Away!
Fix it by confirming that the WDIOC_GETSUPPORT ioctl succeeds.
Fixes: c3f2490d6e9257 ("selftests: watchdog: Add optional file argument")
Reported-by: Eugeniu Rosca <erosca(a)de.adit-jv.com>
Signed-off-by: George G. Davis <george_davis(a)mentor.com>
Signed-off-by: Eugeniu Rosca <erosca(a)de.adit-jv.com>
---
v3:
- Used v1 as starting point and simplified commit description
- Added Fixes tag (WARNING: commit id is from linux-next!)
- No change in the contents
- Applied cleanly to the same base as used in [v1]
v2:
- https://patchwork.kernel.org/patch/11147663/
v1:
- https://patchwork.kernel.org/patch/11136283/
- Applied/tested on commit ce54eab71e210f ("kunit: fix failure to build without printk") of
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git/l…
---
tools/testing/selftests/watchdog/watchdog-test.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/watchdog/watchdog-test.c b/tools/testing/selftests/watchdog/watchdog-test.c
index afff120c7be6..6ed822dc2222 100644
--- a/tools/testing/selftests/watchdog/watchdog-test.c
+++ b/tools/testing/selftests/watchdog/watchdog-test.c
@@ -97,6 +97,7 @@ int main(int argc, char *argv[])
int c;
int oneshot = 0;
char *file = "/dev/watchdog";
+ struct watchdog_info info;
setbuf(stdout, NULL);
@@ -118,6 +119,16 @@ int main(int argc, char *argv[])
exit(-1);
}
+ /*
+ * Validate that `file` is a watchdog device
+ */
+ ret = ioctl(fd, WDIOC_GETSUPPORT, &info);
+ if (ret) {
+ printf("WDIOC_GETSUPPORT error '%s'\n", strerror(errno));
+ close(fd);
+ exit(ret);
+ }
+
optind = 0;
while ((c = getopt_long(argc, argv, sopts, lopts, NULL)) != -1) {
--
2.23.0
Problem:
Currently tasks attempting to allocate more hugetlb memory than is available get
a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1].
However, if a task attempts to allocate hugetlb memory only more than its
hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call,
but will SIGBUS the task when it attempts to fault the memory in.
We have developers interested in using hugetlb_cgroups, and they have expressed
dissatisfaction regarding this behavior. We'd like to improve this
behavior such that tasks violating the hugetlb_cgroup limits get an error on
mmap/shmget time, rather than getting SIGBUS'd when they try to fault
the excess memory in.
The underlying problem is that today's hugetlb_cgroup accounting happens
at hugetlb memory *fault* time, rather than at *reservation* time.
Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and
the offending task gets SIGBUS'd.
Proposed Solution:
A new page counter named hugetlb.xMB.reservation_[limit|usage]_in_bytes. This
counter has slightly different semantics than
hugetlb.xMB.[limit|usage]_in_bytes:
- While usage_in_bytes tracks all *faulted* hugetlb memory,
reservation_usage_in_bytes tracks all *reserved* hugetlb memory.
- If a task attempts to reserve more memory than limit_in_bytes allows,
the kernel will allow it to do so. But if a task attempts to reserve
more memory than reservation_limit_in_bytes, the kernel will fail this
reservation.
This proposal is implemented in this patch, with tests to verify
functionality and show the usage.
Alternatives considered:
1. A new cgroup, instead of only a new page_counter attached to
the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code
duplication with hugetlb_cgroup. Keeping hugetlb related page counters under
hugetlb_cgroup seemed cleaner as well.
2. Instead of adding a new counter, we considered adding a sysctl that modifies
the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do accounting at
reservation time rather than fault time. Adding a new page_counter seems
better as userspace could, if it wants, choose to enforce different cgroups
differently: one via limit_in_bytes, and another via
reservation_limit_in_bytes. This could be very useful if you're
transitioning how hugetlb memory is partitioned on your system one
cgroup at a time, for example. Also, someone may find usage for both
limit_in_bytes and reservation_limit_in_bytes concurrently, and this
approach gives them the option to do so.
Caveats:
1. This support is implemented for cgroups-v1. I have not tried
hugetlb_cgroups with cgroups v2, and AFAICT it's not supported yet.
This is largely because we use cgroups-v1 for now. If required, I
can add hugetlb_cgroup support to cgroups v2 in this patch or
a follow up.
2. Most complicated bit of this patch I believe is: where to store the
pointer to the hugetlb_cgroup to uncharge at unreservation time?
Normally the cgroup pointers hang off the struct page. But, with
hugetlb_cgroup reservations, one task can reserve a specific page and another
task may fault it in (I believe), so storing the pointer in struct
page is not appropriate. Proposed approach here is to store the pointer in
the resv_map. See patch for details.
Testing:
- Added tests passing.
- libhugetlbfs tests mostly passing, but some tests have trouble with and
without this patch series. Seems environment issue rather than code:
- Overall results:
********** TEST SUMMARY
* 2M
* 32-bit 64-bit
* Total testcases: 84 0
* Skipped: 0 0
* PASS: 66 0
* FAIL: 14 0
* Killed by signal: 0 0
* Bad configuration: 4 0
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Test not present: 0 0
* Strange test result: 0 0
**********
- Failing tests:
- elflink_rw_and_share_test("linkhuge_rw") segfaults with and without this
patch series.
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 32):
FAIL Address is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc
HUGETLB_MORECORE=yes malloc (2M: 32):
FAIL Address is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 32):
FAIL Address is not hugepage
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
HUGETLB_MORECORE=yes heapshrink (2M: 32):
FAIL Heap not on hugepages
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 32):
FAIL Heap not on hugepages
- HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAIL small_data is not hugepage
- HUGETLB_ELFMAP=RW HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32):
FAIL small_data is not hugepage
- alloc-instantiate-race shared (2M: 32):
Bad configuration: sched_setaffinity(cpu1): Invalid argument -
FAIL Child 1 killed by signal Killed
- shmoverride_linked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- HUGETLB_SHM=yes shmoverride_linked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- shmoverride_linked_static (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- HUGETLB_SHM=yes shmoverride_linked_static (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- LD_PRELOAD=libhugetlbfs.so shmoverride_unlinked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- LD_PRELOAD=libhugetlbfs.so HUGETLB_SHM=yes shmoverride_unlinked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
[1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html
Changes in v4:
- Split up 'hugetlb_cgroup: add accounting for shared mappings' into 4 patches
for better isolation and context on the individual changes:
- hugetlb_cgroup: add accounting for shared mappings
- hugetlb: disable region_add file_region coalescing
- hugetlb: remove duplicated code
- hugetlb: region_chg provides only cache entry
- Fixed resv->adds_in_progress accounting.
- Retained behavior that region_add never fails, in earlier patchsets region_add
could return failure.
- Fixed libhugetlbfs failure.
- Minor fix to the added tests that was preventing them from running on some
environments.
Changes in v3:
- Addressed comments of Hillf Danton:
- Added docs.
- cgroup_files now uses enum.
- Various readability improvements.
- Addressed comments of Mike Kravetz.
- region_* functions no longer coalesce file_region entries in the resv_map.
- region_add() and region_chg() refactored to make them much easier to
understand and remove duplicated code so this patch doesn't add too much
complexity.
- Refactored common functionality into helpers.
Changes in v2:
- Split the patch into a 5 patch series.
- Fixed patch subject.
Mina Almasry (9):
hugetlb_cgroup: Add hugetlb_cgroup reservation counter
hugetlb_cgroup: add interface for charge/uncharge hugetlb reservations
hugetlb_cgroup: add reservation accounting for private mappings
hugetlb: region_chg provides only cache entry
hugetlb: remove duplicated code
hugetlb: disable region_add file_region coalescing
hugetlb_cgroup: add accounting for shared mappings
hugetlb_cgroup: Add hugetlb_cgroup reservation tests
hugetlb_cgroup: Add hugetlb_cgroup reservation docs
.../admin-guide/cgroup-v1/hugetlb.rst | 84 ++-
include/linux/hugetlb.h | 24 +-
include/linux/hugetlb_cgroup.h | 24 +-
mm/hugetlb.c | 516 +++++++++++-------
mm/hugetlb_cgroup.c | 189 +++++--
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 4 +
.../selftests/vm/charge_reserved_hugetlb.sh | 440 +++++++++++++++
.../selftests/vm/write_hugetlb_memory.sh | 22 +
.../testing/selftests/vm/write_to_hugetlbfs.c | 252 +++++++++
10 files changed, 1304 insertions(+), 252 deletions(-)
create mode 100755 tools/testing/selftests/vm/charge_reserved_hugetlb.sh
create mode 100644 tools/testing/selftests/vm/write_hugetlb_memory.sh
create mode 100644 tools/testing/selftests/vm/write_to_hugetlbfs.c
--
2.23.0.162.g0b9fbb3734-goog
This patchset is being developed here:
<https://github.com/cyphar/linux/tree/resolveat/master>
Patch changelog:
v12:
* Remove @how->reserved field from openat2(2), and instead use the
(struct, size) design for syscall extensions.
* Implement copy_struct_{to,from}_user() to unify (struct, size)
syscall extension designs (as well as make them slightly more
efficient by using memchr_inv() as well as using buffers and
avoiding repeated access_ok() checks for trailing byte operations).
* Port sched_setattr(), perf_event_attr(), and clone3() to use the
new helpers.
v11: <https://lore.kernel.org/lkml/20190820033406.29796-1-cyphar@cyphar.com/>
<https://lore.kernel.org/lkml/20190728010207.9781-1-cyphar@cyphar.com/>
v10: <https://lore.kernel.org/lkml/20190719164225.27083-1-cyphar@cyphar.com/>
v09: <https://lore.kernel.org/lkml/20190706145737.5299-1-cyphar@cyphar.com/>
v08: <https://lore.kernel.org/lkml/20190520133305.11925-1-cyphar@cyphar.com/>
v07: <https://lore.kernel.org/lkml/20190507164317.13562-1-cyphar@cyphar.com/>
v06: <https://lore.kernel.org/lkml/20190506165439.9155-1-cyphar@cyphar.com/>
v05: <https://lore.kernel.org/lkml/20190320143717.2523-1-cyphar@cyphar.com/>
v04: <https://lore.kernel.org/lkml/20181112142654.341-1-cyphar@cyphar.com/>
v03: <https://lore.kernel.org/lkml/20181009070230.12884-1-cyphar@cyphar.com/>
v02: <https://lore.kernel.org/lkml/20181009065300.11053-1-cyphar@cyphar.com/>
v01: <https://lore.kernel.org/lkml/20180929103453.12025-1-cyphar@cyphar.com/>
The need for some sort of control over VFS's path resolution (to avoid
malicious paths resulting in inadvertent breakouts) has been a very
long-standing desire of many userspace applications. This patchset is a
revival of Al Viro's old AT_NO_JUMPS[1,2] patchset (which was a variant
of David Drysdale's O_BENEATH patchset[3] which was a spin-off of the
Capsicum project[4]) with a few additions and changes made based on the
previous discussion within [5] as well as others I felt were useful.
In line with the conclusions of the original discussion of AT_NO_JUMPS,
the flag has been split up into separate flags. However, instead of
being an openat(2) flag it is provided through a new syscall openat2(2)
which provides several other improvements to the openat(2) interface (see the
patch description for more details). The following new LOOKUP_* flags are
added:
* LOOKUP_NO_XDEV blocks all mountpoint crossings (upwards, downwards,
or through absolute links). Absolute pathnames alone in openat(2) do
not trigger this.
* LOOKUP_NO_MAGICLINKS blocks resolution through /proc/$pid/fd-style
links. This is done by blocking the usage of nd_jump_link() during
resolution in a filesystem. The term "magic-links" is used to match
with the only reference to these links in Documentation/, but I'm
happy to change the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
* LOOKUP_BENEATH disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed. Conceptually this flag is to
ensure you "stay below" a certain point in the filesystem tree --
but this requires some additional to protect against various races
that would allow escape using "..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done as
in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
* LOOKUP_NO_SYMLINKS does what it says on the tin. No symlink
resolution is allowed at all, including magic-links. Just as with
LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an
fd for the symlink as long as no parent path had a symlink
component.
* LOOKUP_IN_ROOT is an extension of LOOKUP_BENEATH that, rather than
blocking attempts to move past the root, forces all such movements
to be scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that chroot(2)
is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to cross
magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[6] when opening
paths in a potentially malicious container. There is a long list of
CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT
(such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and
CVE-2019-5736, just to name a few).
And further, several semantics of file descriptor "re-opening" are now
changed to prevent attacks like CVE-2019-5736 by restricting how
magic-links can be resolved (based on their mode). This required some
other changes to the semantics of the modes of O_PATH file descriptor's
associated /proc/self/fd magic-links. openat2(2) has the ability to
further restrict re-opening of its own O_PATH fds, so that users can
make even better use of this feature.
Finally, O_EMPTYPATH was added so that users can do /proc/self/fd-style
re-opening without depending on procfs. The new restricted semantics for
magic-links are applied here too.
In order to make all of the above more usable, I'm working on
libpathrs[7] which is a C-friendly library for safe path resolution. It
features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Eric Biederman <ebiederm(a)xmission.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Christian Brauner <christian(a)brauner.io>
Cc: David Drysdale <drysdale(a)google.com>
Cc: Tycho Andersen <tycho(a)tycho.ws>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
[1]: https://lwn.net/Articles/721443/
[2]: https://lore.kernel.org/patchwork/patch/784221/
[3]: https://lwn.net/Articles/619151/
[4]: https://lwn.net/Articles/603929/
[5]: https://lwn.net/Articles/723057/
[6]: https://github.com/cyphar/filepath-securejoin
[7]: https://github.com/openSUSE/libpathrs
Aleksa Sarai (12):
lib: introduce copy_struct_{to,from}_user helpers
clone3: switch to copy_struct_from_user()
sched_setattr: switch to copy_struct_{to,from}_user()
perf_event_open: switch to copy_struct_from_user()
namei: obey trailing magic-link DAC permissions
procfs: switch magic-link modes to be more sane
open: O_EMPTYPATH: procfs-less file descriptor re-opening
namei: O_BENEATH-style path resolution flags
namei: LOOKUP_IN_ROOT: chroot-like path resolution
namei: aggressively check for nd->root escape on ".." resolution
open: openat2(2) syscall
selftests: add openat2(2) selftests
Documentation/filesystems/path-lookup.rst | 12 +-
arch/alpha/include/uapi/asm/fcntl.h | 1 +
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/include/uapi/asm/fcntl.h | 39 +-
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/include/uapi/asm/fcntl.h | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/fcntl.c | 2 +-
fs/internal.h | 1 +
fs/namei.c | 270 ++++++++++--
fs/open.c | 100 ++++-
fs/proc/base.c | 20 +-
fs/proc/fd.c | 23 +-
fs/proc/namespaces.c | 2 +-
include/linux/fcntl.h | 21 +-
include/linux/fs.h | 8 +-
include/linux/namei.h | 9 +
include/linux/syscalls.h | 14 +-
include/linux/uaccess.h | 5 +
include/uapi/asm-generic/fcntl.h | 4 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/fcntl.h | 42 ++
include/uapi/linux/sched.h | 2 +
kernel/events/core.c | 45 +-
kernel/fork.c | 34 +-
kernel/sched/core.c | 85 +---
lib/Makefile | 2 +-
lib/struct_user.c | 182 ++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/memfd/memfd_test.c | 7 +-
tools/testing/selftests/openat2/.gitignore | 1 +
tools/testing/selftests/openat2/Makefile | 8 +
tools/testing/selftests/openat2/helpers.c | 167 ++++++++
tools/testing/selftests/openat2/helpers.h | 118 +++++
.../testing/selftests/openat2/linkmode_test.c | 333 +++++++++++++++
.../testing/selftests/openat2/openat2_test.c | 106 +++++
.../selftests/openat2/rename_attack_test.c | 127 ++++++
.../testing/selftests/openat2/resolve_test.c | 402 ++++++++++++++++++
53 files changed, 1971 insertions(+), 248 deletions(-)
create mode 100644 lib/struct_user.c
create mode 100644 tools/testing/selftests/openat2/.gitignore
create mode 100644 tools/testing/selftests/openat2/Makefile
create mode 100644 tools/testing/selftests/openat2/helpers.c
create mode 100644 tools/testing/selftests/openat2/helpers.h
create mode 100644 tools/testing/selftests/openat2/linkmode_test.c
create mode 100644 tools/testing/selftests/openat2/openat2_test.c
create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
create mode 100644 tools/testing/selftests/openat2/resolve_test.c
--
2.23.0