Hello,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] Add a signal handler to cleanup properly before exiting the
parent process if there is an error occurs after creating
a child process with fork() in the CAT test.
[patch 5] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on Linux v6.1-rc5
Difference from v3:
[patch 2]
Rename "failed" to "ret" to avoid confusion.
[patch 4]
- Use sigaction(2) instead of signal().
- Add a description of using global bm_pid in commit message.
- Add comments to clarify why let the child continue to its
infinite loop after the write() failed.
[patch 5]
Ensure to run cat/cmt/mbm/mba_test_cleanup() to clear test result
file before return if an error occurs.
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (5):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 31 +++++++++++++------
tools/testing/selftests/resctrl/cmt_test.c | 7 ++---
tools/testing/selftests/resctrl/mba_test.c | 23 +++++++-------
tools/testing/selftests/resctrl/mbm_test.c | 20 ++++++------
.../testing/selftests/resctrl/resctrl_tests.c | 4 ---
tools/testing/selftests/resctrl/resctrl_val.c | 1 +
tools/testing/selftests/resctrl/resctrlfs.c | 5 ++-
7 files changed, 50 insertions(+), 41 deletions(-)
--
2.27.0
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
Changes in v6:
- Updated the interface and made cosmetic changes
Original Cover Letter in v5:
Hello,
This patch series implements IOCTL on the pagemap procfs file to get the
information about the page table entries (PTEs). The following operations
are supported in this ioctl:
- Get the information if the pages are soft-dirty, file mapped, present
or swapped.
- Clear the soft-dirty PTE bit of the pages.
- Get and clear the soft-dirty PTE bit of the pages atomically.
Soft-dirty PTE bit of the memory pages can be read by using the pagemap
procfs file. The soft-dirty PTE bit for the whole memory range of the
process can be cleared by writing to the clear_refs file. There are other
methods to mimic this information entirely in userspace with poor
performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty PTE bit status and clear operation
possible.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows. This syscall is used by games to
keep track of dirty pages to process only the dirty pages.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project[2][3]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project[2].
The IOCTL returns the addresses of the pages which match the specific masks.
The page addresses are returned in struct page_region in a compact form.
The max_pages is needed to support a use case where user only wants to get
a specific number of pages. So there is no need to find all the pages of
interest in the range when max_pages is specified. The IOCTL returns when
the maximum number of the pages are found. The max_pages is optional. If
max_pages is specified, it must be equal or greater than the vec_size.
This restriction is needed to handle worse case when one page_region only
contains info of one page and it cannot be compacted. This is needed to
emulate the Windows getWriteWatch() syscall.
Some non-dirty pages get marked as dirty because of the kernel's
internal activity (such as VMA merging as soft-dirty bit difference isn't
considered while deciding to merge VMAs). The dirty bit of the pages is
stored in the VMA flags and in the per page flags. If any of these two bits
are set, the page is considered to be soft dirty. Suppose you have cleared
the soft dirty bit of half of VMA which will be done by splitting the VMA
and clearing soft dirty bit flag in the half VMA and the pages in it. Now
kernel may decide to merge the VMAs again. So the half VMA becomes dirty
again. This splitting/merging costs performance. The application receives
a lot of pages which aren't dirty in reality but marked as dirty.
Performance is lost again here. Also sometimes user doesn't want the newly
allocated memory to be marked as dirty. PAGEMAP_NO_REUSED_REGIONS flag
solves both the problems. It is used to not depend on the soft dirty flag
in the VMA flags. So VMA splitting and merging doesn't happen. It only
depends on the soft dirty bit of the individual pages. Thus by using this
flag, there may be a scenerio such that the new memory regions which are
just created, doesn't look dirty when seen with the IOCTL, but look dirty
when seen from procfs. This seems okay as the user of this flag know the
implication of using it.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[3] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (3):
fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit
fs/proc/task_mmu: Implement IOCTL to get and/or the clear info about PTEs
selftests: vm: add pagemap ioctl tests
fs/proc/task_mmu.c | 410 +++++++++++-
include/uapi/linux/fs.h | 56 ++
tools/include/uapi/linux/fs.h | 56 ++
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 5 +-
tools/testing/selftests/vm/pagemap_ioctl.c | 698 +++++++++++++++++++++
6 files changed, 1193 insertions(+), 33 deletions(-)
create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
--
2.30.2
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Notes:
- This patch set addresses the kernel panic described below, and not the
more broad issue of accessing kernel objects whose pointer is passed
as parameter by LSM hooks
- Alternative approaches trying to limit return values at run-time either
in the security subsystem or in the eBPF JIT are not preferred by the
respective maintainers
- Although all eBPF selftests have been verified to pass, it still might
be cumbersome to have an eBPF program being accepted by the eBPF
verifier (e.g. ANDing negative numbers causes existing bounds to be lost)
- The patch to store whether a register state changed due to an ALU64 or an
ALU32 operation might not be correct/complete, a review by eBPF
maintainers would be needed
- This patch set requires "lsm: make security_socket_getpeersec_stream()
sockptr_t safe", in lsm/next
- The modification of the LSM infrastructure to define allowed return
values for the LSM hooks could be replaced with an eBPF-only fix, with
the drawback of having to update the information manually each time a
new hook is added; allowing zero or negative values by default could be
reasonable, but there are already exceptions of LSM hooks accepting 0 or
1 (ismaclabel)
- The patches to fix the LSM infrastructure documentation are separated
from this patch set and available here:
https://lore.kernel.org/linux-security-module/20221128144240.210110-1-rober…
BPF LSM defines attachment points to allows security modules (eBPF programs
with type LSM) to provide their implementation of the desired LSM hooks.
Unfortunately, BPF LSM does not restrict which values security modules can
return (for non-void LSM hooks). If they put arbitrary values instead of
those stated in include/linux/lsm_hooks.h, they could cause big troubles.
For example, this simple eBPF program:
SEC("lsm/inode_permission")
int BPF_PROG(test_int_hook, struct inode *inode, int mask)
{
return 1;
}
causes the following kernel panic:
[ 181.130807] BUG: kernel NULL pointer dereference, address: 0000000000000079
[ 181.131478] #PF: supervisor read access in kernel mode
[ 181.131942] #PF: error_code(0x0000) - not-present page
[ 181.132407] PGD 0 P4D 0
[ 181.132650] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 181.133054] CPU: 5 PID: 857 Comm: systemd-oomd Tainted: G OE 6.1.0-rc7+ #530
[ 181.133806] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 181.134601] RIP: 0010:do_sys_openat2+0x235/0x300
[...]
[ 181.136682] RSP: 0018:ffffc90001557ee0 EFLAGS: 00010203
[ 181.137154] RAX: 0000000000000001 RBX: ffffc90001557f20 RCX: ffff888112003380
[ 181.137790] RDX: 0000000000000000 RSI: ffffffff8280b026 RDI: ffffc90001557e28
[ 181.138432] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 181.139081] R10: ffffffff835097dc R11: 0000000000000000 R12: ffff888106118000
[ 181.139717] R13: 000000000000000c R14: 0000000000000000 R15: 0000000000000000
[ 181.140149] FS: 00007fa6ceb0bb40(0000) GS:ffff88846fb40000(0000) knlGS:0000000000000000
[ 181.140556] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 181.140865] CR2: 0000000000000079 CR3: 0000000135c50000 CR4: 0000000000350ee0
[ 181.141239] Call Trace:
[ 181.141373] <TASK>
[ 181.141495] do_sys_open+0x34/0x60
[ 181.141678] do_syscall_64+0x3b/0x90
[ 181.141875] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Avoid this situation by statically analyzing the eBPF programs attaching to
LSM hooks, and ensure that their return values are compatible with the LSM
infrastructure conventions.
First, add a preliminary patch (patch 1) to fix a small code duplication
issue.
Extend the eBPF verifier to let BPF LSM determine whether it should check
estimated 64 bit values or the 32 bit ones (patch 2). Also, extend the LSM
infrastructure to record more precisely the allowed return values depending
on the documentation found in include/linux/lsm_hooks.h (patch 3). Add the
LSM_RET_NEG, LSM_RET_ZERO, LSM_RET_ONE, LSM_RET_GT_ONE flags to an LSM hook
if that hook allows respectively > 0, 0, 1, > 1 return values.
Then, extend BPF LSM to verify that return values, estimated by the
verifier by analyzing the eBPF program, fall in the allowed intervals found
from the return value flags of the LSM hook being attached to (patch 4).
Finally, add new tests to ensure that the verifier enforces return values
correctly (patch 5), and slightly modify existing tests to make them follow
the LSM infrastructure conventions (patches 6-7) and are accepted by the
verifier.
Changelog:
v1:
- Complete the documentation of return values in lsm_hooks.h
- Introduce return value flags in the LSM infrastructure
- Use those flags instead of the scattered logic (suggested by KP)
- Expose a single verification function to the verifier (suggested by KP)
- Add new patch to remove duplicated function definition
- Add new patch to let BPF LSM determine the appropriate register values
to use
Roberto Sassu (7):
bpf: Remove superfluous btf_id_set_contains() declaration
bpf: Mark ALU32 operations in bpf_reg_state structure
lsm: Redefine LSM_HOOK() macro to add return value flags as argument
bpf-lsm: Enforce return value limitations on security modules
selftests/bpf: Check if return values of LSM programs are allowed
selftests/bpf: Prevent positive ret values in test_lsm and
verify_pkcs7_sig
selftests/bpf: Change return value in test_libbpf_get_fd_by_id_opts.c
include/linux/bpf.h | 1 -
include/linux/bpf_lsm.h | 11 +-
include/linux/bpf_verifier.h | 1 +
include/linux/lsm_hook_defs.h | 780 ++++++++++--------
include/linux/lsm_hooks.h | 9 +-
kernel/bpf/bpf_lsm.c | 81 +-
kernel/bpf/verifier.c | 17 +-
security/bpf/hooks.c | 2 +-
security/security.c | 4 +-
tools/testing/selftests/bpf/progs/lsm.c | 4 +
.../bpf/progs/test_libbpf_get_fd_by_id_opts.c | 7 +-
.../bpf/progs/test_verify_pkcs7_sig.c | 11 +-
.../testing/selftests/bpf/verifier/lsm_ret.c | 148 ++++
13 files changed, 729 insertions(+), 347 deletions(-)
create mode 100644 tools/testing/selftests/bpf/verifier/lsm_ret.c
--
2.25.1