This patchset provides support for the SRv6 End.DT4 and SRv6 End.DT6 (VRF mode)
behavior.
The SRv6 End.DT4 is used to implement multi-tenant IPv4 L3 VPN. It decapsulates
the received packets and performs IPv4 routing lookup in the routing table of
the tenant. The SRv6 End.DT4 Linux implementation leverages a VRF device. The
SRv6 End.DT4 is defined in the SRv6 Network Programming [1].
The Linux kernel already offers an implementation of the SRv6 End.DT6
behavior which permits IPv6 L3 VPNs over SRv6 networks. This new
implementation of DT6 is based on the same VRF infrastructure already
exploited for implementing the SRv6 End.DT4 behavior. The aim of the new
SRv6 End.DT6 in VRF mode consists in simplifying the construction of IPv6
L3 VPN services in the multi-tenant environment.
Currently, the two SRv6 End.DT6 implementations (legacy and VRF mode)
coexist seamlessly and can be chosen according to the context and the user
preferences.
- Patch 1 is needed to solve a pre-existing issue with tunneled packets
when a sniffer is attached;
- Patch 2 improves the management of the seg6local attributes used by the
SRv6 behaviors;
- Patch 3 adds support for optional attributes in SRv6 behaviors;
- Patch 4 introduces two callbacks used for customizing the
creation/destruction of a SRv6 behavior;
- Patch 5 is the core patch that adds support for the SRv6 End.DT4
behavior;
- Patch 6 introduces the VRF support for SRv6 End.DT6 behavior;
- Patch 7 adds the selftest for SRv6 End.DT4 behavior;
- Patch 8 adds the selftest for SRv6 End.DT6 (VRF mode) behavior;
- Patch 9 adds the vrftable attribute for End.DT4/DT6 behaviors in iproute2.
I would like to thank David Ahern for his support during the development of
this patchset.
Comments, suggestions and improvements are very welcome!
Thanks,
Andrea Mayer
v3
notes about the build bot:
- apparently the ',' (comma) in the subject prefix confused the build bot.
Removed the ',' in favor of ' ' (space).
Thanks to David Ahern and Konstantin Ryabitsev for shedding light on this
fact.
Thanks also to Nathan Chancellor for trying to build the patchset v2 by
simulating the bot issue.
add new patch for iproute2:
- [9/9] seg6: add support for vrftable attribute in End.DT4/DT6 behaviors
add new patch:
- [8/9] selftests: add selftest for the SRv6 End.DT6 (VRF) behavior
add new patch:
- [6/9] seg6: add VRF support for SRv6 End.DT6 behavior
add new patch:
- [3/9] seg6: add support for optional attributes in SRv6 behaviors
selftests: add selftest for the SRv6 End.DT4 behavior
- keep David Ahern's review tag since the code wasn't changed. Thanks to David
Ahern for his review.
seg6: add support for the SRv6 End.DT4 behavior
- remove useless error in seg6_end_dt4_build();
- remove #ifdef/#endif stubs for DT4 when CONFIG_NET_L3_MASTER_DEV is not
defined;
- fix coding style.
Thanks to Jakub Kicinski for his review and for all his suggestions.
seg6: add callbacks for customizing the creation/destruction of a behavior
- remove typedef(s) slwt_{build/destroy}_state_t;
- fix coding style: remove empty lines, trivial comments and rename labels in
the seg6_local_build_state() function.
Thanks to Jakub Kicinski for his review and for all his suggestions.
seg6: improve management of behavior attributes
- remove defensive programming approach in destroy_attr_srh(),
destroy_attr_bpf() and destroy_attrs();
- change the __destroy_attrs() function signature, renaming the 'end' argument
'parsed_max'. Now, the __destroy_attrs() keeps only the 'parsed_max' and
'slwt' arguments.
Thanks to Jakub Kicinski for his review and for all his suggestions.
vrf: add mac header for tunneled packets when sniffer is attached
- keep David Ahern's review tag since the code wasn't changed.
Thanks to Jakub Kicinski for pointing it out and David Ahern for his review.
v2
no changes made: resubmitted after false build report.
v1
improve comments;
add new patch 2/5 titled: seg6: improve management of behavior attributes
seg6: add support for the SRv6 End.DT4 behavior
- remove the inline keyword in the definition of fib6_config_get_net().
selftests: add selftest for the SRv6 End.DT4 behavior
- add check for the vrf sysctl
[1] https://tools.ietf.org/html/draft-ietf-spring-srv6-network-programming
Andrea Mayer (8):
vrf: add mac header for tunneled packets when sniffer is attached
seg6: improve management of behavior attributes
seg6: add support for optional attributes in SRv6 behaviors
seg6: add callbacks for customizing the creation/destruction of a
behavior
seg6: add support for the SRv6 End.DT4 behavior
seg6: add VRF support for SRv6 End.DT6 behavior
selftests: add selftest for the SRv6 End.DT4 behavior
selftests: add selftest for the SRv6 End.DT6 (VRF) behavior
drivers/net/vrf.c | 78 ++-
include/uapi/linux/seg6_local.h | 1 +
net/ipv6/seg6_local.c | 593 +++++++++++++++++-
.../selftests/net/srv6_end_dt4_l3vpn_test.sh | 494 +++++++++++++++
.../selftests/net/srv6_end_dt6_l3vpn_test.sh | 502 +++++++++++++++
5 files changed, 1649 insertions(+), 19 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
create mode 100755 tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
Paolo Lungaroni (1):
seg6: add support for vrftable attribute in End.DT4/DT6 behaviors
include/uapi/linux/seg6_local.h | 1 +
ip/iproute_lwtunnel.c | 19 ++++++++++++++++---
2 files changed, 17 insertions(+), 3 deletions(-)
--
2.20.1
The eeh-basic test got its own 60 seconds timeout (defined in commit
414f50434aa2 "selftests/eeh: Bump EEH wait time to 60s") per breakable
device.
And we have discovered that the number of breakable devices varies
on different hardware. The device recovery time ranges from 0 to 35
seconds. In our test pool it will take about 30 seconds to run on a
Power8 system that with 5 breakable devices, 60 seconds to run on a
Power9 system that with 4 breakable devices.
Extend the timeout setting in the kselftest framework to 5 minutes
to give it a chance to finish.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/powerpc/eeh/Makefile | 2 +-
tools/testing/selftests/powerpc/eeh/settings | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/powerpc/eeh/settings
diff --git a/tools/testing/selftests/powerpc/eeh/Makefile b/tools/testing/selftests/powerpc/eeh/Makefile
index b397bab..ae963eb 100644
--- a/tools/testing/selftests/powerpc/eeh/Makefile
+++ b/tools/testing/selftests/powerpc/eeh/Makefile
@@ -3,7 +3,7 @@ noarg:
$(MAKE) -C ../
TEST_PROGS := eeh-basic.sh
-TEST_FILES := eeh-functions.sh
+TEST_FILES := eeh-functions.sh settings
top_srcdir = ../../../../..
include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/eeh/settings b/tools/testing/selftests/powerpc/eeh/settings
new file mode 100644
index 0000000..694d707
--- /dev/null
+++ b/tools/testing/selftests/powerpc/eeh/settings
@@ -0,0 +1 @@
+timeout=300
--
2.7.4
This patch set adds AF_XDP selftests based on veth to selftests/bpf.
# Topology:
# ---------
# -----------
# _ | Process | _
# / ----------- \
# / | \
# / | \
# ----------- | -----------
# | Thread1 | | | Thread2 |
# ----------- | -----------
# | | |
# ----------- | -----------
# | xskX | | | xskY |
# ----------- | -----------
# | | |
# ----------- | ----------
# | vethX | --------- | vethY |
# ----------- peer ----------
# | | |
# namespaceX | namespaceY
These selftests test AF_XDP SKB and Native/DRV modes using veth Virtual
Ethernet interfaces.
The test program contains two threads, each thread is single socket with
a unique UMEM. It validates in-order packet delivery and packet content
by sending packets to each other.
Prerequisites setup by script test_xsk_prerequisites.sh:
Set up veth interfaces as per the topology shown ^^:
* setup two veth interfaces and one namespace
** veth<xxxx> in root namespace
** veth<yyyy> in af_xdp<xxxx> namespace
** namespace af_xdp<xxxx>
* create a spec file veth.spec that includes this run-time configuration
that is read by test scripts - filenames prefixed with test_xsk_
*** xxxx and yyyy are randomly generated 4 digit numbers used to avoid
conflict with any existing interface
The following tests are provided:
1. AF_XDP SKB mode
Generic mode XDP is driver independent, used when the driver does
not have support for XDP. Works on any netdevice using sockets and
generic XDP path. XDP hook from netif_receive_skb().
a. nopoll - soft-irq processing
b. poll - using poll() syscall
c. Socket Teardown
Create a Tx and a Rx socket, Tx from one socket, Rx on another.
Destroy both sockets, then repeat multiple times. Only nopoll mode
is used
d. Bi-directional Sockets
Configure sockets as bi-directional tx/rx sockets, sets up fill
and completion rings on each socket, tx/rx in both directions.
Only nopoll mode is used
2. AF_XDP DRV/Native mode
Works on any netdevice with XDP_REDIRECT support, driver dependent.
Processes packets before SKB allocation. Provides better performance
than SKB. Driver hook available just after DMA of buffer descriptor.
a. nopoll
b. poll
c. Socket Teardown
d. Bi-directional Sockets
* Only copy mode is supported because veth does not currently support
zero-copy mode
Total tests: 8
Flow:
* Single process spawns two threads: Tx and Rx
* Each of these two threads attach to a veth interface within their
assigned namespaces
* Each thread creates one AF_XDP socket connected to a unique umem
for each veth interface
* Tx thread transmits 10k packets from veth<xxxx> to veth<yyyy>
* Rx thread verifies if all 10k packets were received and delivered
in-order, and have the right content
v2 changes:
* Move selftests/xsk to selftests/bpf
* Remove Makefiles under selftests/xsk, and utilize selftests/bpf/Makefile
Structure of the patch set:
Patch 1: This patch adds XSK Selftests framework under selftests/bpf
Patch 2: Adds tests: SKB poll and nopoll mode, and mac-ip-udp debug
Patch 3: Adds tests: DRV poll and nopoll mode
Patch 4: Adds tests: SKB and DRV Socket Teardown
Patch 5: Adds tests: SKB and DRV Bi-directional Sockets
Thanks: Weqaar
Weqaar Janjua (5):
selftests/bpf: xsk selftests framework
selftests/bpf: xsk selftests - SKB POLL, NOPOLL
selftests/bpf: xsk selftests - DRV POLL, NOPOLL
selftests/bpf: xsk selftests - Socket Teardown - SKB, DRV
selftests/bpf: xsk selftests - Bi-directional Sockets - SKB, DRV
tools/testing/selftests/bpf/Makefile | 15 +-
.../bpf/test_xsk_drv_bidirectional.sh | 23 +
.../selftests/bpf/test_xsk_drv_nopoll.sh | 20 +
.../selftests/bpf/test_xsk_drv_poll.sh | 20 +
.../selftests/bpf/test_xsk_drv_teardown.sh | 20 +
.../selftests/bpf/test_xsk_prerequisites.sh | 127 ++
.../bpf/test_xsk_skb_bidirectional.sh | 20 +
.../selftests/bpf/test_xsk_skb_nopoll.sh | 20 +
.../selftests/bpf/test_xsk_skb_poll.sh | 20 +
.../selftests/bpf/test_xsk_skb_teardown.sh | 20 +
tools/testing/selftests/bpf/xdpxceiver.c | 1056 +++++++++++++++++
tools/testing/selftests/bpf/xdpxceiver.h | 158 +++
tools/testing/selftests/bpf/xsk_env.sh | 28 +
tools/testing/selftests/bpf/xsk_prereqs.sh | 119 ++
14 files changed, 1664 insertions(+), 2 deletions(-)
create mode 100755 tools/testing/selftests/bpf/test_xsk_drv_bidirectional.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_drv_nopoll.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_drv_poll.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_drv_teardown.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_prerequisites.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_skb_bidirectional.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_skb_nopoll.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_skb_poll.sh
create mode 100755 tools/testing/selftests/bpf/test_xsk_skb_teardown.sh
create mode 100644 tools/testing/selftests/bpf/xdpxceiver.c
create mode 100644 tools/testing/selftests/bpf/xdpxceiver.h
create mode 100755 tools/testing/selftests/bpf/xsk_env.sh
create mode 100755 tools/testing/selftests/bpf/xsk_prereqs.sh
--
2.20.1
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, in the future the secret mappings may be used as a mean to
protect guest memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows (ab)use of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
To limit fragmentation of the direct map to splitting only PUD-size pages,
I've added an amortizing cache of PMD-size pages to each file descriptor
that is used as an allocation pool for the secret memory areas.
As the memory allocated by secretmem becomes unmovable, we use CMA to back
large page caches so that page allocator won't be surprised by failing attempt
to migrate these pages.
v11:
* Drop support for uncached mappings
v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org
* Drop changes to arm64 compatibility layer
* Add Roman's Ack for memcg accounting
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
* Fix build with and without CONFIG_MEMCG
* Update memcg accounting to avoid copying memcg_data, per Roman comments
* Fix issues in secretmem_fault(), thanks Matthew
* Do not wire up syscall in arm64 compatibility layer
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
* Use CMA for all secretmem allocations as David suggested
* Update memcg accounting after transtion to CMA
* Prevent hibernation when there are active secretmem users
* Add zeroing of the memory before releasing it back to cma/page allocator
* Rebase on v5.10-rc2-mmotm-2020-11-07-21-40
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
* Use set_direct_map() instead of __kernel_map_pages() to ensure error
handling in case the direct map update fails
* Add accounting of large pages used to reduce the direct map fragmentation
* Teach get_user_pages() and frieds to refuse get/pin secretmem pages
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
* Silence the warning about missing syscall, thanks to Qian Cai
* Replace spaces with tabs in Kconfig additions, per Randy
* Add a selftest.
Older history:
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
Mike Rapoport (9):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
set_memory: allow set_direct_map_*_noflush() for multiple pages
mm: introduce memfd_secret system call to create "secret" memory areas
secretmem: use PMD-size pages to amortize direct map fragmentation
secretmem: add memcg accounting
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call were relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/arm64/include/asm/cacheflush.h | 4 +-
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/mm/pageattr.c | 10 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/Kconfig | 2 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/secretmem.h | 30 ++
include/linux/set_memory.h | 4 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 5 +
mm/Makefile | 1 +
mm/filemap.c | 3 +-
mm/gup.c | 10 +
mm/internal.h | 3 +
mm/mmap.c | 5 +-
mm/secretmem.c | 436 ++++++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 298 +++++++++++++++
tools/testing/selftests/vm/run_vmtests | 17 +
34 files changed, 863 insertions(+), 39 deletions(-)
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
base-commit: 9f8ce377d420db12b19d6a4f636fecbd88a725a5
--
2.28.0
Nowadays, there are increasing requirements to benchmark the performance
of dma_map and dma_unmap particually while the device is attached to an
IOMMU.
This patchset provides the benchmark infrastruture for streaming DMA
mapping. The architecture of the code is pretty much similar with GUP
benchmark:
* mm/gup_benchmark.c provides kernel interface;
* tools/testing/selftests/vm/gup_benchmark.c provides user program to
call the interface provided by mm/gup_benchmark.c.
In our case, kernel/dma/map_benchmark.c is like mm/gup_benchmark.c;
tools/testing/selftests/dma/dma_map_benchmark.c is like tools/testing/
selftests/vm/gup_benchmark.c
A major difference with GUP benchmark is DMA_MAP benchmark needs to run
on a device. Considering one board with below devices and IOMMUs
device A ------- IOMMU 1
device B ------- IOMMU 2
device C ------- non-IOMMU
Different devices might attach to different IOMMU or non-IOMMU. To make
benchmark run, we can either
* create a virtual device and hack the kernel code to attach the virtual
device to IOMMU1, IOMMU2 or non-IOMMU.
* use the existing driver_override mechinism, unbind device A,B, OR c from
their original driver and bind A to dma_map_benchmark platform driver or
pci driver for benchmarking.
In this patchset, I prefer to use the driver_override and avoid the ugly
hack in kernel. We can dynamically switch device behind different IOMMUs
to get the performance of IOMMU or non-IOMMU.
-v4:
* add dma direction support according to Christoph Hellwig's comment;
* add dma mask bit set according to Christoph Hellwig's comment;
* make the benchmark depend on DEBUG_FS according to John Garry's comment;
* strictly check parameters in ioctl
-v3:
* fix build issues reported by 0day kernel test robot
-v2:
* add PCI support; v1 supported platform devices only
* replace ssleep by msleep_interruptible() to permit users to exit
benchmark before it is completed
* many changes according to Robin's suggestions, thanks! Robin
- add standard deviation output to reflect the worst case
- check users' parameters strictly like the number of threads
- make cache dirty before dma_map
- fix unpaired dma_map_page and dma_unmap_single;
- remove redundant "long long" before ktime_to_ns();
- use devm_add_action()
Barry Song (2):
dma-mapping: add benchmark support for streaming DMA APIs
selftests/dma: add test application for DMA_MAP_BENCHMARK
MAINTAINERS | 6 +
kernel/dma/Kconfig | 9 +
kernel/dma/Makefile | 1 +
kernel/dma/map_benchmark.c | 361 ++++++++++++++++++
tools/testing/selftests/dma/Makefile | 6 +
tools/testing/selftests/dma/config | 1 +
.../testing/selftests/dma/dma_map_benchmark.c | 123 ++++++
7 files changed, 507 insertions(+)
create mode 100644 kernel/dma/map_benchmark.c
create mode 100644 tools/testing/selftests/dma/Makefile
create mode 100644 tools/testing/selftests/dma/config
create mode 100644 tools/testing/selftests/dma/dma_map_benchmark.c
--
2.25.1
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will have desired protection bits set in the user page
table. For instance, current implementation allows uncached mappings.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, in the future the secret mappings may be used as a mean to
protect guest memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows (ab)use of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
To limit fragmentation of the direct map to splitting only PUD-size pages,
I've added an amortizing cache of PMD-size pages to each file descriptor
that is used as an allocation pool for the secret memory areas.
As the memory allocated by secretmem becomes unmovable, we use CMA to back
large page caches so that page allocator won't be surprised by failing attempt
to migrate these pages.
v10:
* Drop changes to arm64 compatibility layer
* Add Roman's Ack for memcg accounting
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
* Fix build with and without CONFIG_MEMCG
* Update memcg accounting to avoid copying memcg_data, per Roman comments
* Fix issues in secretmem_fault(), thanks Matthew
* Do not wire up syscall in arm64 compatibility layer
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
* Use CMA for all secretmem allocations as David suggested
* Update memcg accounting after transtion to CMA
* Prevent hibernation when there are active secretmem users
* Add zeroing of the memory before releasing it back to cma/page allocator
* Rebase on v5.10-rc2-mmotm-2020-11-07-21-40
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
* Use set_direct_map() instead of __kernel_map_pages() to ensure error
handling in case the direct map update fails
* Add accounting of large pages used to reduce the direct map fragmentation
* Teach get_user_pages() and frieds to refuse get/pin secretmem pages
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
* Silence the warning about missing syscall, thanks to Qian Cai
* Replace spaces with tabs in Kconfig additions, per Randy
* Add a selftest.
Older history:
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
Mike Rapoport (9):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
set_memory: allow set_direct_map_*_noflush() for multiple pages
mm: introduce memfd_secret system call to create "secret" memory areas
secretmem: use PMD-size pages to amortize direct map fragmentation
secretmem: add memcg accounting
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call were relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/Kconfig | 7 +
arch/arm64/include/asm/cacheflush.h | 4 +-
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/mm/pageattr.c | 10 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/secretmem.h | 30 ++
include/linux/set_memory.h | 4 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/magic.h | 1 +
include/uapi/linux/secretmem.h | 8 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 5 +
mm/Makefile | 1 +
mm/filemap.c | 3 +-
mm/gup.c | 10 +
mm/internal.h | 3 +
mm/mmap.c | 5 +-
mm/secretmem.c | 446 ++++++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 298 +++++++++++++++
tools/testing/selftests/vm/run_vmtests | 17 +
36 files changed, 888 insertions(+), 38 deletions(-)
create mode 100644 include/linux/secretmem.h
create mode 100644 include/uapi/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
base-commit: 9f8ce377d420db12b19d6a4f636fecbd88a725a5
--
2.28.0