Introduce in-kernel headers and other artifacts which are made available
as an archive through proc (/proc/kheaders.txz file). This archive makes
it possible to build kernel modules, run eBPF programs, and other
tracing programs that need to extend the kernel for tracing purposes
without any dependency on the file system having headers and build
artifacts.
On Android and embedded systems, it is common to switch kernels but not
have kernel headers available on the file system. Raw kernel headers
also cannot be copied into the filesystem like they can be on other
distros, due to licensing and other issues. There's no linux-headers
package on Android. Further once a different kernel is booted, any
headers stored on the file system will no longer be useful. By storing
the headers as a compressed archive within the kernel, we can avoid these
issues that have been a hindrance for a long time.
The feature is also buildable as a module just in case the user desires
it not being part of the kernel image. This makes it possible to load
and unload the headers on demand. A tracing program, or a kernel module
builder can load the module, do its operations, and then unload the
module to save kernel memory. The total memory needed is 3.8MB.
The code to read the headers is based on /proc/config.gz code and uses
the same technique to embed the headers.
To build a module, the below steps have been tested on an x86 machine:
modprobe kheaders
rm -rf $HOME/headers
mkdir -p $HOME/headers
tar -xvf /proc/kheaders.txz -C $HOME/headers >/dev/null
cd my-kernel-module
make -C $HOME/headers M=$(pwd) modules
rmmod kheaders
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
Changes since RFC:
Both changes bring size down to 3.8MB:
- use xz for compression
- strip comments except SPDX lines
- Call out the module name in Kconfig
- Also added selftests in second patch to ensure headers are always
working.
Documentation/dontdiff | 1 +
arch/x86/Makefile | 2 ++
init/Kconfig | 11 ++++++
kernel/.gitignore | 2 ++
kernel/Makefile | 29 +++++++++++++++
kernel/kheaders.c | 74 +++++++++++++++++++++++++++++++++++++++
scripts/gen_ikh_data.sh | 19 ++++++++++
scripts/strip-comments.pl | 8 +++++
8 files changed, 146 insertions(+)
create mode 100644 kernel/kheaders.c
create mode 100755 scripts/gen_ikh_data.sh
create mode 100755 scripts/strip-comments.pl
diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 2228fcc8e29f..05a2319ee2a2 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -151,6 +151,7 @@ int8.c
kallsyms
kconfig
keywords.c
+kheaders_data.h*
ksym.c*
ksym.h*
kxgettext
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 88398fdf8129..ad176d669da4 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -240,6 +240,8 @@ archmacros:
ASM_MACRO_FLAGS = -Wa,arch/x86/kernel/macros.s
export ASM_MACRO_FLAGS
KBUILD_CFLAGS += $(ASM_MACRO_FLAGS)
+IKH_EXTRA += arch/x86/kernel/macros.s
+export IKH_EXTRA
###
# Kernel objects
diff --git a/init/Kconfig b/init/Kconfig
index a4112e95724a..b95d769b6098 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -549,6 +549,17 @@ config IKCONFIG_PROC
This option enables access to the kernel configuration file
through /proc/config.gz.
+config IKHEADERS_PROC
+ tristate "Enable kernel header artifacts through /proc/kheaders.txz"
+ select BUILD_BIN2C
+ depends on PROC_FS
+ help
+ This option enables access to the kernel header and other artifacts that
+ are generated during the build process. These can be used to build kernel
+ modules, and other in-kernel programs such as those generated by eBPF
+ and systemtap tools. If you build the headers as a module, a module
+ called kheaders.ko is built which can be loaded to get access to them.
+
config LOG_BUF_SHIFT
int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
range 12 25
diff --git a/kernel/.gitignore b/kernel/.gitignore
index b3097bde4e9c..6acf71acbdcb 100644
--- a/kernel/.gitignore
+++ b/kernel/.gitignore
@@ -3,5 +3,7 @@
#
config_data.h
config_data.gz
+kheaders_data.h
+kheaders_data.txz
timeconst.h
hz.bc
diff --git a/kernel/Makefile b/kernel/Makefile
index 7343b3a9bff0..aa2d3f9b9f49 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -73,6 +73,7 @@ obj-$(CONFIG_UTS_NS) += utsname.o
obj-$(CONFIG_USER_NS) += user_namespace.o
obj-$(CONFIG_PID_NS) += pid_namespace.o
obj-$(CONFIG_IKCONFIG) += configs.o
+obj-$(CONFIG_IKHEADERS_PROC) += kheaders.o
obj-$(CONFIG_SMP) += stop_machine.o
obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
obj-$(CONFIG_AUDIT) += audit.o auditfilter.o
@@ -131,3 +132,31 @@ $(obj)/config_data.gz: $(KCONFIG_CONFIG) FORCE
targets += config_data.h
$(obj)/config_data.h: $(obj)/config_data.gz FORCE
$(call filechk,ikconfiggz)
+
+# Build a list of in-kernel headers for building kernel modules
+# Any other files will be stored in IKH_EXTRA variable.
+ikh_file_list := include/
+ikh_file_list += arch/$(ARCH)/Makefile
+ikh_file_list += arch/$(ARCH)/include/
+ikh_file_list += $(IKH_EXTRA)
+ikh_file_list += scripts/
+ikh_file_list += Makefile
+ikh_file_list += Module.symvers
+ifeq ($(CONFIG_STACK_VALIDATION), y)
+ikh_file_list += $(objtree)/tools/objtool/objtool
+endif
+
+$(obj)/kheaders.o: $(obj)/kheaders_data.h
+
+targets += kheaders_data.txz
+
+quiet_cmd_genikh = GEN $(obj)/kheaders_data.txz
+cmd_genikh = $(srctree)/scripts/gen_ikh_data.sh $@ $^ >/dev/null 2>&1
+$(obj)/kheaders_data.txz: $(ikh_file_list) FORCE
+ $(call cmd,genikh)
+
+filechk_ikheadersxz = (echo "static const char kernel_headers_data[] __used = KH_MAGIC_START"; cat $< | scripts/bin2c; echo "KH_MAGIC_END;")
+
+targets += kheaders_data.h
+$(obj)/kheaders_data.h: $(obj)/kheaders_data.txz FORCE
+ $(call filechk,ikheadersxz)
diff --git a/kernel/kheaders.c b/kernel/kheaders.c
new file mode 100644
index 000000000000..c39930f51202
--- /dev/null
+++ b/kernel/kheaders.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kernel/kheaders.c
+ * Provide headers and artifacts needed to build kernel modules.
+ * (Borrowed code from kernel/configs.c)
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
+#include <linux/init.h>
+#include <linux/uaccess.h>
+
+/*
+ * Define kernel_headers_data and kernel_headers_data_size, which contains the
+ * compressed kernel headers. The file is first compressed with xz and then
+ * bounded by two eight byte magic numbers to allow extraction from a binary
+ * kernel image:
+ *
+ * IKHD_ST
+ * <image>
+ * IKHD_ED
+ */
+#define KH_MAGIC_START "IKHD_ST"
+#define KH_MAGIC_END "IKHD_ED"
+#include "kheaders_data.h"
+
+
+#define KH_MAGIC_SIZE (sizeof(KH_MAGIC_START) - 1)
+#define kernel_headers_data_size \
+ (sizeof(kernel_headers_data) - 1 - KH_MAGIC_SIZE * 2)
+
+static ssize_t
+ikheaders_read_current(struct file *file, char __user *buf,
+ size_t len, loff_t *offset)
+{
+ return simple_read_from_buffer(buf, len, offset,
+ kernel_headers_data + KH_MAGIC_SIZE,
+ kernel_headers_data_size);
+}
+
+static const struct file_operations ikheaders_file_ops = {
+ .owner = THIS_MODULE,
+ .read = ikheaders_read_current,
+ .llseek = default_llseek,
+};
+
+static int __init ikheaders_init(void)
+{
+ struct proc_dir_entry *entry;
+
+ /* create the current headers file */
+ entry = proc_create("kheaders.txz", S_IFREG | S_IRUGO, NULL,
+ &ikheaders_file_ops);
+ if (!entry)
+ return -ENOMEM;
+
+ proc_set_size(entry, kernel_headers_data_size);
+
+ return 0;
+}
+
+static void __exit ikheaders_cleanup(void)
+{
+ remove_proc_entry("kheaders.txz", NULL);
+}
+
+module_init(ikheaders_init);
+module_exit(ikheaders_cleanup);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Joel Fernandes");
+MODULE_DESCRIPTION("Echo the kernel header artifacts used to build the kernel");
diff --git a/scripts/gen_ikh_data.sh b/scripts/gen_ikh_data.sh
new file mode 100755
index 000000000000..609196b5cea2
--- /dev/null
+++ b/scripts/gen_ikh_data.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+spath="$(dirname "$(readlink -f "$0")")"
+
+rm -rf $1.tmp
+mkdir $1.tmp
+
+for f in "${@:2}";
+ do find "$f" ! -name "*.c" ! -name "*.o" ! -name "*.cmd" ! -name ".*";
+done | cpio -pd $1.tmp
+
+for f in $(find $1.tmp); do
+ $spath/strip-comments.pl $f
+done
+
+tar -Jcf $1 -C $1.tmp/ . > /dev/null
+
+rm -rf $1.tmp
diff --git a/scripts/strip-comments.pl b/scripts/strip-comments.pl
new file mode 100755
index 000000000000..f8ada87c5802
--- /dev/null
+++ b/scripts/strip-comments.pl
@@ -0,0 +1,8 @@
+#!/usr/bin/perl -pi
+# SPDX-License-Identifier: GPL-2.0
+
+# This script removes /**/ comments from a file, unless such comments
+# contain "SPDX". It is used when building compressed in-kernel headers.
+
+BEGIN {undef $/;}
+s/\/\*((?!SPDX).)*?\*\///smg;
--
2.20.1.611.gfbb209baf1-goog
This series attempts to make the fsgsbase test in the x86 kselftest
report a stable result. On some Intel systems there are intermittent
failures in this testcase which have been reported and discussed
previously with the initial report and last meaningful discussion having
been about a year ago:
https://lore.kernel.org/lkml/20180126153631.ha7yc33fj5uhitjo@xps/
with the analysis concluding that this is a hardware issue affecting a
subset of systems but no fix has been merged as yet. In order to at
least make the test more solid for use in automated testing this series
modifies it to execute the test often enough to reproduce the problem
reliably, at least for the systems I have access to.
Mark Brown (2):
selftests/x86/fsgsbase: Indirect output through a wrapper function
selftests/x86/fsgsbase: Default to trying to run the test repeatedly
tools/testing/selftests/x86/fsgsbase.c | 79 +++++++++++++++++++-------
1 file changed, 58 insertions(+), 21 deletions(-)
--
2.20.1
Fix fw_filesystem.sh to run in an automated environment under busybox.
After this change, fw_run_tests.sh still fails at some point in fw_fallback.sh,
with error "usermode helper disabled so ignoring test". This is coming from
fw_lib.sh:verify_reqs() because $HAS_FW_LOADER_USER_HELPER is set to no.
Dan Rue (2):
selftests: firmware: remove use of non-standard diff -Z option
selftests: firmware: add CONFIG_FW_LOADER_USER_HELPER_FALLBACK to
config
tools/testing/selftests/firmware/config | 1 +
tools/testing/selftests/firmware/fw_filesystem.sh | 9 +++------
2 files changed, 4 insertions(+), 6 deletions(-)
--
2.19.1
At present this exposes a bug in do_proc_dointvec_minmax_conv() (it
fails to check for values that are too wide to fit in an int).
Signed-off-by: Zev Weiss <zev(a)bewilderbeest.net>
---
tools/testing/selftests/sysctl/sysctl.sh | 55 ++++++++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/tools/testing/selftests/sysctl/sysctl.sh b/tools/testing/selftests/sysctl/sysctl.sh
index 584eb8ea780a..780ce7123374 100755
--- a/tools/testing/selftests/sysctl/sysctl.sh
+++ b/tools/testing/selftests/sysctl/sysctl.sh
@@ -290,6 +290,58 @@ run_numerictests()
test_rc
}
+check_failure()
+{
+ echo -n "Testing that $1 fails as expected..."
+ reset_vals
+ TEST_STR="$1"
+ orig="$(cat $TARGET)"
+ echo -n "$TEST_STR" > $TARGET 2> /dev/null
+
+ # write should fail and $TARGET should retain its original value
+ if [ $? = 0 ] || [ "$(cat $TARGET)" != "$orig" ]; then
+ echo "FAIL" >&2
+ rc=1
+ else
+ echo "ok"
+ fi
+ test_rc
+}
+
+run_wideint_tests()
+{
+ # sysctl conversion functions receive a boolean sign and ulong
+ # magnitude; here we list the magnitudes we want to test (each of
+ # which will be tested in both positive and negative forms). Since
+ # none of these values fit in 32 bits, writing them to an int- or
+ # uint-typed sysctl should fail.
+ local magnitudes=(
+ # common boundary-condition values (zero, +1, -1, INT_MIN,
+ # and INT_MAX respectively) if truncated to lower 32 bits
+ # (potential for being falsely deemed in range)
+ 0x0000000100000000
+ 0x0000000100000001
+ 0x00000001ffffffff
+ 0x0000000180000000
+ 0x000000017fffffff
+
+ # these look like negatives, but without a leading '-' are
+ # actually large positives (should be rejected as above
+ # despite being zero/+1/-1/INT_MIN/INT_MAX in the lower 32)
+ 0xffffffff00000000
+ 0xffffffff00000001
+ 0xffffffffffffffff
+ 0xffffffff80000000
+ 0xffffffff7fffffff
+ )
+
+ for sign in '' '-'; do
+ for mag in "${magnitudes[@]}"; do
+ check_failure "${sign}${mag}"
+ done
+ done
+}
+
# Your test must accept digits 3 and 4 to use this
run_limit_digit()
{
@@ -556,6 +608,7 @@ sysctl_test_0001()
TEST_STR=$(( $ORIG + 1 ))
run_numerictests
+ run_wideint_tests
run_limit_digit
}
@@ -580,6 +633,7 @@ sysctl_test_0003()
TEST_STR=$(( $ORIG + 1 ))
run_numerictests
+ run_wideint_tests
run_limit_digit
run_limit_digit_int
}
@@ -592,6 +646,7 @@ sysctl_test_0004()
TEST_STR=$(( $ORIG + 1 ))
run_numerictests
+ run_wideint_tests
run_limit_digit
run_limit_digit_uint
}
--
2.20.1
Thanks for the patches, please include akpm(a)linux-foundation.org in the
future, as we can merge the changes through Andrew as well.
Also please Cc yzaikin(a)google.com, brendanhiggins(a)google.com in follow
ups for now. They are looking at the sysctl testing code as well.
Some feedback below:
In-Reply-To: <20181227111231.12912-2-zev(a)bewilderbeest.net>
On Thu, Dec 27, 2018 at 05:12:29AM -0600, Zev Weiss wrote:
> +run_wideint_tests()
> +{
> + # check negative and positive 64-bit values, with and without
> + # bits set in the lower 31, and with and without bit 31 (sign
> + # bit of a 32-bit int) set. None of these are representable
> + # in 32 bits, and hence all should fail.
> + check_failure 0x0000010000000000
> + check_failure 0x0000010080000000
> + check_failure 0x000001ff7fffffff
> + check_failure 0x000001ffffffffff
> + check_failure 0xffffffff7fffffff
> + check_failure 0xffffffffffffffff
This s64 version of -1
> + check_failure 0xffffff0000000000
> + check_failure 0xffffff0080000000
> +}
It was still unclear from the comments and manually looking at the
values why they are clear candidates to always test from all respective
64-bit values. A comment per each would be useful.
Luis
Hi,
This patch series adds optional support for using MSI interrupts instead
of NTB doorbells in ntb_transport. This is desirable seeing doorbells on
current hardware are quite slow and therefore switching to MSI interrupts
provides a significant performance gain. On switchtec hardware, a simple
apples-to-apples comparison shows ntb_netdev/iperf numbers going from
3.88Gb/s to 14.1Gb/s when switching to MSI interrupts.
To do this, a couple changes are required outside of the NTB tree:
1) The IOMMU must know to accept MSI requests from aliased bused numbers
seeing NTB hardware typically sends proxied request IDs through
additional requester IDs. The first patch in this series adds support
for the Intel IOMMU. A quirk to add these aliases for switchtec hardware
was already accepted. See commit ad281ecf1c7d ("PCI: Add DMA alias quirk
for Microsemi Switchtec NTB") for a description of NTB proxy IDs and why
this is necessary.
2) NTB transport (and other clients) may often need more MSI interrupts
than the NTB hardware actually advertises support for. However, seeing
these interrupts will not be triggered by the hardware but through an
NTB memory window, the hardware does not actually need support or need
to know about them. Therefore we add the concept of Virtual MSI
interrupts which are allocated just like any other MSI interrupt but
are not programmed into the hardware's MSI table. This is done in
Patch 2 and then made use of in Patch 3.
The remaining patches in this series add a library for dealing with MSI
interrupts, a test client and finally support in ntb_transport.
The series is based off of v5.0-rc4 and I've tested it on top of a
of the patches I've already sent to the NTB tree (though they are
independent changes). A git repo is available here:
https://github.com/sbates130272/linux-p2pmem/ ntb_transport_msi_v1
Thanks,
Logan
--
Logan Gunthorpe (9):
iommu/vt-d: Allow interrupts from the entire bus for aliased devices
PCI/MSI: Support allocating virtual MSI interrupts
PCI/switchtec: Add module parameter to request more interrupts
NTB: Introduce functions to calculate multi-port resource index
NTB: Rename ntb.c to support multiple source files in the module
NTB: Introduce MSI library
NTB: Introduce NTB MSI Test Client
NTB: Add ntb_msi_test support to ntb_test
NTB: Add MSI interrupt support to ntb_transport
drivers/iommu/intel_irq_remapping.c | 12 +
drivers/ntb/Kconfig | 10 +
drivers/ntb/Makefile | 3 +
drivers/ntb/{ntb.c => core.c} | 0
drivers/ntb/msi.c | 313 ++++++++++++++++++
drivers/ntb/ntb_transport.c | 134 +++++++-
drivers/ntb/test/Kconfig | 9 +
drivers/ntb/test/Makefile | 1 +
drivers/ntb/test/ntb_msi_test.c | 416 ++++++++++++++++++++++++
drivers/pci/msi.c | 51 ++-
drivers/pci/switch/switchtec.c | 12 +-
include/linux/msi.h | 1 +
include/linux/ntb.h | 139 ++++++++
include/linux/pci.h | 9 +
tools/testing/selftests/ntb/ntb_test.sh | 54 ++-
15 files changed, 1150 insertions(+), 14 deletions(-)
rename drivers/ntb/{ntb.c => core.c} (100%)
create mode 100644 drivers/ntb/msi.c
create mode 100644 drivers/ntb/test/ntb_msi_test.c
--
2.19.0