Here is a tiny new syscall, readfile, that makes it simpler to read small/medium sized files all in one shot, no need to do open/read/close. This is especially helpful for tools that poke around in procfs or sysfs, making a little bit of a less system load than before, especially as syscall overheads go up over time due to various CPU bugs being addressed.
There are 4 patches in this series, the first 3 are against the kernel tree, adding the syscall logic, wiring up the syscall, and adding some tests for it.
The last patch is agains the man-pages project, adding a tiny man page to try to describe the new syscall.
Greg Kroah-Hartman (3): readfile: implement readfile syscall arch: wire up the readfile syscall selftests: add readfile(2) selftests
arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + fs/open.c | 50 +++ include/linux/syscalls.h | 2 + include/uapi/asm-generic/unistd.h | 4 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/readfile/.gitignore | 3 + tools/testing/selftests/readfile/Makefile | 7 + tools/testing/selftests/readfile/readfile.c | 285 +++++++++++++++++ .../selftests/readfile/readfile_speed.c | 301 ++++++++++++++++++ 26 files changed, 671 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/readfile/.gitignore create mode 100644 tools/testing/selftests/readfile/Makefile create mode 100644 tools/testing/selftests/readfile/readfile.c create mode 100644 tools/testing/selftests/readfile/readfile_speed.c
It's a tiny syscall, meant to allow a user to do a single "open this file, read into this buffer, and close the file" all in a single shot.
Should be good for reading "tiny" files like sysfs, procfs, and other "small" files.
There is no restarting the syscall, this is a "simple" syscall, with the attempt to make reading "simple" files easier with less syscall overhead.
Cc: Alexander Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/open.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
diff --git a/fs/open.c b/fs/open.c index 6cd48a61cda3..4469faa9379c 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1370,3 +1370,53 @@ int stream_open(struct inode *inode, struct file *filp) }
EXPORT_SYMBOL(stream_open); + +static struct file *readfile_open(int dfd, const char __user *filename, + struct open_flags *op) +{ + struct filename *tmp; + struct file *f; + + tmp = getname(filename); + if (IS_ERR(tmp)) + return (struct file *)tmp; + + f = do_filp_open(dfd, tmp, op); + if (!IS_ERR(f)) + fsnotify_open(f); + + putname(tmp); + return f; +} + +SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename, + char __user *, buffer, size_t, bufsize, int, flags) +{ + struct open_flags op; + struct open_how how; + struct file *file; + loff_t pos = 0; + int retval; + + /* only accept a small subset of O_ flags that make sense */ + if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags) + return -EINVAL; + + /* add some needed flags to be able to open the file properly */ + flags |= O_RDONLY | O_LARGEFILE; + + how = build_open_how(flags, 0000); + retval = build_open_flags(&how, &op); + if (retval) + return retval; + + file = readfile_open(dfd, filename, &op); + if (IS_ERR(file)) + return PTR_ERR(file); + + retval = vfs_read(file, buffer, bufsize, &pos); + + filp_close(file, NULL); + + return retval; +}
Hi Greg,
On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
It's a tiny syscall, meant to allow a user to do a single "open this file, read into this buffer, and close the file" all in a single shot.
Should be good for reading "tiny" files like sysfs, procfs, and other "small" files.
There is no restarting the syscall, this is a "simple" syscall, with the attempt to make reading "simple" files easier with less syscall overhead.
Cc: Alexander Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Thanks for your patch!
--- a/fs/open.c +++ b/fs/open.c
+SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename,
char __user *, buffer, size_t, bufsize, int, flags)
+{
struct open_flags op;
struct open_how how;
struct file *file;
loff_t pos = 0;
int retval;
/* only accept a small subset of O_ flags that make sense */
if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags)
return -EINVAL;
/* add some needed flags to be able to open the file properly */
flags |= O_RDONLY | O_LARGEFILE;
how = build_open_how(flags, 0000);
retval = build_open_flags(&how, &op);
if (retval)
return retval;
file = readfile_open(dfd, filename, &op);
if (IS_ERR(file))
return PTR_ERR(file);
retval = vfs_read(file, buffer, bufsize, &pos);
Should there be a way for the user to be informed that the file doesn't fit in the provided buffer (.e.g. -EFBIG)?
filp_close(file, NULL);
return retval;
+}
Gr{oetje,eeting}s,
Geert
On Sat, Jul 04, 2020 at 04:02:47PM +0200, Greg Kroah-Hartman wrote:
- /* only accept a small subset of O_ flags that make sense */
- if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags)
return -EINVAL;
- /* add some needed flags to be able to open the file properly */
- flags |= O_RDONLY | O_LARGEFILE;
Should we also add O_NOCTTY to prevent any problems if this is called on a tty?
On Sat, Jul 4, 2020 at 4:03 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
It's a tiny syscall, meant to allow a user to do a single "open this file, read into this buffer, and close the file" all in a single shot.
Should be good for reading "tiny" files like sysfs, procfs, and other "small" files.
There is no restarting the syscall, this is a "simple" syscall, with the attempt to make reading "simple" files easier with less syscall overhead.
Cc: Alexander Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
fs/open.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
diff --git a/fs/open.c b/fs/open.c index 6cd48a61cda3..4469faa9379c 100644 --- a/fs/open.c +++ b/fs/open.c @@ -1370,3 +1370,53 @@ int stream_open(struct inode *inode, struct file *filp) }
EXPORT_SYMBOL(stream_open);
+static struct file *readfile_open(int dfd, const char __user *filename,
struct open_flags *op)
+{
struct filename *tmp;
struct file *f;
tmp = getname(filename);
if (IS_ERR(tmp))
return (struct file *)tmp;
f = do_filp_open(dfd, tmp, op);
if (!IS_ERR(f))
fsnotify_open(f);
putname(tmp);
return f;
+}
+SYSCALL_DEFINE5(readfile, int, dfd, const char __user *, filename,
char __user *, buffer, size_t, bufsize, int, flags)
+{
struct open_flags op;
struct open_how how;
struct file *file;
loff_t pos = 0;
int retval;
/* only accept a small subset of O_ flags that make sense */
if ((flags & (O_NOFOLLOW | O_NOATIME)) != flags)
return -EINVAL;
/* add some needed flags to be able to open the file properly */
flags |= O_RDONLY | O_LARGEFILE;
how = build_open_how(flags, 0000);
retval = build_open_flags(&how, &op);
if (retval)
return retval;
file = readfile_open(dfd, filename, &op);
if (IS_ERR(file))
return PTR_ERR(file);
retval = vfs_read(file, buffer, bufsize, &pos);
filp_close(file, NULL);
return retval;
Manpage says: "doing the sequence of open() and then read() and then close()", which is exactly what it does.
But then it goes on to say: "If the file is larger than the value provided in count then only count number of bytes will be copied into buf", which is only half true, it should be: "If the file is larger than the value provided in count then at most count number of bytes will be copied into buf", which is not a lot of information.
And "If the size of file is smaller than the value provided in count then the whole file will be copied into buf", which is simply a lie; for example seq_file will happily return a smaller-than-PAGE_SIZE chunk if at least one record fits in there. You'll have a very hard time explaining that in the man page. So I think there are two possible ways forward:
1) just leave the first explanation (it's an open + read + close equivalent) and leave out the rest
2) add a loop around the vfs_read() in the code.
I'd strongly prefer #2 because with the non-looping read it's impossible to detect whether the file was completely read or not, and that's just going to lead to surprises and bugs in userspace code.
Thanks, Miklos
On Sat, Jul 04, 2020 at 09:41:09PM +0200, Miklos Szeredi wrote:
And "If the size of file is smaller than the value provided in count then the whole file will be copied into buf", which is simply a lie; for example seq_file will happily return a smaller-than-PAGE_SIZE chunk if at least one record fits in there. You'll have a very hard time explaining that in the man page. So I think there are two possible ways forward:
- just leave the first explanation (it's an open + read + close
equivalent) and leave out the rest
- add a loop around the vfs_read() in the code.
3) don't bother with the entire thing, until somebody manages to demonstrate a setup where it does make a real difference (compared to than the obvious sequence of syscalls, that is). At which point we'll need to figure out what's going on and deal with the underlying problem of that setup.
On Sat, Jul 04, 2020 at 09:12:06PM +0100, Al Viro wrote:
On Sat, Jul 04, 2020 at 09:41:09PM +0200, Miklos Szeredi wrote:
And "If the size of file is smaller than the value provided in count then the whole file will be copied into buf", which is simply a lie; for example seq_file will happily return a smaller-than-PAGE_SIZE chunk if at least one record fits in there. You'll have a very hard time explaining that in the man page. So I think there are two possible ways forward:
- just leave the first explanation (it's an open + read + close
equivalent) and leave out the rest
- add a loop around the vfs_read() in the code.
- don't bother with the entire thing, until somebody manages to demonstrate
a setup where it does make a real difference (compared to than the obvious sequence of syscalls, that is). At which point we'll need to figure out what's going on and deal with the underlying problem of that setup.
Incidentally, if that's intended for use on _sysfs_, I would like to see the effects of that sucker being called by many processes in parallel, seeing that sysfs has, er, certain scalability problems in its lookups. And I would be very surprised if they were not heavier than said overhead of two extra syscalls.
This wires up the readfile syscall for all architectures
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/arm64/include/asm/unistd.h | 2 +- arch/arm64/include/asm/unistd32.h | 2 ++ arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/mips/kernel/syscalls/syscall_n32.tbl | 1 + arch/mips/kernel/syscalls/syscall_n64.tbl | 1 + arch/mips/kernel/syscalls/syscall_o32.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/unistd.h | 4 +++- 20 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl index 5ddd128d4b7a..4132380e997f 100644 --- a/arch/alpha/kernel/syscalls/syscall.tbl +++ b/arch/alpha/kernel/syscalls/syscall.tbl @@ -478,3 +478,4 @@ 547 common openat2 sys_openat2 548 common pidfd_getfd sys_pidfd_getfd 549 common faccessat2 sys_faccessat2 +550 common readfile sys_readfile diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl index d5cae5ffede0..454873892ba3 100644 --- a/arch/arm/tools/syscall.tbl +++ b/arch/arm/tools/syscall.tbl @@ -452,3 +452,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h index 3b859596840d..b3b2019f8d16 100644 --- a/arch/arm64/include/asm/unistd.h +++ b/arch/arm64/include/asm/unistd.h @@ -38,7 +38,7 @@ #define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5) #define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
-#define __NR_compat_syscalls 440 +#define __NR_compat_syscalls 441 #endif
#define __ARCH_WANT_SYS_CLONE diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h index 6d95d0c8bf2f..524d19779612 100644 --- a/arch/arm64/include/asm/unistd32.h +++ b/arch/arm64/include/asm/unistd32.h @@ -885,6 +885,8 @@ __SYSCALL(__NR_openat2, sys_openat2) __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #define __NR_faccessat2 439 __SYSCALL(__NR_faccessat2, sys_faccessat2) +#define __NR_readfile 440 +__SYSCALL(__NR_readfile, sys_readfile)
/* * Please add new compat syscalls above this comment and update diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl index 49e325b604b3..b188f03736bb 100644 --- a/arch/ia64/kernel/syscalls/syscall.tbl +++ b/arch/ia64/kernel/syscalls/syscall.tbl @@ -359,3 +359,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl index f71b1bbcc198..ab24bcb91344 100644 --- a/arch/m68k/kernel/syscalls/syscall.tbl +++ b/arch/m68k/kernel/syscalls/syscall.tbl @@ -438,3 +438,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl index edacc4561f2b..46c06f800e8e 100644 --- a/arch/microblaze/kernel/syscalls/syscall.tbl +++ b/arch/microblaze/kernel/syscalls/syscall.tbl @@ -444,3 +444,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl index f777141f5256..552ba4dafbef 100644 --- a/arch/mips/kernel/syscalls/syscall_n32.tbl +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl @@ -377,3 +377,4 @@ 437 n32 openat2 sys_openat2 438 n32 pidfd_getfd sys_pidfd_getfd 439 n32 faccessat2 sys_faccessat2 +440 n32 readfile sys_readfile diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl index da8c76394e17..e12581bf900b 100644 --- a/arch/mips/kernel/syscalls/syscall_n64.tbl +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl @@ -353,3 +353,4 @@ 437 n64 openat2 sys_openat2 438 n64 pidfd_getfd sys_pidfd_getfd 439 n64 faccessat2 sys_faccessat2 +440 n64 readfile sys_readfile diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl index 13280625d312..67cb8f8fbdb2 100644 --- a/arch/mips/kernel/syscalls/syscall_o32.tbl +++ b/arch/mips/kernel/syscalls/syscall_o32.tbl @@ -426,3 +426,4 @@ 437 o32 openat2 sys_openat2 438 o32 pidfd_getfd sys_pidfd_getfd 439 o32 faccessat2 sys_faccessat2 +440 o32 readfile sys_readfile diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl index 5a758fa6ec52..775e5228ab51 100644 --- a/arch/parisc/kernel/syscalls/syscall.tbl +++ b/arch/parisc/kernel/syscalls/syscall.tbl @@ -436,3 +436,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl index f833a3190822..d452db708635 100644 --- a/arch/powerpc/kernel/syscalls/syscall.tbl +++ b/arch/powerpc/kernel/syscalls/syscall.tbl @@ -528,3 +528,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl index bfdcb7633957..7ab529813a42 100644 --- a/arch/s390/kernel/syscalls/syscall.tbl +++ b/arch/s390/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 437 common openat2 sys_openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 sys_faccessat2 +440 common readfile sys_readfile sys_readfile diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl index acc35daa1b79..ce8862cdb707 100644 --- a/arch/sh/kernel/syscalls/syscall.tbl +++ b/arch/sh/kernel/syscalls/syscall.tbl @@ -441,3 +441,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl index 8004a276cb74..d89e7224bb0f 100644 --- a/arch/sparc/kernel/syscalls/syscall.tbl +++ b/arch/sparc/kernel/syscalls/syscall.tbl @@ -484,3 +484,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index d8f8a1a69ed1..6f8d0b0acb6a 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -443,3 +443,4 @@ 437 i386 openat2 sys_openat2 438 i386 pidfd_getfd sys_pidfd_getfd 439 i386 faccessat2 sys_faccessat2 +440 i386 readfile sys_readfile diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 78847b32e137..9c54081b7c14 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -360,6 +360,7 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile
# # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl index 69d0d73876b3..7b1f2ea76621 100644 --- a/arch/xtensa/kernel/syscalls/syscall.tbl +++ b/arch/xtensa/kernel/syscalls/syscall.tbl @@ -409,3 +409,4 @@ 437 common openat2 sys_openat2 438 common pidfd_getfd sys_pidfd_getfd 439 common faccessat2 sys_faccessat2 +440 common readfile sys_readfile diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index b951a87da987..9d338ef7802d 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -1005,6 +1005,8 @@ asmlinkage long sys_pidfd_send_signal(int pidfd, int sig, siginfo_t __user *info, unsigned int flags); asmlinkage long sys_pidfd_getfd(int pidfd, int fd, unsigned int flags); +asmlinkage long sys_readfile(int dfd, const char __user *filename, + char __user *buffer, size_t bufsize, int flags);
/* * Architecture-specific system calls diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index f4a01305d9a6..81b677c01266 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -857,9 +857,11 @@ __SYSCALL(__NR_openat2, sys_openat2) __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) #define __NR_faccessat2 439 __SYSCALL(__NR_faccessat2, sys_faccessat2) +#define __NR_readfile 440 +__SYSCALL(__NR_readfile, sys_readfile)
#undef __NR_syscalls -#define __NR_syscalls 440 +#define __NR_syscalls 441
/* * 32 bit systems traditionally used different
On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This wires up the readfile syscall for all architectures
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
Acked-by: Geert Uytterhoeven geert@linux-m68k.org
Gr{oetje,eeting}s,
Geert
Test the functionality of readfile(2) in various ways.
Also provide a simple speed test program to benchmark using readfile() instead of using open()/read()/close().
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/readfile/.gitignore | 3 + tools/testing/selftests/readfile/Makefile | 7 + tools/testing/selftests/readfile/readfile.c | 285 +++++++++++++++++ .../selftests/readfile/readfile_speed.c | 301 ++++++++++++++++++ 5 files changed, 597 insertions(+) create mode 100644 tools/testing/selftests/readfile/.gitignore create mode 100644 tools/testing/selftests/readfile/Makefile create mode 100644 tools/testing/selftests/readfile/readfile.c create mode 100644 tools/testing/selftests/readfile/readfile_speed.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 1195bd85af38..82359233b945 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -46,6 +46,7 @@ TARGETS += ptrace TARGETS += openat2 TARGETS += rseq TARGETS += rtc +TARGETS += readfile TARGETS += seccomp TARGETS += sigaltstack TARGETS += size diff --git a/tools/testing/selftests/readfile/.gitignore b/tools/testing/selftests/readfile/.gitignore new file mode 100644 index 000000000000..f0e758d437e4 --- /dev/null +++ b/tools/testing/selftests/readfile/.gitignore @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 +readfile +readfile_speed diff --git a/tools/testing/selftests/readfile/Makefile b/tools/testing/selftests/readfile/Makefile new file mode 100644 index 000000000000..1bf1bdec40f8 --- /dev/null +++ b/tools/testing/selftests/readfile/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +CFLAGS += -g -I../../../../usr/include/ +CFLAGS += -O2 -Wl,-no-as-needed -Wall + +TEST_GEN_PROGS := readfile readfile_speed + +include ../lib.mk diff --git a/tools/testing/selftests/readfile/readfile.c b/tools/testing/selftests/readfile/readfile.c new file mode 100644 index 000000000000..f0736c6dfa69 --- /dev/null +++ b/tools/testing/selftests/readfile/readfile.c @@ -0,0 +1,285 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2020 Greg Kroah-Hartman gregkh@linuxfoundation.org + * Copyright (c) 2020 The Linux Foundation + * + * Test the readfile() syscall in various ways. + */ +#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <dirent.h> +#include <fcntl.h> +#include <limits.h> +#include <string.h> +#include <syscall.h> + +#include "../kselftest.h" + +//#ifndef __NR_readfile +//#define __NR_readfile -1 +//#endif + +#define __NR_readfile 440 + +#define TEST_FILE1 "/sys/devices/system/cpu/vulnerabilities/meltdown" +#define TEST_FILE2 "/sys/devices/system/cpu/vulnerabilities/spectre_v1" +#define TEST_FILE4 "/sys/kernel/debug/usb/devices" + +static int sys_readfile(int fd, const char *filename, unsigned char *buffer, + size_t bufsize, int flags) +{ + return syscall(__NR_readfile, fd, filename, buffer, bufsize, flags); +} + +/* + * Test that readfile() is even in the running kernel or not. + */ +static void test_readfile_supported(void) +{ + const char *proc_map = "/proc/self/maps"; + unsigned char buffer[10]; + int retval; + + if (__NR_readfile < 0) + ksft_exit_skip("readfile() syscall is not defined for the kernel this test was built against\n"); + + /* + * Do a simple test to see if the syscall really is present in the + * running kernel + */ + retval = sys_readfile(0, proc_map, &buffer[0], sizeof(buffer), 0); + if (retval == -1) + ksft_exit_skip("readfile() syscall not present on running kernel\n"); + + ksft_test_result_pass("readfile() syscall present\n"); +} + +/* + * Open all files in a specific sysfs directory and read from them + * + * This tests the "openat" type functionality of opening all files relative to a + * directory. We don't care at the moment about the contents. + */ +static void test_sysfs_files(void) +{ + static unsigned char buffer[8000]; + const char *sysfs_dir = "/sys/devices/system/cpu/vulnerabilities/"; + struct dirent *dirent; + DIR *vuln_sysfs_dir; + int sysfs_fd; + int retval; + + sysfs_fd = open(sysfs_dir, O_PATH | O_DIRECTORY); + if (sysfs_fd == -1) { + ksft_test_result_skip("unable to open %s directory\n", + sysfs_dir); + return; + } + + vuln_sysfs_dir = opendir(sysfs_dir); + if (!vuln_sysfs_dir) { + ksft_test_result_skip("%s unable to be opened, skipping test\n"); + return; + } + + ksft_print_msg("readfile: testing relative path functionality by reading files in %s\n", + sysfs_dir); + /* open all sysfs file in this directory and read the whole thing */ + while ((dirent = readdir(vuln_sysfs_dir))) { + /* ignore . and .. */ + if (strcmp(dirent->d_name, ".") == 0 || + strcmp(dirent->d_name, "..") == 0) + continue; + + retval = sys_readfile(sysfs_fd, dirent->d_name, &buffer[0], + sizeof(buffer), 0); + + if (retval <= 0) { + ksft_test_result_fail("readfile(%s) failed with %d\n", + dirent->d_name, retval); + goto exit; + } + + /* cut off trailing \n character */ + buffer[retval - 1] = 0x00; + ksft_print_msg(" '%s' contains "%s"\n", dirent->d_name, + buffer); + } + + ksft_test_result_pass("readfile() relative path functionality passed\n"); + +exit: + closedir(vuln_sysfs_dir); + close(sysfs_fd); +} + +/* Temporary directory variables */ +static int root_fd; /* test root directory file handle */ +static char tmpdir[PATH_MAX]; + +static void setup_tmpdir(void) +{ + char *tmpdir_root; + + tmpdir_root = getenv("TMPDIR"); + if (!tmpdir_root) + tmpdir_root = "/tmp"; + + snprintf(tmpdir, PATH_MAX, "%s/readfile.XXXXXX", tmpdir_root); + if (!mkdtemp(tmpdir)) { + ksft_test_result_fail("mkdtemp(%s) failed\n", tmpdir); + ksft_exit_fail(); + } + + root_fd = open(tmpdir, O_PATH | O_DIRECTORY); + if (root_fd == -1) { + ksft_exit_fail_msg("%s unable to be opened, error = %d\n", + tmpdir, root_fd); + ksft_exit_fail(); + } + + ksft_print_msg("%s created to use for testing\n", tmpdir); +} + +static void teardown_tmpdir(void) +{ + int retval; + + close(root_fd); + + retval = rmdir(tmpdir); + if (retval) { + ksft_exit_fail_msg("%s removed with return value %d\n", + tmpdir, retval); + ksft_exit_fail(); + } + ksft_print_msg("%s cleaned up and removed\n", tmpdir); + +} + +static void test_filesize(size_t size) +{ + char filename[PATH_MAX]; + unsigned char *write_data; + unsigned char *read_data; + int fd; + int retval; + size_t i; + + snprintf(filename, PATH_MAX, "size-%ld", size); + + read_data = malloc(size); + write_data = malloc(size); + if (!read_data || !write_data) + ksft_exit_fail_msg("Unable to allocate %ld bytes\n", size); + + fd = openat(root_fd, filename, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR); + if (fd < 0) + ksft_exit_fail_msg("Unable to create file %s\n", filename); + + ksft_print_msg("%s created\n", filename); + + for (i = 0; i < size; ++i) + write_data[i] = (unsigned char)(0xff & i); + + write(fd, write_data, size); + close(fd); + + retval = sys_readfile(root_fd, filename, read_data, size, 0); + + if (retval != size) { + ksft_test_result_fail("Read %d bytes but wanted to read %ld bytes.\n", + retval, size); + goto exit; + } + + if (memcmp(read_data, write_data, size) != 0) { + ksft_test_result_fail("Read data of buffer size %d did not match written data\n", + size); + goto exit; + } + + ksft_test_result_pass("readfile() of size %ld succeeded.\n", size); + +exit: + unlinkat(root_fd, filename, 0); + free(write_data); + free(read_data); +} + + +/* + * Create a bunch of differently sized files, and verify we read the correct + * amount of data from them. + */ +static void test_filesizes(void) +{ + setup_tmpdir(); + + test_filesize(0x10); + test_filesize(0x100); + test_filesize(0x1000); + test_filesize(0x10000); + test_filesize(0x100000); + test_filesize(0x1000000); + + teardown_tmpdir(); + +} + +static void readfile(const char *filename) +{ +// int root_fd; + unsigned char buffer[16000]; + int retval; + + memset(buffer, 0x00, sizeof(buffer)); + +// root_fd = open("/", O_DIRECTORY); +// if (root_fd == -1) +// ksft_exit_fail_msg("error with root_fd\n"); + + retval = sys_readfile(root_fd, filename, &buffer[0], sizeof(buffer), 0); + +// close(root_fd); + + if (retval <= 0) + ksft_test_result_fail("readfile() test of filename=%s failed with retval %d\n", + filename, retval); + else + ksft_test_result_pass("readfile() test of filename=%s succeeded with retval=%d\n", + filename, retval); +// buffer='%s'\n", +// filename, retval, &buffer[0]); + +} + + +int main(int argc, char *argv[]) +{ + ksft_print_header(); + ksft_set_plan(10); + + test_readfile_supported(); // 1 test + + test_sysfs_files(); // 1 test + + test_filesizes(); // 6 tests + + setup_tmpdir(); + + readfile(TEST_FILE1); + readfile(TEST_FILE2); +// readfile(TEST_FILE4); + + teardown_tmpdir(); + + if (ksft_get_fail_cnt()) + return ksft_exit_fail(); + + return ksft_exit_pass(); +} + diff --git a/tools/testing/selftests/readfile/readfile_speed.c b/tools/testing/selftests/readfile/readfile_speed.c new file mode 100644 index 000000000000..11ca79163131 --- /dev/null +++ b/tools/testing/selftests/readfile/readfile_speed.c @@ -0,0 +1,301 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2020 Greg Kroah-Hartman gregkh@linuxfoundation.org + * Copyright (c) 2020 The Linux Foundation + * + * Tiny test program to try to benchmark the speed of the readfile syscall vs. + * the open/read/close sequence it can replace. + */ +#define _GNU_SOURCE +#include <dirent.h> +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <syscall.h> +#include <time.h> +#include <unistd.h> + +/* Default test file if no one wants to pick something else */ +#define DEFAULT_TEST_FILE "/sys/devices/system/cpu/vulnerabilities/meltdown" + +#define DEFAULT_TEST_LOOPS 1000 + +#define DEFAULT_TEST_TYPE "both" + +/* Max number of bytes that will be read from the file */ +#define TEST_BUFFER_SIZE 10000 +static unsigned char test_buffer[TEST_BUFFER_SIZE]; + +enum test_type { + TEST_READFILE, + TEST_OPENREADCLOSE, + TEST_BOTH, +}; + +/* Find the readfile syscall number */ +//#ifndef __NR_readfile +//#define __NR_readfile -1 +//#endif +#define __NR_readfile 440 + +static int sys_readfile(int fd, const char *filename, unsigned char *buffer, + size_t bufsize, int flags) +{ + return syscall(__NR_readfile, fd, filename, buffer, bufsize, flags); +} + +/* Test that readfile() is even in the running kernel or not. */ +static void test_readfile_supported(void) +{ + const char *proc_map = "/proc/self/maps"; + unsigned char buffer[10]; + int retval; + + if (__NR_readfile < 0) { + fprintf(stderr, + "readfile() syscall is not defined for the kernel this test was built against.\n"); + exit(1); + } + + /* + * Do a simple test to see if the syscall really is present in the + * running kernel + */ + retval = sys_readfile(0, proc_map, &buffer[0], sizeof(buffer), 0); + if (retval == -1) { + fprintf(stderr, + "readfile() syscall not present on running kernel.\n"); + exit(1); + } +} + +static inline long long get_time_ns(void) +{ + struct timespec t; + + clock_gettime(CLOCK_MONOTONIC, &t); + + return (long long)t.tv_sec * 1000000000 + t.tv_nsec; +} + +/* taken from all-io.h from util-linux repo */ +static inline ssize_t read_all(int fd, unsigned char *buf, size_t count) +{ + ssize_t ret; + ssize_t c = 0; + int tries = 0; + + while (count > 0) { + ret = read(fd, buf, count); + if (ret <= 0) { + if (ret < 0 && (errno == EAGAIN || errno == EINTR) && + (tries++ < 5)) { + usleep(250000); + continue; + } + return c ? c : -1; + } + tries = 0; + count -= ret; + buf += ret; + c += ret; + } + return c; +} + +static int openreadclose(const char *filename, unsigned char *buffer, + size_t bufsize) +{ + size_t count; + int fd; + + fd = openat(0, filename, O_RDONLY); + if (fd < 0) { + printf("error opening %s\n", filename); + return fd; + } + + count = read_all(fd, buffer, bufsize); + if (count < 0) { + printf("Error %ld reading from %s\n", count, filename); + } + + close(fd); + return count; +} + +static int run_test(enum test_type test_type, const char *filename) +{ + switch (test_type) { + case TEST_READFILE: + return sys_readfile(0, filename, &test_buffer[0], + TEST_BUFFER_SIZE, O_RDONLY); + + case TEST_OPENREADCLOSE: + return openreadclose(filename, &test_buffer[0], + TEST_BUFFER_SIZE); + default: + return -EINVAL; + } +} + +static const char * const test_names[] = { + [TEST_READFILE] = "readfile", + [TEST_OPENREADCLOSE] = "open/read/close", +}; + +static int run_test_loop(int loops, enum test_type test_type, + const char *filename) +{ + long long time_start; + long long time_end; + long long time_elapsed; + int retval = 0; + int i; + + fprintf(stdout, + "Running %s test on file %s for %d loops...\n", + test_names[test_type], filename, loops); + + /* Fill the cache with one run of the read first */ + retval = run_test(test_type, filename); + if (retval < 0) { + fprintf(stderr, + "test %s was unable to run with error %d\n", + test_names[test_type], retval); + return retval; + } + + time_start = get_time_ns(); + + for (i = 0; i < loops; ++i) { + retval = run_test(test_type, filename); + + if (retval < 0) { + fprintf(stderr, + "test failed on loop %d with error %d\n", + i, retval); + break; + } + } + time_end = get_time_ns(); + + time_elapsed = time_end - time_start; + + fprintf(stdout, "Took %lld ns\n", time_elapsed); + + return retval; +} + +static int do_read_file_test(int loops, enum test_type test_type, + const char *filename) +{ + int retval; + + if (test_type == TEST_BOTH) { + retval = do_read_file_test(loops, TEST_READFILE, filename); + retval = do_read_file_test(loops, TEST_OPENREADCLOSE, filename); + return retval; + } + return run_test_loop(loops, test_type, filename); +} + +static int check_file_present(const char *filename) +{ + struct stat sb; + int retval; + + retval = stat(filename, &sb); + if (retval == -1) { + fprintf(stderr, + "filename %s is not present\n", filename); + return retval; + } + + if ((sb.st_mode & S_IFMT) != S_IFREG) { + fprintf(stderr, + "filename %s must be a real file, not anything else.\n", + filename); + return -1; + } + return 0; +} + +static void usage(char *progname) +{ + fprintf(stderr, + "usage: %s [options]\n" + " -l loops Number of loops to run the test for.\n" + " default is %d\n" + " -t testtype Test type to run.\n" + " types are: readfile, openreadclose, both\n" + " default is %s\n" + " -f filename Filename to read from, full path, not relative.\n" + " default is %s\n", + progname, + DEFAULT_TEST_LOOPS, DEFAULT_TEST_TYPE, DEFAULT_TEST_FILE); +} + +int main(int argc, char *argv[]) +{ + char *progname; + char *testtype = DEFAULT_TEST_TYPE; + char *filename = DEFAULT_TEST_FILE; + int loops = DEFAULT_TEST_LOOPS; + enum test_type test_type; + int retval; + char c; + + progname = strrchr(argv[0], '/'); + progname = progname ? 1+progname : argv[0]; + + while (EOF != (c = getopt(argc, argv, "t:l:f:h"))) { + switch (c) { + case 'l': + loops = atoi(optarg); + break; + + case 't': + testtype = optarg; + break; + + case 'f': + filename = optarg; + break; + + case 'h': + usage(progname); + return 0; + + default: + usage(progname); + return -1; + } + } + + if (strcmp(testtype, "readfile") == 0) + test_type = TEST_READFILE; + else if (strcmp(testtype, "openreadclose") == 0) + test_type = TEST_OPENREADCLOSE; + else if (strcmp(testtype, "both") == 0) + test_type = TEST_BOTH; + else { + usage(progname); + return -1; + } + + test_readfile_supported(); + + retval = check_file_present(filename); + if (retval) + return retval; + + return do_read_file_test(loops, test_type, filename); +}
Hi Greg,
On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Test the functionality of readfile(2) in various ways.
Also provide a simple speed test program to benchmark using readfile() instead of using open()/read()/close().
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Any benchmark results to share?
--- /dev/null +++ b/tools/testing/selftests/readfile/readfile.c
+static void readfile(const char *filename) +{ +// int root_fd;
???
unsigned char buffer[16000];
int retval;
memset(buffer, 0x00, sizeof(buffer));
+// root_fd = open("/", O_DIRECTORY); +// if (root_fd == -1) +// ksft_exit_fail_msg("error with root_fd\n");
???
retval = sys_readfile(root_fd, filename, &buffer[0], sizeof(buffer), 0);
+// close(root_fd);
if (retval <= 0)
ksft_test_result_fail("readfile() test of filename=%s failed with retval %d\n",
filename, retval);
else
ksft_test_result_pass("readfile() test of filename=%s succeeded with retval=%d\n",
filename, retval);
+// buffer='%s'\n", +// filename, retval, &buffer[0]);
+}
+int main(int argc, char *argv[]) +{
ksft_print_header();
ksft_set_plan(10);
test_readfile_supported(); // 1 test
test_sysfs_files(); // 1 test
test_filesizes(); // 6 tests
setup_tmpdir();
readfile(TEST_FILE1);
readfile(TEST_FILE2);
+// readfile(TEST_FILE4);
???
teardown_tmpdir();
if (ksft_get_fail_cnt())
return ksft_exit_fail();
return ksft_exit_pass();
+}
Gr{oetje,eeting}s,
Geert
On Sat, Jul 04, 2020 at 08:38:26PM +0200, Geert Uytterhoeven wrote:
Hi Greg,
On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Test the functionality of readfile(2) in various ways.
Also provide a simple speed test program to benchmark using readfile() instead of using open()/read()/close().
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
Any benchmark results to share?
Yes, the readfile_speed.c file will show you that on your machine, I'll post the results of that for two of my machines when I send the next version of this patch series.
--- /dev/null +++ b/tools/testing/selftests/readfile/readfile.c
+static void readfile(const char *filename) +{ +// int root_fd;
???
Ugh, sorry about that, I obviously didn't clean up my last tests from this file, thanks for catching that.
I should add more tests to validate the flag handling as well, will do all of that for the next version, thanks for noticing this.
greg k-h
Hi Greg,
On Sun, Jul 5, 2020 at 8:55 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sat, Jul 04, 2020 at 08:38:26PM +0200, Geert Uytterhoeven wrote:
On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Test the functionality of readfile(2) in various ways.
Also provide a simple speed test program to benchmark using readfile() instead of using open()/read()/close().
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- /dev/null +++ b/tools/testing/selftests/readfile/readfile.c
+static void readfile(const char *filename) +{ +// int root_fd;
???
Ugh, sorry about that, I obviously didn't clean up my last tests from this file, thanks for catching that.
Reading about seq_file behavior, did the commented-out test for "/sys/kernel/debug/usb/devices" work?
Gr{oetje,eeting}s,
Geert
On Sun, Jul 05, 2020 at 01:24:07PM +0200, Geert Uytterhoeven wrote:
Hi Greg,
On Sun, Jul 5, 2020 at 8:55 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sat, Jul 04, 2020 at 08:38:26PM +0200, Geert Uytterhoeven wrote:
On Sat, Jul 4, 2020 at 4:05 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
Test the functionality of readfile(2) in various ways.
Also provide a simple speed test program to benchmark using readfile() instead of using open()/read()/close().
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- /dev/null +++ b/tools/testing/selftests/readfile/readfile.c
+static void readfile(const char *filename) +{ +// int root_fd;
???
Ugh, sorry about that, I obviously didn't clean up my last tests from this file, thanks for catching that.
Reading about seq_file behavior, did the commented-out test for "/sys/kernel/debug/usb/devices" work?
Yes it did, which means I need to go dig to try to find a "problem" seq_file procfs file to try to debug that behavior...
thanks,
greg k-h
On 7/4/20 4:02 PM, Greg Kroah-Hartman wrote:
Test the functionality of readfile(2) in various ways.
Hello Greg,
I expect readfile() to generate fanotify events FAN_OPEN_PERM, FAN_OPEN, FAN_ACCESS_PERM, FAN_ACCESS, FAN_CLOSE_NOWRITE in this sequence.
Looking at patch 1/3 you took care of notifications. Would this deserve testing here?
Best regards
Heinrich
Also provide a simple speed test program to benchmark using readfile() instead of using open()/read()/close().
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
tools/testing/selftests/Makefile | 1 + tools/testing/selftests/readfile/.gitignore | 3 + tools/testing/selftests/readfile/Makefile | 7 + tools/testing/selftests/readfile/readfile.c | 285 +++++++++++++++++ .../selftests/readfile/readfile_speed.c | 301 ++++++++++++++++++ 5 files changed, 597 insertions(+) create mode 100644 tools/testing/selftests/readfile/.gitignore create mode 100644 tools/testing/selftests/readfile/Makefile create mode 100644 tools/testing/selftests/readfile/readfile.c create mode 100644 tools/testing/selftests/readfile/readfile_speed.c
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile index 1195bd85af38..82359233b945 100644 --- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -46,6 +46,7 @@ TARGETS += ptrace TARGETS += openat2 TARGETS += rseq TARGETS += rtc +TARGETS += readfile TARGETS += seccomp TARGETS += sigaltstack TARGETS += size diff --git a/tools/testing/selftests/readfile/.gitignore b/tools/testing/selftests/readfile/.gitignore new file mode 100644 index 000000000000..f0e758d437e4 --- /dev/null +++ b/tools/testing/selftests/readfile/.gitignore @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 +readfile +readfile_speed diff --git a/tools/testing/selftests/readfile/Makefile b/tools/testing/selftests/readfile/Makefile new file mode 100644 index 000000000000..1bf1bdec40f8 --- /dev/null +++ b/tools/testing/selftests/readfile/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +CFLAGS += -g -I../../../../usr/include/ +CFLAGS += -O2 -Wl,-no-as-needed -Wall
+TEST_GEN_PROGS := readfile readfile_speed
+include ../lib.mk diff --git a/tools/testing/selftests/readfile/readfile.c b/tools/testing/selftests/readfile/readfile.c new file mode 100644 index 000000000000..f0736c6dfa69 --- /dev/null +++ b/tools/testing/selftests/readfile/readfile.c @@ -0,0 +1,285 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Copyright (c) 2020 Greg Kroah-Hartman gregkh@linuxfoundation.org
- Copyright (c) 2020 The Linux Foundation
- Test the readfile() syscall in various ways.
- */
+#define _GNU_SOURCE +#include <stdio.h> +#include <stdlib.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <dirent.h> +#include <fcntl.h> +#include <limits.h> +#include <string.h> +#include <syscall.h>
+#include "../kselftest.h"
+//#ifndef __NR_readfile +//#define __NR_readfile -1 +//#endif
+#define __NR_readfile 440
+#define TEST_FILE1 "/sys/devices/system/cpu/vulnerabilities/meltdown" +#define TEST_FILE2 "/sys/devices/system/cpu/vulnerabilities/spectre_v1" +#define TEST_FILE4 "/sys/kernel/debug/usb/devices"
+static int sys_readfile(int fd, const char *filename, unsigned char *buffer,
size_t bufsize, int flags)
+{
- return syscall(__NR_readfile, fd, filename, buffer, bufsize, flags);
+}
+/*
- Test that readfile() is even in the running kernel or not.
- */
+static void test_readfile_supported(void) +{
- const char *proc_map = "/proc/self/maps";
- unsigned char buffer[10];
- int retval;
- if (__NR_readfile < 0)
ksft_exit_skip("readfile() syscall is not defined for the kernel this test was built against\n");
- /*
* Do a simple test to see if the syscall really is present in the
* running kernel
*/
- retval = sys_readfile(0, proc_map, &buffer[0], sizeof(buffer), 0);
- if (retval == -1)
ksft_exit_skip("readfile() syscall not present on running kernel\n");
- ksft_test_result_pass("readfile() syscall present\n");
+}
+/*
- Open all files in a specific sysfs directory and read from them
- This tests the "openat" type functionality of opening all files relative to a
- directory. We don't care at the moment about the contents.
- */
+static void test_sysfs_files(void) +{
- static unsigned char buffer[8000];
- const char *sysfs_dir = "/sys/devices/system/cpu/vulnerabilities/";
- struct dirent *dirent;
- DIR *vuln_sysfs_dir;
- int sysfs_fd;
- int retval;
- sysfs_fd = open(sysfs_dir, O_PATH | O_DIRECTORY);
- if (sysfs_fd == -1) {
ksft_test_result_skip("unable to open %s directory\n",
sysfs_dir);
return;
- }
- vuln_sysfs_dir = opendir(sysfs_dir);
- if (!vuln_sysfs_dir) {
ksft_test_result_skip("%s unable to be opened, skipping test\n");
return;
- }
- ksft_print_msg("readfile: testing relative path functionality by reading files in %s\n",
sysfs_dir);
- /* open all sysfs file in this directory and read the whole thing */
- while ((dirent = readdir(vuln_sysfs_dir))) {
/* ignore . and .. */
if (strcmp(dirent->d_name, ".") == 0 ||
strcmp(dirent->d_name, "..") == 0)
continue;
retval = sys_readfile(sysfs_fd, dirent->d_name, &buffer[0],
sizeof(buffer), 0);
if (retval <= 0) {
ksft_test_result_fail("readfile(%s) failed with %d\n",
dirent->d_name, retval);
goto exit;
}
/* cut off trailing \n character */
buffer[retval - 1] = 0x00;
ksft_print_msg(" '%s' contains \"%s\"\n", dirent->d_name,
buffer);
- }
- ksft_test_result_pass("readfile() relative path functionality passed\n");
+exit:
- closedir(vuln_sysfs_dir);
- close(sysfs_fd);
+}
+/* Temporary directory variables */ +static int root_fd; /* test root directory file handle */ +static char tmpdir[PATH_MAX];
+static void setup_tmpdir(void) +{
- char *tmpdir_root;
- tmpdir_root = getenv("TMPDIR");
- if (!tmpdir_root)
tmpdir_root = "/tmp";
- snprintf(tmpdir, PATH_MAX, "%s/readfile.XXXXXX", tmpdir_root);
- if (!mkdtemp(tmpdir)) {
ksft_test_result_fail("mkdtemp(%s) failed\n", tmpdir);
ksft_exit_fail();
- }
- root_fd = open(tmpdir, O_PATH | O_DIRECTORY);
- if (root_fd == -1) {
ksft_exit_fail_msg("%s unable to be opened, error = %d\n",
tmpdir, root_fd);
ksft_exit_fail();
- }
- ksft_print_msg("%s created to use for testing\n", tmpdir);
+}
+static void teardown_tmpdir(void) +{
- int retval;
- close(root_fd);
- retval = rmdir(tmpdir);
- if (retval) {
ksft_exit_fail_msg("%s removed with return value %d\n",
tmpdir, retval);
ksft_exit_fail();
- }
- ksft_print_msg("%s cleaned up and removed\n", tmpdir);
+}
+static void test_filesize(size_t size) +{
- char filename[PATH_MAX];
- unsigned char *write_data;
- unsigned char *read_data;
- int fd;
- int retval;
- size_t i;
- snprintf(filename, PATH_MAX, "size-%ld", size);
- read_data = malloc(size);
- write_data = malloc(size);
- if (!read_data || !write_data)
ksft_exit_fail_msg("Unable to allocate %ld bytes\n", size);
- fd = openat(root_fd, filename, O_CREAT | O_RDWR, S_IRUSR | S_IWUSR);
- if (fd < 0)
ksft_exit_fail_msg("Unable to create file %s\n", filename);
- ksft_print_msg("%s created\n", filename);
- for (i = 0; i < size; ++i)
write_data[i] = (unsigned char)(0xff & i);
- write(fd, write_data, size);
- close(fd);
- retval = sys_readfile(root_fd, filename, read_data, size, 0);
- if (retval != size) {
ksft_test_result_fail("Read %d bytes but wanted to read %ld bytes.\n",
retval, size);
goto exit;
- }
- if (memcmp(read_data, write_data, size) != 0) {
ksft_test_result_fail("Read data of buffer size %d did not match written data\n",
size);
goto exit;
- }
- ksft_test_result_pass("readfile() of size %ld succeeded.\n", size);
+exit:
- unlinkat(root_fd, filename, 0);
- free(write_data);
- free(read_data);
+}
+/*
- Create a bunch of differently sized files, and verify we read the correct
- amount of data from them.
- */
+static void test_filesizes(void) +{
- setup_tmpdir();
- test_filesize(0x10);
- test_filesize(0x100);
- test_filesize(0x1000);
- test_filesize(0x10000);
- test_filesize(0x100000);
- test_filesize(0x1000000);
- teardown_tmpdir();
+}
+static void readfile(const char *filename) +{ +// int root_fd;
- unsigned char buffer[16000];
- int retval;
- memset(buffer, 0x00, sizeof(buffer));
+// root_fd = open("/", O_DIRECTORY); +// if (root_fd == -1) +// ksft_exit_fail_msg("error with root_fd\n");
- retval = sys_readfile(root_fd, filename, &buffer[0], sizeof(buffer), 0);
+// close(root_fd);
- if (retval <= 0)
ksft_test_result_fail("readfile() test of filename=%s failed with retval %d\n",
filename, retval);
- else
ksft_test_result_pass("readfile() test of filename=%s succeeded with retval=%d\n",
filename, retval);
+// buffer='%s'\n", +// filename, retval, &buffer[0]);
+}
+int main(int argc, char *argv[]) +{
- ksft_print_header();
- ksft_set_plan(10);
- test_readfile_supported(); // 1 test
- test_sysfs_files(); // 1 test
- test_filesizes(); // 6 tests
- setup_tmpdir();
- readfile(TEST_FILE1);
- readfile(TEST_FILE2);
+// readfile(TEST_FILE4);
- teardown_tmpdir();
- if (ksft_get_fail_cnt())
return ksft_exit_fail();
- return ksft_exit_pass();
+}
diff --git a/tools/testing/selftests/readfile/readfile_speed.c b/tools/testing/selftests/readfile/readfile_speed.c new file mode 100644 index 000000000000..11ca79163131 --- /dev/null +++ b/tools/testing/selftests/readfile/readfile_speed.c @@ -0,0 +1,301 @@ +// SPDX-License-Identifier: GPL-2.0 +/*
- Copyright (c) 2020 Greg Kroah-Hartman gregkh@linuxfoundation.org
- Copyright (c) 2020 The Linux Foundation
- Tiny test program to try to benchmark the speed of the readfile syscall vs.
- the open/read/close sequence it can replace.
- */
+#define _GNU_SOURCE +#include <dirent.h> +#include <errno.h> +#include <fcntl.h> +#include <limits.h> +#include <stdarg.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> +#include <sys/stat.h> +#include <sys/syscall.h> +#include <sys/types.h> +#include <syscall.h> +#include <time.h> +#include <unistd.h>
+/* Default test file if no one wants to pick something else */ +#define DEFAULT_TEST_FILE "/sys/devices/system/cpu/vulnerabilities/meltdown"
+#define DEFAULT_TEST_LOOPS 1000
+#define DEFAULT_TEST_TYPE "both"
+/* Max number of bytes that will be read from the file */ +#define TEST_BUFFER_SIZE 10000 +static unsigned char test_buffer[TEST_BUFFER_SIZE];
+enum test_type {
- TEST_READFILE,
- TEST_OPENREADCLOSE,
- TEST_BOTH,
+};
+/* Find the readfile syscall number */ +//#ifndef __NR_readfile +//#define __NR_readfile -1 +//#endif +#define __NR_readfile 440
+static int sys_readfile(int fd, const char *filename, unsigned char *buffer,
size_t bufsize, int flags)
+{
- return syscall(__NR_readfile, fd, filename, buffer, bufsize, flags);
+}
+/* Test that readfile() is even in the running kernel or not. */ +static void test_readfile_supported(void) +{
- const char *proc_map = "/proc/self/maps";
- unsigned char buffer[10];
- int retval;
- if (__NR_readfile < 0) {
fprintf(stderr,
"readfile() syscall is not defined for the kernel this test was built against.\n");
exit(1);
- }
- /*
* Do a simple test to see if the syscall really is present in the
* running kernel
*/
- retval = sys_readfile(0, proc_map, &buffer[0], sizeof(buffer), 0);
- if (retval == -1) {
fprintf(stderr,
"readfile() syscall not present on running kernel.\n");
exit(1);
- }
+}
+static inline long long get_time_ns(void) +{
struct timespec t;
clock_gettime(CLOCK_MONOTONIC, &t);
return (long long)t.tv_sec * 1000000000 + t.tv_nsec;
+}
+/* taken from all-io.h from util-linux repo */ +static inline ssize_t read_all(int fd, unsigned char *buf, size_t count) +{
- ssize_t ret;
- ssize_t c = 0;
- int tries = 0;
- while (count > 0) {
ret = read(fd, buf, count);
if (ret <= 0) {
if (ret < 0 && (errno == EAGAIN || errno == EINTR) &&
(tries++ < 5)) {
usleep(250000);
continue;
}
return c ? c : -1;
}
tries = 0;
count -= ret;
buf += ret;
c += ret;
- }
- return c;
+}
+static int openreadclose(const char *filename, unsigned char *buffer,
size_t bufsize)
+{
- size_t count;
- int fd;
- fd = openat(0, filename, O_RDONLY);
- if (fd < 0) {
printf("error opening %s\n", filename);
return fd;
- }
- count = read_all(fd, buffer, bufsize);
- if (count < 0) {
printf("Error %ld reading from %s\n", count, filename);
- }
- close(fd);
- return count;
+}
+static int run_test(enum test_type test_type, const char *filename) +{
- switch (test_type) {
- case TEST_READFILE:
return sys_readfile(0, filename, &test_buffer[0],
TEST_BUFFER_SIZE, O_RDONLY);
- case TEST_OPENREADCLOSE:
return openreadclose(filename, &test_buffer[0],
TEST_BUFFER_SIZE);
- default:
return -EINVAL;
- }
+}
+static const char * const test_names[] = {
- [TEST_READFILE] = "readfile",
- [TEST_OPENREADCLOSE] = "open/read/close",
+};
+static int run_test_loop(int loops, enum test_type test_type,
const char *filename)
+{
- long long time_start;
- long long time_end;
- long long time_elapsed;
- int retval = 0;
- int i;
- fprintf(stdout,
"Running %s test on file %s for %d loops...\n",
test_names[test_type], filename, loops);
- /* Fill the cache with one run of the read first */
- retval = run_test(test_type, filename);
- if (retval < 0) {
fprintf(stderr,
"test %s was unable to run with error %d\n",
test_names[test_type], retval);
return retval;
- }
- time_start = get_time_ns();
- for (i = 0; i < loops; ++i) {
retval = run_test(test_type, filename);
if (retval < 0) {
fprintf(stderr,
"test failed on loop %d with error %d\n",
i, retval);
break;
}
- }
- time_end = get_time_ns();
- time_elapsed = time_end - time_start;
- fprintf(stdout, "Took %lld ns\n", time_elapsed);
- return retval;
+}
+static int do_read_file_test(int loops, enum test_type test_type,
const char *filename)
+{
- int retval;
- if (test_type == TEST_BOTH) {
retval = do_read_file_test(loops, TEST_READFILE, filename);
retval = do_read_file_test(loops, TEST_OPENREADCLOSE, filename);
return retval;
- }
- return run_test_loop(loops, test_type, filename);
+}
+static int check_file_present(const char *filename) +{
- struct stat sb;
- int retval;
- retval = stat(filename, &sb);
- if (retval == -1) {
fprintf(stderr,
"filename %s is not present\n", filename);
return retval;
- }
- if ((sb.st_mode & S_IFMT) != S_IFREG) {
fprintf(stderr,
"filename %s must be a real file, not anything else.\n",
filename);
return -1;
- }
- return 0;
+}
+static void usage(char *progname) +{
- fprintf(stderr,
"usage: %s [options]\n"
" -l loops Number of loops to run the test for.\n"
" default is %d\n"
" -t testtype Test type to run.\n"
" types are: readfile, openreadclose, both\n"
" default is %s\n"
" -f filename Filename to read from, full path, not relative.\n"
" default is %s\n",
progname,
DEFAULT_TEST_LOOPS, DEFAULT_TEST_TYPE, DEFAULT_TEST_FILE);
+}
+int main(int argc, char *argv[]) +{
- char *progname;
- char *testtype = DEFAULT_TEST_TYPE;
- char *filename = DEFAULT_TEST_FILE;
- int loops = DEFAULT_TEST_LOOPS;
- enum test_type test_type;
- int retval;
- char c;
- progname = strrchr(argv[0], '/');
- progname = progname ? 1+progname : argv[0];
- while (EOF != (c = getopt(argc, argv, "t:l:f:h"))) {
switch (c) {
case 'l':
loops = atoi(optarg);
break;
case 't':
testtype = optarg;
break;
case 'f':
filename = optarg;
break;
case 'h':
usage(progname);
return 0;
default:
usage(progname);
return -1;
}
- }
- if (strcmp(testtype, "readfile") == 0)
test_type = TEST_READFILE;
- else if (strcmp(testtype, "openreadclose") == 0)
test_type = TEST_OPENREADCLOSE;
- else if (strcmp(testtype, "both") == 0)
test_type = TEST_BOTH;
- else {
usage(progname);
return -1;
- }
- test_readfile_supported();
- retval = check_file_present(filename);
- if (retval)
return retval;
- return do_read_file_test(loops, test_type, filename);
+}
On Sun, Jul 05, 2020 at 03:41:48AM +0200, Heinrich Schuchardt wrote:
On 7/4/20 4:02 PM, Greg Kroah-Hartman wrote:
Test the functionality of readfile(2) in various ways.
Hello Greg,
I expect readfile() to generate fanotify events FAN_OPEN_PERM, FAN_OPEN, FAN_ACCESS_PERM, FAN_ACCESS, FAN_CLOSE_NOWRITE in this sequence.
Yes, it should, I don't think I do anything unique here when it comes to vfs accesses that would go around those events.
Looking at patch 1/3 you took care of notifications. Would this deserve testing here?
Possibly, do we have other in-tree tests of syscalls that validate those events properly being created?
thanks,
greg k-h
On 7/5/20 9:34 AM, Greg Kroah-Hartman wrote:
On Sun, Jul 05, 2020 at 03:41:48AM +0200, Heinrich Schuchardt wrote:
On 7/4/20 4:02 PM, Greg Kroah-Hartman wrote:
Test the functionality of readfile(2) in various ways.
Hello Greg,
I expect readfile() to generate fanotify events FAN_OPEN_PERM, FAN_OPEN, FAN_ACCESS_PERM, FAN_ACCESS, FAN_CLOSE_NOWRITE in this sequence.
Yes, it should, I don't think I do anything unique here when it comes to vfs accesses that would go around those events.
Looking at patch 1/3 you took care of notifications. Would this deserve testing here?
Possibly, do we have other in-tree tests of syscalls that validate those events properly being created?
There is an inotify test in tools/testing/selftests/cgroup/test_freezer.c
There is no fanotify test in tree test.
An fanotify test will require running with CAP_SYS_ADMIN. The kselftest documentation does not describe that tests should be run as root. So it may be preferable to test that the inotify events IN_OPEN, IN_ACCESS, IN_CLOSE_NOWRITE are created for readfile().
Example coding is included in the inotify.7 and fanotify.7 manpages.
Best regards
Heinrich
readfile(2) is a new syscall to remove the need to do the open/read/close dance for small virtual files in places like procfs or sysfs.
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org ---
This patch is for the man-pages project, not the kernel source tree
man2/readfile.2 | 159 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 man2/readfile.2
diff --git a/man2/readfile.2 b/man2/readfile.2 new file mode 100644 index 000000000000..449e722c3442 --- /dev/null +++ b/man2/readfile.2 @@ -0,0 +1,159 @@ +." This manpage is Copyright (C) 2020 Greg Kroah-Hartman; +." and Copyright (C) 2020 The Linux Foundation +." +." %%%LICENSE_START(VERBATIM) +." Permission is granted to make and distribute verbatim copies of this +." manual provided the copyright notice and this permission notice are +." preserved on all copies. +." +." Permission is granted to copy and distribute modified versions of this +." manual under the conditions for verbatim copying, provided that the +." entire resulting derived work is distributed under the terms of a +." permission notice identical to this one. +." +." Since the Linux kernel and libraries are constantly changing, this +." manual page may be incorrect or out-of-date. The author(s) assume no +." responsibility for errors or omissions, or for damages resulting from +." the use of the information contained herein. The author(s) may not +." have taken the same level of care in the production of this manual, +." which is licensed free of charge, as they might when working +." professionally. +." +." Formatted or processed versions of this manual, if unaccompanied by +." the source, must acknowledge the copyright and authors of this work. +." %%%LICENSE_END +." +.TH READFILE 2 2020-07-04 "Linux" "Linux Programmer's Manual" +.SH NAME +readfile - read a file into a buffer +.SH SYNOPSIS +.nf +.B #include <unistd.h> +.PP +.BI "ssize_t readfile(int " dirfd ", const char *" pathname ", void *" buf \ +", size_t " count ", int " flags ); +.fi +.SH DESCRIPTION +.BR readfile () +attempts to open the file specified by +.IR pathname +and to read up to +.I count +bytes from the file into the buffer starting at +.IR buf . +It is to be a shortcut of doing the sequence of +.BR open () +and then +.BR read () +and then +.BR close () +for small files that are read frequently, such as those in +.B procfs +or +.BR sysfs . +.PP +If the size of file is smaller than the value provided in +.I count +then the whole file will be copied into +.IR buf . +.PP +If the file is larger than the value provided in +.I count +then only +.I count +number of bytes will be copied into +.IR buf . +.PP +The argument +.I flags +may contain one of the following +.IR "access modes" : +.BR O_NOFOLLOW ", or " O_NOATIME . +.PP +If the pathname given in +.I pathname +is relative, then it is interpreted relative to the directory +referred to by the file descriptor +.IR dirfd . +.PP +If +.I pathname +is relative and +.I dirfd +is the special value +.BR AT_FDCWD , +then +.I pathname +is interpreted relative to the current working +directory of the calling process (like +.BR openat ()). +.PP +If +.I pathname +is absolute, then +.I dirfd +is ignored. +.SH RETURN VALUE +On success, the number of bytes read is returned. +It is not an error if this number is smaller than the number of bytes +requested; this can happen if the file is smaller than the number of +bytes requested. +.PP +On error, -1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EFAULT +.I buf +is outside your accessible address space. +.TP +.B EINTR +The call was interrupted by a signal before any data was read; see +.BR signal (7). +.TP +.B EINVAL +.I flags +was set to a value that is not allowed. +.TP +.B EIO +I/O error. +This will happen for example when the process is in a +background process group, tries to read from its controlling terminal, +and either it is ignoring or blocking +.B SIGTTIN +or its process group +is orphaned. +It may also occur when there is a low-level I/O error +while reading from a disk or tape. +A further possible cause of +.B EIO +on networked filesystems is when an advisory lock had been taken +out on the file descriptor and this lock has been lost. +See the +.I "Lost locks" +section of +.BR fcntl (2) +for further details. +.SH CONFORMING TO +None, this is a Linux-specific system call at this point in time. +.SH NOTES +The type +.I size_t +is an unsigned integer data type specified by POSIX.1. +.PP +On Linux, +.BR read () +(and similar system calls) will transfer at most +0x7ffff000 (2,147,479,552) bytes, +returning the number of bytes actually transferred. +." commit e28cc71572da38a5a12c1cfe4d7032017adccf69 +(This is true on both 32-bit and 64-bit systems.) +.SH BUGS +None yet! +.SH SEE ALSO +.BR close (2), +.BR open (2), +.BR openat (2), +.BR read (2), +.BR fread (3)
On 7/4/20 4:02 PM, Greg Kroah-Hartman wrote:
readfile(2) is a new syscall to remove the need to do the open/read/close dance for small virtual files in places like procfs or sysfs.
Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
This patch is for the man-pages project, not the kernel source tree
man2/readfile.2 | 159 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 159 insertions(+) create mode 100644 man2/readfile.2
diff --git a/man2/readfile.2 b/man2/readfile.2 new file mode 100644 index 000000000000..449e722c3442 --- /dev/null +++ b/man2/readfile.2 @@ -0,0 +1,159 @@ +." This manpage is Copyright (C) 2020 Greg Kroah-Hartman; +." and Copyright (C) 2020 The Linux Foundation +." +." %%%LICENSE_START(VERBATIM) +." Permission is granted to make and distribute verbatim copies of this +." manual provided the copyright notice and this permission notice are +." preserved on all copies. +." +." Permission is granted to copy and distribute modified versions of this +." manual under the conditions for verbatim copying, provided that the +." entire resulting derived work is distributed under the terms of a +." permission notice identical to this one. +." +." Since the Linux kernel and libraries are constantly changing, this +." manual page may be incorrect or out-of-date. The author(s) assume no +." responsibility for errors or omissions, or for damages resulting from +." the use of the information contained herein. The author(s) may not +." have taken the same level of care in the production of this manual, +." which is licensed free of charge, as they might when working +." professionally. +." +." Formatted or processed versions of this manual, if unaccompanied by +." the source, must acknowledge the copyright and authors of this work. +." %%%LICENSE_END +." +.TH READFILE 2 2020-07-04 "Linux" "Linux Programmer's Manual" +.SH NAME +readfile - read a file into a buffer +.SH SYNOPSIS +.nf +.B #include <unistd.h> +.PP +.BI "ssize_t readfile(int " dirfd ", const char *" pathname ", void *" buf \ +", size_t " count ", int " flags ); +.fi +.SH DESCRIPTION +.BR readfile () +attempts to open the file specified by +.IR pathname +and to read up to +.I count +bytes from the file into the buffer starting at +.IR buf . +It is to be a shortcut of doing the sequence of
Just my personal preference for concise language:
It replaces the sequence of
+.BR open () +and then
%s/and then /, /
+.BR read () +and then
%s/and then/, and/
+.BR close () +for small files that are read frequently, such as those in
". It reduces system call overheads especially for small files, like those in"
readfile() makes sense even if each individual file is only read once, not frequently.
Below you describe that file up to 2GiB can be read. So readfile() seems to be a shortcut for larger files too.
+.B procfs +or +.BR sysfs .
Executing readfile() generates the same file notification events as said individual calls (cf. fanotify.7, inotify.7).
+.PP +If the size of file is smaller than the value provided in +.I count +then the whole file will be copied into +.IR buf . +.PP +If the file is larger than the value provided in +.I count +then only +.I count +number of bytes will be copied into +.IR buf . +.PP +The argument +.I flags +may contain one of the following +.IR "access modes" : +.BR O_NOFOLLOW ", or " O_NOATIME . +.PP +If the pathname given in +.I pathname +is relative, then it is interpreted relative to the directory +referred to by the file descriptor +.IR dirfd . +.PP +If +.I pathname +is relative and +.I dirfd +is the special value +.BR AT_FDCWD , +then +.I pathname +is interpreted relative to the current working +directory of the calling process (like +.BR openat ()). +.PP +If +.I pathname +is absolute, then +.I dirfd +is ignored.
readfile() blocks until either the whole file has been read, the buffer is completely filled, or the system specific limit (see below) has been reached.
+.SH RETURN VALUE +On success, the number of bytes read is returned. +It is not an error if this number is smaller than the number of bytes +requested; this can happen if the file is smaller than the number of +bytes requested.
"It is not an error ..." is very vague. Are there any other cases where a file is only partially read and the number of bytes returned is less then the minimum of buffer size and file size? How would I discover truncation?
Or can I rely on the call returning either an error or said minimum number of bytes? In the latter case:
"When reading from a block device this always equals the minimum of the buffer size specified by 'count', the file size, and the system specific limit for read.2 calls (see below)."
+.PP +On error, -1 is returned, and +.I errno +is set appropriately. +.SH ERRORS +.TP +.B EFAULT +.I buf +is outside your accessible address space. +.TP +.B EINTR +The call was interrupted by a signal before any data was read; see +.BR signal (7). +.TP +.B EINVAL +.I flags +was set to a value that is not allowed. +.TP +.B EIO +I/O error. +This will happen for example when the process is in a +background process group, tries to read from its controlling terminal, +and either it is ignoring or blocking
Can we copy the description from read.2 which gives more information or refer to it?
+.B SIGTTIN +or its process group +is orphaned. +It may also occur when there is a low-level I/O error +while reading from a disk or tape. +A further possible cause of +.B EIO +on networked filesystems is when an advisory lock had been taken +out on the file descriptor and this lock has been lost. +See the +.I "Lost locks" +section of +.BR fcntl (2) +for further details.
EPERM is missing in this section. Cf. fanotify.7.
Best regards
Heinrich
+.SH CONFORMING TO +None, this is a Linux-specific system call at this point in time. +.SH NOTES +The type +.I size_t +is an unsigned integer data type specified by POSIX.1. +.PP +On Linux, +.BR read () +(and similar system calls) will transfer at most +0x7ffff000 (2,147,479,552) bytes, +returning the number of bytes actually transferred. +." commit e28cc71572da38a5a12c1cfe4d7032017adccf69 +(This is true on both 32-bit and 64-bit systems.) +.SH BUGS +None yet! +.SH SEE ALSO +.BR close (2), +.BR open (2), +.BR openat (2), +.BR read (2), +.BR fread (3)
On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote:
Here is a tiny new syscall, readfile, that makes it simpler to read small/medium sized files all in one shot, no need to do open/read/close. This is especially helpful for tools that poke around in procfs or sysfs, making a little bit of a less system load than before, especially as syscall overheads go up over time due to various CPU bugs being addressed.
Nice series, but you are 3 months late with it... Next AFD, perhaps?
Seriously, the rationale is bollocks. If the overhead of 2 extra syscalls is anywhere near the costs of the real work being done by that thing, we have already lost and the best thing to do is to throw the system away and start with saner hardware.
On Sat, Jul 04, 2020 at 08:30:40PM +0100, Al Viro wrote:
On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote:
Here is a tiny new syscall, readfile, that makes it simpler to read small/medium sized files all in one shot, no need to do open/read/close. This is especially helpful for tools that poke around in procfs or sysfs, making a little bit of a less system load than before, especially as syscall overheads go up over time due to various CPU bugs being addressed.
Nice series, but you are 3 months late with it... Next AFD, perhaps?
Perhaps :)
Seriously, the rationale is bollocks. If the overhead of 2 extra syscalls is anywhere near the costs of the real work being done by that thing, we have already lost and the best thing to do is to throw the system away and start with saner hardware.
The real-work the kernel does is almost neglegant compared to the open/close overhead of the syscalls on some platforms today. I'll post benchmarks with the next version of this patch series to hopefully show that. If not, then yeah, this isn't worth it, but it was fun to write.
thanks,
greg k-h
On Sat, Jul 04, 2020 at 04:02:46PM +0200, Greg Kroah-Hartman wrote:
Here is a tiny new syscall, readfile, that makes it simpler to read small/medium sized files all in one shot, no need to do open/read/close. This is especially helpful for tools that poke around in procfs or sysfs, making a little bit of a less system load than before, especially as syscall overheads go up over time due to various CPU bugs being addressed.
There are 4 patches in this series, the first 3 are against the kernel tree, adding the syscall logic, wiring up the syscall, and adding some tests for it.
The last patch is agains the man-pages project, adding a tiny man page to try to describe the new syscall.
General question, using this series as an illustration only:
At the risk of starting a flamewar, why is this needed? Is there a realistic usecase that would get significant benefit from this?
A lot of syscalls seem to get added that combine or refactor the functionality of existing syscalls without justifying why this is needed (or even wise). This case feels like a solution, not a primitive, so I wonder if the long-term ABI fragmentation is worth the benefit.
I ask because I'd like to get an idea of the policy on what is and is not considered a frivolous ABI extension.
(I'm sure a usecase must be in mind, but it isn't mentioned here. Certainly the time it takes top to dump the contents of /proc leaves something to be desired.)
Cheers ---Dave
linux-kselftest-mirror@lists.linaro.org