It was brought to my attention that this bug from 2018 was still unresolved: 32 bit emulators like QEMU were given 64 bit hashes when running 32 bit emulation on 64 bit systems.
The personality(2) system call supports to let processes indicate that they are 32 bit Linux to the kernel. This was suggested by Teo in the original thread, so I just wired it up and it solves the problem.
Programs that need the 32 bit hash only need to issue the personality(PER_LINUX32) call and things start working.
I made a test program like this:
#include <dirent.h> #include <errno.h> #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/personality.h>
int main(int argc, char** argv) { DIR* dir; personality(PER_LINUX32); dir = opendir("/boot"); printf("dir=%p\n", dir); printf("readdir(dir)=%p\n", readdir(dir)); printf("errno=%d: %s\n", errno, strerror(errno)); return 0; }
This was compiled with an ARM32 toolchain from Bootlin using glibc 2.28 and thus suffering from the bug.
Before the patch:
$ ./readdir-bug dir=0x86000 readdir(dir)=(nil) errno=75: Value too large for defined data type
After the patch:
$ ./readdir-bug dir=0x86000 readdir(dir)=0x86020 errno=0: Success
Problem solved.
Cc: Florian Weimer fw@deneb.enyo.de Cc: Peter Maydell peter.maydell@linaro.org Cc: Andy Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Suggested-by: Theodore Ts'o tytso@mit.edu Link: https://bugs.launchpad.net/qemu/+bug/1805913 Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957 Signed-off-by: Linus Walleij linus.walleij@linaro.org --- fs/ext4/dir.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 9aa1f75409b0..3faf9edf3e92 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -27,6 +27,7 @@ #include <linux/slab.h> #include <linux/iversion.h> #include <linux/unicode.h> +#include <linux/personality.h> #include "ext4.h" #include "xattr.h"
@@ -618,6 +619,14 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
static int ext4_dir_open(struct inode * inode, struct file * filp) { + /* + * If we are currently running e.g. a 32 bit emulator on + * a 64 bit machine, the emulator will indicate that it needs + * a 32 bit personality and thus 32 bit hashes from the file + * system. + */ + if (personality(current->personality) == PER_LINUX32) + filp->f_mode |= FMODE_32BITHASH; if (IS_ENCRYPTED(inode)) return fscrypt_get_encryption_info(inode) ? -EACCES : 0; return 0;
* Linus Walleij:
It was brought to my attention that this bug from 2018 was still unresolved: 32 bit emulators like QEMU were given 64 bit hashes when running 32 bit emulation on 64 bit systems.
The personality(2) system call supports to let processes indicate that they are 32 bit Linux to the kernel. This was suggested by Teo in the original thread, so I just wired it up and it solves the problem.
Programs that need the 32 bit hash only need to issue the personality(PER_LINUX32) call and things start working.
I made a test program like this:
#include <dirent.h> #include <errno.h> #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/personality.h>
int main(int argc, char** argv) { DIR* dir; personality(PER_LINUX32); dir = opendir("/boot"); printf("dir=%p\n", dir); printf("readdir(dir)=%p\n", readdir(dir)); printf("errno=%d: %s\n", errno, strerror(errno)); return 0; }
This was compiled with an ARM32 toolchain from Bootlin using glibc 2.28 and thus suffering from the bug.
Just be sure: Is it possible to move the PER_LINUX32 setting into QEMU? (I see why not.)
However, this does not solve the issue with network file systems and other scenarios. I still think need to add a workaround to the glibc implementation.
On Tue, Mar 17, 2020 at 12:53 PM Florian Weimer fw@deneb.enyo.de wrote:
Just be sure: Is it possible to move the PER_LINUX32 setting into QEMU? (I see why not.)
I set it in the program explicitly, but what actually happens when I run it is that the binfmt handler invokes qemu-user so certainly that program can set the flag, any process can.
Yours, Linus Walleij
On Tue, 17 Mar 2020 at 11:31, Linus Walleij linus.walleij@linaro.org wrote:
It was brought to my attention that this bug from 2018 was still unresolved: 32 bit emulators like QEMU were given 64 bit hashes when running 32 bit emulation on 64 bit systems.
The personality(2) system call supports to let processes indicate that they are 32 bit Linux to the kernel. This was suggested by Teo in the original thread, so I just wired it up and it solves the problem.
Thanks for having a look at this. I'm not sure this is what QEMU needs, though. When QEMU runs, it is not a 32-bit process, it's a 64-bit process. Some of the syscalls it makes are on behalf of the guest and would need 32-bit semantics (including this one of wanting 32-bit hash sizes in directory reads). But some syscalls it makes for itself (either directly, or via libraries it's linked against including glibc and glib) -- those would still want the usual 64-bit semantics, I would have thought.
Programs that need the 32 bit hash only need to issue the personality(PER_LINUX32) call and things start working.
What in particular does this personality setting affect? My copy of the personality(2) manpage just says:
PER_LINUX32 (since Linux 2.2) [To be documented.]
which isn't very informative.
thanks -- PMM
On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell peter.maydell@linaro.org wrote:
On Tue, 17 Mar 2020 at 11:31, Linus Walleij linus.walleij@linaro.org wrote:
It was brought to my attention that this bug from 2018 was still unresolved: 32 bit emulators like QEMU were given 64 bit hashes when running 32 bit emulation on 64 bit systems.
The personality(2) system call supports to let processes indicate that they are 32 bit Linux to the kernel. This was suggested by Teo in the original thread, so I just wired it up and it solves the problem.
Thanks for having a look at this. I'm not sure this is what QEMU needs, though. When QEMU runs, it is not a 32-bit process, it's a 64-bit process. Some of the syscalls it makes are on behalf of the guest and would need 32-bit semantics (including this one of wanting 32-bit hash sizes in directory reads). But some syscalls it makes for itself (either directly, or via libraries it's linked against including glibc and glib) -- those would still want the usual 64-bit semantics, I would have thought.
The "personality" thing is a largely unused facility that a process can use to say it has this generic behaviour. In this case we say we have the PER_LINUX32 personality so it will make the process evoke 32bit behaviours inside the kernel when dealing with this process.
Which I (loosely) appreciate that we want to achieve.
Programs that need the 32 bit hash only need to issue the personality(PER_LINUX32) call and things start working.
What in particular does this personality setting affect? My copy of the personality(2) manpage just says:
PER_LINUX32 (since Linux 2.2) [To be documented.]
which isn't very informative.
It's not a POSIX thing (not part of the Single Unix Specification) so as with most Linux things it has some fuzzy semantics defined by the community...
I usually just go to the source.
If you grep the kernel what turns up is a bunch of architecture-specific behaviors on arm64, ia64, mips, parisc, powerpc, s390, sparc. They are very minor.
On x86_64 the semantic effect is none so this would work for any kind of 32bit userspace on x86_64. It would be the first time this flag has any effect at all on x86_64, but arguably a useful one.
I would not be surprised if running say i586 on x86_64 has the same problem, just that noone has reported it yet. But what do I know. If they have the same problem they can use the same solution. Hm QEMU supports emulating i586 or even i386 ... maybe you could test? Or tell me how to test? We might be solving a bigger issue here.
Yours, Linus Walleij
On Thu, 19 Mar 2020 at 15:13, Linus Walleij linus.walleij@linaro.org wrote:
On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell peter.maydell@linaro.org wrote:
What in particular does this personality setting affect? My copy of the personality(2) manpage just says:
PER_LINUX32 (since Linux 2.2) [To be documented.]
which isn't very informative.
It's not a POSIX thing (not part of the Single Unix Specification) so as with most Linux things it has some fuzzy semantics defined by the community...
I usually just go to the source.
If we're going to decide that this is the way to say "give me 32-bit semantics" we need to actually document that and define in at least broad terms what we mean by it, so that when new things are added that might or might not check against the setting there is a reference defining whether they should or not, and so that userspace knows what it's opting into by setting the flag. The kernel loves undocumented APIs but userspace consumers of them are not so enamoured :-)
As a concrete example, should "give me 32-bit semantics via PER_LINUX32" mean "mmap should always return addresses within 4GB" ? That would seem like it would make sense -- but on the other hand it would make it absolutely unusable for QEMU's purposes, because we want to be able to do full 64-bit mmap() for our own internal allocations. (This is a specific example of the general case that I'm dubious about having this be a process-wide switch, because QEMU is a single process which sometimes makes syscalls on its own behalf and sometimes makes syscalls on behalf of the guest program it is emulating. We want 32-bit semantics for the latter but if we also get them for the former then there might be unintended side effects.)
I would not be surprised if running say i586 on x86_64 has the same problem, just that noone has reported it yet. But what do I know. If they have the same problem they can use the same solution. Hm QEMU supports emulating i586 or even i386 ... maybe you could test?
Native i586 code on x86-64 should be fine, because it will be using the compat syscalls, which ext4 already ensures get the 32-bit sized hash (see hash2pos() and friends).
thanks -- PMM
On Thu, Mar 19, 2020 at 4:25 PM Peter Maydell peter.maydell@linaro.org wrote:
On Thu, 19 Mar 2020 at 15:13, Linus Walleij linus.walleij@linaro.org wrote:
On Tue, Mar 17, 2020 at 12:58 PM Peter Maydell peter.maydell@linaro.org wrote:
What in particular does this personality setting affect? My copy of the personality(2) manpage just says:
PER_LINUX32 (since Linux 2.2) [To be documented.]
which isn't very informative.
It's not a POSIX thing (not part of the Single Unix Specification) so as with most Linux things it has some fuzzy semantics defined by the community...
I usually just go to the source.
If we're going to decide that this is the way to say "give me 32-bit semantics" we need to actually document that and define in at least broad terms what we mean by it, so that when new things are added that might or might not check against the setting there is a reference defining whether they should or not, and so that userspace knows what it's opting into by setting the flag. The kernel loves undocumented APIs but userspace consumers of them are not so enamoured :-)
OK I guess we can at least take this opportunity to add some kerneldoc to the include file.
As a concrete example, should "give me 32-bit semantics via PER_LINUX32" mean "mmap should always return addresses within 4GB" ? That would seem like it would make sense --
Incidentally that thing in particular has its own personality flag (personalities are additive, it's a bit schizophrenic) so PER_LINUX_32BIT is defined as: PER_LINUX_32BIT = 0x0000 | ADDR_LIMIT_32BIT, and that is specifically for limiting the address space to 32bit.
There is also PER_LINUX32_3GB for a 3GB lowmem limit.
Since the personality is kind of additive, if we want a flag *specifically* for indicating that we want 32bit hashes from the file system, there are bits left so we can provide that.
Is this what we want to do? I just think we shouldn't decide on that lightly as we will be using up personality bug bits, but sometimes you have to use them.
PER_LINUX32 as it stands means 32bit personality but very specifically does not include memory range limitations since that has its own flags.
Yours, Linus Walleij
On Thu, Mar 19, 2020 at 11:23:33PM +0100, Linus Walleij wrote:
OK I guess we can at least take this opportunity to add some kerneldoc to the include file.
As a concrete example, should "give me 32-bit semantics via PER_LINUX32" mean "mmap should always return addresses within 4GB" ? That would seem like it would make sense --
Incidentally that thing in particular has its own personality flag (personalities are additive, it's a bit schizophrenic) so PER_LINUX_32BIT is defined as: PER_LINUX_32BIT = 0x0000 | ADDR_LIMIT_32BIT, and that is specifically for limiting the address space to 32bit.
There is also PER_LINUX32_3GB for a 3GB lowmem limit.
Since the personality is kind of additive, if we want a flag *specifically* for indicating that we want 32bit hashes from the file system, there are bits left so we can provide that.
Is this what we want to do? I just think we shouldn't decide on that lightly as we will be using up personality bug bits, but sometimes you have to use them.
I've been looking at the personality bug bits more detailed, and it looks... messy. Do we pick a new personality, or do we grab another unique flag?
Another possibility, which would be messier for qemu, would be use a flag set via fcntl. That would require qemu from noticing when the guest is calling open, openat, or openat2, and then inserting a fcntl system call to set the 32-bit readdir mode. That's cleaner from the kernel interface complexity perspective, but it's messier for qemu.
- Ted
On Tue, 24 Mar 2020 at 02:34, Theodore Y. Ts'o tytso@mit.edu wrote:
Another possibility, which would be messier for qemu, would be use a flag set via fcntl. That would require qemu from noticing when the guest is calling open, openat, or openat2, and then inserting a fcntl system call to set the 32-bit readdir mode. That's cleaner from the kernel interface complexity perspective, but it's messier for qemu.
On the contrary, that would be a much better interface for QEMU. We always know when we're doing an open-syscall on behalf of the guest, and it would be trivial to make the fcntl() call then. That would ensure that we don't accidentally get the '32-bit semantics' on file descriptors QEMU opens for its own purposes, and wouldn't leave us open to the risk in future that setting the PER_LINUX32 flag for all of QEMU causes unexpected extra behaviour in future kernels that would be correct for the guest binary but wrong/broken for QEMU's own internals.
thanks -- PMM
On Tue, Mar 24, 2020 at 09:29:58AM +0000, Peter Maydell wrote:
On the contrary, that would be a much better interface for QEMU. We always know when we're doing an open-syscall on behalf of the guest, and it would be trivial to make the fcntl() call then. That would ensure that we don't accidentally get the '32-bit semantics' on file descriptors QEMU opens for its own purposes, and wouldn't leave us open to the risk in future that setting the PER_LINUX32 flag for all of QEMU causes unexpected extra behaviour in future kernels that would be correct for the guest binary but wrong/broken for QEMU's own internals.
If using a flag set by fcntl is better for qemu, then by all means let's go with that instead of using a personality flag/number.
Linus, do you have what you need to do a respin of the patch?
- Ted
On Tue, Mar 24, 2020 at 7:48 PM Theodore Y. Ts'o tytso@mit.edu wrote:
On Tue, Mar 24, 2020 at 09:29:58AM +0000, Peter Maydell wrote:
On the contrary, that would be a much better interface for QEMU. We always know when we're doing an open-syscall on behalf of the guest, and it would be trivial to make the fcntl() call then. That would ensure that we don't accidentally get the '32-bit semantics' on file descriptors QEMU opens for its own purposes, and wouldn't leave us open to the risk in future that setting the PER_LINUX32 flag for all of QEMU causes unexpected extra behaviour in future kernels that would be correct for the guest binary but wrong/broken for QEMU's own internals.
If using a flag set by fcntl is better for qemu, then by all means let's go with that instead of using a personality flag/number.
Linus, do you have what you need to do a respin of the patch?
Absolutely, I'm a bit occupied this week but I will try to get to it early next week!
Thanks a lot for the directions here, it's highly valuable.
Yours, Linus Walleij
On Mar 17, 2020, at 5:31 AM, Linus Walleij linus.walleij@linaro.org wrote:
It was brought to my attention that this bug from 2018 was still unresolved: 32 bit emulators like QEMU were given 64 bit hashes when running 32 bit emulation on 64 bit systems.
The personality(2) system call supports to let processes indicate that they are 32 bit Linux to the kernel. This was suggested by Teo in the original thread, so I just wired it up and it solves the problem.
Programs that need the 32 bit hash only need to issue the personality(PER_LINUX32) call and things start working.
I'm generally with with this from the ext4 point of view.
That said, I'd think it would be preferable for ease of use and compatibility that applications didn't have to be modified (e.g. have QEMU or glibc internally set PER_LINUX32 for this process before the 32-bit syscall is called, given that it knows whether it is emulating a 32-bit runtime or not).
The other way to handle this would be for ARM32 to check the PER_LINUX32 flag via is_compat_task() so that there wouldn't need to be any changes to the ext4 code at all?
Cheers, Andreas
I made a test program like this:
#include <dirent.h> #include <errno.h> #include <stdio.h> #include <string.h> #include <sys/types.h> #include <sys/personality.h>
int main(int argc, char** argv) { DIR* dir; personality(PER_LINUX32); dir = opendir("/boot"); printf("dir=%p\n", dir); printf("readdir(dir)=%p\n", readdir(dir)); printf("errno=%d: %s\n", errno, strerror(errno)); return 0; }
This was compiled with an ARM32 toolchain from Bootlin using glibc 2.28 and thus suffering from the bug.
Before the patch:
$ ./readdir-bug dir=0x86000 readdir(dir)=(nil) errno=75: Value too large for defined data type
After the patch:
$ ./readdir-bug dir=0x86000 readdir(dir)=0x86020 errno=0: Success
Problem solved.
Cc: Florian Weimer fw@deneb.enyo.de Cc: Peter Maydell peter.maydell@linaro.org Cc: Andy Lutomirski luto@kernel.org Cc: stable@vger.kernel.org Suggested-by: Theodore Ts'o tytso@mit.edu Link: https://bugs.launchpad.net/qemu/+bug/1805913 Link: https://lore.kernel.org/lkml/87bm56vqg4.fsf@mid.deneb.enyo.de/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205957 Signed-off-by: Linus Walleij linus.walleij@linaro.org
fs/ext4/dir.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 9aa1f75409b0..3faf9edf3e92 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -27,6 +27,7 @@ #include <linux/slab.h> #include <linux/iversion.h> #include <linux/unicode.h> +#include <linux/personality.h> #include "ext4.h" #include "xattr.h"
@@ -618,6 +619,14 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
static int ext4_dir_open(struct inode * inode, struct file * filp) {
- /*
* If we are currently running e.g. a 32 bit emulator on
* a 64 bit machine, the emulator will indicate that it needs
* a 32 bit personality and thus 32 bit hashes from the file
* system.
*/
- if (personality(current->personality) == PER_LINUX32)
if (IS_ENCRYPTED(inode)) return fscrypt_get_encryption_info(inode) ? -EACCES : 0; return 0;filp->f_mode |= FMODE_32BITHASH;
-- 2.24.1
Cheers, Andreas
On Tue, Mar 17, 2020 at 11:29 PM Andreas Dilger adilger@dilger.ca wrote:
That said, I'd think it would be preferable for ease of use and compatibility that applications didn't have to be modified (e.g. have QEMU or glibc internally set PER_LINUX32 for this process before the 32-bit syscall is called, given that it knows whether it is emulating a 32-bit runtime or not).
I think the plan is that QEMU set this, either directly when you run qemu-user or through the binfmt handler. https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html
IIUC the binfmt handler is invoked when you try to execute ELF so-and-so-for-arch-so-and-so when you are not that arch yourself. So that can set up this personality.
Not that I know how the binfmt handler works, I am just assuming that thing is some program that can issue syscalls. It JustWorks for me after installing the QEMU packages...
The problem still stands that userspace need to somehow inform kernelspace that 32bit is going on, and this was the best I could think of.
Yours, Linus Walleij
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.5.9, v5.4.25, v4.19.109, v4.14.173, v4.9.216, v4.4.216.
v5.5.9: Build OK! v5.4.25: Build OK! v4.19.109: Failed to apply! Possible dependencies: 592ddec7578a ("ext4: use IS_ENCRYPTED() to check encryption status") b886ee3e778e ("ext4: Support case-insensitive file name lookups")
v4.14.173: Failed to apply! Possible dependencies: 2ee6a576be56 ("fs, fscrypt: add an S_ENCRYPTED inode flag") 69fe2d75dd91 ("btrfs: make the delalloc block rsv per inode") 79f015f21653 ("btrfs: cleanup extent locking sequence") 8b62f87bad9c ("Btrfs: rework outstanding_extents") ae5e165d855d ("fs: new API for handling inode->i_version") b886ee3e778e ("ext4: Support case-insensitive file name lookups") ee73f9a52a34 ("ext4: convert to new i_version API")
v4.9.216: Failed to apply! Possible dependencies: 39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution") 5e9d0e3d9ea6 ("powerpc/lib: Fix randconfig build failure in sstep.c") 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl") 7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally") 9cf09d68b89a ("arm64: xen: Enable user access before a privcmd hvc call") b886ee3e778e ("ext4: Support case-insensitive file name lookups") bd38967d406f ("arm64: Factor out PAN enabling/disabling into separate uaccess_* macros") ee73f9a52a34 ("ext4: convert to new i_version API")
v4.4.216: Failed to apply! Possible dependencies: 39bc88e5e38e ("arm64: Disable TTBR0_EL1 during normal kernel execution") 4dffbfc48d65 ("arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP") 57f4959bad0a ("arm64: kernel: Add support for User Access Override") 6c94f27ac847 ("arm64: switch to relative exception tables") 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl") 7c0f6ba682b9 ("Replace <asm/uaccess.h> with <linux/uaccess.h> globally") 7dd01aef0557 ("arm64: trap userspace "dc cvau" cache operation on errata-affected core") 9b7365fc1c82 ("ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support") 9e8e865bbe29 ("arm64: unify idmap removal") b886ee3e778e ("ext4: Support case-insensitive file name lookups") d5370f754875 ("arm64: prefetch: add alternative pattern for CPUs without a prefetcher") e5bc22a42e4d ("arm64/efi: split off EFI init and runtime code for reuse by 32-bit ARM") e7227d0e528f ("arm64: Cleanup SCTLR flags") ee73f9a52a34 ("ext4: convert to new i_version API") f7d924894265 ("arm64/efi: refactor EFI init and runtime code for reuse by 32-bit ARM")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
linux-stable-mirror@lists.linaro.org