The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
Fix this by explicitly casting the expressions to loff_t.
Cc: linux-kernel@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: stable@vger.kernel.org Reviewed-by: Anton Altaparmakov anton@tuxera.com Signed-off-by: Ari Sundholm ari@tuxera.com --- fs/read_write.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/read_write.c b/fs/read_write.c index 0074afa7ecb3..64166e74adc5 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1431,7 +1431,8 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, return -ETXTBSY;
/* Ensure offsets don't wrap. */ - if (pos_in + count < pos_in || pos_out + count < pos_out) + if ((loff_t)(pos_in + count) < pos_in || + (loff_t)(pos_out + count) < pos_out) return -EOVERFLOW;
/* Shorten the copy to EOF */
On Mon, Feb 07, 2022 at 02:07:11PM +0200, Ari Sundholm wrote:
The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
I must be slow this morning, but... which values of pos_in and count are caught by your check, but not by the original?
- if (pos_in + count < pos_in || pos_out + count < pos_out)
- if ((loff_t)(pos_in + count) < pos_in ||
(loff_t)(pos_out + count) < pos_out)
Example, please. Why do you need that comparison to be signed?
On Mon, Feb 07, 2022 at 02:58:59PM +0000, Al Viro wrote:
On Mon, Feb 07, 2022 at 02:07:11PM +0200, Ari Sundholm wrote:
The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
I must be slow this morning, but... which values of pos_in and count are caught by your check, but not by the original?
- if (pos_in + count < pos_in || pos_out + count < pos_out)
- if ((loff_t)(pos_in + count) < pos_in ||
(loff_t)(pos_out + count) < pos_out)
Example, please. Why do you need that comparison to be signed?
Note that we explicitly truncate count so we won't get past the EOF of file_in right below that check and the check in generic_write_check_limits() truncates count so we won't get past ->s_maxbytes on the filesystem we are writing to.
If both source and destination allow arbitrary offsets, we should not fail on copy that crosses from 2^63-1 to 2^63. Your variant will do just that.
It's multiples of 2^64 that we should never attempt to cross, no matter what.
IOW, what values of pos_in, pos_out, count, input file size and output filesystem file size limit do you think should be rejected with -EOVERFLOW here?
Hello, Al,
On 2/7/22 16:58, Al Viro wrote:
On Mon, Feb 07, 2022 at 02:07:11PM +0200, Ari Sundholm wrote:
The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
I must be slow this morning, but... which values of pos_in and count are caught by your check, but not by the original?
Thank you for your response and questions.
Assuming an x86-64 target platform, please consider:
loff_t pos_out = 0x7FFFFFFFFFFEFFFFLL; and uint64_t count = 65537;
The type of the expression (pos_out + count) is a 64-bit unsigned type, by C's integer promotion rules. Its value is 0x8000000000000000ULL, that is, bit 63 is set.
The comparison (pos_out + count) < pos_out, again due to C's integer promotion rules, is unsigned. Thus, the comparison, in this case, is equivalent to:
0x8000000000000000ULL < 0x7FFFFFFFFFFEFFFFULL,
which is false. Please note that the LHS is not expressible as a positive integer of type loff_t. With larger values for count, the problem should become quite obvious, as some the offsets within the file would not be expressible as positive integers of type loff_t. But I digress. As we can see above, the overflow is missed.
With the LHS explicitly cast to loff_t, the comparison is equivalent to:
0x8000000000000000LL < 0x7FFFFFFFFFFEFFFFLL,
which is true, as the LHS is negative.
This has also been verified in practice, and was detected when running tests on special cases of the copy_file_range syscall on different filesystems.
- if (pos_in + count < pos_in || pos_out + count < pos_out)
- if ((loff_t)(pos_in + count) < pos_in ||
(loff_t)(pos_out + count) < pos_out)
Example, please. Why do you need that comparison to be signed?
Please see the above.
I also created a small test program one can try on Compiler Explorer: https://godbolt.org/z/e76rb3Ec9
Please let me know if there are any further concerns.
Best regards, Ari Sundholm ari@tuxera.com
On Mon, Feb 07, 2022 at 06:44:55PM +0200, Ari Sundholm wrote:
Hello, Al,
On 2/7/22 16:58, Al Viro wrote:
On Mon, Feb 07, 2022 at 02:07:11PM +0200, Ari Sundholm wrote:
The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
I must be slow this morning, but... which values of pos_in and count are caught by your check, but not by the original?
Thank you for your response and questions.
Assuming an x86-64 target platform, please consider:
loff_t pos_out = 0x7FFFFFFFFFFEFFFFLL; and uint64_t count = 65537;
The type of the expression (pos_out + count) is a 64-bit unsigned type, by C's integer promotion rules. Its value is 0x8000000000000000ULL, that is, bit 63 is set.
The comparison (pos_out + count) < pos_out, again due to C's integer promotion rules, is unsigned. Thus, the comparison, in this case, is equivalent to:
0x8000000000000000ULL < 0x7FFFFFFFFFFEFFFFULL,
which is false. Please note that the LHS is not expressible as a positive integer of type loff_t. With larger values for count, the problem should become quite obvious, as some the offsets within the file would not be expressible as positive integers of type loff_t. But I digress. As we can see above, the overflow is missed.
With the LHS explicitly cast to loff_t, the comparison is equivalent to:
0x8000000000000000LL < 0x7FFFFFFFFFFEFFFFLL,
which is true, as the LHS is negative.
This has also been verified in practice, and was detected when running tests on special cases of the copy_file_range syscall on different filesystems.
Er... I still don't see the problem here. If the destination filesystem explicitly allows offsets in excess of 2^63, what's the point in that -EOVERFLOW? And if it doesn't, you'll get count truncated by generic_write_check_limits(), down to the amount remaining until the fs limit...
Same on the input side - if your source file is at least 2^63, what's the problem? And if not, you'll get count capped by file size - pos_in, right under that check...
Which filesystems had been involved and what was the test?
On 2/7/22 7:07 PM, Al Viro wrote:
On Mon, Feb 07, 2022 at 06:44:55PM +0200, Ari Sundholm wrote:
Hello, Al,
On 2/7/22 16:58, Al Viro wrote:
On Mon, Feb 07, 2022 at 02:07:11PM +0200, Ari Sundholm wrote:
The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
I must be slow this morning, but... which values of pos_in and count are caught by your check, but not by the original?
Thank you for your response and questions.
Assuming an x86-64 target platform, please consider:
loff_t pos_out = 0x7FFFFFFFFFFEFFFFLL; and uint64_t count = 65537;
The type of the expression (pos_out + count) is a 64-bit unsigned type, by C's integer promotion rules. Its value is 0x8000000000000000ULL, that is, bit 63 is set.
The comparison (pos_out + count) < pos_out, again due to C's integer promotion rules, is unsigned. Thus, the comparison, in this case, is equivalent to:
0x8000000000000000ULL < 0x7FFFFFFFFFFEFFFFULL,
which is false. Please note that the LHS is not expressible as a positive integer of type loff_t. With larger values for count, the problem should become quite obvious, as some the offsets within the file would not be expressible as positive integers of type loff_t. But I digress. As we can see above, the overflow is missed.
With the LHS explicitly cast to loff_t, the comparison is equivalent to:
0x8000000000000000LL < 0x7FFFFFFFFFFEFFFFLL,
which is true, as the LHS is negative.
This has also been verified in practice, and was detected when running tests on special cases of the copy_file_range syscall on different filesystems.
Er... I still don't see the problem here. If the destination filesystem explicitly allows offsets in excess of 2^63, what's the point in that -EOVERFLOW? And if it doesn't, you'll get count truncated by generic_write_check_limits(), down to the amount remaining until the fs limit...
Same on the input side - if your source file is at least 2^63, what's the problem? And if not, you'll get count capped by file size - pos_in, right under that check...
Oops. You are of course correct - the patch is broken for files with unsigned offsets. This stems from the mistaken assumption that the comparison was supposed to be a signed one all along. Regardless, it seems an appropriate amount of flagellation is in order for disregarding the existence of files with unsigned offsets.
However, my concerns with the behavior of generic_copy_file_checks() are not limited to just this line, although just improving the check would be a considerable step forward. If I may, I would like to make some points to clarify what I find the problems with generic_copy_file_checks().
At the surface level, the comparison in question, be it signed or unsigned, should consider and match the signedness of each of the files. *Both* the original code and the patch seem wrong to me here. It is also remarkable that the far more common case of signed offsets is the one not considered here.
As for more fundamental issues, first, the behavior is inconsistent with pread64 and pwrite64, for instance. Using the same offset and length as those of the out file in the example given, both pread64 and pwrite64 immediately return -EINVAL, as rw_verify_area(), which performs these checks correctly, is called in vfs_read() and vfs_write() *before* tampering with the length of the read or write request.
The second fundamental issue is, indeed, the very tampering with the length of the copy before properly checking the ranges. Especially so in the case of signed offsets. The operation, as a whole, must fail, as the requested range within the output file exceeds what can be expressed using loff_t. I think it would be wise to follow the lead of p{read,write}64 here, and fail early, as the range(s) being invalid is independent of any filesystem-specific considerations (apart from the offsets being signed). Returning partial success is not very useful, as userspace cannot tell anything is wrong, and will call copy_file_range() again in an attempt to complete the copy, inevitably forcing an error to be returned.
(Basically, what I am trying to argue here that this is *categorically* a different case from errors arising from the state of a specific volume, such as a write failing due to ENOSPC, or exceeding some limit dictated by the environment or filesystem specification. The range(s) involved in the operation are simply inexpressible with the datatypes used.)
Which filesystems had been involved and what was the test?
To demonstrate the variation in behavior both between copy_file_range and p{read,write}64 and different filesystem implementations, I carried out a small experiment, where the equivalent of the following sequence was run from userspace (pseudocode, most error handling omitted):
const loff_t orig_pos_in = 0; const loff_t orig_pos_out = 0x7FFFFFFFFFFEFFFFLL; const size_t orig_length = 65537; loff_t pos_in = orig_pos_in; loff_t pos_out = orig_pos_out; size_t length = orig_length; ssize_t ret, written = 0; int fd_in = mkstemp(...); int fd_out = mkstemp(...); const char buf[128k] = { whatever };
write(fd_in, buf, 128k); /* Magically always succeeds. */
while (written < orig_length && (ret = copy_file_range(fd_in, &pos_in, fd_out, &pos_out, length, 0)) > 0) { written += ret; length -= ret; }
pos_out = orig_pos_out; length = orig_length; written = 0; while (written < orig_length && (ret = pwrite(fd_out, buf, length, pos_out) > 0) { written += ret; pos_out += ret; length -= ret; }
(The range parameters are shamelessly stolen from xfstests test case generic/564.)
Then, the behavior for various filesystems was observed:
btrfs, xfs: - copy_file_range is called twice. - First copy_file_range copies 65536 bytes. Partial success. - Second copy_file_range returns -EFBIG. - pwrite is called once, and returns -EINVAL, having written nothing.
exfat, ext{2,3,4}, f2fs, fuse2fs, ntfs3, reiserfs, vfat: - copy_file_range is called once, and returns -EFBIG. - pwrite is called once, and returns -EINVAL, having written nothing.
hfsplus: - copy_file_range is called once, and returns -EINVAL. - pwrite is called once, and returns -EINVAL, having written nothing.
Reading the source code of vfs_read() and a few manual test runs show that pread64 has similar behavior to pwrite64 when invoked with similar parameters.
Unlike in the case of pread64 and pwrite64, the copy range operation (after the length was reduced) is allowed to continue on to filesystem-specific code, where there is some variation in how the operation continues. Remarkably, with the sole exception of hfsplus, the end result is the same regardless of whether the filesystem implementation and specification support sparse files. Sure, it is perfectly POSIX'y to allow the filesystem implementation perform a partial write and only return an error on the second call, but the difference in behavior with pread64 and pwrite64 is very odd, and seems unnecessary.
Please note that, had the experiment been run on files with unsigned offsets, and the copy parameters been set in an analogous manner so that they cause unsigned overflow, generic_copy_file_checks() would reject the operation with -EOVERFLOW. Why is the same not done for signed offsets? Why is there a difference with p{read,write}64?
Best regards, Ari Sundholm ari@tuxera.com
From: Ari Sundholm
Sent: 07 February 2022 12:07
The function generic_copy_file_checks() checks that the ends of the input and output file ranges do not overflow. Unfortunately, there is an issue with the check itself.
Due to the integer promotion rules in C, the expressions (pos_in + count) and (pos_out + count) have an unsigned type because the count variable has the type uint64_t. Thus, in many cases where we should detect signed integer overflow to have occurred (and thus one or more of the ranges being invalid), the expressions will instead be interpreted as large unsigned integers. This means the check is broken.
Fix this by explicitly casting the expressions to loff_t.
...
fs/read_write.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/read_write.c b/fs/read_write.c index 0074afa7ecb3..64166e74adc5 100644 --- a/fs/read_write.c +++ b/fs/read_write.c @@ -1431,7 +1431,8 @@ static int generic_copy_file_checks(struct file *file_in, loff_t pos_in, return -ETXTBSY;
/* Ensure offsets don't wrap. */
- if (pos_in + count < pos_in || pos_out + count < pos_out)
- if ((loff_t)(pos_in + count) < pos_in ||
return -EOVERFLOW;(loff_t)(pos_out + count) < pos_out)
Hard to convince myself that is right. The old code is the standard check for unsigned addition overflow. The new one is just odd.
If pos_in is guaranteed to be +ve in a signed variable you can check: count < (1ull << 63) - pos_in since the RHS is then guaranteed not to wrap.
David
- Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
linux-stable-mirror@lists.linaro.org