Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the disk size, and trigger a BUG in ext4_ind_remove_space() [3]. According to the comment doc in ext4_ind_remove_space() the 'end' block parameter needs to be one block after the last block to remove. In the case when the BUG is triggered it points to the last block on a 4GB virtual disk image. This is calculated in ext4_ind_remove_space() in [4]. This patch adds a check that ensure the length + offest to be within the valid range and returns -ENOSPC error code in case it is invalid.
LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d72133... LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 LINK: [3] https://elixir.bootlin.com/linux/v5.17-rc8/source/fs/ext4/indirect.c#L1244 LINK: [4] https://elixir.bootlin.com/linux/v5.17-rc8/source/fs/ext4/indirect.c#L1234
Cc: Theodore Ts'o tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Ritesh Harjani riteshh@linux.ibm.com Cc: linux-ext4@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org
Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality") Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk tadeusz.struk@linaro.org -- v2: Change sbi->s_blocksize to inode->i_sb->s_blocksize in maxlength computation. --- fs/ext4/inode.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 01c9e4f743ba..355384007d11 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3924,7 +3924,8 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) struct super_block *sb = inode->i_sb; ext4_lblk_t first_block, stop_block; struct address_space *mapping = inode->i_mapping; - loff_t first_block_offset, last_block_offset; + loff_t first_block_offset, last_block_offset, max_length; + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); handle_t *handle; unsigned int credits; int ret = 0, ret2 = 0; @@ -3967,6 +3968,16 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) offset; }
+ /* + * For punch hole the length + offset needs to be at least within + * one block before last + */ + max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize; + if (offset + length >= max_length) { + ret = -ENOSPC; + goto out_mutex; + } + if (offset & (sb->s_blocksize - 1) || (offset + length) & (sb->s_blocksize - 1)) { /*
On Tue, Mar 15, 2022 at 02:54:39PM -0700, Tadeusz Struk wrote:
Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the disk size, and trigger a BUG in ext4_ind_remove_space() [3]. According to the comment doc in ext4_ind_remove_space() the 'end' block parameter needs to be one block after the last block to remove. In the case when the BUG is triggered it points to the last block on a 4GB virtual disk image. This is calculated in ext4_ind_remove_space() in [4]. This patch adds a check that ensure the length + offest to be within the valid range and returns -ENOSPC error code in case it is invalid.
Why is the check in vfs_fallocate() not working for this?
https://elixir.bootlin.com/linux/v5.17-rc8/source/fs/open.c#L300
Also why do other file systems not fail? Is it because ext4 is special due to the end block needing to be one block after the last. That seems to imply the last block can't be used or there is some off by one issue somewhere?
Ira
LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d72133... LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000 LINK: [3] https://elixir.bootlin.com/linux/v5.17-rc8/source/fs/ext4/indirect.c#L1244 LINK: [4] https://elixir.bootlin.com/linux/v5.17-rc8/source/fs/ext4/indirect.c#L1234
Cc: Theodore Ts'o tytso@mit.edu Cc: Andreas Dilger adilger.kernel@dilger.ca Cc: Ritesh Harjani riteshh@linux.ibm.com Cc: linux-ext4@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org
Fixes: a4bb6b64e39a ("ext4: enable "punch hole" functionality") Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com Signed-off-by: Tadeusz Struk tadeusz.struk@linaro.org -- v2: Change sbi->s_blocksize to inode->i_sb->s_blocksize in maxlength computation.
fs/ext4/inode.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 01c9e4f743ba..355384007d11 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3924,7 +3924,8 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) struct super_block *sb = inode->i_sb; ext4_lblk_t first_block, stop_block; struct address_space *mapping = inode->i_mapping;
- loff_t first_block_offset, last_block_offset;
- loff_t first_block_offset, last_block_offset, max_length;
- struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); handle_t *handle; unsigned int credits; int ret = 0, ret2 = 0;
@@ -3967,6 +3968,16 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) offset; }
- /*
* For punch hole the length + offset needs to be at least within
* one block before last
*/
- max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize;
- if (offset + length >= max_length) {
ret = -ENOSPC;
goto out_mutex;
- }
- if (offset & (sb->s_blocksize - 1) || (offset + length) & (sb->s_blocksize - 1)) { /*
-- 2.35.1
Hi Ira, On 3/22/22 09:37, Ira Weiny wrote:
On Tue, Mar 15, 2022 at 02:54:39PM -0700, Tadeusz Struk wrote:
Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the disk size, and trigger a BUG in ext4_ind_remove_space() [3]. According to the comment doc in ext4_ind_remove_space() the 'end' block parameter needs to be one block after the last block to remove. In the case when the BUG is triggered it points to the last block on a 4GB virtual disk image. This is calculated in ext4_ind_remove_space() in [4]. This patch adds a check that ensure the length + offest to be within the valid range and returns -ENOSPC error code in case it is invalid.
Why is the check in vfs_fallocate() not working for this?
https://elixir.bootlin.com/linux/v5.17-rc8/source/fs/open.c#L300
Good question. From reading the comment: https://elixir.bootlin.com/linux/v5.17/source/fs/ext4/file.c#L225
it is possible that, for the bitmap-format, the limit might be smaller than the s_maxbytes.
But even for a extent-mapped file the offest+len needs to be within the first to last-1 block range for fallocate(fd, FALLOC_FL_PUNCH_HOLE, ...) If it points to the last one then it is invalid, no?
The check you pointed to in vfs code checks if offest+len goes beyond maximal file size.
Also why do other file systems not fail? Is it because ext4 is special due to the end block needing to be one block after the last. That seems to imply the last block can't be used or there is some off by one issue somewhere?
According to the comment in https://elixir.bootlin.com/linux/v5.17/source/fs/ext4/indirect.c#L1214 it has to be one block after the last to be removed.
On 3/15/22 14:54, Tadeusz Struk wrote:
Syzbot found an issue [1] in ext4_fallocate(). The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul, and offset 0x1000000ul, which, when added together exceed the disk size, and trigger a BUG in ext4_ind_remove_space() [3]. According to the comment doc in ext4_ind_remove_space() the 'end' block parameter needs to be one block after the last block to remove. In the case when the BUG is triggered it points to the last block on a 4GB virtual disk image. This is calculated in ext4_ind_remove_space() in [4]. This patch adds a check that ensure the length + offest to be within the valid range and returns -ENOSPC error code in case it is invalid.
Hi, Any feedback on this?
On Tue, Mar 15, 2022 at 02:54:39PM -0700, Tadeusz Struk wrote:
@@ -3967,6 +3968,16 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) offset; }
- /*
* For punch hole the length + offset needs to be at least within
* one block before last
*/
- max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize;
- if (offset + length >= max_length) {
ret = -ENOSPC;
goto out_mutex;
- }
I wonder if we would be better off just simply capping length to max_length? If length is set to some large value, such as LONG_MAX, it's pretty clear what the intention should be, which is to simply do the equivalent of truncating the file at offset. Perhaps we should just do that?
That being said, we should be consistent with what other file systems do when they are asked to punch a hole starting at offset and extending out to LONG_MAX.
Also, if we are going to return an error, I don't think ENOSPC is the correct error to be returning.
- Ted
On 3/31/22 07:43, Theodore Ts'o wrote:
On Tue, Mar 15, 2022 at 02:54:39PM -0700, Tadeusz Struk wrote:
@@ -3967,6 +3968,16 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length) offset; }
- /*
* For punch hole the length + offset needs to be at least within
* one block before last
*/
- max_length = sbi->s_bitmap_maxbytes - inode->i_sb->s_blocksize;
- if (offset + length >= max_length) {
ret = -ENOSPC;
goto out_mutex;
- }
I wonder if we would be better off just simply capping length to max_length? If length is set to some large value, such as LONG_MAX, it's pretty clear what the intention should be, which is to simply do the equivalent of truncating the file at offset. Perhaps we should just do that?
Don't think that would be the correct behavior. ftrucnate (or truncate) modify the file size, but fallocate with FALLOC_FL_PUNCH_HOLE should not.
man 2 fallocate says: "... The FALLOC_FL_PUNCH_HOLE flag must be ORed with FALLOC_FL_KEEP_SIZE in mode; in other words, even when punching off the end of the file, the file size (as reported by stat(2)) does not change. " that is enforced by vfs: https://elixir.bootlin.com/linux/v5.17.1/source/fs/open.c#L245
That being said, we should be consistent with what other file systems do when they are asked to punch a hole starting at offset and extending out to LONG_MAX.
For all the supported file systems, apart from ext4, only btrfs, gfs2, and xfs support fallocate and FALLOC_FL_PUNCH_HOLE mode. Looking at what they do is they round the length of the space to be freed i.e. offset + length to valid value and then perform the operation using the valid values.
https://elixir.bootlin.com/linux/v5.17.1/source/fs/gfs2/bmap.c#L2424 https://elixir.bootlin.com/linux/v5.17.1/source/fs/btrfs/file.c#L2506
For ext4 this would mean that one could only deallocate space up to the one before last block. I will change this to do the same in the next version if that's ok with you.
Also, if we are going to return an error, I don't think ENOSPC is the correct error to be returning.
I took it from man 2 fallocate, my first suspicion was that it crashed because the disk on my VM wasn't big enough. It was a bad choice.
linux-stable-mirror@lists.linaro.org