On Fri, Aug 2, 2019 at 5:43 PM Theodore Y. Ts'o tytso@mit.edu wrote:
On Fri, Aug 02, 2019 at 12:39:41PM +0200, Arnd Bergmann wrote:
Is it correct to assume that this kind of file would have to be created using the ext3.ko file system implementation that was removed in linux-4.3, but not usiing ext2.ko or ext4.ko (which would always set the extended timestamps even in "-t ext2" or "-t ext3" mode)?
Correct. Some of the enterprise distro's were using ext4 to support "mount -t ext3" even before 4.3. There's a CONFIG option to enable using ext4 for ext2 or ext3 if they aren't enabled.
If we check for s_min_extra_isize instead of s_inode_size to determine s_time_gran/s_time_max, we would warn at mount time as well as and consistently truncate all timestamps to full 32-bit seconds, regardless of whether there is actually space or not.
Alternatively, we could warn if s_min_extra_isize is too small, but use i_inode_size to determine s_time_gran/s_time_max anyway.
Even with ext4, s_min_extra_isize doesn't guarantee that will be able to expand the inode. This can fail if (a) we aren't able to expand existing the transaction handle because there isn't enough space in the journal, or (b) there is already an external xattr block which is also full, so there is no space to evacuate an extended attribute out of the inode's extra space.
I must have misunderstood what the field says. I expected that with s_min_extra_isize set beyond the nanosecond fields, there would be a guarantee that all inodes have at least as many extra bytes already allocated. What circumstances would lead to an i_extra_isize smaller than s_min_extra_isize?
We could be more aggressive by trying to expand make room in the inode in ext4_iget (when we're reading in the inode, assuming the file system isn't mounted read/only), instead of in the middle of mark_inode_dirty(). That will eliminate failure mode (a) --- which is statistically rare --- but it won't eliminate failure mode (b).
Ultimately, the question is which is worse: having a timestamp be wrong, or randomly dropping an xattr from the inode to make room for the extended timestamp. We've come down on it being less harmful to have the timestamp be wrong.
But again, this is a pretty rare case. I'm not convinced it's worth stressing about, since it's going to require multiple things to go wrong before a timestamp will be bad.
Agreed, I'm not overly worried about this happening frequently, I'd just feel better if we could reliably warn about the few instances that might theoretically be affected.
Arnd