On Sat, Aug 3, 2019 at 6:03 PM Theodore Y. Ts'o tytso@mit.edu wrote:
On Sat, Aug 03, 2019 at 11:30:22AM +0200, Arnd Bergmann wrote:
I see in the ext4 code that we always try to expand i_extra_size to s_want_extra_isize in ext4_mark_inode_dirty(), and that s_want_extra_isize is always at least s_min_extra_isize, so we constantly try to expand the inode to fit.
Yes, we *try*. But we may not succeed. There may actually be a problem here if the cause is due to there simply is no space in the external xattr block, so we might try and try every time we try to modify that inode, and it would be a performance mess. If it's due to there being no room in the current transaction, then it's highly likely it will succeed the next time.
Did older versions of ext4 or ext3 ignore s_min_extra_isize when creating inodes despite EXT4_FEATURE_RO_COMPAT_EXTRA_ISIZE, or is there another possibility I'm missing?
s_min_extra_isize could get changed in order to make room for some new file system feature --- such as extended timestamps.
Ok, that explains it. I assumed s_min_extra_isize was meant to not be modifiable, and did not find a way to change it using the kernel or tune2fs, but now I can see that debugfs can set it.
If you want to pretend that file systems never get upgraded, then life is much simpler. The general approach is that for less-sophisticated customers (e.g., most people running enterprise distros) file system upgrades are not a thing. But for sophisticated users, we do try to make thing work for people who are aware of the risks / caveats / rough edges. Google won't have been able to upgrade thousands and thousands of servers in data centers all over the world if we limited ourselves to Red Hat's support restrictions. Backup / reformat / restore really isn't a practical rollout strategy for many exabytes of file systems.
It sounds like your safety checks / warnings are mostly targeted at low-information customers, no?
Yes, that seems like a reasonable compromise: just warn based on s_min_extra_isize, and assume that anyone who used debugfs to set s_min_extra_isize to a higher value from an ext3 file system during the migration to ext4 was aware of the risks already.
That leaves the question of what we should set the s_time_gran and s_time_max to on a superblock with s_min_extra_isize<16 and s_want_extra_isize>=16.
If we base it on s_min_extra_isize, we never try to set a timestamp later than 2038 and so will never fail, but anyone with a grandfathered s_min_extra_isize from ext3 won't be able to set extended timestamps on any files any more. Based on s_want_extra_isize we would keep the current behavior, but could add a custom warning in the ext4 code about the small s_min_extra_isize indicating a theoretical problem.
Arnd