On Thursday 14 January 2016 08:04:36 Dave Chinner wrote:
On Wed, Jan 13, 2016 at 08:33:16AM -0800, Deepa Dinamani wrote:
On Tue, Jan 12, 2016 at 07:29:57PM +1100, Dave Chinner wrote:
On Mon, Jan 11, 2016 at 09:42:36PM -0800, Deepa Dinamani wrote:
On Jan 11, 2016, at 04:33, Dave Chinner david@fromorbit.com wrote:
On Wed, Jan 06, 2016 at 09:35:59PM -0800, Deepa Dinamani wrote:
- How to achieve a seamless transition? Is inode_timespec solution agreed upon to achieve 1a?
No. Just convert direct to timespec64.
The hard part here is how to split that change into logical patches per file system. We have already discussed all sorts of ways to do that, but there is no ideal solution, as you usually end up either having some really large patches, or you have to modify the same lines multiple times.
The most promising approaches are:
a) In Deepa's current patch set, some infrastructure is first introduced by changing the type from timespec to an identical inode_timespec, which lets us convert one file system at a time to inode_timespec and then change the type once they are all done. The downside is then that all file systems have to get touched twice so we end up with timespec64 everywhere.
b) A variation of that which I would do is to use have a smaller set of infrastructure first, so we can change one file system at a time to timespec64 while leaving the common structures to use timespec until all file systems are converted. The downside is the use of some conversion macros when accessing the times in the inode. When the common code is changed, those accessor macros get turned into trivial assignments that can be removed up later or changed in the same patch.
c) The opposite direction from b) is to first change the common code, but then any direct assignment between a timespec in a file system and the timespec64 in the inode/iattr/kstat/etc first needs a conversion helper so we can build cleanly, and then we do one file system at a time to remove them all again while changing the internal structures in the file system from timespec to timespec64.
An alternate approach is included in the cover letter. 3. policy for handling out of range timestamps: There was no conclusion on this from the previous series as noted in the cover letter. a. sysadmin through sysctl (Arnd's suggestion) b. have default vfs handlers with an option for individual fs to override. c. clamp and ignore
I think it's a mix - if the timestamps come in from userspace, fail with ERANGE. That could be controlled by sysctl via VFS part of the ->setattr operation, or in each of the individual FS implementations. If they come from the kernel (e.g. atime update) then the generic behvaiour is to warn and continue, filesystems can otherwise select their own policy for kernel updates via ->update_time.
I'd prefer not to have it done by the individual file system implementation, so we get a consistent behavior. Normally you either care about correct time stamps, or you care about interoperability and you don't want to have errors returned here.
It could be done per mount, but that seems overly complicated for rather little to be gained.
d. disable expired fs at compile time (Arnd's suggestion)
Not really an option, because it means we can't use filesystems that interop with other systems (e.g. cameras, etc) because they won't support y2038k timestamps for a long time, if ever (e.g. vfat).
Let me clarify what my idea is here: I want a global kernel option that disables all code that has known y2038 issues. If anyone tries to build an embedded system with support beyond 2038, that should disable all of those things, including file systems, drivers and system calls, so we can reasonably assume that everything that works today with that kernel build will keep working in the future and not break in random ways.
For a file system, this can be done in a number of ways:
* Most file systems today interpret the time as an unsigned 32-bit number (as opposed to signed as ext3, xfs and few others do), so as long as we use timespec64 in the syscalls, we are ok.
* Some legacy file systems (maybe hfs) can remain disabled, as nobody cares about them any more.
* If we still care about them (e.g. ext2), we can make them support only read-only mode. In ext4, this would mean forbidding write access to file systems that don't have the extended inode format enabled.
Normal users that don't care about not breaking in 2038 obviously won't set the option, and have the same level of backwards compatibility support as today.
The problem really is that there is more than one way of updating these attributes(timestamps in this particular case). The side effect of this is that we don't always call timespec_trunc() before assigning timestamps which can lead to inconsistencies between on disk and in memory inode timestamps.
That's a problem that can be fixed independently of y2038 support. Indeed, we can be quite lazy about updating timestamps - by intent and design we usually have different timestamps in memory compared to on disk, which is one of the reasons why there are so many different ways to change and update timestamps....
This has nothing to do with lazy updates. This is about writing wrong granularities and non clamped values to in-memory inode.
Which really shouldn't happen because we should be clamping and/or truncating timestamps at the creation/entry point into the VFS/filesystem.
e.g. current_fs_time(sb) is how filesystems grab the current kernel time for timestamp updates. Add an equivalent current_fs_time64(sb) to do return timespec64 and do clamping and limit warning, and now you have a simple vehicle for converting the VFS and filesystems to support y2038k clean date formats.
I think the current patch series does this already.
If there are places where filesystems are receiving or using unchecked timestamps then those are bugs that need fixing. Those need to be in separate patches to y2038k support...
Fair enough, but that probably means that patch series will have to come first. This will also reduce the number of places in which a separate type conversion function needs to be added.
Arnd