On Fri, Jun 2, 2017 at 10:18 PM, Arnd Bergmann arnd@arndb.de wrote:
On Fri, Jun 2, 2017 at 2:18 PM, Yan, Zheng ukernel@gmail.com wrote:
On Fri, Jun 2, 2017 at 7:33 PM, Arnd Bergmann arnd@arndb.de wrote:
On Fri, Jun 2, 2017 at 1:18 PM, Yan, Zheng ukernel@gmail.com wrote: What I meant is another related problem in ceph_mkdir() where the i_ctime field of the parent inode is different between the persistent representation in the mds and the in-memory representation.
I don't see any problem in mkdir case. Parent inode's i_ctime in mds is set to r_stamp. When client receives request reply, it set its in-memory inode's ctime to the same time stamp.
Ok, I see it now, thanks for the clarification. Most other file systems do this the other way round and update all fields in the in-memory inode structure first and then write that to persistent storage, so I was getting confused about the order of events here.
If I understand it all right, we have three different behaviors in ceph now, though the differences are very minor and probably don't ever matter:
in setattr(), we update ctime in the in-memory inode first and then send the same time to the mds, and expect to set it again when the reply comes.
in ceph_write_iter write() and mmap/page_mkwrite(), we call file_update_time() to set i_mtime and i_ctime to the same timestamp first once a write is observed by the fs and then take two other timestamps that we send to the mds, and update the in-memory inode a second time when the reply comes. ctime is never older than mtime here, as far as I can tell, but it may be newer when the timer interrupt happens between taking the two stamps.
We don't use request to send i_mtime/i_ctime to mds in this case. Instead, we use cap flush message. i_mtime/i_ctime are directly encoded in cap flush message. When mds receives the cap flush message, it writes i_mtime/i_ctime to persistent storage and sends a cap flush ack message to client. (when client receives the cap flush ack message, it does not update i_mtime/i_ctime). There is no issue as you described.
- in all other calls, we only update the inode (and/or parent inode) after the reply arrives.
There are two cases. 1. Client updates in-memory inode's ctime, it sends the new ctime to mds through cap flush message. 2. client set mds request's r_stamp and send the request to mds. MDS updates relavent inodes' ctime and sends reply to client. Client updates in-memory inodes' ctime according to the reply.
Regards Yan, Zheng
Arnd