On Sun 20-10-19 21:21:05, Theodore Y. Ts'o wrote:
On Fri, Oct 04, 2019 at 12:05:49AM +0200, Jan Kara wrote:
When ext4_mkdir() fails to add entry into directory, it ends up dropping freshly created inode under the running transaction and thus inode truncation happens under that transaction. That breaks assumptions that ext4_evict_inode() does not get called from a transaction context (although I'm not aware of any real issue) and is completely unnecessary. Just stop the transaction before dropping inode reference.
CC: stable@vger.kernel.org Signed-off-by: Jan Kara jack@suse.cz
If we call ext4_journal_stop(handle) before calling iput(inode), there's a chance that we could crash with the inode with i_link_counts == 0, but we won't have yet call ext4_evict_inode() to mark the inode as free in the inode bitmap. This would result in a inode leak.
Also, this isn't the only place where we can enter ext4_evict_inode() with an active handle; the same situation arise in ext4_add_nondir(), and for the same reason.
So I think the code is right as is. Do you agree?
Correct on both points. Thanks for spotting this! Now I still don't think that calling iput() with running transaction is good. It complicates matters with revoke record reservation but it is also fragile for other reasons - e.g. flush worker could find the allocated inode just before we will call iput() on it, try to write it out, block on starting transaction and we get a deadlock with inode_wait_for_writeback() inside evict(). Now inode *probably* won't be dirty yet by the time we get to ext4_add_nondir() or similar, that's why I say above it's just fragile, not an outright bug.
So I'd still prefer to do the iput() outside of transaction and we can protect from leaking the inode in case of crash by adding it to orphan list. I'll update the patch. Thanks for review!
Honza