On Thu 04-10-18 13:50:12, Lukas Czerner wrote:
On Thu, Oct 04, 2018 at 12:46:40PM +0200, Jan Kara wrote:
The code cleaning transaction's lists of checkpoint buffers has a bug where it increases bh refcount only after releasing journal->j_list_lock. Thus the following race is possible:
CPU0 CPU1 jbd2_log_do_checkpoint() jbd2_journal_try_to_free_buffers() __journal_try_to_free_buffer(bh) ... while (transaction->t_checkpoint_io_list) ... if (buffer_locked(bh)) {
<-- IO completes now, buffer gets unlocked -->
spin_unlock(&journal->j_list_lock); spin_lock(&journal->j_list_lock); __jbd2_journal_remove_checkpoint(jh); spin_unlock(&journal->j_list_lock); try_to_free_buffers(page); get_bh(bh) <-- accesses freed bh
Fix the problem by grabbing bh reference before unlocking journal->j_list_lock.
Hi Jan,
nice catch. The patch looks good, you can add
Reviewed-by: Lukas Czerner lczerner@redhat.com
Btw, do you by any chance have a reproducer for this ?
No, syzbot hit it but the race window is really small so I don't think you can create reasonably reliable reproducer...
Honza