On Thu, Aug 15, 2019 at 01:09:11AM +1000, Monthero Ronald wrote:
The patch checks for this condition of NULL pointer for the buffer_head returned from page_buffers() and also a check placed within the list traversal loop for next buffer_head structs.
crash scenario: The buffer_head returned from page_buffers() is not checked in block_invalidatepage_range function. The struct buffer_head* pointer returned by page_buffers(page) was 0x0, although this page had its private flag PG_private bit set and was expected to have buffer_head structs attached.The NULL pointer buffer_head was dereferenced in block_invalidatepage_range function at bh->b_size, where bh returned by page_buffers(page) was 0x0.
The stack frames were truncate_inode_page() => do_invalidatepage_range() => xfs_vm_invalidatepage() => [exception RIP: block_invalidatepage_range+132]
The inode for truncate in this case was valid and had proper inode.i_state = 0x20 - FREEING and had a valid mapped address space to xfs. And the struct page in context of block_invalidatepage_range() had its page flag PG_private set but the page.private was 0x0. So page_buffers(page) returned 0x0 and hence the crash. This patch performs NULL pointer check for returned buffer_head. Applies to 3.16 and later kernels.
... and adds BUG_ON() for that. The only real difference from an oops is that it's a bit easier to recognize. Which may or may not be a good debugging strategy, but what's the point of having it in -stable? Or anywhere other than the build on the boxen you are testing on...
It doesn't fix the underlying bug. It doesn't tell where the problem is. It's definitely *not* a way to fix any bugs. And while we are at it, the stuff in -stable ought to be backports from mainline.
Can you reproduce your crashes on mainline?