Hi Matthew,
On Thu, Aug 20, 2020 at 12:34:48PM +0100, Matthew Wilcox wrote:
On Thu, Aug 20, 2020 at 12:53:23PM +0800, Gao Xiang wrote:
SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS. So, !SWP_FS means non NFS for now, it could be either file backed or device backed. Something similar goes with legacy SWP_FILE.
So in order to achieve the goal of the original patch, SWP_BLKDEV should be used instead.
This is clearly confusing. I think we need to rename SWP_FS to SWP_FS_OPS.
More generally, the swap code seems insane. I appreciate that it's an inherited design from over twenty-five years ago, and nobody wants to touch it, but it's crazy that it cares about how the filesystem has mapped file blocks to disk blocks. I understand that the filesystem has to know not to allocate memory in order to free memory, but this is already something filesystems have to understand. It's also useful for filesystems to know that this is data which has no meaning after a power cycle (so it doesn't need to be journalled or snapshotted or ...), but again, that's useful functionality which we could stand to present to userspace anyway.
I suppose the tricky thing about it is that working on the swap code is not as sexy as working on a filesystem, and doing the swap code right is essentially writing a filesystem, so everybody who's capable already has something better to do.
Yeah, I agree with your point. After looking into swap code a bit (swapfile.c and swap.c), I think such code really needs to be cleaned up... But I'm lack of motivation about this since I couldn't guarantee to introduce some new regression and honestly I don't care much about this piece of code.
Maybe some new projects based on this could help clean up that as well. :)
Anyway, we really need a quick fix to avoid such FS corruption, which looks dangerous on the consumer side.
Anyway, Gao, please can you submit a follow-on patch to rename SWP_FS?
Ok, anyway, that is another stuff and may need some other thread. I will seek some time to send out a patch for further discussion later.
Thanks, Gao Xiang