On Mon, Sep 20, 2021 at 02:36:38PM -0700, Bart Van Assche wrote:
On 9/17/21 10:04 PM, Luis Chamberlain wrote:
A sketch of how this can happen follows:
CPU A CPU B whatever_store() module_unload mutex_lock(foo) mutex_lock(foo) del_gendisk(zram->disk); device_del() device_remove_groups()
In this situation whatever_store() is waiting for the mutex foo to become unlocked, but that won't happen until module removal is complete. But module removal won't complete until the sysfs file being poked completes which is waiting for a lock already held.
If I remember correctly I encountered the deadlock scenario described above for the first time about ten years ago while working on the SCST project. We solved this deadlock by removing the sysfs attributes from the module unload code before grabbing mutex_lock(foo), e.g. by calling sysfs_remove_file().
Well the sysfs attributes in zram do tons of funky mucking around so unfortunately no. It's not the only driver where this can happen. It is why I decided to work on a generic solution instead.
This works because calling sysfs_remove_file() multiple times in a row is safe. Is that solution good enough for the zram driver?
The sysfs attributes are group attributes part of the block, and so are removed for the driver on a del_gendisk(). So unfortunately no, this would not be a good solution in this case.
Luis