On Tue, 5 Jan 2021, Greg KH wrote:
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
So since this has never worked, it is not a regression that is being fixed, but rather, a "new feature". And because of that, if you want it to work properly, please use a new kernel that has all of these major changes in it.
Hmm, maybe :) AMD shipped guest support in 4.14 and host support in 4.16 for the SEV feature. In turns out that a subset of drivers (for Google, NVMe) would run into scheduling while atomic bugs because they do GFP_ATOMIC allocations through the DMA API and that uses set_memory_decrypted() for SEV which can block. I'd argue that's a bug in the SEV feature for a subset of configs.
So this never worked correctly for a subset of drivers until I added atomic DMA pools in 5.7, which was the preferred way of fixing it. But SEV as a feature works for everybody not using this subset of drivers. I wouldn't say that the fix is a "new feature" because it's the means by which we provide unencrypted DMA memory for atomic allocators that can't make the transition from encrypted to unecrypted during allocation because of their context; it specifically addresses the bug.
What distro that is based on 5.4 that follows the upstream stable trees have not already included these patches in their releases? And what prevents them from using a newer kernel release entirely for this new feature their customers are requesting?
I'll defer this to Peter who would have a far better understanding of the base kernel versions that our customers use with SEV.
Thanks