On Mon, 4 Jan 2021, Greg KH wrote:
The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4.
Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up.
For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS.
Given that this "feature" has never worked in the 5.4 or older kernels, why should this be backported there? This isn't a bugfix from what I can tell, is it? And if so, what kernel version did work properly?
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
And if someone really wants this new feature, why can't they just use a newer kernel release?
This is more of a product question that I'll defer to Peter and he can loop the necessary people in if required.
Since the SEV feature provides confidentiality for guest managed memory, running an unmodified guest vs a guest modified to avoid these bugs by the cloud provider is a very different experience from the perspective of the customer trying to protect their data.
These customers are running standard distros that may be slow to upgrade to new kernels released over the past few months. We could certainly work with the distros to backport this support directly to them on a case-by-case basis, but the thought was to first attempt to fix this in 5.4 stable for everybody and allow them to receive the fixes necessary for running a non-buggy SEV encrypted guest that way vs multiple distros doing the backport so they can run with SEV.