On Mon, Jan 04, 2021 at 02:37:00PM -0800, David Rientjes wrote:
On Mon, 4 Jan 2021, Greg KH wrote:
The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4.
Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up.
For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS.
Given that this "feature" has never worked in the 5.4 or older kernels, why should this be backported there? This isn't a bugfix from what I can tell, is it? And if so, what kernel version did work properly?
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
So since this has never worked, it is not a regression that is being fixed, but rather, a "new feature". And because of that, if you want it to work properly, please use a new kernel that has all of these major changes in it.
And if someone really wants this new feature, why can't they just use a newer kernel release?
This is more of a product question that I'll defer to Peter and he can loop the necessary people in if required.
If you want to make a "product" of a new feature, using an old kernel base, then yes, you have to backport this and you are on your own here. That's just totally normal for all "products" that do not want to use the latest kernel release.
Since the SEV feature provides confidentiality for guest managed memory, running an unmodified guest vs a guest modified to avoid these bugs by the cloud provider is a very different experience from the perspective of the customer trying to protect their data.
These customers are running standard distros that may be slow to upgrade to new kernels released over the past few months. We could certainly work with the distros to backport this support directly to them on a case-by-case basis, but the thought was to first attempt to fix this in 5.4 stable for everybody and allow them to receive the fixes necessary for running a non-buggy SEV encrypted guest that way vs multiple distros doing the backport so they can run with SEV.
What distro that is based on 5.4 that follows the upstream stable trees have not already included these patches in their releases? And what prevents them from using a newer kernel release entirely for this new feature their customers are requesting?
thanks,
greg k-h