On Wed, Apr 02, 2025 at 02:36:37PM -0700, Dan Williams wrote:
Naveen N Rao wrote:
On Tue, Apr 01, 2025 at 10:07:18AM -0500, Tom Lendacky wrote:
On 4/1/25 02:57, Kirill Shutemov wrote:
On Mon, Mar 31, 2025 at 04:14:40PM -0700, Dan Williams wrote:
Nikolay reports [1] that accessing BIOS data (first 1MB of the physical address space) via /dev/mem results in an SEPT violation.
The cause is ioremap() (via xlate_dev_mem_ptr()) establishing an unencrypted mapping where the kernel had established an encrypted mapping previously.
Teach __ioremap_check_other() that this address space shall always be mapped as encrypted as historically it is memory resident data, not MMIO with side-effects.
I am not sure if all AMD platforms would survive that.
Tom?
I haven't tested this, yet, but with SME the BIOS is not encrypted, so that would need an unencrypted mapping.
Could you qualify your mapping with a TDX check? Or can you do something in the /dev/mem support to map appropriately?
I'm adding @Naveen since he is preparing a patch to prevent /dev/mem from accessing ROM areas under SNP as those can trigger #VC for a page that is mapped encrypted but has not been validated. He's looking at possibly adding something to x86_platform_ops that can be overridden. The application would get a bad return code vs an exception.
The thought with x86_platform_ops was that TDX may want to differ and setup separate ranges to deny access to. For SEV-SNP, we primarily want to disallow the video ROM range at this point. Something like the below.
If this is not something TDX wants, then we should be able to add a check for SNP in devmem_is_allowed() directly without the x86_platform_ops.
So I think there are 2 problems is a range consistently mapped by early init code + various ioremap callers, and for encrypted mappings is there potential unvalidated access that needs to be prevented outright.
The theoretical use case I have in mind is that userspace PCI drivers have no real reason to be blocked in a confidential VM. Most of the validation work to transition MMIO from shared to private is driven by userspace anyway so it is unfortunate that after the end of that conversion devmem and PCI-sysfs still block mappings.
However, there is no need to do pre-enabling for a theoretical use case. So I am ok if devmem_is_allowed() globally says no for TVMs and then see who screams with a practical problem that causes.
That makes sense. I have posted that patch with some changes: https://lore.kernel.org/all/20250403120228.2344377-1-naveen@kernel.org/T/#u
It should be trivial to add a change for Intel to block the first 1MB for TVMs.
Thanks, Naveen