On 02.01.20 20:47, Volodymyr Babchuk wrote:
Hello community,
I want to discuss the next big feature for virtualized OP-TEE. As you know, currently OP-TEE in virtualization mode does not support access to real hardware like RPMB partitions, crypto accelerators and such because all instances of TEE is equal and there is no locking mechanism (among other problems).
I have wrote a small paper that discusses different approaches to this problem, taking RPMB as an example. You can find this paper there:
https://xen-troops.github.io/papers/optee-virt-rpmb.pdf
For convenience of quoting and discussing, I'm posting its text below:
RPMB without virtualization
OP-TEE does not have direct access to the RPMB device because it is the part of (e)MMC card and this card is used mostly by REE. Fortunately RPMB specification employs HMAC to ensure that only trusted code can read and write RPMB partition. So, there it is perfectly fine to communicate with RPMB over Normal World. There is how it happens:
OP-TEE -> Linux kernel -> Supplicant -> Linux kernel -> RPMB
Linux kernel provide ioctls to communicate with RPMB partition on eMMC. OP-TEE supplicant receives RPMB requests RPC from OP-TEE and uses mentioned ioctls to access the RPMB partition.
It should be noted that during initialization OP-TEE reads the size of RPMB partition and write counter. Then, in runtime it maintains the write counter and returns an error if expected write counter does not correspond to one returned by RPMB. As OP-TEE is the only user of RPMB this is perfectly fine.
RPMB with virtualization
Sharing a single RPMB partition
Now suppose we want to share the single RPMB partition between the multiple guests. The simplest solution is to provide each guest with its own RPMB-capable device. But in the most cases platform will have only one eMMC device, so all guests should share it in some way. Also, this device will physically available to only one guest, let’s call it “Driver Domain” or “DomD”. Writes to RPMB should be atomic, but nature or RPMB driver in OP-TEE requires multiple writes if RPMB is not capable of writing big buffer at once. This leads to series of writes, with increasing write counter for every write request. That will lead to de-synchronization of write counter value stored in different TEE instances. Also, that could lead to race condition, when two or mode TEE instances submit batch of write/read requests in the same time.
I'm sure I'm missing out something here, but if the secure storage driver is part of the nexus, and as such it handles all requests to the RPMB, then there should be no synchronization issues or races here?
Communication between OP-TEE instances
As other domains can’t work with the eMMC directly we will require PV protocol to access foreign RPMB partition. The problem is how to divide RPMB partition between the multiple guests. We want each OP-TEE instance to operate with the physical RPMB partition, but on other hand we want to ensure maximum isolation between TEE instances. Clearly, we need one designated TEE instance that is capable to accessing RPMB. This is the instance that is associated with DomD.
Also we need some communication method between TEE instances. As OP-TEE is scheduled by Normal World, we can’t make direct communication between TEE instances, because in this way one guest can lock up some another guest.
So we need a secure way to exchange messages between TEE instances, but which is scheduled by NW. This involves inter-guest communication in the NW using mechanisms provided by the hypervisor. We are not required to send actual data, only signal to another TEE instance that there is a some request in some shared buffer. This is what is called “doorbell”.
- optee_domU writes request to shm
- optee_domU issues "doorbell" RPC to own supplicant (in domU)
- supplicant in domU uses standard Xen PV protocol to call frontend in DomD
- frontend in DomD calls pTA of optee_DomD
- optee_DomD reads request from shm (written in p.1)
- optee_DomD issues RPMB request using standard way
As you can see, client TEE instance writes the request to a shared buffer inside the secure memory. Then it issues RPC request back to own supplicant. Supplicant uses hypervisor-provided way to signal frontend server in the DomD. In case of Xen, inter-domain events can be used. This is a common mechanism to implement PV drivers in the Xen. Virtio also provides similar methods of signaling. PV server in DomD opens session to pTA in OP-TEE that belongs to DomD. This pTA accesses the previously written shared memory and reads the request from DomU. Then it can handle request - i.e. issue RPMB command. This provides the secure way to transmit requests from one TEE instance to another without adding inter-domain locking mechanisms. Obviously the same mechanism can be used not only for RPMB requests but for any inter-domain communication.
Partitioning RPMB Area
The provided mechanism answers the question “how to access RPMB from multiple domains”. But there is another question: “how to share space on RPMB partition?”. Now OP-TEE have a static number of domains configured at a compilation time (``CFG_VIRT_GUEST_COUNT`` configuration option). So, we can divide RPMB partition into ``CFG_VIRT_GUEST_COUNT`` (equal) parts and grant every guest access to one of these. But how we can make sure that guest A will have access to partition for guest A, not for guest B? We can’t rely on VM ID received by hypervisor at least because VMs can be restarted and each time it will get a new ID. Also, IDs will vary depending on VM creation order. Clearly we need persistent Virtual Machine ID (GUID will be fine). But I can’t see a way how TEE instance can be sure that it communicates with the same VM every boot. As hypervisor could swap GUIDs between “good” and “rouge” VMs.
The behavior related to the VM ID looks hypervisor-specific to me. In practice, whether it's a GUID or a single digit is no different to OP-TEE, as long as the hypervisor guarantees that it won't reassign an identifier to a different guest. In fact, what is passed as a VM ID to OP-TEE, does not even have to be what the hypervisor uses internally to enumerate its guests, and VM IDs don't need to be sequential either. As mentioned below, this is potentially a security issue, and I would add that it also affects the current implementation. So it's probably the mediator implementation that should guarantee that the VM IDs it passes are consistent with guests, using GUIDs or otherwise.
Actually, the same thing possible with multiple CAs, accessing the same TA, as there is no mechanism to authenticate CA. This should be keep in mind while developing Trusted Applications.
So, as I’m seeing it, RPMB partition should have partition table somewhere. The very beginning of the partition should be fine.
This table should hold at least the following values:
Magic value (might serve as the version number also)
Total count of the partitions
Table of partitions, each one having the next entries
GUID of the VM
Offset of the partition in RPMB write blocks
Size of the partition in RPMB write blocks
Extending TEE virtualization API
With features above, we need to extend virtualization-related APIs. The ``OPTEE_SMC_VM_CREATED`` call should be extended with the following information:
GUID of the virtual machine
Flag to indicate that this machine have access to real RPMB partition
In the future number of flags can be extended to denote ability to access hardware accelerators for example.
I would be wary on passing device configuration through registers, as device configuration complexity will eventually increase.
Some examples:
- Different configurations of a certain device type, eg single RPMB device vs multi-RPMB supporting UFS / NVMe.
- Configuration of other types of devices. Although RPMB and crypto-engines are perhaps the highest priority, we see more and more drivers arriving to the TEE.
- Assignment of different devices to different guests. Guest A needs access to device X, Guest B needs access to device Y, and Guest C needs access to both X and Y.
These are all valid scenarios derived from actual usecases. To address these, device configuration needs to be more flexible. I am well aware the since OPTEE_SMC_VM_CREATED is a fast call, there is no SHM to pass these parameters, so another way would be needed.
One possible candidate for this could be the dts. In embedded environments where the guest layout is usually fixed, devices could be pre-configured in the dts. When it comes to more dynamic environments like the cloud, where guests coming in and out, one approach could be that the hypervisor or some other entity would be responsible updating the dts before the new guest gets announced to OP-TEE.
In any case, I think that device configuration should be addressed in general before looking at specific devices.
Also we need pTA for RPMB partition table management. So hypervisor (or trusted domain) can configure the table, e.g. - to assign GUIDs to table entries. It is debatable if it also should be able to wipe out partitions to re-assign GUIDs later.
Another pTA along with some shared memory mechanism is needed to enable inter-TEE instances communication as described in subsection "Communication between OP-TEE instances".
So, only one TEE instance should enable this two pTAs.
Virtual RPMB partition
There is completely another approach to RPMB available. Next version of virtio interface specification provides virtio-rpmb interface which can be used to “forward” RPMB requests to another VM. The same mechanism can be used to enable software-emulated RPMB partitions inside of some “trusted” domain. This approach requires absolutely no changes in the OP-TEE code. We need to write software emulator for RPMB like the one used in OP-TEE supplicant, but with virtio-rpmb interface. Then every TEE instance will see own virtual RPMB device and work with it as usual, providing that kernel will have virtio-rpmb frontend code, which will be implemented anyways, as part of the virtio specification.
Actually, I believe that it is possible to use hardware RPMB to ensure replay protection in the emulated RPMB devices. For example, we can store hashes of virtual RPMB data in the hardware-backed RPMB devices.
I believe there is work to be done in the OP-TEE driver in this case too: virtio-rpmb only provides a way to relay RPMB frames over the VIRTIO transport. Frames are delivered signed (HMAC-SHA256) by the OP-TEE driver, who is also the only party in knowledge of the RPMB key, other than the device itself. This means that no other entity can update these frames. So even in the case that virtio-rpmb is used, the OP-TEE Secure Storage driver needs to be modified to become virtualization aware, so that it takes care of what is described in the "RPMB Partitioning" section, ie be aware of how the max number of guests, set address somewhere within the area that corresponds to the current guest, generate additional frames to update the partition tables etc
Multiple hardware RPMB devices
If platform have more than one RPM partition (for example NVMe devices can support up to 8 independent RPMB partitions), then it is possible to assign each RPMB partition to a different VM. There are two ways possible.
Hardware partitioning
Some platforms allows to “give” access to a certain device to a certain VM. For example, if platform have 3 independent MMC controllers, every controller can be assigned to own VM and it will have exclusive access to that MMC. In this case no changes needed neither in OP-TEE, nor in linux driver. This is a matter of hypervisor configuration.
Virtio-rpmb assisted partitioning
It is possible that RPMB devices can’t be exclusively assigned to VM. For example, all RPMBs are belonging to one NVMe drive. Or platform configuration have tightly bound MMC controllers, so there is no chance to assign them to a different VMs.
In this case virtio-rpmb protocol can be used. We can configure hypervisor to provide each VM with own RPMB physical partition via virtio-rpmb. This setup is similar to one described in section "Virtual RPMB partition" but with real HW instead if emulation.
Comparison of proposed approaches
Comparison is made in form of table. Please see it in the PDF version at https://xen-troops.github.io/papers/optee-virt-rpmb.pdf
Overall I would be in favor of a more generic rather than a bespoke, per-case approach. Sharing devices is a problem that has already been pestering the Normal World for a while, and virtio seems like a well tested solution to this problem. Moreover, in AArch64 the Secure World seems to be moving towards Secure Partitions. It would be nice to have drivers in different secure partitions that communicate with OP-TEE using VIRTIO over SPCI.
For RPMB, this would allow the driver implementation in the secure partition to handle requests while abstracting away the device capabilities, ie whether it supports one or multiple RPMB partitions, or whether it uses assistance from the Normal World or not.
When it comes to other devices, I'm sure that after RPMB the next device to support will be crypto engines. virtio-crypto has already solved the problem on how to share this type of devices. People are already familiar with it, and it also provides a clean placeholder for OEMs to port their drivers to.
--
Michalis Pappas Senior Software Engineer
OpenSynergy GmbH Rotherstr. 20, 10245 Berlin
p: +49 (30) 60 98 54 0 - 0 f: +49 (30) 60 98 54 0 - 99
e: michalis.pappas@opensynergy.com
w: www.opensynergy.com
registered: Amtsgericht Charlottenburg, HRB 108616B
Handelsregister: Amtsgericht Charlottenburg, HRB 108616B Geschäftsführer/Managing Director: Regis Adjamah
Please mind our privacy noticehttps://www.opensynergy.com/datenschutzerklaerung/privacy-notice-for-business-partners-pursuant-to-article-13-of-the-general-data-protection-regulation-gdpr/ pursuant to Art. 13 GDPR. // Unsere Hinweise zum Datenschutz gem. Art. 13 DSGVO finden Sie hier.https://www.opensynergy.com/de/datenschutzerklaerung/datenschutzhinweise-fuer-geschaeftspartner-gem-art-13-dsgvo/