Hi Volodymyr,
On 17/01/2019 15:21, Volodymyr Babchuk wrote:
Julien Grall writes:
Hi Volodymyr,
On 18/12/2018 21:11, Volodymyr Babchuk wrote:
From: Volodymyr Babchuk vlad.babchuk@gmail.com
The main way to communicate with OP-TEE is to issue standard SMCCC call. "Standard" is a SMCCC term and it means that call can be interrupted and OP-TEE can return control to NW before completing the call.
In contrast with fast calls, where arguments and return values are passed in registers, standard calls use shared memory. Register pair a1,a2 holds 64-bit PA of command buffer, where all arguments are stored and which is used to return data. OP-TEE internally copies contents of this buffer into own secure memory before accessing and validating any data in command buffer. This is done to make sure that NW will not change contents of the validated parameters.
Mediator needs to do the same for number of reasons:
- To make sure that guest will not change data after validation.
- To translate IPAs to PAs in the command buffer (this is not done in this patch).
- To hide translated address from guest, so it will not be able to do IPA->PA translation by misusing mediator.
During standard call OP-TEE can issue multiple "RPC returns", asking NW to do some work for OP-TEE. NW then issues special call OPTEE_SMC_CALL_RETURN_FROM_RPC to resume handling of the original call. Thus, mediator needs to maintain context for original standard call during multiple SMCCC calls.
Standard call is considered complete, when returned value is not a RPC request.
Signed-off-by: Volodymyr Babchuk vlad.babchuk@gmail.com
Changes from v2: - renamed struct domain_ctx to struct optee_domain - fixed coding style - Now I use access_guest_memory_by_ipa() instead of mappings to read command buffer - Added tracking for in flight calls, so guest can't resume the same call from two CPUs simultaniously
xen/arch/arm/tee/optee.c | 319 ++++++++++++++++++++++++++++++++++- xen/include/asm-arm/domain.h | 3 + 2 files changed, 320 insertions(+), 2 deletions(-)
diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c index 584241b03a..dc90e2ed8e 100644 --- a/xen/arch/arm/tee/optee.c +++ b/xen/arch/arm/tee/optee.c @@ -12,6 +12,8 @@ */ #include <xen/device_tree.h> +#include <xen/domain_page.h> +#include <xen/guest_access.h> #include <xen/sched.h> #include <asm/smccc.h> #include <asm/tee/tee.h> @@ -22,11 +24,38 @@ /* Client ID 0 is reserved for hypervisor itself */ #define OPTEE_CLIENT_ID(domain) (domain->domain_id + 1) +/*
- Maximal number of concurrent standard calls from one guest. This
- corresponds to OPTEE configuration option CFG_NUM_THREADS, because
- OP-TEE spawns a thread for every standard call.
Looking at OP-TEE, CFG_NUM_THREADS will vary depending on the platform. Is there any way to probe that number of threads from Xen?
Yes, this is per-platform configuration. Easiest way is to add Fast SMC to OP-TEE that will report this parameter. Jens, what do you think about adding additional call?
In any case, I think we should update the comment to reflect that this seems to be the maximum CFG_NUM_THREADS supported by any upstream platform.
Actually, OP-TEE protocol have possibility to handle limited number of threads correctly. OP-TEE can report that all threads are busy and client will wait for a free one. XEN can do the same, although this is not implemented in this patch series. But I can add this.
Could you expand by wait? Will it block in OP-TEE/Xen or does it return to the guest?
Basically this means that all can work correctly even with MAX_STD_CALLS==1. It just will be not so efficient.
Given the OS is not aware of the number of threads, The problem would be the same without Xen between. Am I right?
[...]
- /*
* Command buffer should start at page boundary.
* This is OP-TEE ABI requirement.
*/
- if ( call->guest_arg_ipa & (OPTEE_MSG_NONCONTIG_PAGE_SIZE - 1) )
return false;
- call->xen_arg = _xmalloc(OPTEE_MSG_NONCONTIG_PAGE_SIZE,
OPTEE_MSG_NONCONTIG_PAGE_SIZE);
- if ( !call->xen_arg )
return false;
- BUILD_BUG_ON(OPTEE_MSG_NONCONTIG_PAGE_SIZE > PAGE_SIZE);
As you use _xmalloc, you should not need this. This is only necessary if you use alloc_xenheap_page.
Yes, right. This is leftover from time when I mapped guest page into XEN. I'll move it down, to place where I map that page.
I am wondering whether it is wise to allocate the memory from xenheap and not domheap. While on Arm64 (for now) xenheap and domheap are the same, on Arm32 they are different. The xenheap is at most 1GB, so pretty limited.
Honestly, I have no opinion there. What are limitations of domheap in this case?
domheap pages may not always be mapped in Xen page-tables. So you have to call map_domain_page/unmap_domain_page at every use.
In practice, on Arm64, those operations are today a NOP because the memory is always mapped. On Arm32, domheap is never mapped so those operations will require to modify the page-tables.
There would be potentially ways to optimize the Arm32 case. So I think this is not a big concern as it would allow to account the memory to the domain and take advantage of potential new safety feature around domheap.
BTW, I am not asking to implement the accounting :). You can still allocate domheap memory without a domain assigned. I am only giving the advantages of using domheap over xenheap :).
- xen_addr = virt_to_maddr(call->xen_arg);
- set_user_reg(regs, 1, xen_addr >> 32);
- set_user_reg(regs, 2, xen_addr & 0xFFFFFFFF);
- return true;
+}
+static void copy_std_request_back(struct optee_domain *ctx,
struct cpu_user_regs *regs,
struct optee_std_call *call)
Can you add a comment on top of the function explaining what it does?
Yes, sure.
"Copy OP-TEE response back into guest's request buffer" will be sufficient?
See more below.
+{
- struct optee_msg_arg *guest_arg;
- struct page_info *page;
- unsigned int i;
- uint32_t attr;
- /* copy_std_request() validated IPA for us */
Not really, the guest is free to modify the stage-2 mapping on another vCPU while this is happening. I agree that the guest will shoot himself, but we at least need to not have weird behavior happening.
In that case, I would check that the type is p2m_ram_rw as you don't want to write in read-only or foreign mapping.
How I can do this atomically? I.e. what if guest will mangle p2m right after the check?
What you want to prevent is Xen writing to the wrong page. The guest should not play with page that are shared with an higher exception level.
get_page_from_gfn() takes a reference on the current page, that will be release by put_page(). Between that you are sure the page can not disappear under your feet.
Furthermore, AFAIK, there are no way for an Arm guest to modify the p2m type of a page once inserted. It can only remove or replace with a newly allocated page the mapping. If the guest instructs to - remove the page, as you have a reference that page will not disappear. - replace the page with a new one, then the guest will not be able to see the result. Tough luck, but it was not meant to do that :).
Also, as copy_std_request() and copy_std_request_back may not be called in the same "thread" it would be useful if you specify a bit more the interaction.
I not sure what do you mean there.
What I meant is the following can happen:
guest vCPU A | Xen Initiate call 1 copy_std_request() call OP-TEE -> Call "preempted" return to guest Resume call 1 resume call in OP-TEE copy_std_back_request()
AFAICT, the call could even resume from a different vCPU. It is not entirely trivial to understand this from just reading the code and the comment "copy_std_request() validated IPA for us" leads to think copy_std_request() was called right before. This may not be the case. So can you detail a bit more the interaction in the code?
- page = get_page_from_gfn(current->domain, paddr_to_pfn(call->guest_arg_ipa),
Please use gfn_x(gaddr_to_gfn(...)) to clarify this is a gfn. The gfn_x will be unnecessary soon with a cleanup that is currently under review.
So there are chances, that it will be removed when time will come to merge this patch series? :)
There is a chance to be merged. Even if it is wasn't, I would prefer to use gaddr_to_gfn(..) as your are dealing with a guest physical address. Ideally I would like paddr_to_pfn completely disappear in Arm code as the return is not typesafe (GFN vs MFN).
[...]
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 175de44927..88b48697bd 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -97,6 +97,9 @@ struct arch_domain struct vpl011 vpl011; #endif +#ifdef CONFIG_TEE
- void *tee;
+#endif
Did you look whether there are any hole in arch_domain that could be re-used?
I thought about that. But what are chances to find 64bit-wide, 64bit-aligned hole in a structure? If I remember C standard correctly, there are no reasons for compiler to leave such holes.
It depends on the alignment requested for each structure. Have a look at pahole to see the number (and size) of holes we have in some structures ;).
Anyway, I can't see anything promising in p2m_domain so far.
Cheers,