Hi Paul, Julien,
Volodymyr hasn't come back with an update to this patch, but I think it is good enough as-is as a bug fix and I would rather have it in its current form in 4.14 than not having it at all leaving the bug unfixed.
I think Julien agrees.
Paul, are you OK with this?
On Mon, 11 May 2020, Stefano Stabellini wrote:
On Mon, 11 May 2020, Andrew Cooper wrote:
On 11/05/2020 10:34, Julien Grall wrote:
Hi Volodymyr,
On 06/05/2020 02:44, Volodymyr Babchuk wrote:
Normal World can share buffer with OP-TEE for two reasons:
- Some client application wants to exchange data with TA
- OP-TEE asks for shared buffer for internal needs
The second case was handle more strictly than necessary:
- In RPC request OP-TEE asks for buffer
- NW allocates buffer and provides it via RPC response
- Xen pins pages and translates data
- Xen provides buffer to OP-TEE
- OP-TEE uses it
- OP-TEE sends request to free the buffer
- NW frees the buffer and sends the RPC response
- Xen unpins pages and forgets about the buffer
The problem is that Xen should forget about buffer in between stages 6 and 7. I.e. the right flow should be like this:
- OP-TEE sends request to free the buffer
- Xen unpins pages and forgets about the buffer
- NW frees the buffer and sends the RPC response
This is because OP-TEE internally frees the buffer before sending the "free SHM buffer" request. So we have no reason to hold reference for this buffer anymore. Moreover, in multiprocessor systems NW have time to reuse buffer cookie for another buffer. Xen complained about this and denied the new buffer registration. I have seen this issue while running tests on iMX SoC.
So, this patch basically corrects that behavior by freeing the buffer earlier, when handling RPC return from OP-TEE.
Signed-off-by: Volodymyr Babchuk volodymyr_babchuk@epam.com
xen/arch/arm/tee/optee.c | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c index 6a035355db..af19fc31f8 100644 --- a/xen/arch/arm/tee/optee.c +++ b/xen/arch/arm/tee/optee.c @@ -1099,6 +1099,26 @@ static int handle_rpc_return(struct optee_domain *ctx, if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_ALLOC ) call->rpc_buffer_type = shm_rpc->xen_arg->params[0].u.value.a; + /* + * OP-TEE signals that it frees the buffer that it requested + * before. This is the right for us to do the same. + */ + if ( shm_rpc->xen_arg->cmd == OPTEE_RPC_CMD_SHM_FREE ) + { + uint64_t cookie = shm_rpc->xen_arg->params[0].u.value.b;
+ free_optee_shm_buf(ctx, cookie);
+ /* + * This should never happen. We have a bug either in the + * OP-TEE or in the mediator. + */ + if ( call->rpc_data_cookie && call->rpc_data_cookie != cookie ) + gprintk(XENLOG_ERR, + "Saved RPC cookie does not corresponds to OP-TEE's (%"PRIx64" != %"PRIx64")\n",
s/corresponds/correspond/
+ call->rpc_data_cookie, cookie);
IIUC, if you free the wrong SHM buffer then your guest is likely to be running incorrectly afterwards. So shouldn't we crash the guest to avoid further issue?
No - crashing the guest prohibits testing of the interface, and/or the guest realising it screwed up and dumping enough state to usefully debug what is going on.
Furthermore, if userspace could trigger this path, we'd have to issue an XSA.
Crashing the guest is almost never the right thing to do, and definitely not appropriate for a bad parameter.
Maybe we want to close the OPTEE interface for the guest, instead of crashing the whole VM. I.e. freeing the OPTEE context for the domain (d->arch.tee)?
But I think the patch is good as it is honestly.